
# Notebook 3 – Natural Language Classifier (NLC)
IBM Watson Natural Language Classifier uses machine learning algorithms to return the top matching predefined classes for short text input. 

*YOU* Create and train a classifier to connect predefined classes to example texts so that the service can apply those classes to new inputs.

https://www.ibm.com/watson/services/natural-language-classifier/ 
https://www.ibm.com/watson/developercloud/natural-language-classifier/api/v1 


## Install dependencies

In [5]:
#imports.... Run this each time after restarting the Kernel
#!pip install watson_developer_cloud
import watson_developer_cloud as watson
import json
from botocore.client import Config
import ibm_boto3


### Create Watson Natural Language Classifier service


### Add Credentials

Copy paste the following snippet to next cell, and add your own set of crdentials there:

```code
credentials_os = {
    'IBM_API_KEY_ID': '',
    'IAM_SERVICE_ID': '',
    'ENDPOINT': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
    'IBM_AUTH_ENDPOINT': 'https://iam.ng.bluemix.net/oidc/token',
    'BUCKET': '',
}

credentials_nlc = {
    "classifier_id": "",
    "username": "",
    "password": ""
}

```

In [6]:
# The code was removed by DSX for sharing.

In [7]:

client = ibm_boto3.client(service_name='s3', 
    ibm_api_key_id=credentials_os['IBM_API_KEY_ID'],
    ibm_auth_endpoint=credentials_os['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')




### NLC

- `process_text()` goes throught the text and fetch sentences and concatenate transcript based on chunk size
- `classify()` calls natural language classifier endpoint and classify the text fields in transcript
- 

In [26]:
#NLC

from watson_developer_cloud import NaturalLanguageClassifierV1


natural_language_classifier = NaturalLanguageClassifierV1(
    username = credentials_nlc['username'],
    password = credentials_nlc['password'])

chunk_size = 30

def chunk_transcript(transcript, chunk_size):
    transcript = transcript.split(' ')
    return [ transcript[i:i+chunk_size] for i in range(0, len(transcript), chunk_size) ] # chunking data
    

def process_text(text):
    transcript=''
    for sentence in json.loads(text)['results']:
        transcript = transcript + sentence['alternatives'][0]['transcript'] # concatenate sentences
    transcript = chunk_transcript(transcript, chunk_size) # chunk the transcript
    return transcript

def classify(file_name):
    streaming_body = client.get_object(Bucket = credentials_os['BUCKET'], Key = file_name)['Body']
    transcript=streaming_body.read().decode("utf-8")
    analysis = {}
    for chunk in process_text(transcript):
        chunk = ' '.join(chunk)
        analysis[chunk] = natural_language_classifier.classify(credentials_nlc['classifier_id'], chunk)
    client.put_object(Bucket = credentials_os['BUCKET'], Key = file_name[0].split('_')[0]+'_nlc', Body= json.dumps(analysis))
    return analysis


def classify_transcript(file_name):
    status = natural_language_classifier.get_classifier(credentials_nlc['classifier_id'])
    if status['status'] == 'Available':
        classes = classify(file_name)
    return classes


In [27]:
text_files = ['sample1-addresschange-positive_text.json', 'sample2-address-negative_text.json', 'sample3-shirt-return-weather-chitchat_text.json', 'sample4-angryblender-sportschitchat-recovery_text.json', 'sample5-calibration-toneandcontext_text.json'] # we add audio files to COS pre-conference


classify_transcript(text_files[0])


{"good morning can you give me some help I'd like to change my address please my name is Ryan Smith I am from Sacramento California that's right my phone number": {'classes': [{'class_name': 'Hufflepuff',
    'confidence': 0.6729345769224545},
   {'class_name': 'Gryffindor', 'confidence': 0.2980201008381233},
   {'class_name': 'Ravenclaw', 'confidence': 0.019072750621833615},
   {'class_name': 'Slytherin', 'confidence': 0.00997257161758855}],
  'classifier_id': '340008x87-nlc-967',
  'text': "good morning can you give me some help I'd like to change my address please my name is Ryan Smith I am from Sacramento California that's right my phone number",
  'top_class': 'Hufflepuff',
  'url': 'https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/340008x87-nlc-967'},
 "is five five five one two one two yes that's me my old address is number one two three oak street my new address is five six seven pine street": {'classes': [{'class_name': 'Gryffindor',
    'confi