
# Notebook 3 – Natural Language Classifier (NLC)
IBM Watson Natural Language Classifier uses machine learning algorithms to return the top matching predefined classes for short text input. 

*YOU* Create and train a classifier to connect predefined classes to example texts so that the service can apply those classes to new inputs.

https://www.ibm.com/watson/services/natural-language-classifier/ 
https://www.ibm.com/watson/developercloud/natural-language-classifier/api/v1 


## Install dependencies

In [72]:
#imports.... Run this each time after restarting the Kernel
#!pip install watson_developer_cloud
import watson_developer_cloud as watson
import json
from botocore.client import Config
import ibm_boto3


### Create Watson Natural Language Classifier service


### Add Credentials

Copy paste the following snippet to next cell, and add your own set of crdentials there:

```code
credentials_os = {
    'IBM_API_KEY_ID': '',
    'IAM_SERVICE_ID': '',
    'ENDPOINT': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
    'IBM_AUTH_ENDPOINT': 'https://iam.ng.bluemix.net/oidc/token',
    'BUCKET': '',
}

credentials_nlc = {
    "classifier_id": "",
    "username": "",
    "password": ""
}

```

## TRAIN the NLC by sending it a ground_truth.CSV file to process

<< TO DO MAMOON >> ADD LOGIC HERE FOR PYTHON UPLOAD, PRINT THE MODEL-ID (returned) AND THEN CHECK STATUS (when ready) >>
this can take 10m for small ground truth CSV's and longer for more complex - for the tutorial - you can come back later for yours, or use ours

### Not ready msg  >>  "The classifier instance is in its training phase, not yet ready to accept classify requests"
### Ready message >> "The classifier instance is now available and is ready to take classifier requests"

### Classifier Training?  Waiting?  No problem - for lab we've pre-trained NLC to be ready to interrogate immediately - with creds

In [73]:
# The code was removed by DSX for sharing.

In [74]:

client = ibm_boto3.client(service_name='s3', 
    ibm_api_key_id=credentials_os['IBM_API_KEY_ID'],
    ibm_auth_endpoint=credentials_os['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')




### NLC

- `process_text()` goes throught the text and fetch sentences and concatenate transcript based on chunk size
- `classify()` calls natural language classifier endpoint and classify the text fields in transcript
- 

In [77]:
#NLC

from watson_developer_cloud import NaturalLanguageClassifierV1

natural_language_classifier = NaturalLanguageClassifierV1(
    username = credentials_nlc['username'],
    password = credentials_nlc['password'])

chunk_size = 25
# Used to SPLIT up - "CHUNK" the aggregate transcript into smaller pieces

def chunk_transcript(transcript, chunk_size):
    transcript = transcript.split(' ')
    return [ transcript[i:i+chunk_size] for i in range(0, len(transcript), chunk_size) ] # chunking data
    

def process_text(text):
    transcript=''
    for sentence in json.loads(text)['results']:
        transcript = transcript + sentence['alternatives'][0]['transcript'] # concatenate sentences
    transcript = chunk_transcript(transcript, chunk_size) # chunk the transcript
    return transcript

def classify(file_name):
    streaming_body = client.get_object(Bucket = credentials_os['BUCKET'], Key = file_name.split('.')[0]+'_text.json')['Body']
    transcript=streaming_body.read().decode("utf-8")
    analysis = {}
    for chunk in process_text(transcript):
        chunk = ' '.join(chunk)
        analysis[chunk] = natural_language_classifier.classify(credentials_nlc['classifier_id'], chunk)
    client.put_object(Bucket = credentials_os['BUCKET'], Key = file_name.split('.')[0]+'_nlc', Body= json.dumps(analysis))
    return analysis


def classify_transcript(file_name):
    status = natural_language_classifier.get_classifier(credentials_nlc['classifier_id'])
    if status['status'] == 'Available':
        classes = classify(file_name)
    return classes


In [None]:
file_list = ['sample1-addresschange-positive.ogg',
             'sample2-address-negative.ogg',
             'sample3-shirt-return-weather-chitchat.ogg',
             'sample4-angryblender-sportschitchat-recovery.ogg',
             'sample5-calibration-toneandcontext.ogg',
             'jfk_1961_0525_speech_to_put_man_on_moon.ogg',
             'May 1 1969 Fred Rogers testifies before the Senate Subcommittee on Communications.ogg'
            ]

# we add audio files to COS pre-conference - REMEMBER to update this if you add files in Notebook #1  (JSON here, OGG there)

classify_transcript(file_list[0])

{'bye bye ': {'classes': [{'class_name': 'standard-conversation',
    'confidence': 0.9322813077613376},
   {'class_name': 'strong-signal-joy', 'confidence': 0.015946408018823546},
   {'class_name': 'strong-signal-satisfaction',
    'confidence': 0.011253473850558509},
   {'class_name': 'social-cue-exit', 'confidence': 0.008802616238827679},
   {'class_name': 'chit-chat', 'confidence': 0.007338829116471957},
   {'class_name': 'strong-signal-anger', 'confidence': 0.007010593800530975},
   {'class_name': 'strong-signal-miscommunication',
    'confidence': 0.004514950562333546},
   {'class_name': 'request-strong-signal-manager-request',
    'confidence': 0.0039424477335365075},
   {'class_name': 'request-disconnect', 'confidence': 0.0034904947081333022},
   {'class_name': 'social-cue-uncomfortable',
    'confidence': 0.0027621427078705625}],
  'classifier_id': 'f7ea68x308-nlc-917',
  'text': 'bye bye ',
  'top_class': 'standard-conversation',
  'url': 'https://gateway.watsonplatform.net/n

In [None]:
classify_transcript(file_list[6])
