
## Notebook 2 – Natural Language Understanding (NLU)
NLU analyzes text to extract meta-data from content such as concepts, entities, keywords, categories, relations and semantic roles.
https://www.ibm.com/watson/services/natural-language-understanding/ 
https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/  


## Install dependencies

Python’s standard library is very extensive, offering a wide range of facilities. It contains built-in modules like JSON a lightweight data interchange format. https://docs.python.org/2/library/index.html and https://docs.python.org/2/library/json.html

IBM Watson Developer Cloud has a Python client library to quickly get started with the various Watson APIs services. https://pypi.python.org/pypi/watson-developer-cloud

Using Python with IBM COS: Python support is provided through the Boto 3 library. The boto3 library provides complete access and can source credentials. The IBM COS endpoint must be specified when creating a service resource or low-level client as shown in documentation https://ibm-public-cos.github.io/crs-docs/python




In [49]:
#imports.... Run this each time after restarting the Kernel
#!pip install watson_developer_cloud
import watson_developer_cloud as watson
import json
from botocore.client import Config
import ibm_boto3


### Create Watson Natural Language Understanding service

For more information on creating Watson services, see Notebook 1

### Add Credentials

Copy paste the following snippet to next cell, and add your own set of crdentials there:

```code
credentials_os = {
    'IBM_API_KEY_ID': '',
    'IAM_SERVICE_ID': '',
    'ENDPOINT': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
    'IBM_AUTH_ENDPOINT': 'https://iam.ng.bluemix.net/oidc/token',
    'BUCKET': '',
}

credentials_nlu = {
    "url": "",
    "username": "",
    "password": ""
}

```

In [50]:
# The code was removed by DSX for sharing.

## Set-up Object storage

In [51]:
# For more information on creating Watson services, see Notebook 1

client = ibm_boto3.client(service_name='s3', 
    ibm_api_key_id=credentials_os['IBM_API_KEY_ID'],
    ibm_auth_endpoint=credentials_os['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')



### NLU

- `process_text()` goes throught the text and fetch sentences and concatenate transcript based on chunk size
- `analyze transcript()` calls natural language understanding endpoint and analyze the transcripe
- `post_analysis` processes the results and show insights based on response from NLU endpoint

In [69]:
#NLU

from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding.features import (
    v1 as Features)

natural_language_understanding = NaturalLanguageUnderstandingV1(
    version = '2017-02-27',
    username = credentials_nlu['username'],
    password = credentials_nlu['password'])

chunk_size=25 # This CHUNK size is used to disaggregate a transcript 
#e.g. in this case a 290 word transcript would have 10 chunks - 9 with 30 words and 1 with 20 words - approximates 'time domain' for this lab

def chunk_transcript(transcript, chunk_size):
    transcript = transcript.split(' ')
    return [ transcript[i:i+chunk_size] for i in range(0, len(transcript), chunk_size) ] # chunking data

def process_text(text):
    transcript=''
    for sentence in json.loads(text)['results']:
        transcript = transcript + sentence['alternatives'][0]['transcript'] # concatenate sentences
    transcript = chunk_transcript(transcript, chunk_size) # chunk the transcript
    return transcript


def analyze_transcript(features, file_name):
    streaming_body = client.get_object(Bucket = credentials_os['BUCKET'], Key=file_name.split('.')[0]+'_text.json')['Body']
    transcript=streaming_body.read().decode("utf-8")
    nlu_analysis={}
    for chunk in process_text(transcript):
        if len(chunk) > 5:
            chunk = ' '.join(chunk)
            nlu_analysis[chunk] = natural_language_understanding.analyze(features, chunk, return_analyzed_text=True)
    res=client.put_object(Bucket = credentials_os['BUCKET'], Key=file_name[0].split('.')[0]+'_NLU.json', Body= json.dumps(nlu_analysis))
    return nlu_analysis

def post_analysis(result):
    for chunk in result.keys():
        categories = result[chunk]['categories']
        print('\nchunk: ', chunk)
        for category in categories:
            print('label: ', category['label'], ', score: ', category['score']) #add table instead of prints


In [70]:
file_list = ['sample1-addresschange-positive.ogg',
             'sample2-address-negative.ogg',
             'sample3-shirt-return-weather-chitchat.ogg',
             'sample4-angryblender-sportschitchat-recovery.ogg',
             'sample5-calibration-toneandcontext.ogg',
             'jfk_1961_0525_speech_to_put_man_on_moon.ogg',
             'May 1 1969 Fred Rogers testifies before the Senate Subcommittee on Communications.ogg']

features = {"concepts":{},"entities":{},"keywords":{},"categories":{},"emotion":{},"sentiment":{},"semantic_roles":{} }

In [71]:
result = analyze_transcript(features, file_list[0])

post_analysis(result)



chunk:  of said no other changes the only thing that I want to change is the address yes that's right yep very good yes thank you so much for help it
label:  /shopping/resources/contests and freebies , score:  0.226511
label:  /law, govt and politics/legal issues/legislation/tax laws , score:  0.16996
label:  /business and industrial , score:  0.169616

chunk:  is five five five one two one two yes that's me my old address is number one two three oak street my new address is five six seven pine street
label:  /business and industrial , score:  0.22775
label:  /real estate/apartments , score:  0.147439
label:  /travel/tourist facilities/hotel , score:  0.129948

chunk:  yes and the zip is nine zero two one zero yep that's right now the phone number stays the same that's right I would like to keep all the options
label:  /technology and computing/consumer electronics/telephones/mobile phones , score:  0.165103
label:  /travel/tourist destinations/mexico and central america , score:  0.1

In [72]:
results = analyze_transcript(features, file_list[6])

post_analysis(results)


WatsonApiException: Error: invalid request: content is empty, Code: 400