# MIE1624H - Watson Analytics - Natural Language Understanding API Tutorial

The Watson Analytics Natural Language Understanding service analyzes provided text (in text,url, or html format) for the specified semantic features.

It can extract entities, concepts, keywords, categories, relations, sentiment, and emotions from provided text.

Custom models can also be created with the Watson Knowledge Studio to detect custom entities and relations.

More information regarding this service can be found in the IBM Watson Documentation: 
https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/?python#introduction

The Watson Language Classifier service also lets you build custom classifiers to classify texts provided that you have training data.

You can pass this a question, and it will return a key with the best matching answer based on the classes it was trained on. 

More information about the classifier service can also be found in the IBM Watson Documention:
https://www.ibm.com/watson/developercloud/natural-language-classifier/api/v1/#introduction

The possible features that can be extracted are as follows:

- **Concepts**: 
    - Returns the concept name, relevance source, and link to the concept's DBpedia page
    - e.g. given ibm.com, returns Social network service, Thomas J. Watson, and Lotus Software
- **Categories**:
    - Categorize your content into a 5-level taxonomy and returns the top 3 categories
- **Emotion**:
    - Detects emotions conveyed by the entire body of text
- **Entities**:
    - Identify people, cities, organizations, and other types of entities present in the provided text
    - Can also specify identify emotions, and sentiments related to entities found
- **Keywords**:
    - Identify important keywords in the text
    - Can also specify identify emotions, and sentiments related to keywords found
- **MetaData**:
    - Get document metadata for html/url inputs such as author name, title, RSS/ATOM feeds, prominent page image, and publication date
- **Relations**:
    - Recognize when two entities are related, and identify the type of relation
    - E.g. "awardedTo" relation might connect the entities "Nobel Prize" and "Albert Einstein"
- **SemanticRoles**:
    - Parse sentence in subject, action, and object form
- **Sentiment**:
    - Analyze the general statement of your content or analyze the sentiment toward specific phrases found in the text

#### Download necessary libraries

In [1]:
!pip install watson-developer-cloud



#### Import necessary libraries

In [2]:
from watson_developer_cloud import NaturalLanguageUnderstandingV1 as NLU
from watson_developer_cloud.natural_language_understanding_v1 \
    import Features, EntitiesOptions, KeywordsOptions, ConceptsOptions,\
        CategoriesOptions, EmotionOptions, SemanticRolesOptions, \
        MetadataOptions, SentimentOptions, RelationsOptions
import json

### Insert API credentials from Bluemix
This is found by going the API's page and looking under 'Service Credentials'. The username and password can be viewed by clicking 'View Credentials' next to your Key Name.

**Copy the credentials and replace the contents of *watson_credentials.json*. Alternatively, you can just copy and paste the keys in the variables for username and password.**

#### Parse watson_credentials.json file

In [3]:
credentials = {}

with open('watson_credentials.json') as f:
    data = json.load(f)
    for k in data.keys():
        try:
            credentials[k] = {
                'username': data[k][0]['credentials']['username'],
                'password': data[k][0]['credentials']['password']
            }
        except KeyError:
            credentials[k] = {
                'api_key': data[k][0]['credentials']['api_key']
            }

#### Get username and password for NLU
Enter API username and password manually or add to watson credentials file.

In [4]:
username = ''
password = ''

if username == '' and password == '':
    username = credentials['natural-language-understanding']['username']
    password = credentials['natural-language-understanding']['password']

## Create Natural Language Classifier Instance

In [5]:
nlu = NLU(
    username=username,
    password=password,
    version='2017-02-27'
)

## Create some sample texts to analyze

In [6]:
sample1 = "can't decide if i should even watch the #democraticdebate or is it not worth the migraine!? #conservative #republican2016 #tcot #ycot"
sample2 = "i would be afraid of taking questions too, if i were up to the crap harper's been up to. #cdnpoli"
sample3 = "IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries."

## Concepts Example

In [7]:
print ("Here we are going to analyze the following string for relevant concepts: \n{}".format(sample3))

Here we are going to analyze the following string for relevant concepts: 
IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.


This function uses the natural language understanding object to analyze the provided text for the top 3 concepts in it:

In [8]:
def getConcepts(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            concepts=ConceptsOptions(
                # Concept Options
                limit=3
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))
    
    concepts = {}
    for i in range(len(response['concepts'])):
       concepts[response['concepts'][i]['text']] = response['concepts'][i]['relevance']
                
    return concepts

Below are the raw results and the results in a dictionary format. This text returns the top 3 relevant concepts as United States, U.S. State, and New York City with a lowest relevance score of 83.5%.

In [9]:
concepts = getConcepts(sample3)
print ('\nDictionary Format:')
concepts

Raw Results: 
{
  "concepts": [
    {
      "relevance": 0.939855,
      "dbpedia_resource": "http://dbpedia.org/resource/United_States",
      "text": "United States"
    },
    {
      "relevance": 0.880671,
      "dbpedia_resource": "http://dbpedia.org/resource/New_York_City",
      "text": "New York City"
    },
    {
      "relevance": 0.835175,
      "dbpedia_resource": "http://dbpedia.org/resource/U.S._state",
      "text": "U.S. state"
    }
  ],
  "usage": {
    "text_characters": 139,
    "features": 1,
    "text_units": 1
  },
  "language": "en"
}

Dictionary Format:


{'New York City': 0.880671, 'U.S. state': 0.835175, 'United States': 0.939855}

## Categories Example

In [10]:
print ("Here we are going to analyze the following string for relevant categories: \n{}".format(sample3))

Here we are going to analyze the following string for relevant categories: 
IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.


This function uses the natural language understanding object to analyze the provided text for the categories that it fits into.

In [11]:
def getCategories(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            categories=CategoriesOptions(
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))
    
    categories = {}
    for i in range(len(response['categories'])):
       categories[response['categories'][i]['label']] = response['categories'][i]['score']

    return categories

Below are the raw results and the results in a dictionary format. This text returns 3 categories: Technology and Computing, Business Operations, and Airlines, all with relatively low scores (max 22.5%).

In [12]:
categories = getCategories(sample3)
print ('\nDictionary Format:')
categories

Raw Results: 
{
  "usage": {
    "text_characters": 139,
    "features": 1,
    "text_units": 1
  },
  "categories": [
    {
      "score": 0.224545,
      "label": "/technology and computing"
    },
    {
      "score": 0.196078,
      "label": "/business and industrial/business operations"
    },
    {
      "score": 0.147978,
      "label": "/travel/transports/air travel/airlines"
    }
  ],
  "language": "en"
}

Dictionary Format:


{'/business and industrial/business operations': 0.196078,
 '/technology and computing': 0.224545,
 '/travel/transports/air travel/airlines': 0.147978}

## Emotion Example

In [13]:
print ("Here we are going to analyze the following string for relevant emotions: \n{}".format(sample1))

Here we are going to analyze the following string for relevant emotions: 
can't decide if i should even watch the #democraticdebate or is it not worth the migraine!? #conservative #republican2016 #tcot #ycot


This function uses the natural language understanding object to analyze the provided text to find the emotion that it is conveying.

In [14]:
def getEmotions(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            emotion=EmotionOptions(
                # Emotion options
                #targets=['politic']
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))
    
    emotions = {}
    
    for k,v in response['emotion']['document']['emotion'].items():
        emotions[k] = v
    
    return emotions

Below are the raw results and the results in a dictionary format. The results features 5 types of emotions and their associated scores. In this case, the strongest emotion was disgust.

In [15]:
emotions = getEmotions(sample1)
print ('\nDictionary Format:')
emotions

Raw Results: 
{
  "emotion": {
    "document": {
      "emotion": {
        "anger": 0.225086,
        "fear": 0.107364,
        "disgust": 0.629703,
        "joy": 0.01363,
        "sadness": 0.366987
      }
    }
  },
  "usage": {
    "text_characters": 133,
    "features": 1,
    "text_units": 1
  },
  "language": "en"
}

Dictionary Format:


{'anger': 0.225086,
 'disgust': 0.629703,
 'fear': 0.107364,
 'joy': 0.01363,
 'sadness': 0.366987}

## Entities Example

In [16]:
print ("Here we are going to analyze the following string for relevant entities: \n{}".format(sample3))

Here we are going to analyze the following string for relevant entities: 
IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.


This function uses the natural language understanding object to analyze the provided text to find relevant entities in the text.

In [17]:
def getEntities(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            entities=EntitiesOptions(
                # Entities Options
                #targets=['politic']
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))
    
    entities = {}
    
    for i in range(len(response['entities'])):
       entities[response['entities'][i]['text']] = response['entities'][i]['relevance']
    
    return entities

Below are the raw results and the results in a dictionary format. This text returns 4 entities that it detected in the text including: IBM, Armok, New York, and United States with relevance scores.

In [18]:
entities = getEntities(sample3)
print ('\nDictionary Format:')
entities

Raw Results: 
{
  "usage": {
    "text_characters": 139,
    "features": 1,
    "text_units": 1
  },
  "entities": [
    {
      "relevance": 0.33,
      "count": 1,
      "disambiguation": {
        "name": "IBM",
        "dbpedia_resource": "http://dbpedia.org/resource/IBM",
        "subtype": [
          "SoftwareLicense",
          "OperatingSystemDeveloper",
          "ProcessorManufacturer",
          "SoftwareDeveloper",
          "CompanyFounder",
          "ProgrammingLanguageDesigner",
          "ProgrammingLanguageDeveloper"
        ]
      },
      "text": "IBM",
      "type": "Company"
    },
    {
      "relevance": 0.33,
      "count": 1,
      "disambiguation": {
        "subtype": [
          "City"
        ]
      },
      "text": "Armok",
      "type": "Location"
    },
    {
      "relevance": 0.33,
      "count": 1,
      "disambiguation": {
        "name": "New York City",
        "dbpedia_resource": "http://dbpedia.org/resource/New_York_City",
        "subtype": 

{'Armok': 0.33, 'IBM': 0.33, 'New York': 0.33, 'United States': 0.33}

## Keywords Example

In [19]:
print ("Here we are going to analyze the following string for relevant keywords: \n{}".format(sample2))

Here we are going to analyze the following string for relevant keywords: 
i would be afraid of taking questions too, if i were up to the crap harper's been up to. #cdnpoli


This function uses the natural language understanding object to analyze the provided text to find relevant keywords with its related emotions and sentiments in the text.

In [20]:
def getKeywords(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            keywords = KeywordsOptions(
              # Keywords Option
              emotion=True,
              sentiment=True,
              limit=3
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))
    
    keywords = {}
    for keyword in response['keywords']:
        emotions = {}
        if 'emotion' in keyword:
            for k,v in keyword['emotion'].items():
                emotions[k] = v
        sentiment = {}
        if 'sentiment' in keyword:
            if 'label' in keyword['sentiment']:
                sentiment[keyword['sentiment']['label']] = keyword['sentiment']['score']
            else:
                sentiment = keyword['sentiment']['score']
                      
        keywords[keyword['text']] = {}
        keywords[keyword['text']]['relevance'] = keyword['relevance']
        keywords[keyword['text']]['emotions'] = emotions
        keywords[keyword['text']]['sentiment'] = sentiment
        
    return keywords

Below are the raw results and the results in a dictionary format. This text returns two relevant keywords: crap harper, and questions. Crap hraper is highly relevant with a highly negative sentiment score. Questions is less relevant but has a high score for fear, and highly negative sentiment value.

In [21]:
keywords = getKeywords(sample2)
print ('\nDictionary Format:')
keywords

Raw Results: 
{
  "usage": {
    "text_characters": 97,
    "features": 1,
    "text_units": 1
  },
  "keywords": [
    {
      "relevance": 0.993118,
      "text": "crap harper",
      "sentiment": {
        "score": -0.832073,
        "label": "negative"
      }
    },
    {
      "relevance": 0.483106,
      "emotion": {
        "anger": 0.018854,
        "fear": 0.719337,
        "disgust": 0.028256,
        "joy": 0.010041,
        "sadness": 0.049893
      },
      "text": "questions",
      "sentiment": {
        "score": -0.850843,
        "label": "negative"
      }
    }
  ],
  "language": "en"
}

Dictionary Format:


{'crap harper': {'emotions': {},
  'relevance': 0.993118,
  'sentiment': {'negative': -0.832073}},
 'questions': {'emotions': {'anger': 0.018854,
   'disgust': 0.028256,
   'fear': 0.719337,
   'joy': 0.010041,
   'sadness': 0.049893},
  'relevance': 0.483106,
  'sentiment': {'negative': -0.850843}}}

## Relations Example

In [22]:
print ("Here we are going to analyze the following string for relevant relations: \n{}".format(sample3))

Here we are going to analyze the following string for relevant relations: 
IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.


This function uses the natural language understanding object to analyze the provided text to find relevant entities in the text.

In [23]:
def getRelations(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            relations=RelationsOptions(
                # Relations Options
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))

Below are the raw results. The input text returns relations in the text such as "multinational technology company" is "basedIn" "American". 

In [24]:
getRelations(sample3)

Raw Results: 
{
  "relations": [
    {
      "score": 0.385011,
      "arguments": [
        {
          "entities": [
            {
              "text": "multinational technology company",
              "type": "Organization"
            }
          ],
          "text": "multinational technology company",
          "location": [
            19,
            51
          ]
        },
        {
          "entities": [
            {
              "disambiguation": {
                "subtype": [
                  "Country"
                ]
              },
              "text": "American",
              "type": "GeopoliticalEntity"
            }
          ],
          "text": "American",
          "location": [
            10,
            18
          ]
        }
      ],
      "sentence": "IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.",
      "type": "basedIn"
    },
    {
      "score": 0.5071

## Semantic Roles Example

In [25]:
print ("Here we are going to analyze the following string for relevant semantic roles: \n{}".format(sample3))

Here we are going to analyze the following string for relevant semantic roles: 
IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.


This function uses the natural language understanding object to analyze the provided text to find relevant semantic roles in the text.

In [26]:
def getSemanticRoles(text):
    response=nlu.analyze(
        text=text,
        features=Features(
            semantic_roles=SemanticRolesOptions(
                # Semantic Role Options
                entities=True,
                keywords=True,
                limit=50
            )
        )
    )
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))

Below are the raw results. From this it can be seen that the action in the sentence is "is", the verb is "be", etc.

In [27]:
getSemanticRoles(sample3)

Raw Results: 
{
  "usage": {
    "text_characters": 139,
    "features": 1,
    "text_units": 1
  },
  "semantic_roles": [
    {
      "object": {
        "entities": [
          {
            "disambiguation": {
              "subtype": [
                "City"
              ]
            },
            "text": "Armok",
            "type": "Location"
          },
          {
            "disambiguation": {
              "name": "New York City",
              "dbpedia_resource": "http://dbpedia.org/resource/New_York_City",
              "subtype": [
                "PoliticalDistrict",
                "GovernmentalJurisdiction",
                "PlaceWithNeighborhoods",
                "WineRegion",
                "CityTown",
                "FilmScreeningVenue",
                "City"
              ]
            },
            "text": "New York",
            "type": "Location"
          },
          {
            "disambiguation": {
              "name": "United States",
            

## Sentiment Example

In [28]:
print ("Here we are going to analyze the following string for relevant sentiments: \n{}".format(sample3))

Here we are going to analyze the following string for relevant sentiments: 
IBM is an American multinational technology company headquartered in Armok, New York, United States, with operations in over 170 countries.


This function uses the natural language understanding object to analyze the provided text to find the relevant sentiment score

In [29]:
def getSentiments(text):
    try:
        response=nlu.analyze(
            text=text,
            features=Features(
                sentiment=SentimentOptions(
                    # Sentiment Options
                )
            )
        )
    except:
        response = []
        
    print ('Raw Results: ')
    print(json.dumps(response, indent=2))
        
    return response['sentiment']['document']['score']

Below are the raw results and the relevant sentiment score for the provided text. Both tweets were classified as negative with really negative sentiment scores.

In [30]:
sentiment = getSentiments(sample1)
print ("\nThe following text has a sentiment score of {}: {} ".format(sentiment,sample1))

Raw Results: 
{
  "usage": {
    "text_characters": 133,
    "features": 1,
    "text_units": 1
  },
  "language": "en",
  "sentiment": {
    "document": {
      "score": -0.939078,
      "label": "negative"
    }
  }
}

The following text has a sentiment score of -0.939078: can't decide if i should even watch the #democraticdebate or is it not worth the migraine!? #conservative #republican2016 #tcot #ycot 


In [31]:
sentiment = getSentiments(sample2)
print ("\nThe following text has a sentiment score of {}: {} ".format(sentiment,sample2))

Raw Results: 
{
  "usage": {
    "text_characters": 97,
    "features": 1,
    "text_units": 1
  },
  "language": "en",
  "sentiment": {
    "document": {
      "score": -0.924145,
      "label": "negative"
    }
  }
}

The following text has a sentiment score of -0.924145: i would be afraid of taking questions too, if i were up to the crap harper's been up to. #cdnpoli 
