![alt text](https://github.com/SrStone/BlueMixJar/blob/master/ML_Hub_Logo.png?raw=true)

# IBM Watson Natural Language Understanding (NLU) 

## Introduction

IBM Watson Natural Language Understanding(NLU) uses natural language processing to analyze semantic features of any text. Provide plain text, HTML, or a public URL, and NLU returns results for the features you specify. The following language analysis can be performed:

- **Concepts**: Identify general concepts that are referenced or alluded to in your content. Concepts that are detected typically have an associated link to a DBpedia resource.
- **Entities**: Detect important people, places, geopolitical entities and other types of entities in your content. Entity detection recognizes consecutive coreferences of each entity. 
- **Keywords**: Determine the most important keywords in your content. Keyword phrases are organized by relevance in the results.
- **Categories**: Categorize your content into a hierarchical 5-level taxonomy. For example, "Leonardo DiCaprio won an Oscar" returns "/art and entertainment/movies and tv/movies" as the most confident classification.
- **Sentiment**: Determine whether your content conveys postive or negative sentiment. Sentiment information can be returned for detected entities, keywords, or user-specified target phrases found in the text.
- **Emotion**: Detect anger, disgust, fear, joy, or sadness that is conveyed by your content. Emotion information can be returned for detected entities, keywords, or user-specified target phrases found in the text.
- **Relations**: Recognize when two entities are related, and identify the type of relation. For example, you can identify an "awardedTo" relation between an award and its recipient.
- **Semantic Role**: sParse sentences into subject-action-object form, and identify entities and keywords that are subjects or objects of an action.
- **Metadata**: Get author information, publication date, and the title of your text/HTML content.

Refer to the [Online API Doc](https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/?curl#get-analyze
) for the detailed API and how to use them. 


-----------------------
### Preparation 

1) Create a NLU service instance in Bluemix [Log in and create the service](https://www.ibm.com/watson/developercloud/doc/natural-language-understanding/getting-started.html)

2) Obtain the NLU service credentials (username, password, version) from Bluemix dashboard

2) To use this service in Python notebook, install the client library by running this command in the notebook:

`!pip install --upgrade watson_developer_cloud`



In [4]:
!pip install --upgrade watson_developer_cloud

Requirement already up-to-date: watson_developer_cloud in /gpfs/global_fs01/sym_shared/YS1Prod/user/s3f8-8f0f0b99f11709-9d303a809961/.local/lib/python2.7/site-packages
Requirement already up-to-date: pyOpenSSL>=16.2.0 in /gpfs/global_fs01/sym_shared/YS1Prod/user/s3f8-8f0f0b99f11709-9d303a809961/.local/lib/python2.7/site-packages (from watson_developer_cloud)
Requirement already up-to-date: requests<3.0,>=2.0 in /usr/local/src/bluemix_jupyter_bundle.v75/notebook/lib/python2.7/site-packages (from watson_developer_cloud)
Requirement already up-to-date: pysolr<4.0,>=3.3 in /gpfs/global_fs01/sym_shared/YS1Prod/user/s3f8-8f0f0b99f11709-9d303a809961/.local/lib/python2.7/site-packages (from watson_developer_cloud)
Requirement already up-to-date: cryptography>=1.9 in /gpfs/global_fs01/sym_shared/YS1Prod/user/s3f8-8f0f0b99f11709-9d303a809961/.local/lib/python2.7/site-packages (from pyOpenSSL>=16.2.0->watson_developer_cloud)
Requirement already up-to-date: six>=1.5.2 in /gpfs/global_fs01/sym_shar

### Step 1: Connecting to the service instance


In [5]:
import json
from watson_developer_cloud import NaturalLanguageUnderstandingV1
import watson_developer_cloud.natural_language_understanding.features.v1 as Features

# Replace the username/password here with yours in your credentials
natural_language_understanding = NaturalLanguageUnderstandingV1(
  username="630adbe6-5bd6-460d-b4ee-a783429a3db3",
  password="*rgYQISchQnj",
  version="2017-02-27")

## Step 2: analyzing semantic features of any text


**Concepts**: Identify general concepts that are referenced or alluded to in your content. Concepts that are detected typically have an associated link to a DBpedia resource.

In [6]:
response = natural_language_understanding.analyze(
  text="Machine learning is a method of data analysis that automates \
  analytical model building.Using algorithms that iteratively learn from\
  data, machine learning allows computers to find hidden insights without\
  being explicitly programmed where to look.",
  features=[
    Features.Concepts(
      limit=3
    )
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 250, 
    "features": 1, 
    "text_units": 1
  }, 
  "language": "en", 
  "concepts": [
    {
      "relevance": 0.966928, 
      "text": "Machine learning", 
      "dbpedia_resource": "http://dbpedia.org/resource/Machine_learning"
    }, 
    {
      "relevance": 0.850526, 
      "text": "Computer", 
      "dbpedia_resource": "http://dbpedia.org/resource/Computer"
    }, 
    {
      "relevance": 0.832159, 
      "text": "Computer program", 
      "dbpedia_resource": "http://dbpedia.org/resource/Computer_program"
    }
  ]
}


**Categories**: Categorize your content into a hierarchical 5-level taxonomy. For example, "Leonardo DiCaprio won an Oscar" returns "/art and entertainment/movies and tv/movies" as the most confident classification.

In [7]:
response = natural_language_understanding.analyze(
  url="www.td.com",
  features=[
    Features.Categories()
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 4540, 
    "features": 1, 
    "text_units": 1
  }, 
  "categories": [
    {
      "score": 0.935495, 
      "label": "/finance/bank"
    }, 
    {
      "score": 0.512826, 
      "label": "/finance/bank/bank account"
    }, 
    {
      "score": 0.433517, 
      "label": "/business and industrial"
    }
  ], 
  "language": "en", 
  "retrieved_url": "https://www.tdbank.com"
}


**Entities**: Detect important people, places, geopolitical entities and other types of entities in your content. Entity detection recognizes consecutive coreferences of each entity.

In [8]:
response = natural_language_understanding.analyze(
  url="www.cnn.com",
  features=[
    Features.Entities(
        sentiment=True,
        limit=2
    )
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 2728, 
    "features": 1, 
    "text_units": 1
  }, 
  "entities": [
    {
      "count": 9, 
      "sentiment": {
        "score": 0.0
      }, 
      "text": "CNN", 
      "disambiguation": {
        "subtype": [
          "Broadcast", 
          "AwardWinner", 
          "RadioNetwork", 
          "TVNetwork"
        ], 
        "name": "CNN", 
        "dbpedia_resource": "http://dbpedia.org/resource/CNN"
      }, 
      "relevance": 0.781907, 
      "type": "Company"
    }, 
    {
      "count": 2, 
      "sentiment": {
        "score": 0.0
      }, 
      "text": "CNN", 
      "disambiguation": {
        "subtype": [
          "Broadcast", 
          "AwardWinner", 
          "Company", 
          "RadioNetwork", 
          "TVNetwork", 
          "TelevisionStation"
        ], 
        "name": "CNN", 
        "dbpedia_resource": "http://dbpedia.org/resource/CNN"
      }, 
      "relevance": 0.498556, 
      "type": "Broadcaster"
    }
  ], 
 

**Sentiment**: Determine whether your content conveys postive or negative sentiment. Sentiment information can be returned for detected entities, keywords, or user-specified target phrases found in the text.

In [9]:
response = natural_language_understanding.analyze(
  text="TD is a great company",
  features=[
    Features.Keywords(
      emotion=True,
      sentiment=True,  
      limit=2
    )
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 21, 
    "features": 1, 
    "text_units": 1
  }, 
  "keywords": [
    {
      "relevance": 0.976062, 
      "text": "TD", 
      "emotion": {
        "anger": 0.018518, 
        "joy": 0.843626, 
        "sadness": 0.030468, 
        "fear": 0.033944, 
        "disgust": 0.018826
      }, 
      "sentiment": {
        "score": 0.801133, 
        "label": "positive"
      }
    }, 
    {
      "relevance": 0.82803, 
      "text": "great company", 
      "emotion": {
        "anger": 0.018518, 
        "joy": 0.843626, 
        "sadness": 0.030468, 
        "fear": 0.033944, 
        "disgust": 0.018826
      }, 
      "sentiment": {
        "score": 0.801133, 
        "label": "positive"
      }
    }
  ], 
  "language": "en"
}


### Step 3: Analyzing a press release from the Federal Reserve

Url: https://www.federalreserve.gov/newsevents/pressreleases/bcreg20170622a.htm

Search the keywords from the news and perform emotion/sentiment analysis.
- Define the url wanted for analysis
- Define the parameters for NLU
  * `Features.Keywords` means keywords analysis
  * Set `emotion` and `sentiment` to `true` to allow emotion/sentiment analysis
  * `limit=2` means only detecting two keywords

In [10]:
response = natural_language_understanding.analyze(
  url="https://www.federalreserve.gov/newsevents/pressreleases/bcreg20170622a.htm",
  features=[
    Features.Keywords(
      emotion=True,
      sentiment=True,  
      limit=2
    )
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 2998, 
    "features": 1, 
    "text_units": 1
  }, 
  "keywords": [
    {
      "relevance": 0.951286, 
      "text": "bank holding companies", 
      "emotion": {
        "anger": 0.061719, 
        "joy": 0.095852, 
        "sadness": 0.228201, 
        "fear": 0.136248, 
        "disgust": 0.013403
      }, 
      "sentiment": {
        "score": -0.566024, 
        "label": "negative"
      }
    }, 
    {
      "relevance": 0.847884, 
      "text": "Federal Reserve", 
      "emotion": {
        "anger": 0.140472, 
        "joy": 0.124019, 
        "sadness": 0.186927, 
        "fear": 0.320088, 
        "disgust": 0.010946
      }, 
      "sentiment": {
        "score": 0.0, 
        "label": "neutral"
      }
    }
  ], 
  "language": "en", 
  "retrieved_url": "https://www.federalreserve.gov/newsevents/pressreleases/bcreg20170622a.htm"
}


### Insights from output:
- **text**: Keyword text. **"bank holding companies"** and **"Federal Reserve"** are defined as the keywords
- **relevance**: Keyword relevance score with the document. A 0 means it's not relevant, and a 1 means it's highly relevant.
- **sentiment**: Sentiment score within the document for the concept ranging from -1 to 1. Negative scores indicate negative sentiment, and positive scores indicate positive sentiment.
- **emotion**: emotion score within the document for the concept ranging from -1 to 1. Negative scores indicate negative sentiment, and positive scores indicate positive sentiment.

### Step 4: Sentiment analyze a press release from the Federal Reserve with target phrases

Analyze the sentiment toward specific target phrases found in the url:
- Define the url wanted for analysis
- Define the parameters for NLU
  * `Features.Sentiment` means sentiment analysis
  * Set target phrases for analysis: "economy" and "common equity capital"

In [11]:
response = natural_language_understanding.analyze(
  url="https://www.federalreserve.gov/newsevents/pressreleases/bcreg20170622a.htm",
  features=[
    Features.Sentiment(
      # Sentiment option
      targets=["economy", "common equity capital"]
    )
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 2998, 
    "features": 1, 
    "text_units": 1
  }, 
  "language": "en", 
  "sentiment": {
    "document": {
      "score": -0.424257, 
      "label": "negative"
    }, 
    "targets": [
      {
        "text": "economy", 
        "score": -0.467679, 
        "label": "negative"
      }, 
      {
        "text": "common equity capital", 
        "score": -0.436162, 
        "label": "negative"
      }
    ]
  }, 
  "retrieved_url": "https://www.federalreserve.gov/newsevents/pressreleases/bcreg20170622a.htm"
}


### Insights from output:
- **document**: Document-level sentiment analysis results. It's negative with score value -0.424257.
- **targets**: Array of target analysis results. Each object contains the text of the target, sentiment score, and a label. Both of the two phrases are identified as negative in the document.

### Step 5: Analysis from a string of words

Search the keywords from the news and perform emotion/sentiment analysis.
- Define a piece of text wanted for analysis(Pasted from a url, [here](https://seekingalpha.com/article/4081708-june-fomc-announcement-rate-hike-balance-sheet-plans), the second paragraph)
- Define the parameters for NLU
  * `Features.Sentiment` means sentiment analysis
  * Set target phrases for analysis: "FOMC" and "experts"

In [12]:
response = natural_language_understanding.analyze(
  text="There are still concerns at the FOMC (and in monetary officialdom in general) \
  that the devaluation of our purchasing power is not occurring rapidly enough. From its\
  own statistics, which exclude things most important to consumers such as food and energy, \
  price inflation dipped a bit to 1.7%. This, of course, is an utter outrage to the experts.",
  features=[
    Features.Sentiment(
      # Sentiment option
      targets=["FOMC", "experts"]
    )
  ]
)

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_characters": 350, 
    "features": 1, 
    "text_units": 1
  }, 
  "language": "en", 
  "sentiment": {
    "document": {
      "score": -0.885821, 
      "label": "negative"
    }, 
    "targets": [
      {
        "text": "FOMC", 
        "score": -0.667683, 
        "label": "negative"
      }, 
      {
        "text": "experts", 
        "score": -0.701179, 
        "label": "negative"
      }
    ]
  }
}


### Insights from output:
- **document**: Document-level sentiment analysis results. It's negative with score value -0.885821.
- **targets**: Both of the two phrases are identified as negative in the words.