# Project 3 - Using Natural Language Understanding and Sentiment

> <font color=red>**NOTE:** This Jupyter Notebook is a save-as from an Existing IBM NoteBook from the GitHub of Scott D'Angelo with minor modifications made by me. Also the order of flow of the NoteBook will be off, due to API call limitations by IBM Watson NLU Program. This meant I could not rerun the notebook to get the code to be chronological.</font>

In this portion of the workshop, we'll use an instance of [Watson Natural Language Understanding](https://cloud.ibm.com/catalog/services/natural-language-understanding) to gather insights into data.

Watson Natural Language Understanding is a cloud native product that uses deep learning to extract metadata from text such as entities, keywords, categories, sentiment, emotion, relations, and syntax.
There is a rich [API](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python) that we will use along with the [Watson Python SDK](https://github.com/watson-developer-cloud/python-sdk) to analyze our data.

## Contents

- [1.0 Setup - install modules](#setup)
- [2.0 Test NLU APIs](#test)
- [3.0 Import Data and Setup Pandas Dataframe ](#pandas)
- [4.0 Clean and Prepare Data for NLU scoring](#clean)
- [5.0 Get Sentiment by Row](#sentiment-row)

## 1.0 Setup - Install Modules<a name="setup"></a>

We use the [Watson Python SDK](https://github.com/watson-developer-cloud/python-sdk) to access the [NLU APIs](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python) programatically.

Import python modules from the Watson Python SDKs

In [1]:
import json
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 import Features,CategoriesOptions,EmotionOptions,KeywordsOptions

In [94]:
# Warning
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
from datetime import datetime

### 1.1 Add NLU credentials
Get the [IAM Authentication Key](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#authentication) and [Service URL](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#service-endpoint) that you obtained when you [Created a Watson NLU instance](https://github.ibm.com/IBMDeveloper/python-and-analytics/tree/addNLU/workshop/natural-language-understanding#create-a-watson-nlu-instance).

Add your [IAM Authentication Key](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#authentication) below.

In [15]:
IAM_KEY = 'qIuo8c51R0YFODWhAuVGpiF0cdb-QUe239zy2X--L0rN'

Add your [NLU Service URL](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#service-endpoint) below

In [16]:
SERVICE_URL = 'https://api.us-south.natural-language-understanding.watson.cloud.ibm.com/instances/b3f729b2-7149-455c-902b-eea149f75e49'

## 2.0 Test NLU APIs <a name="test"></a>
Run a quick check to make sure everything is working. We'll use a [basic web page](https://www.ibm.com) to see how Watson Natural Language Understanding can extract categories when given a URL. [This example](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#categories) comes from the Watson NLU documentation.

In [17]:
authenticator = IAMAuthenticator(IAM_KEY)
natural_language_understanding = NaturalLanguageUnderstandingV1(version='2020-08-01',authenticator=authenticator)

natural_language_understanding.set_service_url(SERVICE_URL)

response = natural_language_understanding.analyze(
    url='www.ibm.com',
    features=Features(categories=CategoriesOptions(limit=3))).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 1358,
    "features": 1
  },
  "retrieved_url": "https://www.ibm.com/us-en/?ar=1",
  "language": "en",
  "categories": [
    {
      "score": 0.927228,
      "label": "/technology and computing/operating systems"
    },
    {
      "score": 0.907285,
      "label": "/business and industrial/business software"
    },
    {
      "score": 0.889748,
      "label": "/technology and computing/hardware/computer"
    }
  ]
}


## 3.0 Import Data and Setup Pandas Dataframe<a name="pandas"></a>

In [5]:
df = pd.read_csv('/Users/macbook/Google Drive/0. Ofilispeaks Business (Mac and Cloud)/9. Data Science/0. Python/General Assembly Training/Project 3/data/ibm_watson.csv')
df.head(5)

Unnamed: 0,subreddit,title,status_char_length,status_word_count
0,ProCreate,Giving this a go!,17,4
1,ProCreate,Recently got an iPad and have never done digit...,109,21
2,ProCreate,Occasionally can't draw in specific spots?,42,6
3,ProCreate,Day 1 • 365 challenge,21,5
4,ProCreate,First finished painting in procreate! Trying f...,71,11


## 4.0 Clean and Prepare data for NLU scoring<a name="clean"></a>

Now, let's look for something that we can use with Watson NLU to derive an analysis of the sentiment of the customer feedback.

In [20]:
df[df['status_word_count'] > 5]['subreddit'].value_counts()

AdobeIllustrator    3595
ProCreate           3443
Name: subreddit, dtype: int64

In [25]:
df_five = df[df['status_word_count'] > 5]

In [26]:
df_five.reset_index(inplace = True, drop = True)

In [27]:
df_five.head(1357)

Unnamed: 0,subreddit,title,status_char_length,status_word_count
0,ProCreate,Recently got an iPad and have never done digit...,109,21
1,ProCreate,Occasionally can't draw in specific spots?,42,6
2,ProCreate,First finished painting in procreate! Trying f...,71,11
3,ProCreate,"I just bought an ipad, and downloaded Procreat...",106,19
4,ProCreate,First piece with Procreate– constructive feedb...,58,7


In [28]:
print (df_five['title'][0])

Recently got an iPad and have never done digital art before. Not perfect, but I think it’s ok for a beginner.


That looks like what we want. Now, we'll create a list to hold the `responses`, call Watson NLU with the data and then populate the responses list. We'll do the same with a list called `normalize` that we can use along with [json_normalize()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html).

## 5.0 Get Sentiment by Row<a name="sentiment-row"></a>
Now, let's derive some sentiment and emotion information on a per-row basis, to provide more granualarity.
The number of API calls that you can make to Watson NLU is [rate limited and dependent on your service plan](https://cloud.ibm.com/catalog/services/natural-language-understanding).

In [38]:
responses = []
normalize = []
for index, row in df_five.iterrows():
    
    response = natural_language_understanding.analyze(
    text = row['title'],
        language = 'en',
    features=Features(keywords=KeywordsOptions(sentiment=True,emotion=True,limit=1))).get_result()
    normalize.append(pd.json_normalize(response['keywords']))
    responses.append(response)

Add the `responses` list and the `normalize` to the df_rows dataframe. We can continue to use these new data features, but more commonly we'll derive new dataframes for our experiments and change those new dataframes instead.

In [95]:
warnings.filterwarnings('ignore')
df_five['response'] = responses
df_five.head()

Unnamed: 0,subreddit,title,status_char_length,status_word_count,response,normalized,anger,sentiment.score,sadness,joy,fear,disgust,keyword,relevance_score,sentiment_score
0,ProCreate,Recently got an iPad and have never done digit...,109,21,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.sc...,0,-0.701735,0,0,0,0,0,0,0
1,ProCreate,Occasionally can't draw in specific spots?,42,6,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment...,0,0.0,0,0,0,0,0,0,0
2,ProCreate,First finished painting in procreate! Trying f...,71,11,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.score ...,0,0.0,0,0,0,0,0,0,0
3,ProCreate,"I just bought an ipad, and downloaded Procreat...",106,19,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.sco...,0,0.0,0,0,0,0,0,0,0
4,ProCreate,First piece with Procreate– constructive feedb...,58,7,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.sc...,0,0.932697,0,0,0,0,0,0,0


In [96]:
warnings.filterwarnings('ignore')
df_five['normalized'] = normalize
df_five.head()

Unnamed: 0,subreddit,title,status_char_length,status_word_count,response,normalized,anger,sentiment.score,sadness,joy,fear,disgust,keyword,relevance_score,sentiment_score
0,ProCreate,Recently got an iPad and have never done digit...,109,21,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.sc...,0,-0.701735,0,0,0,0,0,0,0
1,ProCreate,Occasionally can't draw in specific spots?,42,6,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment...,0,0.0,0,0,0,0,0,0,0
2,ProCreate,First finished painting in procreate! Trying f...,71,11,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.score ...,0,0.0,0,0,0,0,0,0,0
3,ProCreate,"I just bought an ipad, and downloaded Procreat...",106,19,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.sco...,0,0.0,0,0,0,0,0,0,0
4,ProCreate,First piece with Procreate– constructive feedb...,58,7,"{'usage': {'text_units': 1, 'text_characters':...",text relevance count sentiment.sc...,0,0.932697,0,0,0,0,0,0,0


In [44]:
# Write the DataFrame you created to a csv called 'predictions.csv'
df_five.to_csv('/data/2_Cleaned_IBM_Data/cleaned_ibm.csv', index=False)
print('Submission CSV is ready!')

Submission CSV is ready!


“I am feeling 😕   today"