# Prolexitim Analytics (NLP)
## IBM Watson Natural Language Understanding (NLU) API Client
### NLU Sentiment and Emotion from Prolexitim NLP experiments. 
<hr>
May 2019.<br> Prolexitim dataset version 1.0 (MPGS-TFM-Submission).<br> 
Raúl Arrabales Moreno (Psicobótica / Serendeepia Research)<br>
<a target="_blank" href="http://www.conscious-robots.com/">http://www.conscious-robots.com/</a> <br>
<hr>

### Using IBM Watson APIs python client 
Prolextimin NLP experiments require automatic sentiment and emotion tagging.<br> 
This client is using the IBM NLU API to tag documents from prolexitim NLP narratives<br> 
(written by participants in wave 1 experiments - using TAT cards as stimuli). 
<br>
More info at: https://github.com/watson-developer-cloud/python-sdk <br>



### IBM Watson python api client install:

In [4]:
! pip install --upgrade "ibm-watson>=3.0.3"

Collecting ibm-watson>=3.0.3
  Downloading https://files.pythonhosted.org/packages/af/82/a0a8555e37c6822bc63b6170d6f30d3681c3ec987b3ab9b8f83f9b3887a3/ibm-watson-3.0.4.tar.gz (246kB)
Collecting websocket-client==0.48.0 (from ibm-watson>=3.0.3)
  Downloading https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl (198kB)
Collecting ibm_cloud_sdk_core>=0.2.0 (from ibm-watson>=3.0.3)
  Downloading https://files.pythonhosted.org/packages/50/61/2c197bf3898f3ef74b22adc595a15f315d8c45c5ca7586dbe93b8b947825/ibm-cloud-sdk-core-0.4.2.tar.gz
Building wheels for collected packages: ibm-watson, ibm-cloud-sdk-core
  Building wheel for ibm-watson (setup.py): started
  Building wheel for ibm-watson (setup.py): finished with status 'done'
  Stored in directory: C:\Users\array\AppData\Local\pip\Cache\wheels\b0\b4\42\15563549063278065c74f6fabcd9eae12666230edebbac9a66
  Building wheel for ibm-cloud-sdk-core (set

### NLU Service credentials (Raúl - Private):

In [1]:
'''
{
  "XXXXXXXXXXXXXXXXXXXXXXXXX
  "url": "https://gateway-lon.watsonplatform.net/natural-language-understanding/api"
}
'''

'\n{\n  "XXXXXXXXXXXXXXXXXXXXXXXXX\n  "url": "https://gateway-lon.watsonplatform.net/natural-language-understanding/api"\n}\n'

### IAM Authentication

In [12]:
from ibm_watson import NaturalLanguageUnderstandingV1

natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2018-03-16', # Not sure about version; IBM examples use this: "2018-03-16"
    iam_apikey='8xdScwtU-QTGeJpWGyUO29iahSbJRVP77wfKK6upw4xH',
    url='https://gateway-lon.watsonplatform.net/natural-language-understanding/api'
)

### NLU API - Emotions and Sentiment.
see https://cloud.ibm.com/apidocs/natural-language-understanding 

In [17]:
import json
from ibm_watson.natural_language_understanding_v1 import Features, EmotionOptions, SentimentOptions

### Making one test call and checking response

In [26]:
# Making the call for Emotion and Sentiment
response = natural_language_understanding.analyze(
    text='The man gets up to work, while his lover stays asleep in bed',
    features=Features(emotion=EmotionOptions(),
                      sentiment=SentimentOptions())).get_result()

In [27]:
# Check the result:
print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 60,
    "features": 2
  },
  "sentiment": {
    "document": {
      "score": 0.37321,
      "label": "positive"
    }
  },
  "language": "en",
  "emotion": {
    "document": {
      "emotion": {
        "sadness": 0.216717,
        "joy": 0.290798,
        "fear": 0.349397,
        "disgust": 0.044799,
        "anger": 0.229808
      }
    }
  }
}


In [217]:
# Getting the specific variables from JSON response
response["sentiment"]["document"]["score"]

0.37321

In [218]:
response["sentiment"]["document"]["label"]

'positive'

In [221]:
response["usage"]["text_characters"]

60

In [224]:
response["emotion"]["document"]["emotion"]["sadness"]

0.216717

In [225]:
response["emotion"]["document"]["emotion"]["joy"]

0.290798

In [226]:
response["emotion"]["document"]["emotion"]["fear"]

0.349397

In [228]:
response["emotion"]["document"]["emotion"]["disgust"]

0.044799

In [229]:
response["emotion"]["document"]["emotion"]["anger"]

0.229808

## Loading the dataset 
### Prolexitim Dataset 1.0 (may 2019)
Including dataframe schema check, fixes and adding new columns for emotion and sentiment

In [230]:
# Local file names:
nlp_dataset_path = "D:\\Dropbox-Array2001\\Dropbox\\UNI\\MPGS\\2_TFM\\Datos\\prolexitim-nlp-1.0.csv"
tas_dataset_path = "D:\\Dropbox-Array2001\\Dropbox\\UNI\\MPGS\\2_TFM\\Datos\\prolexitim-tas-1.0.csv"
print(nlp_dataset_path)
print(tas_dataset_path)

D:\Dropbox-Array2001\Dropbox\UNI\MPGS\2_TFM\Datos\prolexitim-nlp-1.0.csv
D:\Dropbox-Array2001\Dropbox\UNI\MPGS\2_TFM\Datos\prolexitim-tas-1.0.csv


In [231]:
# CSV load into pandas dataframes (CSVs saved using UTF-8 Coding)
import pandas as pd 

nlp_df = pd.read_csv(nlp_dataset_path,header=0,delimiter=";") 
tas_df = pd.read_csv(tas_dataset_path,header=0,delimiter=";") 

In [232]:
# Check NLP Dataset
nlp_df.head()

Unnamed: 0,code,card,hum,mode,time,G-score,G-magnitude,Azure-TA,Text,Text-EN
0,b7adde8a9eec8ce92b5ee0507ce054a4,13V,1,T,200000,-2,2,62,Era un niño pensando en el granero pensando a ...,It was a child sitting in the barn and thinkin...
1,b7adde8a9eec8ce92b5ee0507ce054a4,18NM,2,T,200000,-5,5,41,"Una madre que está consolando a su hijo, despu...","A mother who is comforting her son, after givi..."
2,b7adde8a9eec8ce92b5ee0507ce054a4,12VN,0,T,200000,0,12,63,Un pantanal con una barca abandonada. A ver qu...,A swam with an abandoned boat. Let's see what ...
3,76ef63369f7d5b6597a543017e1ef578,12VN,0,T,200000,0,1,89,"Era un paraje muy bonito, con una barca, un po...","It was a beautiful place, with a boat, a littl..."
4,76ef63369f7d5b6597a543017e1ef578,10,2,T,200000,3,1,24,"Era una vez un matrimonio, que se quería muchí...","It was once a husband and wife, who loved each..."


In [233]:
nlp_df.dtypes

code           object
card           object
hum             int64
mode           object
time            int64
G-score        object
G-magnitude    object
Azure-TA       object
Text           object
Text-EN        object
dtype: object

In [234]:
# Check TAS-20 Dataset
tas_df.head()

Unnamed: 0,NLP,Code,TAS20,F1,F2,F3,Tas20Time,Sex,Gender,Age,Dhand,Studies,SClass,Siblings,SibPos,Origin,Resid,Rtime,Ethnic,Job
0,0,be8f0c722d0a0f4cd9d92c503e6f7583,42,16,10,16,254305,1,1,21,1,5,2,6,2,ES,ES,-1,Iberic,Psychology
1,1,608af5455da8c250a87f81a5ed5c1942,55,15,20,15,103425,1,1,42,2,7,2,5,5,ES,ES,-1,Iberic,Psychology
2,1,bc39e22ca5dba59fbd97c27987878f56,40,16,9,15,201637,2,2,22,1,5,2,2,2,ES,ES,-1,Iberic,Psychology
3,0,a2caa2eaccf99705bf39f6aeaee00ee3,40,13,10,17,242202,2,2,22,1,5,2,3,1,ES,ES,-1,Iberic,Psychology
4,1,20cd825cadb95a71763bad06e142c148,40,12,10,18,155945,2,2,22,1,5,2,1,1,ES,ES,-1,Iberic,Psychology


In [235]:
tas_df.dtypes

NLP           int64
Code         object
TAS20         int64
F1            int64
F2            int64
F3            int64
Tas20Time     int64
Sex           int64
Gender        int64
Age           int64
Dhand         int64
Studies       int64
SClass        int64
Siblings      int64
SibPos        int64
Origin       object
Resid        object
Rtime         int64
Ethnic       object
Job          object
dtype: object

In [236]:
# Remove empty column
# nlp_df = nlp_df.drop(columns="Unnamed: 10");
# nlp_df.head()

In [237]:
# NLP dataset schema fixing
# Convert strings into doubles
nlp_df['G-score'] = nlp_df['G-score'].str.replace(',','.',regex=True)
nlp_df['G-score'] = nlp_df['G-score'].astype(float)

nlp_df['G-magnitude'] = nlp_df['G-magnitude'].str.replace(',','.',regex=True)
nlp_df['G-magnitude'] = nlp_df['G-magnitude'].astype(float)

nlp_df['Azure-TA'] = nlp_df['Azure-TA'].str.replace(',','.',regex=True)
nlp_df['Azure-TA'] = nlp_df['Azure-TA'].astype(float)


In [238]:
# check schema again
nlp_df.dtypes

code            object
card            object
hum              int64
mode            object
time             int64
G-score        float64
G-magnitude    float64
Azure-TA       float64
Text            object
Text-EN         object
dtype: object

In [239]:
# Add columns to nlp table for new data from IBM Watson NLU
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-sentiment', value=0.0)
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-label', value="")
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-joy', value=0.0)
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-anger', value=0.0)
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-disgust', value=0.0)
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-sadness', value=0.0)
nlp_df.insert(loc=len(nlp_df.columns), column='nlu-fear', value=0.0)

# Add columns to nlp table for new data from text processing
nlp_df.insert(loc=len(nlp_df.columns), column='es-len', value=0)
nlp_df.insert(loc=len(nlp_df.columns), column='en-len', value=0)


In [240]:
# Adding a unique row id: 
import numpy as np
nlp_df.insert(loc=0, column="RowId", value= np.arange(1, nlp_df["code"].count()+1, 1) )

In [241]:
nlp_df.head()

Unnamed: 0,RowId,code,card,hum,mode,time,G-score,G-magnitude,Azure-TA,Text,Text-EN,nlu-sentiment,nlu-label,nlu-joy,nlu-anger,nlu-disgust,nlu-sadness,nlu-fear,es-len,en-len
0,1,b7adde8a9eec8ce92b5ee0507ce054a4,13V,1,T,200000,-0.2,0.2,0.62,Era un niño pensando en el granero pensando a ...,It was a child sitting in the barn and thinkin...,0.0,,0.0,0.0,0.0,0.0,0.0,0,0
1,2,b7adde8a9eec8ce92b5ee0507ce054a4,18NM,2,T,200000,-0.5,0.5,0.41,"Una madre que está consolando a su hijo, despu...","A mother who is comforting her son, after givi...",0.0,,0.0,0.0,0.0,0.0,0.0,0,0
2,3,b7adde8a9eec8ce92b5ee0507ce054a4,12VN,0,T,200000,0.0,1.2,0.63,Un pantanal con una barca abandonada. A ver qu...,A swam with an abandoned boat. Let's see what ...,0.0,,0.0,0.0,0.0,0.0,0.0,0,0
3,4,76ef63369f7d5b6597a543017e1ef578,12VN,0,T,200000,0.0,0.1,0.89,"Era un paraje muy bonito, con una barca, un po...","It was a beautiful place, with a boat, a littl...",0.0,,0.0,0.0,0.0,0.0,0.0,0,0
4,5,76ef63369f7d5b6597a543017e1ef578,10,2,T,200000,0.3,0.1,0.24,"Era una vez un matrimonio, que se quería muchí...","It was once a husband and wife, who loved each...",0.0,,0.0,0.0,0.0,0.0,0.0,0,0


### Getting NLU estimates for sentiment and emotions
And adding them to the dataframe<br>
Plus some additional simple text metrics

In [261]:
for index, row in nlp_df.tail(n=67).iterrows():
# for index, row in nlp_df.iterrows():
    # print(index)
    # print(row["Text-EN"])
    
    # Get the text in English and Spanish
    doc_en = row["Text-EN"]
    doc_es = row["Text"]
    
    
    # Get the NLU analysis from Watson
    if len(doc_en) > 12:
        response = natural_language_understanding.analyze(
            text = doc_en,
            features = Features(emotion=EmotionOptions(),
                                sentiment=SentimentOptions())).get_result()
        
        # Save the Watson NLU results
        nlp_df.at[index, "nlu-sentiment"] = response["sentiment"]["document"]["score"]
        nlp_df.at[index, "nlu-label"] = response["sentiment"]["document"]["label"] 
        nlp_df.at[index, "nlu-joy"] = response["emotion"]["document"]["emotion"]["joy"]
        nlp_df.at[index, "nlu-anger"] = response["emotion"]["document"]["emotion"]["anger"]
        nlp_df.at[index, "nlu-disgust"] = response["emotion"]["document"]["emotion"]["disgust"]
        nlp_df.at[index, "nlu-fear"] = response["emotion"]["document"]["emotion"]["fear"]
        nlp_df.at[index, "nlu-sadness"] = response["emotion"]["document"]["emotion"]["sadness"]
    
        # Save the text length
        nlp_df.at[index, "es-len"] = len(doc_es) 
        nlp_df.at[index, "en-len"] = len(doc_en)
    else:
        print("Text too short: " + doc_en)
        
    print("Row " + str(index) + " processed.")

Row 267 processed.
Row 268 processed.
Row 269 processed.
Row 270 processed.
Row 271 processed.
Row 272 processed.
Row 273 processed.
Row 274 processed.
Row 275 processed.
Row 276 processed.
Row 277 processed.
Row 278 processed.
Row 279 processed.
Row 280 processed.
Row 281 processed.
Row 282 processed.
Row 283 processed.
Row 284 processed.
Row 285 processed.
Row 286 processed.
Row 287 processed.
Row 288 processed.
Row 289 processed.
Row 290 processed.
Row 291 processed.
Row 292 processed.
Row 293 processed.
Row 294 processed.
Row 295 processed.
Row 296 processed.
Row 297 processed.
Row 298 processed.
Row 299 processed.
Row 300 processed.
Row 301 processed.
Row 302 processed.
Row 303 processed.
Row 304 processed.
Row 305 processed.
Row 306 processed.
Row 307 processed.
Row 308 processed.
Row 309 processed.
Row 310 processed.
Row 311 processed.
Row 312 processed.
Row 313 processed.
Row 314 processed.
Row 315 processed.
Row 316 processed.
Row 317 processed.
Row 318 processed.
Row 319 proc

In [263]:
nlp_df

Unnamed: 0,RowId,code,card,hum,mode,time,G-score,G-magnitude,Azure-TA,Text,Text-EN,nlu-sentiment,nlu-label,nlu-joy,nlu-anger,nlu-disgust,nlu-sadness,nlu-fear,es-len,en-len
0,1,b7adde8a9eec8ce92b5ee0507ce054a4,13V,1,T,200000,-0.2,0.2,0.62,Era un niño pensando en el granero pensando a ...,It was a child sitting in the barn and thinkin...,-0.640157,negative,0.317920,0.143086,0.422023,0.173421,0.098997,115,124
1,2,b7adde8a9eec8ce92b5ee0507ce054a4,18NM,2,T,200000,-0.5,0.5,0.41,"Una madre que está consolando a su hijo, despu...","A mother who is comforting her son, after givi...",0.000000,neutral,0.285100,0.168727,0.057098,0.362623,0.109176,110,115
2,3,b7adde8a9eec8ce92b5ee0507ce054a4,12VN,0,T,200000,0.0,1.2,0.63,Un pantanal con una barca abandonada. A ver qu...,A swam with an abandoned boat. Let's see what ...,0.265769,positive,0.039779,0.205065,0.244164,0.164005,0.481812,93,96
3,4,76ef63369f7d5b6597a543017e1ef578,12VN,0,T,200000,0.0,0.1,0.89,"Era un paraje muy bonito, con una barca, un po...","It was a beautiful place, with a boat, a littl...",-0.353556,negative,0.208997,0.007244,0.008434,0.698307,0.190991,255,244
4,5,76ef63369f7d5b6597a543017e1ef578,10,2,T,200000,0.3,0.1,0.24,"Era una vez un matrimonio, que se quería muchí...","It was once a husband and wife, who loved each...",-0.552068,negative,0.367801,0.063256,0.095947,0.469062,0.103351,184,184
5,6,76ef63369f7d5b6597a543017e1ef578,1,1,T,200000,-0.3,0.9,0.34,Erase una vez un niño que se encontraba muy tr...,Once upon a time there was a child who was ver...,-0.865733,negative,0.006273,0.065443,0.113949,0.858103,0.208772,284,284
6,7,3a7bc6a0450eda9cc016324a2ee5b749,3VH,1,T,200000,-0.1,0.3,0.16,Alguien que está triste o cansado de algo y es...,Someone who is sad or tired of something and i...,-0.952534,negative,0.004444,0.069648,0.089484,0.938321,0.118041,156,156
7,8,3a7bc6a0450eda9cc016324a2ee5b749,11,0,T,200000,0.2,0.2,0.82,Erase una vez un bosque encantado en el cual n...,Once upon a time there was an enchanted forest...,0.000000,neutral,0.561453,0.058646,0.087763,0.215142,0.172113,168,167
8,9,3a7bc6a0450eda9cc016324a2ee5b749,13N,1,T,200000,0.8,0.8,0.69,Erase una vez un niño que le gustaba descubrir...,Once upon a time there was a child who liked t...,0.000000,neutral,0.529544,0.084414,0.044831,0.310398,0.095722,223,231
9,10,4509cf6e9d9a624a3a809bf96cfbdbd7,3VH,1,T,200000,-0.1,0.3,0.45,Pues no sé. Puede haber llegado por la noche d...,"Well, I do not know. She may have arrived at n...",0.000000,neutral,0.263468,0.086924,0.020744,0.158272,0.295343,146,186


### Export expanded dataset to CSV

In [264]:
#Enriched nlp csv dataset file path
new_nlp_dataset_path = "D:\\Dropbox-Array2001\\Dropbox\\UNI\\MPGS\\2_TFM\\Datos\\prolexitim-nlp-1.1.csv"
new_tas_dataset_path = "D:\\Dropbox-Array2001\\Dropbox\\UNI\\MPGS\\2_TFM\\Datos\\prolexitim-tas-1.1.csv"
nlp_df.to_csv(new_nlp_dataset_path, sep='\t', encoding='utf-8')
tas_df.to_csv(new_tas_dataset_path, sep='\t', encoding='utf-8')