## Language Analysis of Alexithymic Discourse

<hr>

Alexithymic Language Project / raul@psicobotica.com / V2 release (sept 2020)

<hr>

### Sentiment Analysis

We perform here different methods for sentiment analysis. Expressed sentiment variables might be use as part of the feature vectors for detecting alexithymia.

- Lexicon-based sentiment analysis. 
- Third-party API based sentiment analysis. 
- Discussion and caveats about training our own sentiment analysis model. 

<hr>

[More about Sentiment Analysis](https://en.wikipedia.org/wiki/Sentiment_analysis)


In [1]:
import pandas as pd 
import numpy as np
import ast

## Load features dataset
- Data is already pre-processed (1-Preprocessing). 
- Basic NLP features are already calculated (2-Features). 
- Some additional BoW features have been added (3-BoW).
- Some additional TF/IDF features have been added (3-TFIDF).
- N-Gram models have been generated (3-N-Grams). 
- PoS lists for each document identified (4-Lexicosemantics). 
- Writer's personality variables inferred (5-Personality).

In [2]:
feats_dataset_path = "https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/data/Prolexitim_v2_features_5.csv"
alex_df = pd.read_csv(feats_dataset_path, header=0, delimiter=";")

In [5]:
alex_df.sample(2)

Unnamed: 0,Code,TAS20,F1,F2,F3,Gender,Age,Card,T_Metaphors,T_ToM,...,consumption_preferences_music_playing,consumption_preferences_music_latin,consumption_preferences_music_rock,consumption_preferences_music_classical,consumption_preferences_read_frequency,consumption_preferences_books_entertainment_magazines,consumption_preferences_books_non_fiction,consumption_preferences_books_financial_investing,consumption_preferences_books_autobiographies,consumption_preferences_volunteer
361,1c9636c6a36ba79f847db0589528df65,66,26,19,21,2,18,9VH,0,1,...,0.0,0.0,0.5,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,20cd825cadb95a71763bad06e142c148,40,12,10,18,2,22,1,0,1,...,,,,,,,,,,


## Lexicon-based sentiment Analysis

### Load sentiment lexicon models
- Previously generated (1b-SA-Lexicons). 
- We have lists of positives and negative words/stems obtained from Multilingual Sentiment Project. 
- We have my Spanish translation of AFINN-165 (AFINN-165-ES)

In [6]:
# Lexicon files are already available at Github: 
path_MSP_Pos_Words = 'https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/lexicon/MSP_Pos_Words.csv'
path_MSP_Neg_Words = 'https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/lexicon/MSP_Neg_Words.csv'
path_MSP_Pos_StemsS = 'https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/lexicon/MSP_Pos_StemsS.csv'
path_MSP_Neg_StemsS = 'https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/lexicon/MSP_Neg_StemsS.csv'
path_AFINN_165_ES = 'https://raw.githubusercontent.com/raul-arrabales/alexithymic-lang/master/lexicon/AFINN-165-es.csv'

In [10]:
# Get the lexicons in memory as dataframes
MSP_Pos_Words_df = pd.read_csv(path_MSP_Pos_Words, header=0, delimiter=";")
MSP_Neg_Words_df = pd.read_csv(path_MSP_Neg_Words, header=0, delimiter=";")
MSP_Pos_StemsS_df = pd.read_csv(path_MSP_Pos_StemsS, header=0, delimiter=";")
MSP_Neg_StemsS_df = pd.read_csv(path_MSP_Neg_StemsS, header=0, delimiter=";")
AFINN_165_ES_df = pd.read_csv(path_AFINN_165_ES, header=0, delimiter=";")

In [16]:
AFINN_165_ES_df.sample(4)

Unnamed: 0,Word,Score,Word_ES,Stem_ES_P,Stem_ES_S
586,crazier,-2,más loco,más loco,mas loc
2592,threaten,-2,threaten,threaten,threat
2748,vested,1,establecido,establecido,establec
2148,rebel,-2,rebelde,rebeld,rebeld


In [22]:
# Use sets in the case of MSP, where there is no specific score
MSP_Pos_Words_set = set(MSP_Pos_Words_df.Pos)
MSP_Neg_Words_set = set(MSP_Neg_Words_df.Neg)
MSP_Pos_StemsS_set = set(MSP_Pos_StemsS_df.Pos)
MSP_Neg_StemsS_set = set(MSP_Neg_StemsS_df.Neg)

In [26]:
import random
print(random.sample(MSP_Pos_Words_set, 4))
print(random.sample(MSP_Neg_Words_set, 4))
print(random.sample(MSP_Pos_StemsS_set, 4))
print(random.sample(MSP_Neg_StemsS_set, 4))

['accesible', 'prolífico', 'estabilizar', 'estelar']
['adherente', 'cuestiones', 'descarada', 'imprecisiones']
['subvencion', 'aug', 'prefir', 'lind']
['invis', 'villan', 'devor', 'deprim']


### Calculate sentimen per document
- Based on MSP word matches. 
- Based on MSP snowball stems matches. 
- Based on AFINN word matches. 
- Based on AFINN snowball stems matches. 

In [59]:
# Calculates the sentiment score based on MSP pos/neg words/stems and doc length
def get_MSP_Sent(doc, posWordsSet, negWordsSet):
    """
    Parameters
    ----------
    doc : list
        List of stopped tokens / snowballed stems extracted from a text. 
    posWordsSet : set
        Set of positive words
    
    Returns
    -------
    score: float
        (Positive matches - negative matches) / length of token list
        
    """
    score = 0
    
    for tok in doc: 
        if (tok in posWordsSet):
            score += 1
        if (tok in negWordsSet):
            score -= 1
            
    return (score / len(doc))    

In [28]:
alex_df.columns[0:40]

Index(['Code', 'TAS20', 'F1', 'F2', 'F3', 'Gender', 'Age', 'Card',
       'T_Metaphors', 'T_ToM', 'T_FP', 'T_Interpret', 'T_Desc', 'T_Confussion',
       'Text', 'Alex_A', 'Alex_B', 'Words', 'Sentences', 'Tokens',
       'Tokens_Stop', 'Tokens_Stem_P', 'Tokens_Stem_S', 'POS', 'NER', 'DEP',
       'Lemmas_CNLP', 'Lemmas_Spacy', 'Chars', 'avgWL', 'avgSL', 'Pun_Count',
       'Stop_Count', 'RawTokens', 'Title_Count', 'Upper_Count', 'PRON_Count',
       'DET_Count', 'ADV_Count', 'VERB_Count'],
      dtype='object')

In [60]:
# Test function with words
test_tokens = ast.literal_eval(alex_df['Tokens_Stop'][159])
print(test_tokens)
get_MSP_Sent(test_tokens, MSP_Pos_Words_set, MSP_Neg_Words_set)

['niño', 'tras', 'horas', 'práctica', 'infructuosas', 'violín', 'punto', 'abandonar', 'estudios', 'lugar', 'ello', 'tras', 'dejarlo', 'reposar', 'mesa', 'explorar', 'estructura', 'materiales', 'funcionamiento', 'supo', 'encontrar', 'transformar', 'frustración', 'camino', 'aprendizaje', 'interactivo', 'instrumento', 'compañero', 'nuevas', 'aventuras', 'llenas', 'asombro']


-0.03125

In [61]:
# Test function with stems
test_tokens = ast.literal_eval(alex_df['Tokens_Stem_S'][159])
print(test_tokens)
get_MSP_Sent(test_tokens, MSP_Pos_StemsS_set, MSP_Neg_StemsS_set)

['niñ', 'tras', 'hor', 'practic', 'infructu', 'violin', 'punt', 'abandon', 'estudi', 'lug', 'ello', 'tras', 'dej', 'rep', 'mes', 'explor', 'estructur', 'material', 'funcion', 'sup', 'encontr', 'transform', 'frustracion', 'camin', 'aprendizaj', 'interact', 'instrument', 'compañer', 'nuev', 'aventur', 'llen', 'asombr']


-0.03125

In [66]:
# Now, add the MSP sentiment score to all examplars
alex_df['MSP_Words'] = alex_df.Tokens_Stop.apply(lambda x: get_MSP_Sent(ast.literal_eval(x), MSP_Pos_Words_set, MSP_Neg_Words_set))
alex_df['MSP_Stems'] = alex_df.Tokens_Stem_S.apply(lambda x: get_MSP_Sent(ast.literal_eval(x), MSP_Pos_StemsS_set, MSP_Neg_StemsS_set))

In [69]:
alex_df[['MSP_Words','MSP_Stems']].describe()

Unnamed: 0,MSP_Words,MSP_Stems
count,381.0,381.0
mean,-0.034257,-0.072769
std,0.135139,0.201612
min,-0.6,-0.8
25%,-0.1,-0.2
50%,0.0,-0.058824
75%,0.04,0.057143
max,0.3,0.5


Let's do AFINN now: 
- Polarity: (positive scores - negative scores).
- Mean Intensity: mean(sum(positives),sum(abs(negatives))). 
- Max Intensity: max(sum(positives),sum(abs(negatives))).

In [152]:
# Calculates the sentiment score (polarity) based on AFINN-ES score
def get_AFINN_Polarity(doc, isStem, AFINN):
    """
    Parameters
    ----------
    doc : list
        List of stopped tokens / snowballed stems extracted from a text. 
    isStem: boolean
        True means stem, false indicates word. 
    AFINN : dataframe
        Dataframe with words, stems and their corresponding sentiment score.
    
    Returns
    -------
    score: int
        sum(score(Positive matches)) - sum(score(negative matches)))
        
    """
    score = 0
    
    if (isStem):
        for tok in doc: 
            points = AFINN[AFINN['Stem_ES_S'] == tok]['Score']
            if (len(points) == 1):
                # print("Word: " + tok + " -> " + str(points.item()))
                score += points.item()
    else: 
        for tok in doc: 
            points = AFINN[AFINN['Word_ES'] == tok]['Score']
            if (len(points) == 1):
                # print("Word: " + tok + " -> " + str(points.item()))
                score += points.item()
            
    return score

In [150]:
# Test function with words
test_tokens = ast.literal_eval(alex_df['Tokens_Stop'][69])
print(test_tokens)
get_AFINN_Polarity(test_tokens, False, AFINN_165_ES_df)

['niño', 'obligaban', 'tocar', 'violín', 'gustaba', 'música', 'bueno', 'tocando', 'instrumentos', 'padres', 'obligaban', 'ir', 'conservatorio', 'tomar', 'clases', 'música', 'daba', 'vergüenza', 'hacerlo', 'delante', 'amigos', 'llegar', 'casa', 'estudiar', 'partituras', 'aburría', 'agobiaba', 'vez']


1

In [151]:
# Test function with stems
test_tokens = ast.literal_eval(alex_df['Tokens_Stem_S'][69])
print(test_tokens)
get_AFINN_Polarity(test_tokens, True, AFINN_165_ES_df)

['niñ', 'oblig', 'toc', 'violin', 'gust', 'music', 'buen', 'toc', 'instrument', 'padr', 'oblig', 'ir', 'conservatori', 'tom', 'clas', 'music', 'dab', 'vergüenz', 'hac', 'delant', 'amig', 'lleg', 'cas', 'estudi', 'partitur', 'aburr', 'agobi', 'vez']
Word: oblig -> 1
Word: buen -> 3
Word: oblig -> 1
Word: vergüenz -> -2
Word: amig -> 1


4

In [159]:
# Calculates the sentiment mean intensity based on AFINN-ES score
def get_AFINN_Intensity(doc, isStem, AFINN):
    """
    Parameters
    ----------
    doc : list
        List of stopped tokens / snowballed stems extracted from a text. 
    isStem: boolean
        True means stem, false indicates word. 
    AFINN : dataframe
        Dataframe with words, stems and their corresponding sentiment score.
    
    Returns
    -------
    score: float
         mean(sum(positives),sum(abs(negatives))).
        
    """
    positives = 0
    negatives = 0
    
    if (isStem):
        for tok in doc: 
            points = AFINN[AFINN['Stem_ES_S'] == tok]['Score']
            if (len(points) == 1):
                score = points.item()
                # print("Word: " + tok + " -> " + str(score))
                if (score > 0):
                    positives += score
                else: 
                    negatives += abs(score)
    else: 
        for tok in doc: 
            points = AFINN[AFINN['Word_ES'] == tok]['Score']
            if (len(points) == 1):
                score = points.item()
                # print("Word: " + tok + " -> " + str(score))
                if (score > 0):
                    positives += score
                else: 
                    negatives += abs(score)
                            
    return ((positives+negatives)/2)

In [156]:
# Test function with words
test_tokens = ast.literal_eval(alex_df['Tokens_Stop'][69])
print(test_tokens)
get_AFINN_Intensity(test_tokens, False, AFINN_165_ES_df)

['niño', 'obligaban', 'tocar', 'violín', 'gustaba', 'música', 'bueno', 'tocando', 'instrumentos', 'padres', 'obligaban', 'ir', 'conservatorio', 'tomar', 'clases', 'música', 'daba', 'vergüenza', 'hacerlo', 'delante', 'amigos', 'llegar', 'casa', 'estudiar', 'partituras', 'aburría', 'agobiaba', 'vez']
Word: bueno -> 3
Word: vergüenza -> -2


2.5

In [158]:
# Test function with stems
test_tokens = ast.literal_eval(alex_df['Tokens_Stem_S'][69])
print(test_tokens)
get_AFINN_Intensity(test_tokens, True, AFINN_165_ES_df)

['niñ', 'oblig', 'toc', 'violin', 'gust', 'music', 'buen', 'toc', 'instrument', 'padr', 'oblig', 'ir', 'conservatori', 'tom', 'clas', 'music', 'dab', 'vergüenz', 'hac', 'delant', 'amig', 'lleg', 'cas', 'estudi', 'partitur', 'aburr', 'agobi', 'vez']
Word: oblig -> 1
Word: buen -> 3
Word: oblig -> 1
Word: vergüenz -> -2
Word: amig -> 1


4.0

In [166]:
# Calculates the sentiment maximum intensity based on AFINN-ES score
def get_AFINN_Max_Intensity(doc, isStem, AFINN):
    """
    Parameters
    ----------
    doc : list
        List of stopped tokens / snowballed stems extracted from a text. 
    isStem: boolean
        True means stem, false indicates word. 
    AFINN : dataframe
        Dataframe with words, stems and their corresponding sentiment score.
    
    Returns
    -------
    score: float
         max(sum(positives),sum(abs(negatives))).
        
    """
    positives = 0
    negatives = 0
    
    if (isStem):
        for tok in doc: 
            points = AFINN[AFINN['Stem_ES_S'] == tok]['Score']
            if (len(points) == 1):
                score = points.item()
                # print("Word: " + tok + " -> " + str(score))
                if (score > 0):
                    positives += score
                else: 
                    negatives += abs(score)
    else: 
        for tok in doc: 
            points = AFINN[AFINN['Word_ES'] == tok]['Score']
            if (len(points) == 1):
                score = points.item()
                # print("Word: " + tok + " -> " + str(score))
                if (score > 0):
                    positives += score
                else: 
                    negatives += abs(score)
                            
    if (positives >= negatives):
        return positives
    else:
        return negatives

In [161]:
# Test function with words
test_tokens = ast.literal_eval(alex_df['Tokens_Stop'][69])
print(test_tokens)
get_AFINN_Max_Intensity(test_tokens, False, AFINN_165_ES_df)

['niño', 'obligaban', 'tocar', 'violín', 'gustaba', 'música', 'bueno', 'tocando', 'instrumentos', 'padres', 'obligaban', 'ir', 'conservatorio', 'tomar', 'clases', 'música', 'daba', 'vergüenza', 'hacerlo', 'delante', 'amigos', 'llegar', 'casa', 'estudiar', 'partituras', 'aburría', 'agobiaba', 'vez']
Word: bueno -> 3
Word: vergüenza -> -2


3

In [164]:
# Test function with stems
test_tokens = ast.literal_eval(alex_df['Tokens_Stem_S'][69])
print(test_tokens)
get_AFINN_Max_Intensity(test_tokens, True, AFINN_165_ES_df)

['niñ', 'oblig', 'toc', 'violin', 'gust', 'music', 'buen', 'toc', 'instrument', 'padr', 'oblig', 'ir', 'conservatori', 'tom', 'clas', 'music', 'dab', 'vergüenz', 'hac', 'delant', 'amig', 'lleg', 'cas', 'estudi', 'partitur', 'aburr', 'agobi', 'vez']
Word: oblig -> 1
Word: buen -> 3
Word: oblig -> 1
Word: vergüenz -> -2
Word: amig -> 1


6

In [167]:
# Now, add the AFINN sentiment scores to all exemplars
alex_df['AFINN_Words_Pol'] = alex_df.Tokens_Stop.apply(lambda x: get_AFINN_Polarity(ast.literal_eval(x), False, AFINN_165_ES_df))
alex_df['AFINN_Stems_Pol'] = alex_df.Tokens_Stem_S.apply(lambda x: get_AFINN_Polarity(ast.literal_eval(x), True, AFINN_165_ES_df))
alex_df['AFINN_Words_Int'] = alex_df.Tokens_Stop.apply(lambda x: get_AFINN_Intensity(ast.literal_eval(x), False, AFINN_165_ES_df))
alex_df['AFINN_Stems_Int'] = alex_df.Tokens_Stem_S.apply(lambda x: get_AFINN_Intensity(ast.literal_eval(x), True, AFINN_165_ES_df))
alex_df['AFINN_Words_Max'] = alex_df.Tokens_Stop.apply(lambda x: get_AFINN_Max_Intensity(ast.literal_eval(x), False, AFINN_165_ES_df))
alex_df['AFINN_Stems_Max'] = alex_df.Tokens_Stem_S.apply(lambda x: get_AFINN_Max_Intensity(ast.literal_eval(x), True, AFINN_165_ES_df))

In [169]:
alex_df[['AFINN_Words_Pol',
         'AFINN_Stems_Pol','AFINN_Words_Int','AFINN_Stems_Int',
         'AFINN_Words_Max','AFINN_Stems_Max']].describe()

Unnamed: 0,AFINN_Words_Pol,AFINN_Stems_Pol,AFINN_Words_Int,AFINN_Stems_Int,AFINN_Words_Max,AFINN_Stems_Max
count,381.0,381.0,381.0,381.0,381.0,381.0
mean,-0.456693,-0.314961,1.48294,1.748031,2.406824,2.716535
std,2.854606,2.892915,1.752546,1.806648,2.684234,2.667241
min,-11.0,-11.0,0.0,0.0,0.0,0.0
25%,-2.0,-2.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,1.0,1.5,2.0,2.0
75%,0.0,1.0,2.0,2.5,4.0,4.0
max,11.0,11.0,10.0,9.0,15.0,14.0


In [171]:
alex_df[['AFINN_Words_Pol',
         'AFINN_Stems_Pol','AFINN_Words_Int','AFINN_Stems_Int',
         'AFINN_Words_Max','AFINN_Stems_Max']].corr()

Unnamed: 0,AFINN_Words_Pol,AFINN_Stems_Pol,AFINN_Words_Int,AFINN_Stems_Int,AFINN_Words_Max,AFINN_Stems_Max
AFINN_Words_Pol,1.0,0.457984,-0.064684,0.052638,-0.117186,0.045166
AFINN_Stems_Pol,0.457984,1.0,0.001014,-0.020763,-0.00955,-0.03718
AFINN_Words_Int,-0.064684,0.001014,1.0,0.620746,0.961139,0.601903
AFINN_Stems_Int,0.052638,-0.020763,0.620746,1.0,0.57063,0.954758
AFINN_Words_Max,-0.117186,-0.00955,0.961139,0.57063,1.0,0.572275
AFINN_Stems_Max,0.045166,-0.03718,0.601903,0.954758,0.572275,1.0


In [170]:
alex_df[['MSP_Stems','AFINN_Stems_Pol']].corr()

Unnamed: 0,MSP_Stems,AFINN_Stems_Pol
MSP_Stems,1.0,0.247254
AFINN_Stems_Pol,0.247254,1.0


In [174]:
# We have 8 new features related to sentiment. 
alex_df.columns[len(alex_df.columns)-8:len(alex_df.columns)]

Index(['MSP_Words', 'MSP_Stems', 'AFINN_Words_Pol', 'AFINN_Stems_Pol',
       'AFINN_Words_Int', 'AFINN_Stems_Int', 'AFINN_Words_Max',
       'AFINN_Stems_Max'],
      dtype='object')

## Sentiment Analysis using IBM Cloud
- Watson Natural Language Understanding (NLU) API

[Watson NLU](https://www.ibm.com/es-es/cloud/watson-natural-language-understanding)

[API Guide - Python](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python)

[Limited support for Spanish Language](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-language-support#spanish)

- Basically, Watson NLU provides a sentiment score for Spanish, but not the details about different emotions (for that, we'd need to translate first into English). 

In [194]:
import json
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_watson.natural_language_understanding_v1 import Features, EmotionOptions, SentimentOptions
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import ApiException

In [175]:
# Watson NLU endpoint for Europe
NLU_URL = 'https://gateway-lon.watsonplatform.net/natural-language-understanding/api'

In [178]:
# API key stored in local file apikey.json
with open('apikey.json') as f:
    apikeydata = json.load(f)

In [180]:
# apikeydata.get('NLU_key')

In [182]:
authenticator = IAMAuthenticator(apikeydata.get('NLU_key'))
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='2020-08-01',
    authenticator=authenticator
)

In [183]:
natural_language_understanding.set_service_url(NLU_URL)

In [188]:
# Testing the API
# Making the call for Emotion and Sentiment
response = natural_language_understanding.analyze(
    text='Probamos con una frase en español, aunque IBM dice que el idioma español no está soportado para las funciones de emociones, pero sí para las de análisis del sentimiento.',
    features=Features(emotion=EmotionOptions(),
                      sentiment=SentimentOptions())).get_result()

In [224]:
# Testing the API
# Making the call for Emotion and Sentiment
response = natural_language_understanding.analyze(
    text='dos palabras más',
    features=Features(sentiment=SentimentOptions())).get_result()

In [225]:
# Check the result:
print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 16,
    "features": 1
  },
  "sentiment": {
    "document": {
      "score": 0,
      "label": "neutral"
    }
  },
  "language": "es"
}


In [226]:
response['sentiment']['document']['score']

0

Let's annotate our data with the sentiment score from IBM NLU

In [227]:
# This function calls the Watson NLU API and gets the sentiment score
# for a given plain text in Spanish.
def get_Watson_Sentiment(text_es):
    """
    Parameters
    ----------
    text_es : str
        Document to be analyzed in Spanish
    
    Returns
    -------
    score: float
        Document sentiment analysis results (sentiment score)
        
    """
    try:
        json_response = natural_language_understanding.analyze(
            text = text_es,
            features = Features(sentiment=SentimentOptions())).get_result()
        
    except ApiException as ex:
        return np.nan
        print("Method failed with status code " + str(ex.code) + ": " + ex.message)
        
    return json_response['sentiment']['document']['score']

In [None]:
# Apply for all rows: 
# alex_df['Watson_Sent'] = alex_df.Text.apply(lambda x: get_Watson_Sentiment(x))

In [228]:
# Create empty column
alex_df['Watson_Sent'] = np.nan

In [231]:
# Let do that iteratively
for i in range(len(alex_df)):
    
    # The API requires a minimum of XX words??
    if ( len(alex_df['Text'].iloc[i].split()) > 1 ):
        
        # Get the results for user i:
        score = get_Watson_Sentiment(alex_df['Text'].iloc[i])
        print("("+str(i)+") Watson said: " + str(score))
        
        # Update the feature vectors
        alex_df['Watson_Sent'].iloc[i] = score
        

(0) Watson said: -0.621737


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


(1) Watson said: -0.946776
(2) Watson said: -0.809901
(3) Watson said: -0.964077
(4) Watson said: -0.599994
(5) Watson said: -0.8963
(6) Watson said: 0
(7) Watson said: 0.720764
(8) Watson said: -0.733411
(9) Watson said: 0.315651
(10) Watson said: 0.630662
(11) Watson said: -0.405908
(12) Watson said: -0.861215
(13) Watson said: 0
(14) Watson said: -0.5975
(15) Watson said: 0.424281
(16) Watson said: -0.891803
(17) Watson said: -0.531513
(18) Watson said: 0.450271
(19) Watson said: -0.326503
(20) Watson said: 0
(21) Watson said: -0.429521
(22) Watson said: -0.701011
(23) Watson said: -0.855875
(24) Watson said: 0
(25) Watson said: 0
(26) Watson said: -0.592164
(27) Watson said: 0
(28) Watson said: 0
(29) Watson said: 0
(30) Watson said: -0.70638
(31) Watson said: -0.709653
(32) Watson said: -0.814213
(33) Watson said: 0.93461
(34) Watson said: 0
(35) Watson said: -0.866115
(36) Watson said: 0
(37) Watson said: 0.673749
(38) Watson said: 0
(39) Watson said: 0.457191
(40) Watson said: -

(314) Watson said: -0.716789
(315) Watson said: -0.774191
(316) Watson said: -0.685945
(317) Watson said: 0.307415
(318) Watson said: 0.512047
(319) Watson said: 0.76681
(320) Watson said: -0.543033
(321) Watson said: -0.32615
(322) Watson said: -0.374754
(323) Watson said: 0
(324) Watson said: 0
(325) Watson said: 0.433333
(326) Watson said: -0.847622
(327) Watson said: -0.475316
(328) Watson said: 0.251176
(329) Watson said: -0.974369
(330) Watson said: -0.440927
(331) Watson said: 0
(332) Watson said: -0.753944
(333) Watson said: -0.670537
(334) Watson said: 0
(335) Watson said: -0.625833
(336) Watson said: -0.479581
(337) Watson said: 0


ERROR:root:unsupported text language: ca
Traceback (most recent call last):
  File "C:\Users\array\Anaconda3\lib\site-packages\ibm_cloud_sdk_core\base_service.py", line 225, in send
    response.status_code, http_response=response)
ibm_cloud_sdk_core.api_exception.ApiException: Error: unsupported text language: ca, Code: 400 , X-global-transaction-id: 3505c58fcb072500ae17abcb1c5e507a


(338) Watson said: nan
(339) Watson said: -0.787782
(340) Watson said: -0.385569
(341) Watson said: 0.984114
(342) Watson said: 0.946816
(343) Watson said: -0.920059
(344) Watson said: 0
(345) Watson said: 0.834984
(346) Watson said: 0
(347) Watson said: -0.577162
(348) Watson said: 0.46053
(349) Watson said: 0.758709
(350) Watson said: 0.831549
(351) Watson said: -0.799399
(352) Watson said: -0.932474
(353) Watson said: -0.853616
(354) Watson said: -0.599355
(355) Watson said: -0.553014
(356) Watson said: 0
(357) Watson said: -0.519449
(358) Watson said: 0.426008
(359) Watson said: -0.529973
(360) Watson said: -0.587762
(361) Watson said: -0.92285
(362) Watson said: 0
(363) Watson said: -0.715093
(364) Watson said: -0.907211
(365) Watson said: -0.313718
(366) Watson said: 0
(367) Watson said: -0.634458
(368) Watson said: -0.815829
(369) Watson said: 0
(370) Watson said: -0.883665
(371) Watson said: 0.757983
(372) Watson said: -0.940531
(373) Watson said: 0.517471
(374) Watson said: -0

In [232]:
alex_df.Watson_Sent.describe()

count    380.000000
mean      -0.108048
std        0.584759
min       -0.996518
25%       -0.617178
50%        0.000000
75%        0.424713
max        0.991864
Name: Watson_Sent, dtype: float64

In [233]:
alex_df[['MSP_Stems','AFINN_Stems_Pol','Watson_Sent']].corr()

Unnamed: 0,MSP_Stems,AFINN_Stems_Pol,Watson_Sent
MSP_Stems,1.0,0.247254,0.447891
AFINN_Stems_Pol,0.247254,1.0,0.26541
Watson_Sent,0.447891,0.26541,1.0


## Save fatures data in a new version of the CSV

In [234]:
# We have 9 new features related to sentiment. 
alex_df.columns[len(alex_df.columns)-9:len(alex_df.columns)]

Index(['MSP_Words', 'MSP_Stems', 'AFINN_Words_Pol', 'AFINN_Stems_Pol',
       'AFINN_Words_Int', 'AFINN_Stems_Int', 'AFINN_Words_Max',
       'AFINN_Stems_Max', 'Watson_Sent'],
      dtype='object')

In [235]:
# Save Updated features dataset
Feats_6_path = "D:\\Dropbox-Array2001\\Dropbox\\DataSets\\Prolexitim-Dataset\\Prolexitim_v2_features_6.csv"
alex_df.to_csv(Feats_6_path, sep=';', encoding='utf-8', index=False)