# NLP & Sentiment Analysis.

En esta etapa del proyecto manejaremos diferentes modelos pre-entrenados (librerías) y trataremos de evaluar cuál es el que nos ofrece mejores resultados.

Evidentemente, sería conveniente desarrollar un modelo propio ad-hoc según las necesidades particulares del proyecto, lo cual probablemente nos permita alcanzar resultados más precisos. Sin embargo, no debemos olvidar que por el momento pretendemos construir las bases del proyecto y obtener un mínimo producto viable que demuestre su potencial. Disponemos de muy poco tiempo como para desarrollar y entrenar nuestro propio modelo y es por eso que se opta por la utilización de modelos pre-entrenados de NLP a la hora de analizar el sentimiento de los tweets y, por tanto, la percepción del usuario respecto de las marcas.

Los análisis de sentimiento se ejecutarán a través de funciones previamente definidas en *' src / nlp_functions.py '*.

In [1]:
from src.nlp_functions import *

import pandas as pd

In [2]:
brands_tweets = pd.read_pickle('data/brands_tokens.pkl')

In [3]:
brands_tweets.head()

Unnamed: 0,text,hashtags,brand_attribute,brand,token
0,nobody cares about nike in russia russia is al...,[],quality,nike,"[care, nike, russia, russia, adidas]"
1,ye green nike hoodie waala two weeks pehle put...,[],quality,nike,"[ye, green, nike, hoodie, waala, week, pehle, ..."
2,nike okundaye also knowns nike twins seven sev...,"[#womengiant, #Documentwomen]",quality,nike,"[nike, okundaye, knowns, nike, twin, seven, se..."
3,day four of maxmadness air max month nike air ...,"[#maxmadness, #AirMaxMonth, #airmaxgang, #kotd...",quality,nike,"[day, maxmadness, air, max, month, nike, air, ..."
4,thank you sir,[],quality,nike,"[thank, sir]"


### TextBlob (spaCy)

In [4]:
'''INTRO!!!'''

# Esplicación breve.

# Catarse bien de cómo funciona.
# Ojo instalaciones (lo he metido en todos laos).


'INTRO!!!'

In [5]:
brands_tweets['blob_scores'] = brands_tweets.token.apply(blob_scoring)

In [6]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,blob_scores
20218,fantastic gents hope the day went well,[],quality,adidas,"[fantastic, gent, hope, day, went]",0.4
18581,nike stops selling shoes online in russia beca...,[],quality,adidas,"[nike, stop, selling, shoe, online, russia, fi...",0.0
14057,nike air jordan one low elevated bred dq one e...,[],quality,adidas,"[nike, air, jordan, low, elevated, bred, dq, z...",0.0
7576,news flash we re not watching anymore that inc...,[],quality,adidas,"[news, flash, watching, anymore, includes, rem...",0.0
18057,ad restock via nike us nike air force one mid ...,[],price,nike,"[ad, restock, nike, nike, air, force, mid, nyc...",0.2
7171,i like them and would get them but they usuall...,[],quality,adidas,"[like, usually, size, wide, look, smaller, one]",-0.116667
15712,i got a nike air force one collab,[],quality,nike,"[got, nike, air, force, collab]",0.0
13574,these are the vibez im trying to be on,[],quality,adidas,"[vibez, im, trying]",0.0
13898,nike air jordan one four retro low shocking pi...,[],quality,nike,"[nike, air, jordan, retro, low, shocking, pink...",-0.366667
8544,how do i become nike shoes quick,[],quality,adidas,"[nike, shoe, quick]",0.333333


### VADER.

VADER (Valence Aware Dictionary and sEntiment Reasoner) es una librería utilizada para el análisis del sentimiento que se enfoca en los textos de social media. Así, pone énfases en las rules que captan la esencia del texto que normalmente se ve en las redes sociales. Algo interesante de VADER es que está pensado para poder actuar sobre texto sobre el que se ha hecho un limpieza muy básica (conservando emojis, signos de exclamación, etc.). Nosotros lo probaremos sobre los datos ya limpios ya que confiamos en que ofrecerán un mejor resultado (aunque no se presenta el proceso, se ha probado con los datos originales en bruto, pero no consique analizar los sentimientos correctamente y la gran mayoría de scores para negatividad, neutralidad y positivadad son 0). No obstante, lo ideal sería evaluar la precisión del análisis sobre tweets con diferentes niveles de limpieza (desde el más básico, al más detallista) y comparar los modelos para identificar el que ofrece un mayor rendimiento (incluyendo también los análisis desarrollados mediante otros modelos diferentes a VADER).

https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4

In [7]:
brands_tweets['vader_scores'] = brands_tweets.token.apply(vader_scoring)

In [8]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,blob_scores,vader_scores
9433,savings ad nike air jordan one retro high og b...,[],quality,nike,"[saving, ad, nike, air, jordan, retro, high, o...",0.16,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
24503,yes i do b embarrassed when all my adidas sock...,[],quality,nike,"[yes, b, embarrassed, adidas, sock, dirty, wea...",-0.1,"{'neg': 0.29, 'neu': 0.43, 'pos': 0.28, 'compo..."
25120,thanks,[],quality,adidas,[thanks],0.2,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound..."
16690,nike pulling all their products from foot lock...,[],quality,nike,"[nike, pulling, product, foot, locker, basical...",0.3,"{'neg': 0.207, 'neu': 0.632, 'pos': 0.161, 'co..."
28526,sneaker scouts the nike blazer mid seven seven...,"[#SneakerScouts, #ad]",price,nike,"[sneaker, scout, nike, blazer, mid, seven, sev...",0.058333,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
9496,right like it s a white tank top with nike wri...,[],quality,adidas,"[right, like, white, tank, nike, written, thin...",0.142857,"{'neg': 0.0, 'neu': 0.551, 'pos': 0.449, 'comp..."
27993,all were in a game played with a nike basketba...,[],quality,adidas,"[game, played, nike, basketball, shame]",-0.4,"{'neg': 0.365, 'neu': 0.353, 'pos': 0.282, 'co..."
15321,he don t know queen,[],quality,nike,"[know, queen]",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
25204,are those boots sponsored by nike,[],quality,adidas,"[boot, sponsored, nike]",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
25254,nike town mane,[],quality,adidas,"[nike, town, mane]",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."


In [9]:
brands_tweets.blob_scores.value_counts()

 0.000000    27355
 0.650000     2451
 0.500000     1945
 0.400000     1429
 0.200000     1280
             ...  
 0.155682        1
 0.174107        1
 0.573016        1
 0.055556        1
-0.040476        1
Name: blob_scores, Length: 2226, dtype: int64

In [10]:
type(brands_tweets.vader_scores[2].reset_index().vader_scores[0])

dict

In [11]:
brands_tweets.reset_index(inplace = True)

# Resultados del Análisis.

### Blob Scores

In [12]:
brands_tweets.blob_scores.mean()

0.10486408248893689

**Resultados Nike:**

In [13]:
nike_price = brands_tweets[(brands_tweets['brand'] == 'nike') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [14]:
print('Precio Nike: ', nike_price.blob_scores.mean())

Precio Nike:  0.109937374381409


In [15]:
nike_quality = brands_tweets[(brands_tweets['brand'] == 'nike') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [16]:
print('Calidad Nike: ', nike_quality.blob_scores.mean())

Calidad Nike:  0.20489180285136294


**Resultados Adidas:**

In [17]:
adidas_price = brands_tweets[(brands_tweets['brand'] == 'adidas') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [18]:
print('Precio Adidas: ', adidas_price.blob_scores.mean())

Precio Adidas:  0.11120344708668521


In [19]:
adidas_quality = brands_tweets[(brands_tweets['brand'] == 'adidas') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [20]:
print('Calidad Adidas: ', adidas_quality.blob_scores.mean())

Calidad Adidas:  0.20460581742491246


**Resultados Asics:**

In [21]:
asics_price = brands_tweets[(brands_tweets['brand'] == 'asics') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [22]:
print('Precio Asics: ', asics_price.blob_scores.mean())

Precio Asics:  0.10397568056206592


In [23]:
asics_quality = brands_tweets[(brands_tweets['brand'] == 'asics') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [24]:
print('Calidad Asics: ', asics_quality.blob_scores.mean())

Calidad Asics:  0.23039710194948845


**Resultados Reebok:**

In [25]:
reebok_price = brands_tweets[(brands_tweets['brand'] == 'reebok') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [30]:
print('Precio Reebok: ', reebok_price.blob_scores.mean())

Precio Reebok:  0.16575064075064078


In [27]:
reebok_quality = brands_tweets[(brands_tweets['brand'] == 'reebok') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [31]:
print('Calidad Reebok: ', reebok_quality.blob_scores.mean())

Calidad Reebok:  0.2610846382136843


### VADER

In [29]:
print('Precio Nike: ', nike_price.vader_scores[.mean())

SyntaxError: invalid syntax (<ipython-input-29-a6b0ec2140c2>, line 1)

In [None]:
nike_price.vader_scores[13]

In [None]:
def compound_score(scores):
    return  scores

nike_price.vader_scores['neg'].mean()

# PRUEBAS

In [None]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

In [None]:
lemmatizer.lemmatize('are')

In [None]:
from textblob import Word

a = Word('are')
a.lemmatize('v')

In [None]:
nlp = spacy.load('en_core_web_sm')

In [None]:
sent = 'Gus is helping organize a developer'

for token in nlp(sent):
    print (token, token.lemma_)

In [None]:
from spacy import displacy

doc = nlp(sent)

displacy.render(doc, style='dep', jupyter=True)