# NLP & Sentiment Analysis.

En esta etapa del proyecto manejaremos diferentes modelos pre-entrenados (librerías) y trataremos de evaluar cuál es el que nos ofrece mejores resultados.

Evidentemente, sería conveniente desarrollar un modelo propio ad-hoc según las necesidades particulares del proyecto, lo cual probablemente nos permita alcanzar resultados más precisos. Sin embargo, no debemos olvidar que por el momento pretendemos construir las bases del proyecto y obtener un mínimo producto viable que demuestre su potencial. Disponemos de muy poco tiempo como para desarrollar y entrenar nuestro propio modelo y es por eso que se opta por la utilización de modelos pre-entrenados de NLP a la hora de analizar el sentimiento de los tweets y, por tanto, la percepción del usuario respecto de las marcas.

Los análisis de sentimiento se ejecutarán a través de funciones previamente definidas en *' src / nlp_functions.py '*.

In [1]:
from src.nlp_functions import *

import pandas as pd

In [2]:
brands_tweets = pd.read_pickle('data/brands_tokens.pkl')

In [3]:
brands_tweets.head()

Unnamed: 0,text,hashtags,brand_attribute,brand,token
0,nobody cares about nike in russia russia is al...,[],quality,nike,"[care, nike, russia, russia, adidas]"
1,ye green nike hoodie waala two weeks pehle put...,[],quality,nike,"[ye, green, nike, hoodie, waala, week, pehle, ..."
2,nike okundaye also knowns nike twins seven sev...,"[#womengiant, #Documentwomen]",quality,nike,"[nike, okundaye, knowns, nike, twin, seven, se..."
3,day four of maxmadness air max month nike air ...,"[#maxmadness, #AirMaxMonth, #airmaxgang, #kotd...",quality,nike,"[day, maxmadness, air, max, month, nike, air, ..."
4,thank you sir,[],quality,nike,"[thank, sir]"


### TextBlob (spaCy)

In [4]:
'''INTRO!!!'''

# Esplicación breve.

# Catarse bien de cómo funciona.
# Ojo instalaciones (lo he metido en todos laos).


'INTRO!!!'

In [5]:
brands_tweets['blob_scores'] = brands_tweets.token.apply(blob_scoring)

In [6]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,blob_scores
20410,got to love it,[],quality,adidas,"[got, love]",0.5
12147,new invention the skinny horse man is a car ni...,[],price,adidas,"[new, invention, skinny, horse, man, car, nike...",0.136364
24069,so good i had to share check out all the items...,"[#poshmark, #fashion, #style, #shopmycloset, #...",quality,nike,"[good, share, check, item, loving, poshmark, f...",0.65
15092,get him some shoes or nice nike sweater or som...,[],quality,adidas,"[shoe, nice, nike, sweater]",0.6
15859,i miss nike but seeing him do so well with his...,[],quality,nike,"[miss, nike, seeing, mom, make, happy]",0.8
28047,did nike design the paint job on all these car...,[#],quality,nike,"[nike, design, paint, job, car, horrible, plac...",-1.0
19429,six love mr amp mrs zepeda fyi zepeda cia s im...,[],price,nike,"[love, mr, amp, mr, zepeda, fyi, zepeda, cia, ...",0.225
9071,we re they nike socks made by slave labor in c...,[],quality,nike,"[nike, sock, slave, labor, china]",0.0
26895,so good i had to share check out all the items...,"[#poshmark, #fashion, #style, #shopmycloset, #...",quality,adidas,"[good, share, check, item, loving, poshmark, f...",0.65
13253,check out the nike react element five five ken...,[],quality,nike,"[check, nike, react, element, kendrick, lamar,...",0.4


### VADER.

VADER (Valence Aware Dictionary and sEntiment Reasoner) es una librería utilizada para el análisis del sentimiento que se enfoca en los textos de social media. Así, pone énfases en las rules que captan la esencia del texto que normalmente se ve en las redes sociales. Algo interesante de VADER es que está pensado para poder actuar sobre texto sobre el que se ha hecho un limpieza muy básica (conservando emojis, signos de exclamación, etc.). Nosotros lo probaremos sobre los datos ya limpios ya que confiamos en que ofrecerán un mejor resultado (aunque no se presenta el proceso, se ha probado con los datos originales en bruto, pero no consique analizar los sentimientos correctamente y la gran mayoría de scores para negatividad, neutralidad y positivadad son 0). No obstante, lo ideal sería evaluar la precisión del análisis sobre tweets con diferentes niveles de limpieza (desde el más básico, al más detallista) y comparar los modelos para identificar el que ofrece un mayor rendimiento (incluyendo también los análisis desarrollados mediante otros modelos diferentes a VADER).

https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4

In [7]:
brands_tweets['vader_scores'] = brands_tweets.token.apply(vader_scoring)

In [8]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,blob_scores,vader_scores
20458,work rn i have on nike sweats a gucci hoodie a...,[],quality,adidas,"[work, rn, nike, sweat, gucci, hoodie, ugg, sl...",0.0375,"{'neg': 0.135, 'neu': 0.865, 'pos': 0.0, 'comp..."
23276,yep i have a tracksuit from foschini that look...,[],price,nike,"[yep, tracksuit, foschini, look, feel, better,...",0.4,"{'neg': 0.102, 'neu': 0.557, 'pos': 0.341, 'co..."
8449,wonder if i can buy that popcaan nike poster,[],quality,adidas,"[wonder, buy, popcaan, nike, poster]",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
12195,nike which uses slave labor in china,[],quality,adidas,"[nike, us, slave, labor, china]",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
23484,naw analysis nike should be the next luxury fa...,[],quality,adidas,"[naw, analysis, nike, luxury, fashion, house]",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
27575,check out this listing i just added to my posh...,"[#Poshmark, #shopmycloset]",quality,nike,"[check, listing, added, poshmark, closet, nike...",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
8968,mf u said if he sells yeezy he gets one zero z...,[],price,adidas,"[mf, u, said, sell, yeezy, get, zero, zero, pr...",0.0,"{'neg': 0.142, 'neu': 0.749, 'pos': 0.109, 'co..."
3782,check out this listing i just added to my posh...,"[#Poshmark, #shopmycloset]",quality,nike,"[check, listing, added, poshmark, closet, nike...",0.016667,"{'neg': 0.0, 'neu': 0.872, 'pos': 0.128, 'comp..."
24818,check out this listing i just added to my posh...,"[#Poshmark, #shopmycloset]",quality,nike,"[check, listing, added, poshmark, closet, nike...",0.066667,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
1871,so good i had to share check out all the items...,"[#poshmark, #fashion, #style, #shopmycloset, #...",quality,nike,"[good, share, check, item, loving, poshmark, f...",0.65,"{'neg': 0.0, 'neu': 0.383, 'pos': 0.617, 'comp..."


In [15]:
brands_tweets.blob_scores.value_counts()

 0.000000    25739
 0.650000     2119
 0.500000     1853
 0.400000     1343
 0.200000     1236
             ...  
-0.003333        1
-0.057778        1
 0.068667        1
 0.022321        1
-0.135714        1
Name: blob_scores, Length: 2049, dtype: int64

In [78]:
type(brands_tweets.vader_scores[2].reset_index().vader_scores[0])

dict

In [80]:
brands_tweets.reset_index(inplace = True)

# Resultados del Análisis.

### Blob Scores

In [30]:
brands_tweets.blob_scores.mean()

0.10324134810990024

In [39]:
nike_price = brands_tweets[(brands_tweets['brand'] == 'nike') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [45]:
print('Precio Nike: ', nike_price.blob_scores.mean())

Precio Nike:  0.109937374381409


In [51]:
nike_quality = brands_tweets[(brands_tweets['brand'] == 'nike') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [44]:
print('Calidad Nike: ', nike_quality.blob_scores.mean())

Calidad Nike:  0.20489180285136294


In [46]:
adidas_price = brands_tweets[(brands_tweets['brand'] == 'adidas') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [47]:
print('Precio Adidas: ', adidas_price.blob_scores.mean())

Precio Adidas:  0.11120344708668521


In [67]:
adidas_quality = brands_tweets[(brands_tweets['brand'] == 'adidas') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [68]:
print('Calidad Adidas: ', adidas_quality.blob_scores.mean())

Calidad Adidas:  0.20460581742491246


### VADER

In [None]:
print('Precio Nike: ', nike_price.vader_scores[.mean())

In [90]:
def compound_score:

nike_price.vader_scores['neg'].mean()

KeyError: 'neg'

# PRUEBAS

In [9]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

In [10]:
lemmatizer.lemmatize('are')

'are'

In [11]:
from textblob import Word

a = Word('are')
a.lemmatize('v')

'be'

In [12]:
nlp = spacy.load('en_core_web_sm')

NameError: name 'spacy' is not defined

In [None]:
sent = 'Gus is helping organize a developer'

for token in nlp(sent):
    print (token, token.lemma_)

In [None]:
from spacy import displacy

doc = nlp(sent)

displacy.render(doc, style='dep', jupyter=True)