# NLP & Sentiment Analysis.

En esta etapa del proyecto manejaremos diferentes modelos pre-entrenados (librerías) y trataremos de evaluar cuál es el que nos ofrece mejores resultados.

Evidentemente, sería conveniente desarrollar un modelo propio ad-hoc según las necesidades particulares del proyecto, lo cual probablemente nos permita alcanzar resultados más precisos. Sin embargo, no debemos olvidar que por el momento pretendemos construir las bases del proyecto y obtener un mínimo producto viable que demuestre su potencial. Disponemos de muy poco tiempo como para desarrollar y entrenar nuestro propio modelo y es por eso que se opta por la utilización de modelos pre-entrenados de NLP a la hora de analizar el sentimiento de los tweets y, por tanto, la percepción del usuario respecto de las marcas.

Los análisis de sentimiento se ejecutarán a través de funciones previamente definidas en *' src / nlp_functions.py '*.

In [1]:
from src.nlp_functions import *

import pandas as pd

In [2]:
brands_tweets = pd.read_pickle('data/sports_equipment_brands_tokens.pkl')

In [3]:
brands_tweets.head()

Unnamed: 0,text,hashtags,brand_attribute,brand,token
0,nobody cares about nike in russia russia is al...,[],quality,nike,"[care, nike, russia, russia, adidas]"
1,ye green nike hoodie waala two weeks pehle put...,[],quality,nike,"[ye, green, nike, hoodie, waala, week, pehle, ..."
2,nike okundaye also knowns nike twins seven sev...,"[#womengiant, #Documentwomen]",quality,nike,"[nike, okundaye, knowns, nike, twin, seven, se..."
3,day four of maxmadness air max month nike air ...,"[#maxmadness, #AirMaxMonth, #airmaxgang, #kotd...",quality,nike,"[day, maxmadness, air, max, month, nike, air, ..."
4,thank you sir,[],quality,nike,"[thank, sir]"


### TextBlob (spaCy)

In [4]:
'''INTRO!!!'''

# Esplicación breve.

# Catarse bien de cómo funciona.
# Ojo instalaciones (lo he metido en todos laos).


'INTRO!!!'

In [5]:
brands_tweets['blob_scores'] = brands_tweets.token.apply(blob_scoring)

In [6]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,blob_scores
29320,one of my favorite pics i took this az high sc...,[],quality,nike,"[favorite, pic, took, az, high, school, girl, ...",0.273333
2289,checks over stripes custom made nike rug made ...,[],quality,adidas,"[check, stripe, custom, nike, rug, raffle, win...",0.0
9904,size uk one two nike air force one x space jam...,[],quality,nike,"[size, uk, nike, air, force, x, space, jam, co...",0.136364
5322,nike blazer mid cop or drop gt gt gt ad sneake...,"[#Nike, #AD, #sneakers, #sneakerhead, #streets...",quality,nike,"[nike, blazer, mid, cop, drop, gt, gt, gt, ad,...",0.0
382,yessirr,[],quality,puma,[yessirr],0.0
21487,yeah destroying your own property that ll work...,[],quality,nike,"[yeah, destroying, property, work, burn, nike,...",-0.2
27551,restock nike dunk low gold white silver univer...,[],price,nike,"[restock, nike, dunk, low, gold, white, silver...",0.0
6561,we will be fine on a neutral court any venue i...,[],price,nike,"[fine, neutral, court, venue, sec, tough, play...",0.015873
14026,check out the nike dunk high black white two z...,[],quality,adidas,"[check, nike, dunk, high, black, white, zero, ...",0.098333
23503,nike acg gore tex misery ridge shell jacket e ...,"[#eBay🇺🇸, #Men, #Activewear, #Jackets]",quality,nike,"[nike, acg, gore, tex, misery, ridge, shell, j...",0.0


### VADER.

VADER (Valence Aware Dictionary and sEntiment Reasoner) es una librería utilizada para el análisis del sentimiento que se enfoca en los textos de social media. Así, pone énfases en las rules que captan la esencia del texto que normalmente se ve en las redes sociales. Algo interesante de VADER es que está pensado para poder actuar sobre texto sobre el que se ha hecho un limpieza muy básica (conservando emojis, signos de exclamación, etc.). Nosotros lo probaremos sobre los datos ya limpios ya que confiamos en que ofrecerán un mejor resultado (aunque no se presenta el proceso, se ha probado con los datos originales en bruto, pero no consique analizar los sentimientos correctamente y la gran mayoría de scores para negatividad, neutralidad y positivadad son 0). No obstante, lo ideal sería evaluar la precisión del análisis sobre tweets con diferentes niveles de limpieza (desde el más básico, al más detallista) y comparar los modelos para identificar el que ofrece un mayor rendimiento (incluyendo también los análisis desarrollados mediante otros modelos diferentes a VADER).

https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4

In [7]:
brands_tweets['vader_scores'] = brands_tweets.token.apply(vader_scoring)

In [8]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,blob_scores,vader_scores
2104,glad to see og pairs being pulled out and not ...,[],quality,adidas,"[glad, og, pair, pulled, new, thing]",0.318182,"{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'comp..."
23723,my son wants nike air force i ve been around a...,[],quality,adidas,"[son, want, nike, air, force, mall, ke, e, bat...",0.0,"{'neg': 0.0, 'neu': 0.874, 'pos': 0.126, 'comp..."
18803,hate that happened to you,[],quality,nike,"[hate, happened]",-0.8,"{'neg': 0.787, 'neu': 0.213, 'pos': 0.0, 'comp..."
19206,he doing it just like nike,[],quality,adidas,"[like, nike]",0.0,"{'neg': 0.0, 'neu': 0.286, 'pos': 0.714, 'comp..."
124,not sure i follow tweet says last pic before u...,[],quality,umbro,"[sure, follow, tweet, say, pic, umbro, badge, ...",0.5,"{'neg': 0.0, 'neu': 0.753, 'pos': 0.247, 'comp..."
19429,six love mr amp mrs zepeda fyi zepeda cia s im...,[],price,nike,"[love, mr, amp, mr, zepeda, fyi, zepeda, cia, ...",0.225,"{'neg': 0.072, 'neu': 0.749, 'pos': 0.18, 'com..."
8452,its baby shark girls amp boy watch baby shark ...,[],quality,adidas,"[baby, shark, girl, amp, boy, watch, baby, sha...",0.1,"{'neg': 0.0, 'neu': 0.849, 'pos': 0.151, 'comp..."
3850,why are any companies doing business in russia...,[],price,adidas,"[company, business, russia, time, starve, russ...",0.0,"{'neg': 0.259, 'neu': 0.536, 'pos': 0.205, 'co..."
17454,feeling some serious keurig nike etc vibes rig...,[],price,nike,"[feeling, keurig, nike, etc, vibe, right]",0.285714,"{'neg': 0.0, 'neu': 0.769, 'pos': 0.231, 'comp..."
3486,check out this listing i just added to my posh...,"[#Poshmark, #shopmycloset]",quality,puma,"[check, listing, added, poshmark, closet, sold...",0.0,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."


In [9]:
brands_tweets.blob_scores.value_counts()

 0.000000    31878
 0.650000     2747
 0.500000     2220
 0.400000     1657
 0.200000     1398
             ...  
 0.009524        1
-0.343750        1
-0.035556        1
 0.118182        1
 0.477778        1
Name: blob_scores, Length: 2545, dtype: int64

In [10]:
type(brands_tweets.vader_scores[2].reset_index().vader_scores[0])

dict

In [11]:
brands_tweets.reset_index(inplace = True)

# Resultados del Análisis.

### Blob Scores

In [12]:
brands_tweets.blob_scores.mean()

0.1053805303017492

**Resultados Nike:**

In [13]:
nike_price = brands_tweets[(brands_tweets['brand'] == 'nike') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [14]:
print('Precio Nike: ', nike_price.blob_scores.mean())

Precio Nike:  0.109937374381409


In [15]:
nike_quality = brands_tweets[(brands_tweets['brand'] == 'nike') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [16]:
print('Calidad Nike: ', nike_quality.blob_scores.mean())

Calidad Nike:  0.20489180285136294


**Resultados Adidas:**

In [17]:
adidas_price = brands_tweets[(brands_tweets['brand'] == 'adidas') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [18]:
print('Precio Adidas: ', adidas_price.blob_scores.mean())

Precio Adidas:  0.11120344708668521


In [19]:
adidas_quality = brands_tweets[(brands_tweets['brand'] == 'adidas') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [20]:
print('Calidad Adidas: ', adidas_quality.blob_scores.mean())

Calidad Adidas:  0.20460581742491246


**Resultados Asics:**

In [21]:
asics_price = brands_tweets[(brands_tweets['brand'] == 'asics') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [22]:
print('Precio Asics: ', asics_price.blob_scores.mean())

Precio Asics:  0.10397568056206592


In [23]:
asics_quality = brands_tweets[(brands_tweets['brand'] == 'asics') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [24]:
print('Calidad Asics: ', asics_quality.blob_scores.mean())

Calidad Asics:  0.23039710194948845


**Resultados Reebok:**

In [25]:
reebok_price = brands_tweets[(brands_tweets['brand'] == 'reebok') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [26]:
print('Precio Reebok: ', reebok_price.blob_scores.mean())

Precio Reebok:  0.16575064075064078


In [27]:
reebok_quality = brands_tweets[(brands_tweets['brand'] == 'reebok') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [28]:
print('Calidad Reebok: ', reebok_quality.blob_scores.mean())

Calidad Reebok:  0.2610846382136843


**Resultados Skechers:**

In [29]:
skechers_price = brands_tweets[(brands_tweets['brand'] == 'skechers') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [30]:
print('Precio Skechers: ', skechers_price.blob_scores.mean())

Precio Skechers:  0.20445661163847706


In [31]:
skechers_quality = brands_tweets[(brands_tweets['brand'] == 'skechers') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [32]:
print('Calidad Skechers: ', skechers_quality.blob_scores.mean())

Calidad Skechers:  0.2719384021938369


**Resultados Under Armour:**

In [33]:
under_armour_price = brands_tweets[(brands_tweets['brand'] == 'under armour') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [34]:
print('Precio Under Armour: ', under_armour_price.blob_scores.mean())

Precio Under Armour:  0.11430521809290652


In [35]:
under_armour_quality = brands_tweets[(brands_tweets['brand'] == 'under armour') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [36]:
print('Calidad Under Armour: ', under_armour_quality.blob_scores.mean())

Calidad Under Armour:  0.08316104229232908


**Resultados Umbro:**

In [37]:
umbro_price = brands_tweets[(brands_tweets['brand'] == 'umbro') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [38]:
print('Precio Umbro: ', umbro_price.blob_scores.mean())

Precio Umbro:  0.21035246646357758


In [39]:
umbro_quality = brands_tweets[(brands_tweets['brand'] == 'umbro') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [40]:
print('Calidad Umbro: ', umbro_quality.blob_scores.mean())

Calidad Umbro:  0.2218498321710396


**Resultados Puma:**

In [41]:
puma_price = brands_tweets[(brands_tweets['brand'] == 'puma') & (brands_tweets['brand_attribute'] == 'price') & (brands_tweets['blob_scores'] != 0.000)]

In [42]:
print('Precio Puma: ', puma_price.blob_scores.mean())

Precio Puma:  0.17975266982289456


In [43]:
puma_quality = brands_tweets[(brands_tweets['brand'] == 'puma') & (brands_tweets['brand_attribute'] == 'quality') & (brands_tweets['blob_scores'] != 0.000)]

In [44]:
print('Calidad Puma: ', puma_quality.blob_scores.mean())

Calidad Puma:  0.23911617268566362


### VADER

In [48]:
'''Con un reset index ya podemos manejar los scores como un diccionario!!!'''

'Con un reset index ya podemos manejar los scores como un diccionario!!!'

In [51]:
brands_tweets.vader_scores[0]['compound']

0.4939

In [54]:
print('Precio Nike: ', brands_tweets.vader_scores[:10]['compound'].mean())

KeyError: 'compound'

In [None]:
nike_price.vader_scores[13]

In [None]:
def compound_score(scores):
    return  scores

nike_price.vader_scores['neg'].mean()

# PRUEBAS

In [None]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

In [None]:
lemmatizer.lemmatize('are')

In [None]:
from textblob import Word

a = Word('are')
a.lemmatize('v')

In [None]:
nlp = spacy.load('en_core_web_sm')

In [None]:
sent = 'Gus is helping organize a developer'

for token in nlp(sent):
    print (token, token.lemma_)

In [None]:
from spacy import displacy

doc = nlp(sent)

displacy.render(doc, style='dep', jupyter=True)