# NLP & Sentiment Analysis.

En esta etapa del proyecto manejaremos diferentes modelos pre-entrenados (librerías) y trataremos de evaluar cuál es el que nos ofrece mejores resultados.

Evidentemente, sería conveniente desarrollar un modelo propio ad-hoc según las necesidades particulares del proyecto, lo cual probablemente nos permita alcanzar resultados más precisos. Sin embargo, no debemos olvidar que por el momento pretendemos construir las bases del proyecto y obtener un mínimo producto viable que demuestre su potencial. Disponemos de muy poco tiempo como para desarrollar y entrenar nuestro propio modelo y es por eso que se opta por la utilización de modelos pre-entrenados de NLP a la hora de analizar el sentimiento de los tweets y, por tanto, la percepción del usuario respecto de las marcas.

### TextBlob (spaCy)

In [3]:
'''INTRO!!!'''

# Esplicación breve.

# Catarse bien de cómo funciona.
# Ojo instalaciones (lo he metido en todos laos).


'INTRO!!!'

In [1]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
text = 'I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy.'
doc = nlp(text)
doc._.blob.polarity                            # Polarity: -0.125
doc._.blob.subjectivity                        # Subjectivity: 0.9
doc._.blob.sentiment_assessments.assessments   # Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
doc._.blob.ngrams()   

[WordList(['I', 'had', 'a']),
 WordList(['had', 'a', 'really']),
 WordList(['a', 'really', 'horrible']),
 WordList(['really', 'horrible', 'day']),
 WordList(['horrible', 'day', 'It']),
 WordList(['day', 'It', 'was']),
 WordList(['It', 'was', 'the']),
 WordList(['was', 'the', 'worst']),
 WordList(['the', 'worst', 'day']),
 WordList(['worst', 'day', 'ever']),
 WordList(['day', 'ever', 'But']),
 WordList(['ever', 'But', 'every']),
 WordList(['But', 'every', 'now']),
 WordList(['every', 'now', 'and']),
 WordList(['now', 'and', 'then']),
 WordList(['and', 'then', 'I']),
 WordList(['then', 'I', 'have']),
 WordList(['I', 'have', 'a']),
 WordList(['have', 'a', 'really']),
 WordList(['a', 'really', 'good']),
 WordList(['really', 'good', 'day']),
 WordList(['good', 'day', 'that']),
 WordList(['day', 'that', 'makes']),
 WordList(['that', 'makes', 'me']),
 WordList(['makes', 'me', 'happy'])]

In [2]:
from textblob import TextBlob

text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
blob = TextBlob(text)

print(blob.sentiment_assessments.polarity)
# -0.125

print(blob.sentiment_assessments.subjectivity)
# 0.9

print(blob.sentiment_assessments.assessments)
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]


-0.125
0.9
[(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]


### VADER.

VADER (Valence Aware Dictionary and sEntiment Reasoner) es una librería utilizada para el análisis del sentimiento que se enfoca en los textos de social media. Así, pone énfases en las rules que captan la esencia del texto que normalmente se ve en las redes sociales. Algo interesante de VADER es que está pensado para poder actuar sobre texto sobre el que se ha hecho un limpieza muy básica (conservando emojis, signos de exclamación, etc.). Nosotros lo probaremos sobre los datos ya limpios ya que confiamos en que ofrecerán un mejor resultado (aunque no se presenta el proceso, se ha probado con los datos originales en bruto, pero no consique analizar los sentimientos correctamente y la gran mayoría de scores para negatividad, neutralidad y positivadad son 0). No obstante, lo ideal sería evaluar la precisión del análisis sobre tweets con diferentes niveles de limpieza (desde el más básico, al más detallista) y comparar los modelos para identificar el que ofrece un mayor rendimiento (incluyendo también los análisis desarrollados mediante otros modelos diferentes a VADER).

https://towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part-1-2697bb111ed4

In [4]:
from src.nlp_functions import *

In [6]:
import pandas as pd

brands_tweets = pd.read_pickle('data/brands_tokens.pkl')

In [7]:
brands_tweets.head()

Unnamed: 0,text,hashtags,brand_attribute,brand,token
0,nobody cares about nike in russia russia is al...,[],quality,nike,"[care, nike, russia, russia, adidas]"
1,ye green nike hoodie waala two weeks pehle put...,[],quality,nike,"[ye, green, nike, hoodie, waala, week, pehle, ..."
2,nike okundaye also knowns nike twins seven sev...,"['#womengiant', '#Documentwomen']",quality,nike,"[nike, okundaye, knowns, nike, twin, seven, se..."
3,day four of maxmadness air max month nike air ...,"['#maxmadness', '#AirMaxMonth', '#airmaxgang',...",quality,nike,"[day, maxmadness, air, max, month, nike, air, ..."
4,thank you sir,[],quality,nike,"[thank, sir]"


In [8]:
brands_tweets['vader_scores'] = brands_tweets.token.apply(vader_scoring)

In [9]:
brands_tweets.sample(10)

Unnamed: 0,text,hashtags,brand_attribute,brand,token,vader_scores
26586,the nike go flyease surfaces in another multic...,['#sneakersapp'],quality,nike,"[nike, flyease, surface, multicolor, sneakersapp]","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
7967,just in nike streetgato white lime glow,[],quality,nike,"[nike, streetgato, white, lime, glow]","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound..."
1476,for the first time the fruity pebbles lebron f...,[],quality,nike,"[time, fruity, pebble, lebron, set, retail, re...","{'neg': 0.0, 'neu': 0.748, 'pos': 0.252, 'comp..."
4839,nike ikea close russian stores as sanctions tr...,[],quality,nike,"[nike, ikea, close, russian, store, sanction, ...","{'neg': 0.208, 'neu': 0.792, 'pos': 0.0, 'comp..."
26706,sizes up to one two for the vast grey electric...,[],price,nike,"[size, vast, grey, electric, green, nike, spac...","{'neg': 0.0, 'neu': 0.809, 'pos': 0.191, 'comp..."
47081,so dumb it s ike when people were burning thei...,[],quality,adidas,"[dumb, ike, people, burning, nike, gear, like,...","{'neg': 0.525, 'neu': 0.35, 'pos': 0.125, 'com..."
41527,my number one actually wore em to see batman t...,[],quality,adidas,"[number, actually, wore, em, batman, tuesday, ...","{'neg': 0.0, 'neu': 0.645, 'pos': 0.355, 'comp..."
38498,so good i had to share check out all the items...,"['#poshmark', '#fashion', '#style', '#shopmycl...",quality,adidas,"[good, share, check, item, loving, poshmark, f...","{'neg': 0.0, 'neu': 0.383, 'pos': 0.617, 'comp..."
38801,ad dropped via nike us nike air sesh white var...,[],quality,adidas,"[ad, dropped, nike, nike, air, sesh, white, va...","{'neg': 0.0, 'neu': 0.826, 'pos': 0.174, 'comp..."
39417,by any means supreme x nike sb dunk drops toda...,[],price,adidas,"[mean, supreme, x, nike, sb, dunk, drop, today...","{'neg': 0.128, 'neu': 0.305, 'pos': 0.567, 'co..."


In [10]:
import spacy

from nltk.corpus import stopwords
from spacy.lang.en.stop_words import STOP_WORDS

from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

In [22]:
lemmatizer.lemmatize('are')

'are'

In [23]:
nlp = spacy.load('en_core_web_sm')

In [24]:
sent = 'Gus is helping organize a developer'

for token in nlp(sent):
    print (token, token.lemma_)

Gus Gus
is be
helping helping
organize organize
a a
developer developer
