# Análisis de Polaridad

En análisis de polaridad consiste en otorgar un valor numérico (positivo o negativo) a diferentes palabras o expresiones asociadas a su uso más común con respecto a las emociones.

### Vader Sentiment Analysis
VADER Sentiment Analysis es una herramienta de analisis de sentimiento basada en <b>reglas léxicas</b>. Funciona a partir de un conjunto de palabras o <b>lexicón</b> cuyo significado semántico está asociado mayormente a expresiones positivas o negativas. Al analizar una oración Vader tiene en cuenta todas las palabras que puedan afectar el sentimiento expresado y da como resultado un analisis del porcentaje estimado de expresiones positivas y negativas.

Las palabras reconocidas por Vader tienen asociadas un <b>valor numérico</b> positivo o negativo dependiendo de la emoción que expresan. Si una palabra no se encuentra en la lista de palabras valoradas esta no se tiene en cuenta para la evaluación.

Al realizar un análisis de sentimiento sobre un texto Vader devuelve 4 valores
- Porcentaje del contenido que cae dentro de la clasificación positiva
- Porcentaje del contenido que cae dentro de la clasificación neutral
- Porcentaje del contenido que cae dentro de la clasificación negativa
- Puntuación compuesta: Total de los valores obtenidos normalizado entre 1 y -1


In [38]:
import nltk
nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/andy/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### Obtener polaridad

In [20]:
sentence = "This is a great product! Very good quality"
# Instanciar Analizador
sentiment_analyzer = SentimentIntensityAnalyzer()
# Analizar polaridad de la oración
analisis = sentiment_analyzer.polarity_scores(sentence)
print(analisis)        

{'neg': 0.0, 'neu': 0.442, 'pos': 0.558, 'compound': 0.8217}


## Ejemplo

In [35]:
import pandas as pd
df = pd.read_csv("movie-reviews.csv")
df

Unnamed: 0,reviews
0,Excellent movie. The acting was great and it i...
1,A bit boring at first but enjoyable.
2,Very boring. You cant tell it has a low budget.
3,I loved it! Best movie of the year!
4,The special effects looked cheap and the plot ...
5,The actors did a good job but the script is te...
6,Boring. So boring.
7,Not the worst movie I've seen
8,A waste of time


In [37]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['reviews'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] >= 0 :
        row["result"] = "Positive"
    elif analisis['compound'] <=  0 :
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df

Unnamed: 0,reviews,negative,neutral,positive,result
0,Excellent movie. The acting was great and it i...,0.0,0.469,0.531,Positive
1,A bit boring at first but enjoyable.,0.157,0.476,0.367,Positive
2,Very boring. You cant tell it has a low budget.,0.37,0.63,0.0,Negative
3,I loved it! Best movie of the year!,0.0,0.409,0.591,Positive
4,The special effects looked cheap and the plot ...,0.0,0.803,0.197,Positive
5,The actors did a good job but the script is te...,0.275,0.596,0.129,Negative
6,Boring. So boring.,0.83,0.17,0.0,Negative
7,Not the worst movie I've seen,0.0,0.603,0.397,Positive
8,A waste of time,0.483,0.517,0.0,Negative


# Ejercicio

- Obtener de la API Tweets que no sean retweet y que contengan el hashtag #SpiderMan2 en inglés.
- Realizar limpieza en los datos
- Evaluar la polaridad 

In [77]:
import os
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")
#print (bearer_token)

In [78]:
url = "https://api.twitter.com/2/tweets/search/recent"

In [79]:
params = {
    'query': '#SpiderMan2 -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}

In [80]:
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"TweeetHunch"
} 

In [81]:
import requests
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-14T13:09:24.000Z', 'id': '1448637059623686153', 'text': '@strahzimate hello Aaron how are you today I want to present you a new suit for Peter Parker is the fusion of spidey and the CAP I hope you like it and they approve it #insomniacgames #Spiderverse  #SpidermanPS5  #SpiderMan2 🙏🏼🕷️🕷️🕷️🕷️ https://t.co/bslaAWAwns'}, {'created_at': '2021-10-14T12:06:08.000Z', 'id': '1448621138058358790', 'text': 'SPIDERMAN MOVIES RANKED #spiderman #spidermannowayhome #marvel #SpiderManPS5 #amazingspiderman #SpiderMan2 #spiderman3 #spiderverse #venom #TobeyMaguire #tomholland #andrewgarfield https://t.co/VCyD2gCAKX https://t.co/2z5km8TNgk https://t.co/RJbSudDOOp'}, {'created_at': '2021-10-14T12:04:03.000Z', 'id': '1448620617469681668', 'text': 'SPIDERMAN MOVIES RANKED #spiderman #spidermannowayhome #marvel #SpiderManPS5 #amazingspiderman #SpiderMan2 #spiderman3 #spiderverse #venom #TobeyMaguire #tomholland #andrewgarfield https://t.co/6KscRxZn1P https:

In [82]:
import pandas as pd
import re, string
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-10-14T13:09:24.000Z,1448637059623686153,@strahzimate hello Aaron how are you today I w...
1,2021-10-14T12:06:08.000Z,1448621138058358790,SPIDERMAN MOVIES RANKED #spiderman #spidermann...
2,2021-10-14T12:04:03.000Z,1448620617469681668,SPIDERMAN MOVIES RANKED #spiderman #spidermann...
3,2021-10-14T11:34:08.000Z,1448613085980217348,New details about Marvel's Spider-Man 2 game l...
4,2021-10-14T11:00:01.000Z,1448604500697092099,"🕷️ Además de en su película, #Venom será prot..."
...,...,...,...
95,2021-10-13T02:37:28.000Z,1448115643895615489,“You have a train to catch…”\n\n#SpiderMan2 #Q...
96,2021-10-13T02:36:49.000Z,1448115477432111113,One of the greatest fight scenes of all time #...
97,2021-10-13T02:35:47.000Z,1448115218761011203,And thus begins the iconic train scene 🕷🐙 #Spi...
98,2021-10-13T02:34:38.000Z,1448114930423586822,One more time for the people in the back! Spid...


In [85]:
URL_REGEX = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
MENTIONS_REGEX = r"(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9-_]+)"
HASHTAG_REGEX = r"#"



df["text"].replace(URL_REGEX,'',regex=True, inplace = True)
df["text"].replace(MENTIONS_REGEX,'',regex=True, inplace = True)
df["text"].replace(HASHTAG_REGEX,'',regex=True, inplace = True)
df["text"].replace(r"[^A-Za-z0-9 | \n]+",' ',regex=True, inplace = True)
df["text"].replace(r"\t",' ',regex=True, inplace = True)
df["text"].replace('[{}]'.format(string.punctuation),' ',regex=True, inplace = True)


df["text"] = df["text"].str.lower()

df

Unnamed: 0,id,text
0,1448637059623686153,hello aaron how are you today i want to prese...
1,1448621138058358790,spiderman movies ranked spiderman spidermannow...
2,1448620617469681668,spiderman movies ranked spiderman spidermannow...
3,1448613085980217348,new details about marvel s spider man 2 game l...
4,1448604500697092099,adem s de en su pel cula venom ser protag...
...,...,...
95,1448115643895615489,you have a train to catch \n\nspiderman2 quar...
96,1448115477432111113,one of the greatest fight scenes of all time s...
97,1448115218761011203,and thus begins the iconic train scene spide...
98,1448114930423586822,one more time for the people in the back spid...


In [87]:
df = df.drop('id', 1)

# Tokenización

In [88]:
from nltk.tokenize import TweetTokenizer
# Instanciar Tokenizer
tt = TweetTokenizer()
# Aplicar Tokenizer a la columna
tokenized_text = df['text'].apply(tt.tokenize)
df["tokenized_text"] = tokenized_text
df

Unnamed: 0,text,tokenized_text
0,hello aaron how are you today i want to prese...,"[hello, aaron, how, are, you, today, i, want, ..."
1,spiderman movies ranked spiderman spidermannow...,"[spiderman, movies, ranked, spiderman, spiderm..."
2,spiderman movies ranked spiderman spidermannow...,"[spiderman, movies, ranked, spiderman, spiderm..."
3,new details about marvel s spider man 2 game l...,"[new, details, about, marvel, s, spider, man, ..."
4,adem s de en su pel cula venom ser protag...,"[adem, s, de, en, su, pel, cula, venom, ser, p..."
...,...,...
95,you have a train to catch \n\nspiderman2 quar...,"[you, have, a, train, to, catch, spiderman, 2,..."
96,one of the greatest fight scenes of all time s...,"[one, of, the, greatest, fight, scenes, of, al..."
97,and thus begins the iconic train scene spide...,"[and, thus, begins, the, iconic, train, scene,..."
98,one more time for the people in the back spid...,"[one, more, time, for, the, people, in, the, b..."


In [111]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
df["compound"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['text'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    row["compound"] = analisis["compound"]
    # Evaluar que valores se considerarán positivo o negativo
    
    if analisis['compound'] >= 0.5 and analisis['compound'] <= 1:
        row["result"] = "Positive"
        
    elif analisis['compound'] <= 0:
        row["result"] = "Negative"
        
    else :
        row["result"] = "Neutral"

df

Unnamed: 0,text,negative,neutral,positive,result,compound
0,hello aaron how are you today i want to prese...,0,0.839,0.161,Positive,0.6908
1,spiderman movies ranked spiderman spidermannow...,0,0.833,0.167,Neutral,0.4215
2,spiderman movies ranked spiderman spidermannow...,0,0.833,0.167,Neutral,0.4215
3,new details about marvel s spider man 2 game l...,0.143,0.683,0.174,Neutral,0.128
4,adem s de en su pel cula venom ser protag...,0.087,0.913,0,Negative,-0.296
...,...,...,...,...,...,...
95,you have a train to catch \n\nspiderman2 quar...,0,1,0,Negative,0
96,one of the greatest fight scenes of all time s...,0.165,0.57,0.266,Neutral,0.3818
97,and thus begins the iconic train scene spide...,0,1,0,Negative,0
98,one more time for the people in the back spid...,0,1,0,Negative,0


In [112]:
import nltk
nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/agustin/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [113]:
#export
df.to_csv('polarity')