# Análisis de Polaridad

En análisis de polaridad consiste en otorgar un valor numérico (positivo o negativo) a diferentes palabras o expresiones asociadas a su uso más común con respecto a las emociones.

### Vader Sentiment Analysis
VADER Sentiment Analysis es una herramienta de analisis de sentimiento basada en <b>reglas léxicas</b>. Funciona a partir de un conjunto de palabras o <b>lexicón</b> cuyo significado semántico está asociado mayormente a expresiones positivas o negativas. Al analizar una oración Vader tiene en cuenta todas las palabras que puedan afectar el sentimiento expresado y da como resultado un analisis del porcentaje estimado de expresiones positivas y negativas.

Las palabras reconocidas por Vader tienen asociadas un <b>valor numérico</b> positivo o negativo dependiendo de la emoción que expresan. Si una palabra no se encuentra en la lista de palabras valoradas esta no se tiene en cuenta para la evaluación.

Al realizar un análisis de sentimiento sobre un texto Vader devuelve 4 valores
- Porcentaje del contenido que cae dentro de la clasificación positiva
- Porcentaje del contenido que cae dentro de la clasificación neutral
- Porcentaje del contenido que cae dentro de la clasificación negativa
- Puntuación compuesta: Total de los valores obtenidos normalizado entre 1 y -1


In [1]:
import nltk
nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/santiago/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### Obtener polaridad

In [2]:
sentence = "This is a great product! Very good quality"
# Instanciar Analizador
sentiment_analyzer = SentimentIntensityAnalyzer()
# Analizar polaridad de la oración
analisis = sentiment_analyzer.polarity_scores(sentence)
print(analisis)        

{'neg': 0.0, 'neu': 0.442, 'pos': 0.558, 'compound': 0.8217}


## Ejemplo

In [3]:
import pandas as pd
df = pd.read_csv("movie-reviews.csv")
df

FileNotFoundError: [Errno 2] No such file or directory: 'movie-reviews.csv'

In [None]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['reviews'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] >= 0 :
        row["result"] = "Positive"
    elif analisis['compound'] <=  0 :
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"
df

# Ejercicio

- Obtener de la API Tweets que no sean retweet y que contengan el hashtag #SpiderMan2 en inglés.
- Realizar limpieza en los datos
- Evaluar la polaridad 

In [26]:
import requests
import os
from dotenv import load_dotenv
import pandas as pd
import string
from nltk.tokenize import TweetTokenizer
from wordcloud import WordCloud
import matplotlib.pyplot as plt

import nltk
nltk.download('vader_lexicon')
#conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
url = "https://api.twitter.com/2/tweets/search/recent"
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")
sentiment_analyzer = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/santiago/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [41]:
params = {
    'query': '#FarCry6  lang:en -is:retweet',
    'tweet.fields':'created_at',
    'max_results':40
}
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
}
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-14T13:38:58.000Z', 'id': '1448644502328127493', 'text': '#FarCry6 #AssassinsCreed leap of faith. Or so we wished it will be \n\nhttps://t.co/TXvQ21Qw39'}, {'created_at': '2021-10-14T13:38:08.000Z', 'id': '1448644290536624137', 'text': 'Highest point of the map!? #FarCry6  #FarCryContest https://t.co/zP5GFV0dXt'}, {'created_at': '2021-10-14T13:38:06.000Z', 'id': '1448644284845015044', 'text': 'Join us on https://t.co/b1DuMSVUTU from 8pm tonight, for our Review Discussion and LIVE play through of Far Cry 6! #games #gaming #gamer #farcry6 #openworld #review https://t.co/tWvxdKGfcH'}, {'created_at': '2021-10-14T13:36:48.000Z', 'id': '1448643955562663938', 'text': 'Beautiful!\n#FarCry6 #Stadia https://t.co/RvHDWxNluv'}, {'created_at': '2021-10-14T13:36:38.000Z', 'id': '1448643916169822227', 'text': '#GameAwards best Performance of 2021 has gotta be Nisa Gunduz.. hands down!!!\n\n#FarCry6 @UbisoftToronto #FarCryContest #ArtisticofSociety #VP

In [42]:
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-10-14T13:38:58.000Z,1448644502328127493,#FarCry6 #AssassinsCreed leap of faith. Or so ...
1,2021-10-14T13:38:08.000Z,1448644290536624137,Highest point of the map!? #FarCry6 #FarCryCo...
2,2021-10-14T13:38:06.000Z,1448644284845015044,Join us on https://t.co/b1DuMSVUTU from 8pm to...
3,2021-10-14T13:36:48.000Z,1448643955562663938,Beautiful!\n#FarCry6 #Stadia https://t.co/RvHD...
4,2021-10-14T13:36:38.000Z,1448643916169822227,#GameAwards best Performance of 2021 has gotta...
5,2021-10-14T13:35:53.000Z,1448643726625050624,Jackpot! \n#FarCry6 #Stadia https://t.co/fgxa5...
6,2021-10-14T13:34:59.000Z,1448643499218202628,Will the real SLIM SHADY please stand up! 🤣\n#...
7,2021-10-14T13:32:50.000Z,1448642957297344524,Scenery looking beautiful as ever. 🥰\n\n#FarCr...
8,2021-10-14T13:32:13.000Z,1448642805144850440,WTF #2 - [Far Cry 6]\n\n@PlayStation\n@Ubisoft...
9,2021-10-14T13:31:42.000Z,1448642674798391304,It has begun.\n\n#FarCry6 https://t.co/OBMrcgBuEb


In [43]:
URL_REGEX = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
MENTIONS_REGEX = r"(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9-_]+)"
HASHTAG_REGEX = r"#"



df["text"].replace(URL_REGEX,'',regex=True, inplace = True)
df["text"].replace(MENTIONS_REGEX,'',regex=True, inplace = True)
df["text"].replace(HASHTAG_REGEX,'',regex=True, inplace = True)
df["text"].replace(r"[^A-Za-z0-9 | \n]+",' ',regex=True, inplace = True)
df["text"].replace(r"\t",' ',regex=True, inplace = True)
df["text"].replace('[{}]'.format(string.punctuation),' ',regex=True, inplace = True)
df["text"].replace(r"\n",'',regex=True, inplace = True)

df["text"] = df["text"].str.lower()
df
df = df.drop('created_at', 1)
df = df.drop('id', 1)

In [44]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    #Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row['text'])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis['compound'] >= 0.7 :
        row["result"] = "Positive"
    elif analisis['compound'] <=  0.3 :
        row["result"] = "Negative"
    else :
        row["result"] = "Neutral"


df

Unnamed: 0,text,negative,neutral,positive,result
0,farcry6 assassinscreed leap of faith or so we...,0.0,0.797,0.203,Neutral
1,highest point of the map farcry6 farcrycontest,0.0,1.0,0.0,Negative
2,join us on from 8pm tonight for our review d...,0.108,0.732,0.16,Negative
3,beautiful farcry6 stadia,0.0,0.339,0.661,Neutral
4,gameawards best performance of 2021 has gotta ...,0.0,0.826,0.174,Neutral
5,jackpot farcry6 stadia,0.0,1.0,0.0,Negative
6,will the real slim shady please stand up far...,0.0,0.796,0.204,Neutral
7,scenery looking beautiful as ever farcry6,0.0,0.562,0.438,Neutral
8,wtf 2 far cry 6 farcry6 farcry,0.58,0.42,0.0,Negative
9,it has begun farcry6,0.0,1.0,0.0,Negative


In [45]:
# Tokenizar

tt = TweetTokenizer()

tokenized_text = df['text'].apply(tt.tokenize)
df["tokenized_text"] = tokenized_text
df

Unnamed: 0,text,negative,neutral,positive,result,tokenized_text
0,farcry6 assassinscreed leap of faith or so we...,0.0,0.797,0.203,Neutral,"[farcry, 6, assassinscreed, leap, of, faith, o..."
1,highest point of the map farcry6 farcrycontest,0.0,1.0,0.0,Negative,"[highest, point, of, the, map, farcry, 6, farc..."
2,join us on from 8pm tonight for our review d...,0.108,0.732,0.16,Negative,"[join, us, on, from, 8p, m, tonight, for, our,..."
3,beautiful farcry6 stadia,0.0,0.339,0.661,Neutral,"[beautiful, farcry, 6, stadia]"
4,gameawards best performance of 2021 has gotta ...,0.0,0.826,0.174,Neutral,"[gameawards, best, performance, of, 2021, has,..."
5,jackpot farcry6 stadia,0.0,1.0,0.0,Negative,"[jackpot, farcry, 6, stadia]"
6,will the real slim shady please stand up far...,0.0,0.796,0.204,Neutral,"[will, the, real, slim, shady, please, stand, ..."
7,scenery looking beautiful as ever farcry6,0.0,0.562,0.438,Neutral,"[scenery, looking, beautiful, as, ever, farcry..."
8,wtf 2 far cry 6 farcry6 farcry,0.58,0.42,0.0,Negative,"[wtf, 2, far, cry, 6, farcry, 6, farcry]"
9,it has begun farcry6,0.0,1.0,0.0,Negative,"[it, has, begun, farcry, 6]"


In [60]:
from nltk.corpus import stopwords
stopwords = set(stopwords.words('english'))

def stop_func(tokenized_text: list) -> list:
    no_stopwords_data = []
    for x in tokenized_text:
        if x.lower() not in stop_words:
            no_stopwords_data.append(word)
    return no_stopwords_data


positive = []
negative = []
neutral = []
for index, row in df.iterrows():
      if row["result"] == "Positive":
            positive.extend(row["tokenized_text"])      
stop_func(positive)        

TypeError: argument of type 'function' is not iterable