# Análisis de Polaridad

En análisis de polaridad consiste en otorgar un valor numérico (positivo o negativo) a diferentes palabras o expresiones asociadas a su uso más común con respecto a las emociones.

### Vader Sentiment Analysis
VADER Sentiment Analysis es una herramienta de analisis de sentimiento basada en <b>reglas léxicas</b>. Funciona a partir de un conjunto de palabras o <b>lexicón</b> cuyo significado semántico está asociado mayormente a expresiones positivas o negativas. Al analizar una oración Vader tiene en cuenta todas las palabras que puedan afectar el sentimiento expresado y da como resultado un analisis del porcentaje estimado de expresiones positivas y negativas.

Las palabras reconocidas por Vader tienen asociadas un <b>valor numérico</b> positivo o negativo dependiendo de la emoción que expresan. Si una palabra no se encuentra en la lista de palabras valoradas esta no se tiene en cuenta para la evaluación.

Al realizar un análisis de sentimiento sobre un texto Vader devuelve 4 valores
- Porcentaje del contenido que cae dentro de la clasificación positiva
- Porcentaje del contenido que cae dentro de la clasificación neutral
- Porcentaje del contenido que cae dentro de la clasificación negativa
- Puntuación compuesta: Total de los valores obtenidos normalizado entre 1 y -1


In [1]:
# conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

### Obtener polaridad

In [2]:
sentence = "This is a great product! Very good quality"
# Instanciar Analizador
sentiment_analyzer = SentimentIntensityAnalyzer()
# Analizar polaridad de la oración
analisis = sentiment_analyzer.polarity_scores(sentence)
print(analisis)

{'neg': 0.0, 'neu': 0.442, 'pos': 0.558, 'compound': 0.8217}


## Ejemplo

In [3]:
import pandas as pd

df = pd.read_csv("movie-reviews.csv")
df

Unnamed: 0,reviews
0,Excellent movie. The acting was great and it i...
1,A bit boring at first but enjoyable.
2,Very boring. You cant tell it has a low budget.
3,I loved it! Best movie of the year!
4,The special effects looked cheap and the plot ...
5,The actors did a good job but the script is te...
6,Boring. So boring.
7,Not the worst movie I've seen
8,A waste of time


In [4]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    # Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row["reviews"])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis["compound"] >= 0:
        row["result"] = "Positive"
    elif analisis["compound"] <= 0:
        row["result"] = "Negative"
    else:
        row["result"] = "Neutral"
df

Unnamed: 0,reviews,negative,neutral,positive,result
0,Excellent movie. The acting was great and it i...,0.0,0.469,0.531,Positive
1,A bit boring at first but enjoyable.,0.157,0.476,0.367,Positive
2,Very boring. You cant tell it has a low budget.,0.37,0.63,0.0,Negative
3,I loved it! Best movie of the year!,0.0,0.409,0.591,Positive
4,The special effects looked cheap and the plot ...,0.0,0.803,0.197,Positive
5,The actors did a good job but the script is te...,0.275,0.596,0.129,Negative
6,Boring. So boring.,0.83,0.17,0.0,Negative
7,Not the worst movie I've seen,0.0,0.603,0.397,Positive
8,A waste of time,0.483,0.517,0.0,Negative


# Ejercicio

- Obtener de la API Tweets que no sean retweet y que contengan el hashtag #SpiderMan2 en inglés.
- Realizar limpieza en los datos
- Evaluar la polaridad 

In [5]:
import requests
import os
from dotenv import load_dotenv
import pandas as pd
import string
from nltk.tokenize import TweetTokenizer

# conda install -c conda-forge vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

url = "https://api.twitter.com/2/tweets/search/recent"
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")
sentiment_analyzer = SentimentIntensityAnalyzer()

In [6]:
params = {
    "query": "#FarCry6  lang:en -is:retweet",
    "tweet.fields": "created_at",
    "max_results": 40,
}
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent": "v2FullArchiveSearchPython",
}
response = requests.get(url, headers=headers, params=params)
print(response)
# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print(response.json())

<Response [200]>
{'data': [{'created_at': '2021-10-23T14:31:10.000Z', 'id': '1451919127665430534', 'text': 'Far Cry 4 - Part 15 https://t.co/QSSJEhPtsp via @YouTube #farcry #farcry2 #farcry3 #farcry4 #farcry5 #farcry6 #farcrynewdawn #farcryprimal #pc #xbox #xboxone #xbone #XboxSeriesS #XboxSeriesX #ps4 #ps4pro #ps5'}, {'created_at': '2021-10-23T14:29:51.000Z', 'id': '1451918800086048769', 'text': '#FarCry6 available for both PS4 and PS5.. We have a few copies remaining. \n\n#PS4: 480gh #PS5: 550gh \nKeep your orders coming 💯💯 https://t.co/r57ppW5cLN'}, {'created_at': '2021-10-23T14:26:39.000Z', 'id': '1451917994322546692', 'text': 'Far Cry 5\n\n@FarCrygame @Ubisoft @UbisoftMTL @UbisoftQuebec @UbisoftItalia #FarCry5 #FarCry #Ubisoft #FarCry6 #VirtualPhotography #VPGamers #GamerGram https://t.co/Kv38SsvA21'}, {'created_at': '2021-10-23T14:25:34.000Z', 'id': '1451917718240825351', 'text': 'Down by the waterside.\n\n#FarCry6 #VirtualPhotography #VGPUnite #XboxSeriesX https://t.co/xYPpkDAs3

In [7]:
df = pd.json_normalize(response.json()["data"])
df

Unnamed: 0,created_at,id,text
0,2021-10-23T14:31:10.000Z,1451919127665430534,Far Cry 4 - Part 15 https://t.co/QSSJEhPtsp vi...
1,2021-10-23T14:29:51.000Z,1451918800086048769,#FarCry6 available for both PS4 and PS5.. We h...
2,2021-10-23T14:26:39.000Z,1451917994322546692,Far Cry 5\n\n@FarCrygame @Ubisoft @UbisoftMTL ...
3,2021-10-23T14:25:34.000Z,1451917718240825351,Down by the waterside.\n\n#FarCry6 #VirtualPho...
4,2021-10-23T14:21:10.000Z,1451916611926073344,I GOT A NEW PET GUYS #PS5Share #FarCry6 https:...
5,2021-10-23T14:20:05.000Z,1451916340223221764,"Happy Saturday Morning, I'm ready to jump into..."
6,2021-10-23T14:17:54.000Z,1451915791113277444,New episode up! Link in Bio!\n#FarCry6 #YouTub...
7,2021-10-23T14:16:55.000Z,1451915541766098953,Now I'm going to play #FarCry6 to calm my nerv...
8,2021-10-23T14:14:48.000Z,1451915008993071104,”Postcards from Yara”\n#FarCry6 \n0504\n@...
9,2021-10-23T14:13:17.000Z,1451914628808810496,Miami Vice meets #FarCry6 \n\nClick for full I...


In [8]:
URL_REGEX = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
MENTIONS_REGEX = r"(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9-_]+)"
HASHTAG_REGEX = r"#"


df["text"].replace(URL_REGEX, "", regex=True, inplace=True)
df["text"].replace(MENTIONS_REGEX, "", regex=True, inplace=True)
df["text"].replace(HASHTAG_REGEX, "", regex=True, inplace=True)
df["text"].replace(r"[^A-Za-z0-9 | \n]+", " ", regex=True, inplace=True)
df["text"].replace(r"\t", " ", regex=True, inplace=True)
df["text"].replace("[{}]".format(string.punctuation), " ", regex=True, inplace=True)
df["text"].replace(r"\n", "", regex=True, inplace=True)

df["text"] = df["text"].str.lower()
df
df = df.drop("created_at", 1)
df = df.drop("id", 1)

  df = df.drop("created_at", 1)
  df = df.drop("id", 1)


In [9]:
df["negative"] = ""
df["neutral"] = ""
df["positive"] = ""
df["result"] = ""
for index, row in df.iterrows():
    # Analizar cada review
    analisis = sentiment_analyzer.polarity_scores(row["text"])
    row["negative"] = analisis["neg"]
    row["neutral"] = analisis["neu"]
    row["positive"] = analisis["pos"]
    # Evaluar que valores se considerarán positivo o negativo
    if analisis["compound"] >= 0.7:
        row["result"] = "Positive"
    elif analisis["compound"] <= 0.3:
        row["result"] = "Negative"
    else:
        row["result"] = "Neutral"


df

Unnamed: 0,text,negative,neutral,positive,result
0,far cry 4 part 15 via farcry farcry2 farcr...,0.124,0.876,0.0,Negative
1,farcry6 available for both ps4 and ps5 we hav...,0.0,1.0,0.0,Negative
2,far cry 5 farcry5 farcry ubisoft farcry6 v...,0.256,0.744,0.0,Negative
3,down by the waterside farcry6 virtualphotograp...,0.0,1.0,0.0,Negative
4,i got a new pet guys ps5share farcry6,0.0,1.0,0.0,Negative
5,happy saturday morning i m ready to jump into...,0.118,0.665,0.217,Neutral
6,new episode up link in bio farcry6 youtuber g...,0.0,1.0,0.0,Negative
7,now i m going to play farcry6 to calm my nerve...,0.154,0.712,0.134,Negative
8,postcards from yara farcry6 0504 thephotom...,0.0,1.0,0.0,Negative
9,miami vice meets farcry6 click for full imagev...,0.0,1.0,0.0,Negative


In [10]:
# Tokenizar

tt = TweetTokenizer()

tokenized_text = df["text"].apply(tt.tokenize)
df["tokenized_text"] = tokenized_text
df

Unnamed: 0,text,negative,neutral,positive,result,tokenized_text
0,far cry 4 part 15 via farcry farcry2 farcr...,0.124,0.876,0.0,Negative,"[far, cry, 4, part, 15, via, farcry, farcry, 2..."
1,farcry6 available for both ps4 and ps5 we hav...,0.0,1.0,0.0,Negative,"[farcry, 6, available, for, both, ps4, and, ps..."
2,far cry 5 farcry5 farcry ubisoft farcry6 v...,0.256,0.744,0.0,Negative,"[far, cry, 5, farcry, 5, farcry, ubisoft, farc..."
3,down by the waterside farcry6 virtualphotograp...,0.0,1.0,0.0,Negative,"[down, by, the, waterside, farcry, 6, virtualp..."
4,i got a new pet guys ps5share farcry6,0.0,1.0,0.0,Negative,"[i, got, a, new, pet, guys, ps5share, farcry, 6]"
5,happy saturday morning i m ready to jump into...,0.118,0.665,0.217,Neutral,"[happy, saturday, morning, i, m, ready, to, ju..."
6,new episode up link in bio farcry6 youtuber g...,0.0,1.0,0.0,Negative,"[new, episode, up, link, in, bio, farcry, 6, y..."
7,now i m going to play farcry6 to calm my nerve...,0.154,0.712,0.134,Negative,"[now, i, m, going, to, play, farcry, 6, to, ca..."
8,postcards from yara farcry6 0504 thephotom...,0.0,1.0,0.0,Negative,"[postcards, from, yara, farcry, 6, 0504, theph..."
9,miami vice meets farcry6 click for full imagev...,0.0,1.0,0.0,Negative,"[miami, vice, meets, farcry, 6, click, for, fu..."


In [11]:
from nltk.corpus import stopwords

stopwords = set(stopwords.words("english"))


def stop_func(tokenized_text: list) -> list:
    no_stopwords_data = []
    for x in tokenized_text:
        if x.lower() not in stopwords:
            no_stopwords_data.append(x)
    return no_stopwords_data


positive = []
negative = []
neutral = []
for index, row in df.iterrows():
    if row["result"] == "Positive":
        positive.extend(row["tokenized_text"])
stop_func(positive)

['returnal',
 'goty',
 'next',
 'list',
 'farcry',
 '6',
 'good',
 'love',
 'everything',
 'check',
 'fishing',
 'serious',
 'done',
 'well',
 'literally',
 'relaxes',
 'hey',
 'hope',
 'ur',
 'grerat',
 'going',
 'live',
 'farcry',
 '6',
 'hope',
 'see',
 'might',
 'nice',
 'farcry',
 '6',
 'could',
 'patch',
 'code',
 'allowed',
 'game',
 'remember',
 'turned',
 'car',
 'radio',
 'least',
 'vehicle',
 'ideally',
 'vehicles',
 'amp',
 'even',
 'better',
 'still',
 'volume',
 'slider',
 'menus',
 'including',
 'entirely',
 'would',
 'ideal']