<a href="https://colab.research.google.com/github/gomesluiz/pln-na-pratica/blob/main/u4-03-nlp-analise-de-sentimentos-pratica-1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Análise de Sentimentos



In [1]:
!pip install textblob==0.17.1
!pip install vaderSentiment==3.3.2
!pip install transformers==4.38.2

Collecting vaderSentiment==3.3.2
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [2]:
# Importa módulos essenciais para funcionalidades do notebook.
import re
import datetime
import sys

import transformers
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

print("Pacotes importados com sucesso! Notebook pronto para uso.")

Pacotes importados com sucesso! Notebook pronto para uso.


In [3]:
# Declara define funções utilitárias utilizadas no notebook.

def formata_msg(nivel, msg):
    """
    Formata uma mensagem de log incluindo o nível de severidade, timestamp
    e a mensagem.

    Parâmetros:
    - nivel (str): Nível de severidade da mensagem (ex: 'INFO', 'ERROR', 'WARNING').
    - msg (str): A mensagem de log propriamente dita.

    Retorna:
    - str: A mensagem de log formatada.
    """
    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

    return f"[{nivel}] {timestamp} - {msg}"

print(formata_msg("INFO", "Funções utilitárias prontas para utilização."))
print(formata_msg("INFO", f"Versão do Python: {sys.version} "))

[INFO] 2024-04-04 14:38:10 - Funções utilitárias prontas para utilização.
[INFO] 2024-04-04 14:38:10 - Versão do Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] 


In [4]:
texto_1 = "The movie was so awesome."
texto_2 = "The food here tastes terrible."

In [None]:
print(formata_msg("INFO",f"\n{texto_1}\n{texto_2}"))

[INFO] 2024-04-02 13:47:40 - 
The movie was so awesome.
The food here tastes terrible.


# Bag-Of-Words

In [15]:
import pandas as pd
dados = pd.read_csv("data.csv")
dados.head()

Unnamed: 0,Sentence,Sentiment
0,The GeoSolutions technology will leverage Bene...,positive
1,"$ESI on lows, down $1.50 to $2.50 BK a real po...",negative
2,"For the last quarter of 2010 , Componenta 's n...",positive
3,According to the Finnish-Russian Chamber of Co...,neutral
4,The Swedish buyout firm has sold its remaining...,neutral


In [16]:
from nltk.tokenize import RegexpTokenizer
tokenizador = RegexpTokenizer(r'[a-zA-z0-9]+')

In [17]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(stop_words='english', ngram_range=(1, 1), tokenizer=tokenizador.tokenize)
frequencias = cv.fit_transform(dados["Sentence"])



In [20]:
# Splitting the data into trainig and testing
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(frequencias, dados["Sentiment"], test_size=0.20, random_state=42)

In [21]:
# Training the model
from sklearn.naive_bayes import MultinomialNB
MNB = MultinomialNB()
MNB.fit(X_train, Y_train)

In [22]:
# Calculating the accuracy score of the model
from sklearn import metrics
predicted = MNB.predict(X_test)
acc = metrics.accuracy_score(predicted, Y_test)
print(formata_msg("INFO",f"Acurácia: {acc}"))

[INFO] 2024-04-04 15:01:35 - Acurácia: 0.6817792985457656


## Text Blob

In [None]:
# Determina a polaridade do texto
p_1 = TextBlob(texto_1).sentiment.polarity
p_2 = TextBlob(texto_2).sentiment.polarity
print(formata_msg("INFO",f"\n{p_1}\n{p_2}"))

[INFO] 2024-04-02 13:50:31 - 
1.0
-1.0


In [None]:
# Determina a subjetividade do texto
s_1 = TextBlob(texto_1).sentiment.subjectivity
s_2 = TextBlob(texto_2).sentiment.subjectivity
print(formata_msg("INFO",f"\n{s_1}\n{s_2}"))

[INFO] 2024-04-02 13:51:43 - 
1.0
1.0


## VADER


In [None]:
sentiment = SentimentIntensityAnalyzer()
se_1 = sentiment.polarity_scores(texto_1)
se_2 = sentiment.polarity_scores(texto_2)
print(formata_msg("INFO",f"\n{se_1}\n{se_2}"))

[INFO] 2024-04-02 13:57:58 - 
{'neg': 0.0, 'neu': 0.433, 'pos': 0.567, 'compound': 0.7384}
{'neg': 0.437, 'neu': 0.563, 'pos': 0.0, 'compound': -0.4767}


## Transformers


In [None]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
textos = [texto_1, texto_2]
se = sentiment_pipeline(textos)
print(formata_msg("INFO",f"\n{se}"))

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[INFO] 2024-04-02 14:07:52 - 
[{'label': 'POSITIVE', 'score': 0.9998677968978882}, {'label': 'NEGATIVE', 'score': 0.9991149306297302}]
