# Sentimental Analysis in Python


1. Vader (Valence Aware Dictionary and Sentiment Reasoner) - Bag of Words approach
2. Roberta Pretrained LLM Model
3. HuggingFace Pipeline


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('ggplot')

import nltk

In [2]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Loukik\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [3]:
nltk.download('maxent_ne_chunker')
nltk.download('words')

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\Loukik\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\Loukik\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


True

In [4]:
df = pd.read_csv('reviews.csv')
df.head(2)

Unnamed: 0,text
0,This toothpaste is amazing! My teeth feel so c...
1,This toothbrush is the worst! It's too hard an...


In [5]:
print(df.shape)

(10, 1)


VADER - (Valence Aware Dictionary and Sentiment Reasoner) - Bag of Words approach


It takes all the word in out sentence and have all the values of positive negative or neutral for each of the words and it will add up all the values of words and then will tell what is the score , i.e 1) Positive 2) Negative , 3) Neutral.



# Roberta PreTrained Model
- Use a model trained large corpus of data
- Transformer model accounts for the words but also the context related to the other words
- Vader could not context related words or could not take out correct conclusion if the words are not clear or cannot figure out the context and then analyse
- Takes the context of the sentence too and take out the analysis

In [6]:
%pip install scipy

Note: you may need to restart the kernel to use updated packages.


In [7]:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax # applying to the output and will be betn  0 and 1

  from .autonotebook import tqdm as notebook_tqdm


In [8]:
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
# downloads all the weights

pytorch_model.bin: 100%|██████████| 499M/499M [00:47<00:00, 10.5MB/s] 
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [16]:
example = df['text'][2]
example

"I'm really disappointed with this mouthwash. It doesn't taste good and it doesn't seem to work very well."

In [18]:
def polarity_scores_roberta(example):
  encodedText = tokenizer(example ,  return_tensors = 'pt')
  output = model(**encodedText)
  scores = output[0][0].detach().numpy()
  scores = softmax(scores)
  scores_dict = {
    'roberta_neg' : scores[0],
    'roberta_neu' : scores[1],
    'roberta_pos' : scores[2]
  }

  

  return scores_dict


polarity_scores_roberta(example)

{'roberta_neg': 0.983375,
 'roberta_neu': 0.014249509,
 'roberta_pos': 0.0023755375}