# Sentiment Analysis using Roberta

<span style="color:grey">

The main objective of this project is to use a pre-trained model named Roberta which is created by training through a large number of twitter responses from users and create a application using streamlit to analyze the sentiment and emotion of user based on the text they provide.

For example:

User-Input : 'I am so happy today'
    
result :   Positive

</span>



### Flow of this project:



<div class='blue'>
<span style="color:grey">    

i)   Setting up the environment by adding all the necessary modules (such as numpy, scipy, pandas) and importing the roberta module from hugging-face.
    
ii)  To import a csv file containing bunch of comments from a Amazon products reviews to view the accuracy of roberta. (by comparing roberta's score with      our intuition)
    
iii) Using roberta create a function which takes in a string from user and returns the emotion (either Positive, Negative, Neutral) along with an emoji        expressing those.
    
iv)  Making use of Streamlit to create a sort of application which basically does what the function do in (iii).
</span>
</div>


#### Step-1: Setting up the environment

In [None]:
#pip install transformers

In [82]:
import numpy as np
import pandas as pd
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer
from scipy.special import softmax
import streamlit as st

In [2]:
MODEL = "cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [3]:
# Checking whether the model is well loaded or not

text= 'Hello, man'
tokenizer(text, return_tensors='pt') 

{'input_ids': tensor([[    0, 31414,     6,   313,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1]])}

#### Step-2: Importing a csv with user reviews and viewing the response from Roberta model

In [5]:
data = pd.read_csv('Reviews.csv')

In [6]:
data.head(10)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,5,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,1,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,4,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...
3,4,B000UA0QIQ,A395BORC6FGVXV,Karl,3,3,2,1307923200,Cough Medicine,If you are looking for the secret ingredient i...
4,5,B006K2ZZ7K,A1UQRSCLF8GW1T,"Michael D. Bigham ""M. Wassir""",0,0,5,1350777600,Great taffy,Great taffy at a great price. There was a wid...
5,6,B006K2ZZ7K,ADT0SRK1MGOEU,Twoapennything,0,0,4,1342051200,Nice Taffy,I got a wild hair for taffy and ordered this f...
6,7,B006K2ZZ7K,A1SP2KVKFXXRU1,David C. Sullivan,0,0,5,1340150400,Great! Just as good as the expensive brands!,This saltwater taffy had great flavors and was...
7,8,B006K2ZZ7K,A3JRGQVEQN31IQ,Pamela G. Williams,0,0,5,1336003200,"Wonderful, tasty taffy",This taffy is so good. It is very soft and ch...
8,9,B000E7L2R4,A1MZYO9TZK0BBI,R. James,1,1,5,1322006400,Yay Barley,Right now I'm mostly just sprouting this so my...
9,10,B00171APVA,A21BT40VZCCYT4,Carol A. Reed,0,0,5,1351209600,Healthy Dog Food,This is a very healthy dog food. Good for thei...


In [18]:
data.shape  

(568454, 10)

In [20]:
data = data.head(1000)  # Original data is too large so selecting only the first 1000 to work on
data.shape

(1000, 10)

In [9]:
example_sentence = data['Text'][0]
example_sentence

'I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.'

In [16]:
# Checking the sentiment using roberta model

encoded_input = tokenizer(example_sentence, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

for i in range(0,1):
    print('negative: ',scores[i])
    print('neutral:  ',scores[i+1])
    print('positive: ',scores[i+2])
    

negative:  0.009624252
neutral:   0.049980428
positive:  0.9403953


In [29]:
encoded_input = tokenizer(data['Text'][50], return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

print(data['Text'][50],'\n')

for i in range(0,1):
    print('negative: ',scores[i])
    print('neutral:  ',scores[i+1])
    print('positive: ',scores[i+2])

This oatmeal is not good. Its mushy, soft, I don't like it. Quaker Oats is the way to go. 

negative:  0.97635514
neutral:   0.020687476
positive:  0.0029573706


#### Step-4: Creating a function which takes in a string from user and returns the emotion (either Positive, Negative, Neutral) along with an emoji   expressing those.


In [72]:
U1F60A = '\U0001F60A'  # Assign the Unicode value for the emoji happy
U1F612 = '\U0001F612'  # Unamused face
U1F610 = '\U0001F610'  # Neutral face


In [73]:
def sentiment_finder(text):
    encoded_input = tokenizer(text, return_tensors='pt')
    output = model(**encoded_input)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    
    index = np.where(scores==scores.max())[0][0]   # return the index of maximum value among three scores determining the sentiments
    if (index==0): 
        return 'Negative', U1F612
    elif (index==1):
        return 'Neutral', U1F610
    else:
        return 'Positive',U1F60A
    

In [77]:
print('Review from user: ',data['Text'][4], '\n')

print('The sentiment predicted from roberta model is: ',sentiment_finder(data['Text'][4]))

Review from user:  Great taffy at a great price.  There was a wide assortment of yummy taffy.  Delivery was very quick.  If your a taffy lover, this is a deal. 

The sentiment predicted from roberta model is:  ('Positive', '😊')


#### Step-5: Making a streamlit app which returns asks user to enter a text and returns their sentiment (making use of Roberta model)

In [83]:
from streamlit_jupyter import StreamlitPatcher, tqdm
StreamlitPatcher().jupyter() 

In [85]:
st.title("Sentiment Analysis")
temp = """
       <div style = "background-color:tomato; padding:10px">
       <h2 style= "color:white; text-align:center;"> Real Time Sentiment Analysis </h2> 
       </div>
"""
       
st.markdown(temp, unsafe_allow_html=True)
text = st.text_input("Text","")
if st.button("Predict"):
    result = sentiment_finder(text)
    st.success(result)

# Sentiment Analysis


       <div style = "background-color:tomato; padding:10px">
       <h2 style= "color:white; text-align:center;"> Real Time Sentiment Analysis </h2> 
       </div>


Textarea(value='', description='Text', placeholder='Type something')

2023-09-30 20:43:50.565 
  command:

    streamlit run C:\Users\khare\Anaconda3\lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
