<a href="https://colab.research.google.com/github/richardmukechiwa/Sentiment-Analysis-with-BERT/blob/main/SentimentAnalysisBERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [69]:
import torch
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import requests
import re


# **Instantiate the Model**

In [70]:
tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model =AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

## **Encode and Calculate Sentiment**

In [71]:
#extract tokens of a sentence
tokens = tokenizer.encode("What a worst of resources", return_tensors='pt')
tokens[0]

tensor([  101, 11523,   143, 43060, 10108, 19030,   102])

In [72]:
#retrieve the tokenized sentence
tokenizer.decode(tokens[0])

'[CLS] what a worst of resources [SEP]'

In [73]:
# test the tokens in the model
result = model(tokens)
result

SequenceClassifierOutput(loss=None, logits=tensor([[ 4.2731,  1.9730, -0.3172, -2.6130, -2.6067]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [74]:
# extract useful sentiments using pytorch's argmax
torch.argmax(result.logits)


tensor(0)

In [75]:
# convert the tensor to an integer
int(torch.argmax(result.logits))+1 # The higher the number the better the sentiment

1

# **Collect Reviews**

In [76]:
# Extracting reviews from Yelp website to do Sentiment Analysis
r = requests.get("https://www.sitejabber.com/reviews/hawalili.com")
soup = BeautifulSoup(r.text, "html.parser")
regex = re.compile(".*margin-bottom:10px*.")
results = soup.find_all('p',{'style': regex})
reviews = [result.text for result in results]

In [77]:
#show results
r.text



In [78]:
# first result from the soup
results[0].text

'Always good to find good shop. And this one is exactly one! Good looking shirts and other things! Thank you will buy again'

In [79]:
# the reviews collected
reviews[1:3]

["I've found the BEST place to shop for my wardrobe! I have 13 of these super light, super graphics shiirts in my closet. Yesterday, I ordered 15 more after an exhaustive review of the web site. Now, it'll be 28! In my building, I'm now well known for my shirt wardrobe. Some are beautiful, some exotic, some funny, ALL fun. The best part is the price. $6.00 up to perhaps $25, but mostly economical. Dressing up can be fun, too!",
 "If it sounds too good to be true it usually is. The shirts they show on their website look great but what will show up at your front door is completely different, I would be embarrassed to wear these shirts in public, and the sizes are completely off. If you have a problem you have to pay all kinds of fees and spend time trying to return it. This is not an Amazon like company. Don't waste your money buying anything from this company,"]

# **Load Reviews into DataFrame and Score**

In [80]:
# using pandas, create a dataframe
df = pd.DataFrame(reviews, columns = ['Reviews'])
df.head()

Unnamed: 0,Reviews
0,Always good to find good shop. And this one is...
1,I've found the BEST place to shop for my wardr...
2,If it sounds too good to be true it usually is...
3,"I purchased 5 shirts from this site, the quali..."
4,"There were 2 shirts with a similar print, one ..."


In [81]:
3 # creating a sentiments score function
def sentiment_score(review):
  tokens = tokenizer.encode(review, return_tensors='pt')
  result = model(tokens)
  return int(torch.argmax(result.logits))+1

In [86]:

# testing the function with the reviews
sentiment_score(df["Reviews"].iloc[1])

4

In [88]:
# Applying the function on all the reviewxs in the dataframe using .apply and lambda
df['Sentiments'] = df['Reviews'].apply(lambda x: sentiment_score(x))
df.head()

Unnamed: 0,Reviews,Sentiments
0,Always good to find good shop. And this one is...,5
1,I've found the BEST place to shop for my wardr...,4
2,If it sounds too good to be true it usually is...,1
3,"I purchased 5 shirts from this site, the quali...",1
4,"There were 2 shirts with a similar print, one ...",1


# **Trying The Sentiment Analyis on Another Site**