# Sentiment Analyzer with Transformer

Sentiment Analysis with scraped data from the review website

**Install the dependecy packages**

In [137]:
!pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html


In [138]:
!pip install transformers



In [139]:
!pip install tensorflow



Packages for the web-scraping

In [140]:
!pip install requests beautifulsoup4



In [141]:
from transformers import AutoTokenizer,AutoModelForSequenceClassification
import torch
import numpy as np
import pandas as pd
import tensorflow as tf

Use the pre-trained transformer from the HF
https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment

In [142]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

In [143]:
tokens = tokenizer('I love this pizza')
tokens

{'input_ids': [101, 151, 11157, 10372, 59371, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1]}

In [144]:
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Create the tokenizer Encodings


In [145]:
import torch
# tokens = tokenizer.encode('I love this pizza',return_tensors='tf')
tokens = tokenizer.encode('I love this pizza',return_tensors='pt')
tokens

tensor([[  101,   151, 11157, 10372, 59371,   102]])

In [146]:
output = model(tokens)
output.logits

tensor([[-2.2257, -2.5091, -0.9815,  1.2772,  3.5961]],
       grad_fn=<AddmmBackward0>)

In [147]:
int(torch.argmax(output.logits)) + 1

5

Unit Test Code

In [148]:
def unit_test():
  test_data = ''
  while test_data != 'quit':
    test_data = input("Please enter the unit test input for the  Sentiment Analysis Model, or enter 'quit': ")
    tokens = tokenizer.encode(test_data,return_tensors='pt')
    output = model(tokens)
    sentiment = int(torch.argmax(output.logits)) + 1
    print(f'The sentiment value is : {sentiment}')
if __name__ =='__main__':
  unit_test()

Please enter the unit test input for the  Sentiment Analysis Model, or enter 'quit': quit
The sentiment value is : 1


Web-scraping the reviews

In [149]:
import requests
from bs4 import BeautifulSoup
import re

In [150]:
# This yelp url is chosen randomly for the testing purpose
url_data = requests.get('https://www.yelp.ca/biz/seven-lives-tacos-y-mariscos-toronto')
soup = BeautifulSoup(url_data.text,'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [151]:
type(reviews),len(reviews)

(list, 10)

Get the data into a dataframe format

In [152]:
df_data = pd.DataFrame(np.array(reviews),columns=['reviews'])
df_data.head(2)

Unnamed: 0,reviews
0,"TLDR: best taco place in Toronto, would eat ev..."
1,The kings of tacos in Toronto.Seven Lives has ...


In [153]:
df_data['reviews']

0    TLDR: best taco place in Toronto, would eat ev...
1    The kings of tacos in Toronto.Seven Lives has ...
2    The best tacos in town!! Amazing selection, so...
3    It's sooooooo good ! I've been wanting to try ...
4    So every time I would look up what to eat in T...
5    MMMM tacos!! And this place does it BIG!They m...
6    Heard about this place from friends in the are...
7    Good Location | Authentic | Quick ServiceI vis...
8    Craving Tacos and Seven Lives is usually my go...
9    Super fun to try! The tacos were incredibly fl...
Name: reviews, dtype: object

In [154]:
def sentiment(review):  
  # tokens = tokenizer(reviews,padding=True,truncation=True,max_length=512,return_tensors='pt')
  tokens = tokenizer(review,return_tensors='pt')
  output = model(tokens)
  return int(torch.argmax(output.logits)) + 1 

In [157]:
df_data['reviews'].iloc[1]

"The kings of tacos in Toronto.Seven Lives has always been a major player in the taco scene in the city, and even after many years, they continue to reign supreme. Most diners go for their fish tacos, but I wanted to see how the meat-based tacos fared. Safe to say that they were exceptional. I tried the pollo asado (chicken) and suadero (beef) tacos. Both were filled with flavour and left me wanting to buy more. They're quite big and two will leave most people satisfied. During busy times (especially in the summer) it can get incredibly busy with super long lineups, so make ample time for yourself before you go. Cheers.Check out @foodpharaoh on Instagram for more food!"

In [160]:
# Making the assumption that length of the tokens within the review is upto 512
df_data['sentimentscore'] = df_data['reviews'].apply(lambda x: sentiment(x[:512]))
df_data