# **1 . Import Dependencies**

In [33]:
import pandas as pd
import numpy as np
import torch
import requests
import re
from bs4 import BeautifulSoup
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# **2. Instantiate Model and Tokenizer**

This is a `bert-base-multilingual-uncased` model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish, and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5).

This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above or for further finetuning on related sentiment analysis tasks.

In [6]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# **3. Encode and Calculate Sentiment**

We can use `tokenizer.encode()` to tokenize a sentence, it will basically first seperate each word in a sentence as tokens and convert each tokens (word) into an integer which is basically the index ids that maps to the word embeddings

In [7]:
tokens = tokenizer.encode("I loved it") 
tokens # returns input ids of each word, thats mapped to the word embeddings

[101, 151, 46747, 10197, 102]

To get back the original sentence we can simply decode it, along with the special tokens

In [8]:
tokenizer.decode(tokens) # back to readable sentence along with the special tokens

'[CLS] i loved it [SEP]'

We can specify what format we want the encoded tokens to be using `return_tensors`, and either pass `tf` for Tensorflow tensors or `pt` for PyTorch tensors

In [9]:
tokens = tokenizer.encode("I loved it", return_tensors='pt') 
tokens 

tensor([[  101,   151, 46747, 10197,   102]])

Since it returns us 2D tensors we need to put `[0]` when decoding

In [10]:
tokenizer.decode(tokens[0])

'[CLS] i loved it [SEP]'

So we can now pass the tokens as input to the Model and it will return the Probability of the 5 class (from 1 to 5), the one with the highest probability is our actual output

In [11]:
res = model(tokens)
res

SequenceClassifierOutput(loss=None, logits=tensor([[-2.1030, -1.9658, -0.2264,  1.3489,  2.2864]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [12]:
res.logits

tensor([[-2.1030, -1.9658, -0.2264,  1.3489,  2.2864]],
       grad_fn=<AddmmBackward0>)

So we can simply use `torch.argmax()` to get the highest value index, however, we need to `+ 1` because the index starts from 0, but our scale is from **1 to 5**, not 0 to 4

In [13]:
torch.argmax(res.logits)+1

tensor(5)

So on the scale of 1 to 5, the sentiment output is 5 which made sense as our sentence was "I loved it"

# **4. Scrap Reviews from websites**

I will scrap the reviews from this website, it has only 7 reviews but oh well, 

basically the reviews are inside the `<header>` tags which is inside a div tag with a class "author" 

In [17]:
req = requests.get("https://www.kutchi-iti.com/2-testimonials.php")
soup = BeautifulSoup(req.text, 'html.parser')

regex = re.compile('author')
results = soup.find_all('div', {'class':regex})

# Extract content inside <header> tags
header_contents = [div.find('header').text.strip() for div in results if div.find('header')]

header_contents

['With the help of Kutchi-ITI I have substantially improved my understanding of Information Technology and Networking. Kutchi-ITI is simply an example of studying in a congenial environment.',
 'Here at Kutchi-ITI, you meet all the good, friendly people - from students to lecturers. You enjoy freedom to think, to express yourself and to succeed.',
 'The knowledge and skills I have gained at Kutchi-ITI will empower me to help others. I am very excited by what I can do and I have many ideas that I want to implement which will be useful to my country.',
 'I feel Kutchi-ITI offered me the best opportunity for development of my skills in community interaction and development.',
 '“World in one Place” is not only a slogan but lived reality. The multicultural environment that is unique to Kutchi-ITI provides me with the opportunity to become friends with people from all different corners of the world.',
 'Kutchi-ITI is a great institution. The staff and instructors are always helpful and are 

In [18]:
len(header_contents)

7

In [19]:
header_contents[0]

'With the help of Kutchi-ITI I have substantially improved my understanding of Information Technology and Networking. Kutchi-ITI is simply an example of studying in a congenial environment.'

# **5 Load the reviews into Pandas DataFrame**

To make things easier, lets just load the reviews into a Pandas DataFrame

In [21]:
df = pd.DataFrame(header_contents, columns=['review'])
df

Unnamed: 0,review
0,With the help of Kutchi-ITI I have substantial...
1,"Here at Kutchi-ITI, you meet all the good, fri..."
2,The knowledge and skills I have gained at Kutc...
3,I feel Kutchi-ITI offered me the best opportun...
4,“World in one Place” is not only a slogan but ...
5,Kutchi-ITI is a great institution. The staff a...
6,Kutchi-ITI gives a great opportunity for stude...


To get any particular review we can use `loc[index]`

In [32]:
df.review.loc[0]

'With the help of Kutchi-ITI I have substantially improved my understanding of Information Technology and Networking. Kutchi-ITI is simply an example of studying in a congenial environment.'

Now we will create a function that simply Tokenize the reviews and do predictions

In [38]:
def sentiment_analyse(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = torch.argmax(model(tokens).logits)+1
    return int(result)

In [39]:
sentiment_analyse(df.review.loc[0])

5

Lets do it on all the reviews

In [43]:
for i in df.review:
    print("Sentence: {} \n-Sentiment: {}".format(i,sentiment_analyse(i)))

Sentence: With the help of Kutchi-ITI I have substantially improved my understanding of Information Technology and Networking. Kutchi-ITI is simply an example of studying in a congenial environment. 
-Sentiment: 5
Sentence: Here at Kutchi-ITI, you meet all the good, friendly people - from students to lecturers. You enjoy freedom to think, to express yourself and to succeed. 
-Sentiment: 5
Sentence: The knowledge and skills I have gained at Kutchi-ITI will empower me to help others. I am very excited by what I can do and I have many ideas that I want to implement which will be useful to my country. 
-Sentiment: 5
Sentence: I feel Kutchi-ITI offered me the best opportunity for development of my skills in community interaction and development. 
-Sentiment: 5
Sentence: “World in one Place” is not only a slogan but lived reality. The multicultural environment that is unique to Kutchi-ITI provides me with the opportunity to become friends with people from all different corners of the world. 

So it predicted the sentiments properly as all the 7 reviews are highly positive, now lets add a column to the dataframe for sentiment of corresponding reviews

In [44]:
df['sentiment'] = [sentiment_analyse(sentence) for sentence in df.review]
df

Unnamed: 0,review,sentiment
0,With the help of Kutchi-ITI I have substantial...,5
1,"Here at Kutchi-ITI, you meet all the good, fri...",5
2,The knowledge and skills I have gained at Kutc...,5
3,I feel Kutchi-ITI offered me the best opportun...,5
4,“World in one Place” is not only a slogan but ...,5
5,Kutchi-ITI is a great institution. The staff a...,5
6,Kutchi-ITI gives a great opportunity for stude...,5


In [46]:
model.save_pretrained('artifacts/bert-model')
tokenizer.save_pretrained('artifacts/bert-tokenizer')

('artifacts/bert-tokenizer\\tokenizer_config.json',
 'artifacts/bert-tokenizer\\special_tokens_map.json',
 'artifacts/bert-tokenizer\\vocab.txt',
 'artifacts/bert-tokenizer\\added_tokens.json',
 'artifacts/bert-tokenizer\\tokenizer.json')