# Sentiment Analysis With **BERT**

### 1. Dependencies

In [1]:
!pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.1+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl (1982.2 MB)
[K     |█████████████▌                  | 834.1 MB 1.5 MB/s eta 0:13:12tcmalloc: large alloc 1147494400 bytes == 0x55910b842000 @  0x7f08f65db615 0x5590d19fe4cc 0x5590d1ade47a 0x5590d1a012ed 0x5590d1af2e1d 0x5590d1a74e99 0x5590d1a6f9ee 0x5590d1a02bda 0x5590d1a74d00 0x5590d1a6f9ee 0x5590d1a02bda 0x5590d1a71737 0x5590d1af3c66 0x5590d1a70daf 0x5590d1af3c66 0x5590d1a70daf 0x5590d1af3c66 0x5590d1a70daf 0x5590d1a03039 0x5590d1a46409 0x5590d1a01c52 0x5590d1a74c25 0x5590d1a6f9ee 0x5590d1a02bda 0x5590d1a71737 0x5590d1a6f9ee 0x5590d1a02bda 0x5590d1a70915 0x5590d1a02afa 0x5590d1a70c0d 0x5590d1a6f9ee
[K     |█████████████████               | 1055.7 MB 1.3 MB/s eta 0:11:32tcmalloc: large alloc 1434370048 bytes == 0x55914fe98000 @  0x7f08f65db615 0x5590d19fe4cc 0x5590d1ade47a 0x5590d1a012e

In [2]:
!pip install transformers requests beautifulsoup4 pandas numpy

Collecting transformers
  Downloading transformers-4.14.1-py3-none-any.whl (3.4 MB)
[K     |████████████████████████████████| 3.4 MB 5.3 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 36.3 MB/s 
[?25hCollecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 35.8 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.2.1-py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 448 kB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 44.3 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
 

In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

### 2. Model Creating

In [4]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/851k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/638M [00:00<?, ?B/s]

### 3. Result

In [5]:
tokens = tokenizer.encode('It was good but couldve been better. Great', return_tensors='pt')

In [6]:
result = model(tokens)

In [7]:
result.logits

tensor([[-2.7768, -1.2353,  1.4419,  1.9804,  0.4584]],
       grad_fn=<AddmmBackward>)

In [8]:
int(torch.argmax(result.logits))+1

4

### 4. Extracting Data

In [9]:
r = requests.get('https://www.yelp.com/biz/social-brew-cafe-pyrmont')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [10]:
import numpy as np
import pandas as pd

In [11]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [12]:
df['review'].iloc[0]

'I went here a little while ago- a beautiful morning,a lovely little brew house on a quaint street corner- perfection.I went to this cafe with my step-daughter Lucille.She was always raving about how great it was to her mother, so I thought it would be a nice idea to go here with her for her birthday... boy was I wrong.She announced her hatred for me while I was waiting for my extra large iced frappé. It felt like hours of awkward silence once she said those four words; "you\'re a low-life."Was it in my mind, or was my drink taking ages to arrive? The hands on the clock didn\'t budge from the last time I glanced at them- 7:43AM, where the fuck is my drink?"Why do you always feel you have to be my friend? You\'re not my dad!" She fired.I could only sit there, my head facing down towards the floral tablecloth that lay beneath my quivering arms. The bullet lodged in my heart.I don\'t understand why she hates me so much; is it my jokes? The funny way I walk? The fact that I often scream my

In [13]:
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [14]:
sentiment_score(df['review'].iloc[1])

5

### 5. Applying to Dataset

In [15]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

In [16]:
df.head()

Unnamed: 0,review,sentiment
0,I went here a little while ago- a beautiful mo...,2
1,I came to Social brew cafe for brunch while ex...,5
2,Ricotta hot cakes! These were so yummy. I ate ...,5
3,Good coffee and toasts. Straight up and down -...,5
4,We came for brunch twice in our week-long visi...,4


In [17]:
df['review'].iloc[3]

'Good coffee and toasts. Straight up and down - hits the spot with nothing mind blowing. Solid and tasty. \xa0Good work'