# Project: Sentiment Analysis with BERT Neural Network
### Natural Language Processing (NLP)

## Workflow 

1. Install Transformers
2. Perform Sentiment Scoring using BERT
3. Scrape reviews form Yelp and Score

## 1. Install and Import Dependencies

In [2]:
!pip3 install torch torchvision torchaudio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
!pip install transformers requests beautifulsoup4

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.22.1-py3-none-any.whl (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 27.8 MB/s 
Collecting huggingface-hub<1.0,>=0.9.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 53.5 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 43.7 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.22.1


In [4]:
# importing libraries

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

# tokenizer is going to allow to pass through a string and convert that into a sequence of numbers that we can pass to our nlp model
# 'AutoModelForSequenceClassification' is gonna give the architecture from transformers to be able to load in out nlp model
# 'requests' is going to be used to grab data or grab the web page from from yelp
# beautiful soup allows us to actually traverse the dom results from yelp, so this allows us to extract data that we actually need
# 're' is going to allow us to create a regex function to be able to extract the specific comments that we want

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

## 2. Model

In [7]:
# instantiate the model

tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/669M [00:00<?, ?B/s]

## 3. Encode and Calculate Sentiment

In [8]:
tokens = tokenizer.encode('I hate this, absolutely the worst', return_tensors='pt')

tokens[0]

tensor([  101,   151, 39487, 10372,   117, 35925, 10563, 10103, 43060,   102])

In [9]:
# decoding the tokens

tokenizer.decode(tokens[0])

'[CLS] i hate this, absolutely the worst [SEP]'

In [10]:
# implement the sentiment analysis

result = model(tokens)

In [12]:
result

# The output from the model is a one-hot encoded list of scores
# The position with the highest score represents the sentiment rating
# eg. [.9,.2,.1,-.2,-.5] is a rating of 1

SequenceClassifierOutput(loss=None, logits=tensor([[ 5.0607,  1.6029, -1.0123, -3.0154, -1.8469]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [14]:
result.logits

tensor([[ 5.0607,  1.6029, -1.0123, -3.0154, -1.8469]],
       grad_fn=<AddmmBackward0>)

In [15]:
int(torch.argmax(result.logits)) + 1

1

## 4. Collect Reviews

In [17]:
# scrapper code

r = requests.get('https://www.yelp.com/biz/mejico-sydney-2')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment*.')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [19]:
results[2].text

"Out of all the restaurants that I tried in Sydney, this was definitely the most reasonably priced one offering good food of course. We started off with Margarita's - they have $10 margaritas on Mondays (or perhaps all weekdays). We lost count of how many we had, they were so good. On to the food, we tried a little bit of everything - we ordered corn lollipops, jalapeño poppers, grilled halloumi, batata bravas to start and we shared 2 items from the grill which I can't remember (probably because of the margarita's). We ended with churros & chocolate sauce which was awesome. Service was good and the staff waiting us was very friendly. He also recommended us portions for certain items to match our party size. For instance, we didn't have to order 2 portions of an appetizer instead we could order one and a half portion which I think was great. It allowed us to chose more dishes than more quantity of the same dish."

## 5. Load Reviews into DataFrame and Score

In [20]:
import numpy as np
import pandas as pd

In [21]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [23]:
df

Unnamed: 0,review
0,The food is fresh and tasty. The scallop cevi...
1,Don't come here expecting legit Mexican food b...
2,Out of all the restaurants that I tried in Syd...
3,We came here on a Thursday night @ 5pm and by ...
4,Have been here twice and have absolutely loved...
5,I was pleasantly surprised at what a great job...
6,Really nice (upmarket) Mexican restaurant. Goo...
7,If you're looking for a quiet little romantic ...
8,The service at this place was top notch - the ...
9,Ordered feed me for $59 along with that.. Food...


In [24]:
df['review'].iloc[0]

'The food is fresh and tasty. \xa0The scallop ceviche started the lunch. The scallops were tender with a great acidity and use of mango and peppers. The steak was tender and I got the hint of tequila in the sauce. I enjoyed a watermelon salad that complimented the the steak. The portions are good, but a stretch if you are sharing. My only down point is the service. They really only showed up to present my next plate and never checked to see if I wanted another drink (which I did).Enjoyed the food.'

In [28]:
# creating a function to run the model

def sentiment_score(review):
  tokens = tokenizer.encode(review, return_tensors='pt')
  result = model(tokens)
  return int(torch.argmax(result.logits)) + 1

# SENTIMENT FUNCTION: Encapsulating the sentiment pipeline in a function makes it easier to process multiple strings

In [31]:
sentiment_score(df['review'].iloc[2])

5

In [32]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

df['review']

0    The food is fresh and tasty.  The scallop cevi...
1    Don't come here expecting legit Mexican food b...
2    Out of all the restaurants that I tried in Syd...
3    We came here on a Thursday night @ 5pm and by ...
4    Have been here twice and have absolutely loved...
5    I was pleasantly surprised at what a great job...
6    Really nice (upmarket) Mexican restaurant. Goo...
7    If you're looking for a quiet little romantic ...
8    The service at this place was top notch - the ...
9    Ordered feed me for $59 along with that.. Food...
Name: review, dtype: object

In [33]:
df

Unnamed: 0,review,sentiment
0,The food is fresh and tasty. The scallop cevi...,4
1,Don't come here expecting legit Mexican food b...,3
2,Out of all the restaurants that I tried in Syd...,5
3,We came here on a Thursday night @ 5pm and by ...,4
4,Have been here twice and have absolutely loved...,5
5,I was pleasantly surprised at what a great job...,5
6,Really nice (upmarket) Mexican restaurant. Goo...,4
7,If you're looking for a quiet little romantic ...,2
8,The service at this place was top notch - the ...,5
9,Ordered feed me for $59 along with that.. Food...,2
