### INSTALL AND IMPORT DEPENDENCIES

In [1]:
!pip3 install torch torchvision torchaudio

Collecting torch
  Downloading torch-1.10.0-cp38-none-macosx_10_9_x86_64.whl (147.1 MB)
[K     |████████████████████████████████| 147.1 MB 33 kB/s  eta 0:00:01     |█████████▎                      | 42.9 MB 10.8 MB/s eta 0:00:10
[?25hCollecting torchvision
  Downloading torchvision-0.11.1-cp38-cp38-macosx_10_9_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 7.7 MB/s eta 0:00:01
[?25hCollecting torchaudio
  Downloading torchaudio-0.10.0-cp38-cp38-macosx_10_9_x86_64.whl (2.4 MB)
[K     |████████████████████████████████| 2.4 MB 8.2 MB/s eta 0:00:01
Installing collected packages: torch, torchvision, torchaudio
Successfully installed torch-1.10.0 torchaudio-0.10.0 torchvision-0.11.1


In [2]:
!pip install transformers requests beautifulsoup4 pandas numpy

Collecting transformers
  Downloading transformers-4.12.5-py3-none-any.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 5.7 MB/s eta 0:00:01
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp38-cp38-macosx_10_11_x86_64.whl (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 23.2 MB/s eta 0:00:01
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.2.0-py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 694 kB/s  eta 0:00:01
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 8.7 MB/s eta 0:00:01
Installing collected packages: tokenizers, sacremoses, huggingface-hub, transformers
Successfully installed huggingface-hub-0.2.0 sacremoses-0.0.46 tokenizers-0.10.3 transformers-4.12.5


We have installed the dependencies, we now need to install them into the notebook.

In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

### Setup Model

In [4]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/851k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/638M [00:00<?, ?B/s]

### Encode and Calculate Sentiment

In [5]:
tokens = tokenizer.encode('This experience was awful, hated every minute, a waste of money and time', return_tensors='pt')

In [6]:
tokens

tensor([[  101, 10372, 16277, 10140, 37079, 15836,   117, 39487, 10163, 13667,
         18471,   117,   143, 43346, 10108, 15033, 10110, 10573,   102]])

The string has been converted into a sequence of numbers.

We can also convert it back, make sure to only include the list of numbers with [0].

In [7]:
tokenizer.decode(tokens[0])

'[CLS] this experience was awful, hated every minute, a waste of money and time [SEP]'

Now pass the sequence of numbers to the model.

In [8]:
result = model(tokens)

In [9]:
result

SequenceClassifierOutput(loss=None, logits=tensor([[ 4.9333,  2.0599, -0.8983, -2.9996, -2.2736]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

The output is an encoded list of scores, the position with the highest score in the list represents the sentiment rating, here 4.9333 is in position 0. This is the worst score and so a very negative sentiment, a 1 out of 5. 

We can make this score easy to spot straight away.

In [11]:
torch.argmax(result.logits)

tensor(0)

Lets turn it into an accessible result.

In [13]:
print( int(torch.argmax(result.logits))+1, 'out of 5')

1 out of 5


### Collect Reviews

In [16]:
r = requests.get('https://www.yelp.co.uk/biz/the-british-museum-london')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]  #this gets rid of all the html data and just gives us the comment.

In [17]:
reviews[0]

'Come feast your eyes on the finest collection of historical artifacts imperialism can buy! Jokes and legitimate discussion about the complicated topic of "stolen history" aside, the British Museum is a magnificent collection of history and art from around the world and throughout the ages. From Egyptian sculptures and monuments (including the Rosetta Stone, an ancient tablet which proved the key to deciphering Egyptian hieroglyphics) to contemporary and traditional African woodcarvings to the Mitsubishi Corporation Japanese Galleries (including the famous "Great Wave" painting), the British Museum, like another beloved British icon, can take you on a fascinating trip through time and space.The museum is free to visit, and although pre-booked tickets are recommended, in my experience I haven\'t needed them. There are also a few premium rotating exhibitions which charge an entrance fee (or are free for members), such as the Nero exhibit on display currently.Overall, the British Museum i

### Load Reviews into Dataframe and Score 

In [19]:
import pandas as pd
import numpy as np

In [21]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [25]:
#df.head()  #gives first 5 reviews

In [26]:
#df.tail()  #gives last 5 reviews

In [28]:
df['review'].iloc[0]  #retrieves the first review

'Come feast your eyes on the finest collection of historical artifacts imperialism can buy! Jokes and legitimate discussion about the complicated topic of "stolen history" aside, the British Museum is a magnificent collection of history and art from around the world and throughout the ages. From Egyptian sculptures and monuments (including the Rosetta Stone, an ancient tablet which proved the key to deciphering Egyptian hieroglyphics) to contemporary and traditional African woodcarvings to the Mitsubishi Corporation Japanese Galleries (including the famous "Great Wave" painting), the British Museum, like another beloved British icon, can take you on a fascinating trip through time and space.The museum is free to visit, and although pre-booked tickets are recommended, in my experience I haven\'t needed them. There are also a few premium rotating exhibitions which charge an entrance fee (or are free for members), such as the Nero exhibit on display currently.Overall, the British Museum i

Now lets make a function to combine it all.

In [30]:
def sentiment_score(reviews):
    tokens = tokenizer.encode(reviews, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [31]:
sentiment_score(df['review'].iloc[0])

5

We now have the sentiment score for just the first review in the dataframe.

In [36]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))  #the pipeline is limited to 512 tokens

In [37]:
df

Unnamed: 0,review,sentiment
0,Come feast your eyes on the finest collection ...,5
1,[Pre-COVID post]Absolutely gorgeous museum wit...,5
2,I love that British museum is free. I've been ...,5
3,"Where do I even begin?! I mean, it's all just ...",5
4,The system fell apart with covidSeniors don't ...,1
5,This is the museum of stolen goods. Anyone sho...,1
6,Amazing venue with jaw droppingly wonderful ar...,5
7,I last went thirty years ago. Was rather surpr...,3
8,One of the cultural landmarks of the U.K. if n...,5
9,The British Museum they call it. But there's a...,2


There is now a column for sentiment score.