<a href="https://colab.research.google.com/github/j-hartmann/siebert/blob/main/Yelp_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# install libraries
!pip install transformers --q

In [2]:
# check gpu
!nvidia-smi

Wed Jun  7 09:58:51 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
# load libraries
from transformers import pipeline
from google.colab import files
import pandas as pd
from google.colab import drive

In [None]:
# upload dataset
files.upload()
# alternatively, connect to gdrive using the following command: drive.mount("/content/drive/")

In [5]:
# specify your filename
input_filename = "/content/yelp_labelled.txt"  # note: you can right-click on your file and copy/paste the path

In [6]:
# read in csv
d = pd.read_csv(input_filename, delimiter = "\t", header = None)
d.columns = ["text", "class"]
texts = d['text'].astype('str').tolist()
print(texts[0:5])  # print first 5 rows

['Wow... Loved this place.', 'Crust is not good.', 'Not tasty and the texture was just nasty.', 'Stopped by during the late May bank holiday off Rick Steve recommendation and loved it.', 'The selection on the menu was great and so were the prices.']


In [None]:
# load siebert
model_name = "siebert/sentiment-roberta-large-english"
get_sentiment = pipeline(model=model_name, device = 0)  # use gpu

In [8]:
# predict sentiment
predictions = pd.DataFrame(get_sentiment(texts))['label']
predictions[0:5]  # print first 5 rows

0    POSITIVE
1    NEGATIVE
2    NEGATIVE
3    POSITIVE
4    POSITIVE
Name: label, dtype: object

In [9]:
# transform positive/negative to 1/0
predictions = [1 if x=='POSITIVE' else 0 for x in predictions.to_list()]
predictions[0:5]  # print first 5 rows

[1, 0, 0, 1, 1]

In [10]:
# compute accuracy
sum(predictions == d['class'])/len(predictions)

0.963

### **Sources**

1.   **Data:** https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
2.   **Model (SiEBERT):** https://huggingface.co/siebert/sentiment-roberta-large-english
2.   **Publication:** Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a feeling: Accuracy and application of sentiment analysis. *International Journal of Research in Marketing*, 40(1), 75-87. https://doi.org/10.1016/j.ijresmar.2022.05.005 
