In [8]:
from transformers import pipeline
from datasets import load_dataset
from tqdm import tqdm
import pandas as pd
import plotly.express as px
import textwrap

In this notebook I'll showcase basic sentiment analysis on tweets using the Huggingface's `Pipeline` interface.

In [9]:
classifier = pipeline(task="sentiment-analysis",model='distilbert/distilbert-base-uncased-finetuned-sst-2-english')

Device set to use cuda:0


Let's load a few tweets and classify them

In [10]:
dataset = load_dataset("carblacac/twitter-sentiment-analysis", split="train")

In [11]:
inputs = dataset['text'][:200]
truth = dataset['feeling'][:200]

In [12]:
answers = classifier(inputs)

In [13]:
df = pd.DataFrame([inputs,truth,answers]).T.rename(columns={0:'text',1:'truth',2:'output'})
df['truth'] = df['truth'].replace({0:'NEGATIVE',1:'POSITIVE'})

In [14]:
df['sentiment'] = df['output'].apply(lambda t: t['label'])
df['score'] = df['output'].apply(lambda t: t['score'] if t['label'] == 'NEGATIVE' else 1-t['score'])
df

Unnamed: 0,text,truth,output,sentiment,score
0,@fa6ami86 so happy that salman won. btw the 1...,NEGATIVE,"{'label': 'POSITIVE', 'score': 0.9986536502838...",POSITIVE,0.001346
1,@phantompoptart .......oops.... I guess I'm ki...,NEGATIVE,"{'label': 'NEGATIVE', 'score': 0.9995519518852...",NEGATIVE,0.999552
2,@bradleyjp decidedly undecided. Depends on the...,POSITIVE,"{'label': 'NEGATIVE', 'score': 0.9778768420219...",NEGATIVE,0.977877
3,@Mountgrace lol i know! its so frustrating isn...,POSITIVE,"{'label': 'NEGATIVE', 'score': 0.9986138343811...",NEGATIVE,0.998614
4,@kathystover Didn't go much of any where - Lif...,POSITIVE,"{'label': 'NEGATIVE', 'score': 0.9996030926704...",NEGATIVE,0.999603
...,...,...,...,...,...
195,@jonasbrothers http://twitpic.com/5t2p6 - JONA...,NEGATIVE,"{'label': 'NEGATIVE', 'score': 0.9842454791069...",NEGATIVE,0.984245
196,there's this man in our roof.. like literally ...,NEGATIVE,"{'label': 'NEGATIVE', 'score': 0.9850871562957...",NEGATIVE,0.985087
197,"emma love you to death, miss you tons. wish y...",NEGATIVE,"{'label': 'POSITIVE', 'score': 0.9997510313987...",POSITIVE,0.000249
198,"@evila_elf lucky for you, then. I always seem ...",POSITIVE,"{'label': 'NEGATIVE', 'score': 0.9510592818260...",NEGATIVE,0.951059


The tooltip in Plotly express does not wrap text, so we have to do it by hand to prevent it from being awkwardly cut in the visualization.

In [15]:
df['wrapped_text'] = df['text'].apply(lambda x: '<br>'.join(textwrap.wrap(x, width=50)))

In [16]:
from sklearn.metrics import classification_report,ConfusionMatrixDisplay, confusion_matrix

report = classification_report(df['truth'], df['sentiment'])
print(report)

              precision    recall  f1-score   support

    NEGATIVE       0.68      0.85      0.76        89
    POSITIVE       0.85      0.68      0.75       111

    accuracy                           0.76       200
   macro avg       0.77      0.76      0.75       200
weighted avg       0.77      0.76      0.75       200



It looks like model mostly performs well (F1=0.75), but it seems poorly calibrated: it predicts incorrectly with high confidence. For ambiguous cases (e.g. "busy busy! getting ready for my open house!") it should predict closer to 0.5. 

In [17]:
px.scatter(df, x='score', hover_data=['wrapped_text'],color='truth')

The model really likes to make extreme predictions. It's hard to give a quantitative estimate of the calibration because we have only binary label, adding the "NEUTRAL" or "AMBIGUOUS" class to the dataset might help mitigate this problem.

In [18]:
px.histogram(df,x='score',nbins=20)