# Pretrained Models Evaluation

There are sentiment analysis models on huggingface that use state of the art models such as `BERT` and `DistilBERT` as the base model. 

Goal is to see how the pre-trained sentiment analysis models perform against the crypto-related reddit comment dataset. 

1. Test existing pretrained sentiment analysis deep learning models against the given dataset. 
2. Calculate accuracy against given dataset.
3. Calculate inference time for longest sequence token count = 701

In [1]:
import emoji
import pandas as pd
from sklearn.metrics import accuracy_score
import torch
from transformers import DistilBertTokenizer, AutoTokenizer, pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [28]:
pd.set_option('max_colwidth', None)

## Test Distilbert Sentiment Analysis Model

Model source: https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english?text=I+like+you.+I+love+you

This model is a distilbert model fine-tuned on [SST2](https://huggingface.co/datasets/sst2). 
This model outputs negative and positive sentiment labels.

In [199]:
distilbert_classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")

In [2]:
crypto_sentiment_dataset_path = '../datasets/crypto_reddit_sentiment.csv'

In [174]:
df = pd.read_csv(crypto_sentiment_dataset_path)
df = df.drop_duplicates(subset='Comment Text')

### During data exploration, I noticed some comments had emojis, so I'm checking here if distilbert has emojis in its dictionary. If it doesn't, I'll convert them into text for additional signal.

In [66]:
text_with_emojis = "Unbelievable 😒 🙄"

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
tokenizer.tokenize(text_with_emojis)

['unbelievable', '[UNK]', '[UNK]']

In [68]:
demojized_text = emoji.demojize(text_with_emojis)
print(demojized_text)
" ".join(tokenizer.tokenize(demojized_text))

Unbelievable :unamused_face: :face_with_rolling_eyes:


'unbelievable : una ##mus ##ed _ face : : face _ with _ rolling _ eyes :'

In [225]:
def get_predictions(classifier, row, max_length=None):
    text = row['clean_text']
    result = classifier(text, truncation=True, max_length=max_length, top_k=2)
    pred = result[0]
    # if the top prediction is neutral, pick the next best prediction
    if result[0]['label'].lower() == 'neutral': #
        pred = result[1]
    return pred['label'].lower(), pred['score']

In [175]:
df['clean_text'] = df['Comment Text'].apply(emoji.demojize)

In [177]:
# lowercasing sentiment labels for accuracy calculation later
df['ground_truth'] = df['Sentiment'].apply(lambda s: s.lower())

In [213]:
df[['pred_label_distilbert', 'pred_score_distilbert']] = df.apply(lambda x: get_predictions(distilbert_classifier, x), axis=1, result_type="expand")

In [214]:
accuracy_score(df['ground_truth'], df['pred_label_distilbert'])

0.7192028985507246

Accuracy is lower than expected. This may be because the dataset the model was fine-tuned on are movie reviews which are very different from crypto content. 

Based on the [Paperswithcode SOTA for Sentiment Analysis on SST2](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary), there are also other larger more accurate model architectures like Roberta that may be worth trying.

## Test Roberta Sentiment Analysis

Model source: https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest

This model is a Roberta model fine-tuned on [TweetEval](https://huggingface.co/datasets/tweet_eval) dataset. 
This model outputs negative, neutral, and positive sentiment labels.

In [118]:
roberta_classifier = pipeline('sentiment-analysis', 
                              model="cardiffnlp/twitter-roberta-base-sentiment-latest")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [226]:
# Maximum length set to 511 - https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest/discussions/2
df[['pred_label_roberta', 'pred_score_roberta']] = df.apply(lambda x: get_predictions(roberta_classifier, x, max_length=511), axis=1, result_type="expand")

In [227]:
df['pred_label_roberta'].value_counts()

negative    293
positive    259
Name: pred_label_roberta, dtype: int64

In [228]:
accuracy_score(df['ground_truth'], df['pred_label_roberta'])

0.8786231884057971

Accuracy is much better. This may be because of the model architecture and fine-tuned dataset. 
I found a model that was specifically fine-tuned on crypto social media data, so trying that next. 

## Test CryptoBert Sentiment Analysis

Model source: https://huggingface.co/ElKulako/cryptobert

This model is a BERTweet model (A pre-trained language roberta model for English Tweets) fine-tuned on crypto social media data. 

This model outputs bearish, neutral, and bullish sentiment labels. According to [this discussion](https://huggingface.co/ElKulako/cryptobert/discussions/2), accuracy is around 70%. 

In [None]:
cryptobert_classifier = pipeline('sentiment-analysis', model="ElKulako/cryptobert")

In [259]:
df[['pred_label_crypto', 'pred_score_crypto']] = df.apply(lambda x: get_predictions(cryptobert_classifier, x, max_length=128), axis=1, result_type="expand")

In [260]:
df['pred_processed_label_crypto'] = df['pred_label_crypto'].apply(lambda x: 'positive' if x == 'bullish' else 'negative')

In [261]:
accuracy_score(df['ground_truth'], df['pred_processed_label_crypto'])

0.7572463768115942

In [262]:
df['pred_label_crypto'].value_counts()

bullish    406
bearish    146
Name: pred_label_crypto, dtype: int64

In [263]:
# Check negative sentiment mislabelled as positive
df[(df['ground_truth'] == 'negative') & (df['pred_processed_label_crypto'] == 'positive')][['clean_text', 'ground_truth', 'pred_processed_label_crypto', 'pred_label_roberta']]

Unnamed: 0,clean_text,ground_truth,pred_processed_label_crypto,pred_label_roberta
5,"or, how about this. terra was a bad investment because all cryptos operate as if they are ponzi schemes.\n\n",negative,positive,negative
6,"His first company ticket monster was a major flop, which was the critical reason why I never touched that dog shit.",negative,positive,negative
8,Fuck Do Kwon and his 2.0 version,negative,positive,negative
12,"Unpopular opinion: crypto sucks, criticism topic\nPERSPECTIVE\nBanks suck but crypto aint much better. Higly speculative and unstable. Just as much bullshit as there is on wallstreet. Leverage, shorting, futures, longs, market manipulation and what not. People gambling with their money.\n\nYour keys arn't really safe, what do you want to do, implement a chip in your arm? What if the chip breaks? Keep it digitally somewhere? Dig a hole in the ground? Most people keep it on centralised exchanges anyway. What if you die, it won't go to your relatives?\n\nIt fucks with your mental and emotional state too, because of the unstable nature of it. One day you think you have money, the next day 20% is gone. People who buy early are basicly just taking other peoples investment when they sell. Call it pyramid games. The space is flooding with scammers, annoying shilling shouting idiots and and what not.\n\nBitcoin and the likes are highly energy inefficient. Yes but look at the regular financial systems energy use you will say, blah blah....this is like saying u can dump your plastic in the river because there's a plastic soup already in the ocean. Yes here and there they use solar panels...well why not use them for more usefull things as they are costly to make. Yes but proof of stake you will say, yes...it's basicly being kept in the air by people that are again....gambling with their money. All these company's, can't they just build stuff without thousands of people throwing their money at them, hoping they will get rich temselves.\n\nBlockchains are cool in their decentralised way, but all the speculation on top of it...\n\nWhile everyone is shouting at digital coins, society remains a shithole. Salaries are low for most people, housing prices are still going through the roof. Crypto won't safe us. Wealthy people will get more wealthy at the cost of others, and some might get lucky..the lucky few who are in it early and take other peoples money when they finally cash out.\n\nSorry for the rant, but this is kind of how I'm starting to feel about the whole thing.",negative,positive,negative
14,"No shit. Crypto isn't safe at all. Between the volatility, rugpulls, honeypots and other scams, if there's one thing we don't have is safety. Thus, the higher returns are what make this market survive.\n\n",negative,positive,negative
...,...,...,...,...
533,"Not a store of value\n\nNot a currency\n\nNot a hedge against money printer\n\nWhat is it?\n\nI just want to understand current understanding aside from lowest effort "" ponzi / scam "" but truly i am at loss to understand and explain what function does bitcoin solve or is solving , because\n\nIts also carbon negative , toxic because coal mines are now running some mining operations and that sounds positively awful and very much seems like bitcoin is digital cigarette blowing fumes to plebs while letting easy cheap energy sources become stronghold for even more authoritarian rules ( Russia , Kazakhstan etc )\n\nSo what do you think is the point of bitcoin aside from speculative bubble? It looks like elaborate tulip mania the longer we dont answer them",negative,positive,negative
543,"I've found that nearly all coins are either useless, or are only there to serve a purpose or problem that the other coins themselves create. Like governance tokens - don't need those if no crypto. Proctopcs where you can lend crypto, so others can borrow crypto, so they can ... Buy crypto.\n\nIt's a nearly fully self-referential and closed loop system. I real,Ed that, got ros of all ambitions and dreams about it, and just be a cold blooded trader. It helped.",negative,positive,negative
548,More like 99.9%. Out of 20000 coins thats 20. Im still being very generous. My honest belief is that about 5 coins have actual utility.\n\n,negative,positive,positive
550,Think the hype gone now we stuck holding bags,negative,positive,negative


## Quick Prediction Response Time Check

In [254]:
longest_reddit_comment = 'why I don\'t like crypto, ethically, financially and technologically:\n\nproof of work mining (e.g. Bitcoin) wastes ungodly amounts of electricity (until the world is 100% renewables mining displaces more useful work and thus creates emissions) and computing capacity\nproof of space is the same but for making the world a worse place by driving up storage costs for everybody\ncoin speculation is a massive ponzi scheme and no crypto fan acknowledges this\nthe e.g. bitcoin network throughput is ~dozens of qps which would merely be hilarious if it wasn\'t burning as much coal as Australia\njust on pure waste: Bitcoin production is estimated to generate between 22 and 22.9 million metric tons of carbon dioxide emissions a year, or between the levels produced by Jordan and Sri Lanka, a 2019 study in scientific journal Joule found. (source) - ie this game is an entire additional country of environmental damage for ~no useful gain\nno one has found an actual use for any of it except for online drug dealing (which requires niche coins since Bitcoin is massive distributed public ledger)\nthe level of market manipulation and accounting fraud by eg Tether (and now USD "coin" admitted lying about the backing) would normally send people to jail but has for some reason got a free pass\nthe clueless marketing by fanboys who only hope to enrich themselves by creating another layer to the pyramid\nthe endlessly stupid projects it has spawned where blockchain is seen as a solution to anything aside from the niche case of "distributed tamper evident journal"\nit has single handedly made mass international ransomware a viable business model by allowing a way to receive cash outside the banking system\nthe general enablement of all sorts of scams, e.g. the endless kids on this very subreddit promoting "12% crypto savings accounts"\nIt isn\'t nearly as secure as people think it is. Any determined state actor could subvert the network.\nDeflation isn\'t actually a good thing in a currency. It ultimately favours the wealthy who can hold and dont need to spend.\nMining (proof of stake or work) allow the wealthy to invest in mining equipment to control or dominate the market.\nUltimately, no government on earth is going to give up its monetary policy as that is a key lever of control.\nessentially everyone involved at this point is just trying to hustle the next generation - any hope any of it would serve as a useful technological or financial system has gone)\nthe massive mis-represnetaion of how bitcoin in particular works, despite claims, it does not offer any of the following:\nanonymity - the entire blockchain is public and diligently recorded on millions of nodes\ndistributed freedom - a small number of mining cartels control almost all the hash power and normal people use one of ten crypto currency exchanges for everything anyway\nunencumbered transactions - it\'s comically slow and inefficient\nconvenient purchases - wallets are unwieldly for normal people, no reversibility\nstore of value - volatility is hilarious and all anyone cares about us BTC:USD\nthings I do like about cryptocurrency:\n\nmonero is pretty clever\nethereum smart contracts are pretty scifi\nwatching people learn 5000 years of history of why we have banking regulation in a single decade has been fun\nthere will be some massive financial catastrophes to watch that will mostly only harm people who did not read my previous list\nthe earlier state of every single exchange either getting robbed or exit scamming their own fanboys was quite funny in retrospect\n(thx to u/TwentyCharactersShor for some extra items).\n\nI used to just think cryptocurrency was a fun hack but a bad idea for the mainstream, but I\'m now pretty convinced it should be banned purely on resource grounds, since it seems impossible to force the externalities to be addressed\n\nIt\'s possible that someone will invent a cryptocurrency that\'s efficient to a level that it\'s not morally reprehensible to use, but how would it take off? the existing cryptocurrency interests want bitcoin to succeed purely because they\'re the top of the pyramid and also selling the shovels and so now anything new has to compete aginst the real financial sector and the entrenched cryptocurrency players.'
median_reddit_comment = 'His first company ticket monster was a major flop, which was the critical reason why I never touched that dog shit.'

In [164]:
%%timeit -r 10
distilbert_classifier(longest_reddit_comment, truncation=True)

188 ms ± 15.3 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)


In [165]:
%%timeit -r 10
roberta_classifier(longest_reddit_comment, truncation=True, max_length=511)

393 ms ± 16.4 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)


In [256]:
%%timeit -r 10
cryptobert_classifier(longest_reddit_comment, truncation=True, max_length=511)

382 ms ± 16 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)


In [173]:
%%timeit -r 10
# 1000ms / 15.6ms = 64 comments / sec
distilbert_classifier(median_reddit_comment, truncation=True)

15.6 ms ± 633 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)


## Summary

Based on [Sentiment Analysis on SST2 data](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary), state of the art models are:
1. Roberta
2. Bert large
3. Distilbert
4. Bert base

### Accuracy on given crypto dataset
1. Distilbert - Accuracy is 71.920%
2. Roberta base fine-tuned on tweets sentiment - Accuracy is 87.862%
3. CryptoBert, bertweet model fine-tuned on crypto sentiment - Accuracy is 75.724%

Distilbert's prediction speed is **2x** faster than the other models. 

### Next Steps
1. Create training dataset with https://www.kaggle.com/datasets/leukipp/reddit-crypto-data and any other readily available crypto related reddit comments. 
2. Create weak labels with best sentiment analysis model --> Roberta. 