# Fine-tuned Distilbert Evaluation

In [2]:
import emoji
import pandas as pd
from sklearn.metrics import accuracy_score
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
pd.set_option('max_colwidth', None)

In [4]:
model = pipeline(model="mwkby/distilbert-base-uncased-sentiment-reddit-crypto")

In [5]:
crypto_sentiment_dataset_path = '../datasets/crypto_reddit_sentiment.csv'

In [6]:
df = pd.read_csv(crypto_sentiment_dataset_path)
df = df.drop_duplicates(subset='Comment Text')

In [7]:
df['clean_text'] = df['Comment Text'].apply(emoji.demojize)

In [11]:
def get_predictions(row):
    text = row['clean_text']
    result = model(text, truncation=True)[0]
    return result['label'].lower(), result['score']

In [12]:
df['ground_truth'] = df['Sentiment'].apply(lambda s: s.lower())

In [13]:
df[['pred_label', 'pred_score']] = df.apply(lambda x: get_predictions(x), axis=1, result_type="expand")

In [14]:
df['pred_label'].value_counts()

negative    287
positive    265
Name: pred_label, dtype: int64

In [16]:
accuracy_score(df['ground_truth'], df['pred_label'])

0.8641304347826086

In [17]:
df['ground_truth'].value_counts()

positive    296
negative    256
Name: ground_truth, dtype: int64

In [18]:
false_negative_condition = (df['ground_truth'] == 'positive') & (df['pred_label'] == 'negative')

df[false_negative_condition][['clean_text', 'ground_truth', 'pred_label', 'pred_score']]

Unnamed: 0,clean_text,ground_truth,pred_label,pred_score
25,"Laugh all you want, but it's practically inevitable that a crypto coin will take over as the world exchange currency. It would take a pretty big reversal at this point for it to not be Bitcoin, but I suppose it's possible.",positive,negative,0.920343
34,Cause you are still pegging BTC to the dollar when actually 1 Bitcoin = 1 Bitcoin. While $1 = .75c after inflation.,positive,negative,0.986351
55,"Bitcoin is a lot of things right now. It’s a doomsday currency, it’s an inflation hedge, it’s a speculative asset, it’s central banking hedge, it’s a technology play, plus hundred more things. All of things are components of its price and value. People buy, sell and hold for different reasons. You can’t look at it and say “why is it dropping due to inflation, it’s a hedge against inflation.” Also you can’t say it’s a hedge against central banks, because monetary policy and central banks is in probably one of the worst spots it’s been in a while. Yet BTC isn’t at an all time high.\n\nBTC is wearing many hats right now don’t let anyone tell you it’s not any one of a number of different things.",positive,negative,0.997295
80,You gotta think a few years out with CRO. It’s still early,positive,negative,0.924622
89,"What is dead may never die, Bitcoin has been declared dead like 400 times, it always comes back.",positive,negative,0.997153
106,DCA is only reason I don't care if price drops anymore. Just getting more for less. I'll take it.,positive,negative,0.969097
140,"allways wanted to invest money in to stoks but it is unnecessary complicated and expensive plus the returns you get with stoks are most often just inflation. Crypto has everything, it is accessible, it is cheap, it can return huge gains, it operates 24/7, and I know a thing or two about technology\n\n",positive,negative,0.726737
143,"Bitcoin is a hedge against bank bailouts, its literally in the genisis block.\n\nBitcoin is a hedge against the entire financial system from central banks to commercial banks.\n\n99% of people will not buy bitcoin to hedge run away inflation. But 99% of people will buy bitcoin when their bank collapses, their credit card stops working and their grocery store accepts lightning payments.",positive,negative,0.943313
148,"ETH has the most development going on and that's ultimately more important than the blockchain itself. But it's not useable by most people right now. It's not a finished product. No crypto can really justify it's market cap with performance at this point. That's why people say we're still early - there is speculative value placed on cryptos that may or may not pan out.\n\nWith the ETH 2.0 roadmap, ETH has the best chance to be scalable and developer friendly.",positive,negative,0.922464
154,Ethereum is the future. People FUD because they missed the boat or want to buy in cheap.,positive,negative,0.974675


In [19]:
false_positive_condition = (df['ground_truth'] == 'negative') & (df['pred_label'] == 'positive')

df[false_positive_condition][['clean_text', 'ground_truth', 'pred_label', 'pred_score']]

Unnamed: 0,clean_text,ground_truth,pred_label,pred_score
10,it can drop another 80%!,negative,positive,0.750709
32,Crypto is just another scratch it ticket and tax on the poor. It’s a digi pipe dream. Don’t fall for it,negative,positive,0.502293
33,"Nooooo, it can’t be!!! I was told it was a hedge against inflation!! Everyone said it was the best store of value! At this rate, all some people will be able to afford with their BTC are some pretty tulips.",negative,positive,0.978677
110,"Wow, glad I cashed out when I did.",negative,positive,0.998271
127,With room to drop further too.,negative,positive,0.631701
138,The funny thing is that people have already added more liquidity. You can see on Algo Explorer that folks have already put some money in to your shid coin unless those are other wallets you’re associated with.\n\nWhich is to say: case in point. People will buy :pile_of_poo: even if it’s labeled very clearly on the packaging and smells very much like :pile_of_poo:.,negative,positive,0.995996
226,I prefer gold investment trading.,negative,positive,0.963054
256,Two words- free fall !!,negative,positive,0.93211
262,Tether is also backed by not much more than hope and prayers\n\n,negative,positive,0.858846
268,Life support,negative,positive,0.996869


## Some Observations

* 14.493% improvement from the distilbert bert model fine-tuned on SST-2.
* It seems like the model struggles with sarcasm and mixed sentiment in the same comment.

>Wow, glad I cashed out when I did.

>biggest pyramid scheme in the history of the world. very impressive no?

>Munger believes the USD will fail in the next 100 years. Since they understand how frail our current monetary system is, it's madness they can't see the value/utility of Bitcoin	