# CryptoProphet
## Notebook's Goal
> Identify mentioned coins on tweets (to use as features afterwards)

In [1]:
# import custom packages
from src.helpers import load_df_tweets, get_mentioned_coins
from src.paths import LOCAL_PROCESSED_DATA_PATH

# import official packages
from tqdm._tqdm_notebook import tqdm_notebook
import pandas as pd
import numpy as np

# sets columns width
pd.set_option('max_colwidth', 100) 

# loads tweets
df = load_df_tweets()
df['mentioned_coins'] = df.words.progress_apply(get_mentioned_coins)
df['mentioned_coins'].explode().value_counts()

Please use `tqdm.notebook.*` instead of `tqdm._tqdm_notebook.*`
  after removing the cwd from sys.path.
100%|██████████████████████████████████████████████████████████████████████████████████| 47/47 [00:39<00:00,  1.18it/s]
100%|██████████████████████████████████████████████████████████████████████| 127642/127642 [00:00<00:00, 203577.43it/s]
100%|████████████████████████████████████████████████████████████████████████| 127642/127642 [01:46<00:00, 1201.72it/s]


BTC     7763
ETH     2540
ADA      997
LTC      754
USDT     363
DOGE     271
BNB       34
Name: mentioned_coins, dtype: int64

In [2]:
# presents coin mentions by influencer
df.explode('mentioned_coins').groupby(['mentioned_coins','user_screen_name']).size().sort_values(ascending=False).head(20)

mentioned_coins  user_screen_name
BTC              rogerkver           1222
ETH              ethereum             885
ADA              Cardano              824
LTC              SatoshiLite          725
BTC              BTCFoundation        673
                 100trillionUSD       635
                 TheMoonCarl          596
                 BarrySilbert         513
                 APompliano           474
                 danheld              393
                 ErikVoorhees         357
                 SatoshiLite          279
ETH              VitalikButerin       277
BTC              aantonop             259
                 TheCryptoLark        250
                 IvanOnTech           221
ETH              TheCryptoLark        175
                 BarrySilbert         144
                 IvanOnTech           136
BTC              TheCryptoDog         129
dtype: int64

In [44]:
# checks classification results
# grab unique coins
coins = df.mentioned_coins.explode().dropna().unique() 
for c in coins:
    mask = df.mentioned_coins.apply(lambda x: c in x)
    print(f'Tweets with {c} Mentions')
    print(df[mask].full_text.sample(5, random_state=1))
    print(20*'=')

Tweets with BTC Mentions
1645      @CaitlinLong_ Maybe in the US, but non-US can use btc-only exchanges to do cash and carry or cov...
115841    RT @saifedean: In the latest Bitcoin standard podcast we host @ToneVays to discuss if trading is...
19703     I still go to bed every night wondering if I'm going to wake up the next morning and see bitcoin...
110510                                           Bitcoin is retesting the 200-day MA! https://t.co/kpTmD3WpqP
110625                                                                           Bitcoin is looking BULLISH 💪
Name: full_text, dtype: object
Tweets with ETH Mentions
33859     Currently, 1% of all ETH in circulation is being staked. 1.2m $ETH that is taken out of circulat...
61708        Jeffrey Wilcke (obscuren), lead Go Developer for Ethereum at #bitcoinexpo http://t.co/Qsqb9O6gAC
60894     Reminder: If you need help with your Ethereum 'pre-sale' wallet, please contact helpdesk@ethereu...
124303    RT @AnettRolikova: What does 

In [3]:
# exports data to local path
df_path = LOCAL_PROCESSED_DATA_PATH / 'tweets_with_mentioned_coins_20211013.json'
df.reset_index(drop=True).to_json(df_path, orient='index')

# Conclusion
> Weak classifier spots coins with reasonable accuracy