# BTC Prediction (Tweet Sentiment Analysis)

## This notebook contains:
- Scraping historical tweets from "twint" (open source twitter scraper)
- Cleaning the tweets, removing unwanted characters etc.
- The calculation of sentiment mean polarity saved as a new dataframe

In [2]:
import twint
import nest_asyncio
import pandas as pd
import numpy as np
import csv
nest_asyncio.apply()

### Configuring Twint

#### What it will search for:
- Tweets that contain BTCUSD
- From our targeted time range: 2019-09-02 - 2020-11-17
- English
- Minimum likes = 33, to filter out the weak tweets. If more people like the tweet, it is probably more reliable.
- Limited to only 10,000 tweets
- Filtering retweets, to get a even spread of unique tweets

In [4]:
# configure Twint and run scrape

c = twint.Config()
c.Search = "$BTCUSD"
c.Since = "2019-09-02"
c.Until = "2020-11-17"
c.Lang = "en"
c.Min_likes = 33
c.Limit = 10000
c.Pandas = True
c.Filter_retweets = True
c.Count = True

twint.run.Search(c)

1327885997183479809 2020-11-15 03:07:47 -0500 <DaveCrypto83> #bitcoin #ew update $BTCUSD 1 day and 4hr charted on #bitfinex  Targetting low 17k first(4hr chart) then pullback to under side diagonal (see 1d chart).  Diagonal  lining up to finish around 20k🤑  I do ❤️ this count the structure is a beauty😍  Guys Pls Retweet &amp; Like 🙏  https://t.co/Kfm4WHZ3Vn
1327703359307001857 2020-11-14 15:02:03 -0500 <Ozono_Merval> $BTCUSD por cruces de MM fueron compra las zonas USD 3.500 &amp; USD 6.500 Ahora hay que seguirlo de forma fina, ya que las MM están en zona de doble top Compresión semanal  https://t.co/2lHNhUfjZe
1327699317201760256 2020-11-14 14:45:59 -0500 <Ozono_Merval> No vas a encontrar a otro loco que hable de AT de Medias Móviles 😁 $BTCUSD son claras las 2 zonas de correcciones, a priori de ondas 2 &amp; 4 y estaríamos en onda 5 &lt;pero&gt; debe romper su máximo histórico para confirmarlo #Bitcoin compresión mensual  https://t.co/sx79bdRAIH
1327653784324108289 2020-11-14 11:45:03

1299643372618035200 2020-08-29 05:41:40 -0500 <rektcapital> $BTC $BTCUSD #Bitcoin   Bitcoin is on the cusp of Weekly Closing above $11.400-$11.600 for the third straight week  This has never, ever been done before  https://t.co/yIvfUkQYrm
1299558473684774914 2020-08-29 00:04:19 -0500 <TrendSpider> Make sure to check out all the charts below!!  Charts reviewed tonight: $SPY $QQQ $IWM  $VIX $FB $AMZN $AAPL $NFLX $GOOG $MSFT $SPOT $SHOP $RKT $INTC $BIGC $BYND $ROKU $TWTR $OSTK $SLV $CRON $AMD $NVDA $CRM  $VIAC $FSLY $SPCE $BA $DKNG $NKLA  $SQ $BTCUSD $ETHUSD $XRPUSD &amp; more!!
1299297907804450819 2020-08-28 06:48:55 -0500 <rektcapital> $BTC $BTCUSD #Bitcoin - Monthly  Only a few days away from the 1M Close...  The historic July candle broke $10.800 (red) for the first time since Dec' 2017   Whereas price turned $10.800 into support in August   Is BTC setting itself up for trend continuation in September?  https://t.co/i2OTuXwyt6
1299117275627233280 2020-08-27 18:51:09 -0500 <tradertroy8

1280090700567150592 2020-07-06 06:46:20 -0500 <rektcapital> $BTC $BTCUSD #Bitcoin Dominance - 1D   Breaking down from a multi-month uptrend line (orange)  Clean breakdown from here means Altseason continues  #cryptocurrency #Crypto  https://t.co/ppzyLx9LFO
1279747049047670790 2020-07-05 08:00:47 -0500 <Yodaskk> So, a small recap of all the bull &amp; bear arguments I found past weeks  $btc $btcusd #BTC    #Thread  1/x
1279587653369896962 2020-07-04 21:27:24 -0500 <marketoccultat1> $BTCUSD Natal solar cycle ☀️#bitcoin #btc  https://t.co/ecXWnKjM3t
1278704626980839425 2020-07-02 10:58:34 -0500 <BTC_JackSparrow> $BTCUSD denominated in Gold (vs $USD)  vs  $MSFT dot(.)com bubble re-accumulation  https://t.co/fxjBgHa1Th
1277772880021868544 2020-06-29 21:16:09 -0500 <TrendSpider> Holy moly that was a lot of charts....   Please give us a like if this was helpful!  Reviewed: $SPY $QQQ $IWM $XBI $XLE $XLF $BTCUSD  $ETHUSD $GLD  $SLV $FB $AMZN $AAPL $NFLX $GOOG $CRON $CGC $TLRY $ATVI $GE $MRO $BA

1242092945508945922 2020-03-23 10:16:49 -0500 <CryptoHamsterIO> Bitcoin is continuing to follow the Bump-and-Run Reversal pattern.  👀 $BTC $BTCUSD #bitcoin  https://t.co/II2lObqmtX
1241777441997303810 2020-03-22 13:23:07 -0500 <rektcapital> $BTC $BTCUSD #Bitcoin   Reminds me of a Hanging Man candlestick formation  Should this be the case, Bitcoin will see further downside from here  #Crypto  https://t.co/tE0qYXxMcu
1241698037770092545 2020-03-22 08:07:36 -0500 <rektcapital> $BTC $BTCUSD #Bitcoin - Monthly  The green box is a key point of reference in terms of directional bias for Bitcoin heading into the Monthly close  A close below the green would see price occupy the lower half of this macro triangular market structure in April  #Crypto  https://t.co/24qEAgjaoe
1240519045603983360 2020-03-19 02:02:42 -0500 <BTC_JackSparrow> Longing $BTCUSD is essentially shorting the US dollar  I'd say that is a great idea right now
1240281586110738438 2020-03-18 10:19:08 -0500 <HyenukChu> Para mante

1198674436955742208 2019-11-24 13:47:10 -0500 <TechCharts> $BTCUSD is below the year-long average. Long-term average at 7.9K becomes the new resistance. Downtrend  https://t.co/bSIWq0gqQo
1198082415068991489 2019-11-22 22:34:41 -0500 <CryptoHamsterIO> $BTC $BTCUSD #bitcoin  https://t.co/BAmdYzL83a
1197507459805519872 2019-11-21 08:30:01 -0500 <rektcapital> Less than 180 days until the next #Bitcoin Halving  Green boxes show the mid-November to early January period prior to each of $BTC's Halvings  Historically, $BTCUSD has enjoyed a few weeks of uninterrupted upside movement heading into the New Year  Will this time be different?  https://t.co/QEEMj17Djj
1197489759997833217 2019-11-21 07:19:41 -0500 <Yodaskk> $BTC $BTCUSD  I think both the structures at 3k bottom and right now are similar  -Fake-out to the downside into scampump (exagerated by the Xi pump) -Slow bleed on decreasing volume (less sellers) -Price  fake-out below 78.6% fib retracement into old support - Pump  https://t.co/

In [5]:
# function to return twint tweets columns

def available_columns():
    return twint.output.panda.Tweets_df.columns

In [6]:
# function to return columns to pandas dataframe

def twint_to_pandas(columns):
    return twint.output.panda.Tweets_df[columns]

In [7]:
available_columns()

Index(['id', 'conversation_id', 'created_at', 'date', 'timezone', 'place',
       'tweet', 'language', 'hashtags', 'cashtags', 'user_id', 'user_id_str',
       'username', 'name', 'day', 'hour', 'link', 'urls', 'photos', 'video',
       'thumbnail', 'retweet', 'nlikes', 'nreplies', 'nretweets', 'quote_url',
       'search', 'near', 'geo', 'source', 'user_rt_id', 'user_rt',
       'retweet_id', 'reply_to', 'retweet_date', 'translate', 'trans_src',
       'trans_dest'],
      dtype='object')

In [8]:
# create dataframe from tweets

df_tweets = twint_to_pandas(["date", "username", "tweet"])

In [9]:
# reset index

df_tweets.reset_index(drop=True, inplace=True)

In [10]:
df_tweets.head(20)

Unnamed: 0,date,username,tweet
0,2020-11-15 03:07:47,DaveCrypto83,#bitcoin #ew update $BTCUSD 1 day and 4hr char...
1,2020-11-14 15:02:03,Ozono_Merval,$BTCUSD por cruces de MM fueron compra las zon...
2,2020-11-14 14:45:59,Ozono_Merval,No vas a encontrar a otro loco que hable de AT...
3,2020-11-14 11:45:03,RadioSilentplay,$BTCUSD 15982 UPDATE: Monthly Bull Run: Elli...
4,2020-11-14 07:34:08,FeraSY1,#Bitcoin 4H TF 2 Important zone to watch for a...
5,2020-11-13 15:21:52,RadioSilentplay,$MARA 2.44 just broke all MA here $BTCUSD ex...
6,2020-11-13 05:15:16,Yodaskk,$btc $btcusd #BTC 2017 20k top &amp; today's...
7,2020-11-13 02:53:12,FeraSY1,$BTC.D ( #Bitcoin Dominance) 16 Hours till ...
8,2020-11-12 21:29:51,canuck2usa,"$BTCUSD - I realize some of you DGAF about it,..."
9,2020-11-12 15:31:57,FeraSY1,"#Bitcoin HTFs Chart Weekly/Monthly based, No ..."


In [11]:
df_tweets.shape

(464, 3)

In [13]:
df_tweets.index

RangeIndex(start=0, stop=464, step=1)

In [14]:
df_tweets.head()

Unnamed: 0,date,username,tweet
0,2020-11-15 03:07:47,DaveCrypto83,#bitcoin #ew update $BTCUSD 1 day and 4hr char...
1,2020-11-14 15:02:03,Ozono_Merval,$BTCUSD por cruces de MM fueron compra las zon...
2,2020-11-14 14:45:59,Ozono_Merval,No vas a encontrar a otro loco que hable de AT...
3,2020-11-14 11:45:03,RadioSilentplay,$BTCUSD 15982 UPDATE: Monthly Bull Run: Elli...
4,2020-11-14 07:34:08,FeraSY1,#Bitcoin 4H TF 2 Important zone to watch for a...


In [15]:
# save tweets dataframe to csv file

df_tweets.to_csv('tweets_btcusd.csv')

In [18]:
# pull tweets dataframe in to make new dataframe

df1_tweets=pd.read_csv("tweets_btcusd.csv", index_col=[0], parse_dates=True)

In [19]:
df1_tweets.shape

(464, 3)

In [20]:
df1_tweets.head()

Unnamed: 0,date,username,tweet
0,2020-11-15 03:07:47,DaveCrypto83,#bitcoin #ew update $BTCUSD 1 day and 4hr char...
1,2020-11-14 15:02:03,Ozono_Merval,$BTCUSD por cruces de MM fueron compra las zon...
2,2020-11-14 14:45:59,Ozono_Merval,No vas a encontrar a otro loco que hable de AT...
3,2020-11-14 11:45:03,RadioSilentplay,$BTCUSD 15982 UPDATE: Monthly Bull Run: Elli...
4,2020-11-14 07:34:08,FeraSY1,#Bitcoin 4H TF 2 Important zone to watch for a...


In [21]:
# set index as "date" column

df1_tweets.set_index(['date'], inplace=True)

In [22]:
df1_tweets.head()

Unnamed: 0_level_0,username,tweet
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-11-15 03:07:47,DaveCrypto83,#bitcoin #ew update $BTCUSD 1 day and 4hr char...
2020-11-14 15:02:03,Ozono_Merval,$BTCUSD por cruces de MM fueron compra las zon...
2020-11-14 14:45:59,Ozono_Merval,No vas a encontrar a otro loco que hable de AT...
2020-11-14 11:45:03,RadioSilentplay,$BTCUSD 15982 UPDATE: Monthly Bull Run: Elli...
2020-11-14 07:34:08,FeraSY1,#Bitcoin 4H TF 2 Important zone to watch for a...


In [23]:
# import re for regular expression operations to clean tweets

import re

In [24]:
# clean tweets and add new cleaned tweets column

df1_tweets['tweet_text_clean']=df1_tweets['tweet'].apply(lambda row: ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", row).split()))

In [25]:
df1_tweets.head(20)

Unnamed: 0_level_0,username,tweet,tweet_text_clean
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-11-15 03:07:47,DaveCrypto83,#bitcoin #ew update $BTCUSD 1 day and 4hr char...,bitcoin ew update BTCUSD 1 day and 4hr charted...
2020-11-14 15:02:03,Ozono_Merval,$BTCUSD por cruces de MM fueron compra las zon...,BTCUSD por cruces de MM fueron compra las zona...
2020-11-14 14:45:59,Ozono_Merval,No vas a encontrar a otro loco que hable de AT...,No vas a encontrar a otro loco que hable de AT...
2020-11-14 11:45:03,RadioSilentplay,$BTCUSD 15982 UPDATE: Monthly Bull Run: Elli...,BTCUSD 15982 UPDATE Monthly Bull Run Elliott W...
2020-11-14 07:34:08,FeraSY1,#Bitcoin 4H TF 2 Important zone to watch for a...,Bitcoin 4H TF 2 Important zone to watch for an...
2020-11-13 15:21:52,RadioSilentplay,$MARA 2.44 just broke all MA here $BTCUSD ex...,MARA 2 44 just broke all MA here BTCUSD expect...
2020-11-13 05:15:16,Yodaskk,$btc $btcusd #BTC 2017 20k top &amp; today's...,btc btcusd BTC 2017 20k top amp today s Price ...
2020-11-13 02:53:12,FeraSY1,$BTC.D ( #Bitcoin Dominance) 16 Hours till ...,BTC D Bitcoin Dominance 16 Hours till current ...
2020-11-12 21:29:51,canuck2usa,"$BTCUSD - I realize some of you DGAF about it,...",BTCUSD I realize some of you DGAF about it but...
2020-11-12 15:31:57,FeraSY1,"#Bitcoin HTFs Chart Weekly/Monthly based, No ...",Bitcoin HTFs Chart Weekly Monthly based No Muc...


## Getting the sentiment for each tweet

The sentiment was gathered using mean polarity from TextBlobs. Each tweet for each day that mentioned "BTC" was given either the characteristic of positive or negative. Polarity being the degree of emotion expressed through the tweet. 

- A negative polarity is anything less than 0. 
- A polarity equal to 0 is neutral. 
- A positive polarity is more than 0. 

The polarity is then aggregated for each day, as we are trying to only predict the daily price.

In [26]:
# import TextBlob for sentiment analysis 

from textblob import TextBlob

In [27]:
# get sentiment for each clean tweet in each row

df1_tweets['tweet_sentiment']=df1_tweets['tweet_text_clean'].apply(
            lambda row: 1 if TextBlob(row).sentiment.polarity>0 else(0 if TextBlob(row).sentiment.polarity==0 else -1))

In [28]:
df1_tweets.tail()

Unnamed: 0_level_0,username,tweet,tweet_text_clean,tweet_sentiment
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-09-06 12:10:16,SimmonsSupreme,Pretty amazing ROI in #tier3 over the last few...,Pretty amazing ROI in tier3 over the last few ...,1
2019-09-06 11:26:02,SwenLink,$BTCUSD: Skeptical of this breakout. Would pr...,BTCUSD Skeptical of this breakout Would prefer...,-1
2019-09-03 07:42:44,cointradernik,This would shaft the most market participants:...,This would shaft the most market participants ...,1
2019-09-03 05:49:09,CryptoHamsterIO,"$BTCUSD, 1D, RSI Every time, when there is a b...",BTCUSD 1D RSI Every time when there is a break...,0
2019-09-02 00:59:39,loomdart,"the ""6k break"" for $BTCUSD was preceded by a s...",the 6k break for BTCUSD was preceded by a stee...,-1


In [29]:
# drop time sequence from each row in index

df1_tweets.index = pd.to_datetime(df1_tweets.index)

In [30]:
# create new dataframe from resampling daily, with mean for each day

tweets_indicator_df=df1_tweets.resample('D').tweet_sentiment.mean()

In [31]:
tweets_indicator_df.head()

date
2019-09-02   -1.0
2019-09-03    0.5
2019-09-04    NaN
2019-09-05    NaN
2019-09-06    0.0
Freq: D, Name: tweet_sentiment, dtype: float64

In [37]:
# use foward fill method to replace NaN values

tweets_indicator_df = tweets_indicator_df.fillna(method='ffill')

In [38]:
type(tweets_indicator_df)

pandas.core.series.Series

In [39]:
tweets_indicator_df.shape

(441,)

In [40]:
# save tweet sentiment indicator to csv file

tweets_indicator_df.to_csv('btc_sentiment.csv')