# Quick Analysis of Bitcoin Tweets from Tweepy streaming API, Pandas and Plot.ly


This notebook is a simple analysis of some tweets I collected using the tweepy API that tracks a given key word in this case 'bitcoin' writes selected fields from the twitter json data set into a csv which we then read into a pandas dataframe for analysis.

[How Do People Feel?](#How-Do-People-Feel?)

[Tweets Over Time](#Tweets-Over-Time)

[Tweet Mentions](#Tweet-Mentions)

[Top 20 Tweeters with most followers](#Top-20-Tweeters-with-most-followers)

[Tweets Per User Per Hour](#Tweets-Per-User-Per-Hour)


In [83]:
#Setup the Environment by importing the libraries needed for this notebook

import sys
import os
import pandas as pd
import cufflinks as cf
import plotly as py
import plotly.graph_objs as go
import datetime
from plotly.offline import download_plotlyjs, init_notebook_mode,plot,iplot #allow offline and #notebook 


In [84]:
init_notebook_mode(connected=True) #allow plotly graphs to work inside notebook

In [85]:
column_name=['created_at','text','screen_name','followers_count','favourite_count','hashtaglist','device_used','feelings']

In [86]:
bitcoin_tweets='bitcointweets.csv'
tweet=pd.read_csv(bitcoin_tweets,parse_dates=['created_at'],encoding='utf-8',names=column_name) #parse dates

In [87]:
pd.set_option('display.max_columns',None)
pd.set_option('display.expand_frame_repr', False)

In [88]:
tweet.head()

Unnamed: 0,created_at,text,screen_name,followers_count,favourite_count,hashtaglist,device_used,feelings
0,2018-03-23 00:40:32,"RT @ALXTOKEN: Paul Krugman, Nobel Luddite. I h...",myresumerocket,16522,0,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['neutral']
1,2018-03-23 00:40:34,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,BitMocro,1295,0,[u'Bitcoin'],"<a href=""http://twitter.com/download/android"" ...",['neutral']
2,2018-03-23 00:40:35,RT @tippereconomy: Another use case for #block...,hojachotopur,6090,0,"[u'blockchain', u'Tipper', u'TipperEconomy']","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive']
3,2018-03-23 00:40:36,free coins https://t.co/DiuoePJdap,denies_distro,2626,0,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive']
4,2018-03-23 00:40:36,RT @payvxofficial: WE are happy to announce th...,aditzgraha,184,0,[],"<a href=""http://twitter.com/download/android"" ...",['positive']


In [89]:
tweet=tweet[~tweet['text'].str.contains('RT')] #remove retweets

In [90]:
tweet.head()

Unnamed: 0,created_at,text,screen_name,followers_count,favourite_count,hashtaglist,device_used,feelings
1,2018-03-23 00:40:34,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,BitMocro,1295,0,[u'Bitcoin'],"<a href=""http://twitter.com/download/android"" ...",['neutral']
3,2018-03-23 00:40:36,free coins https://t.co/DiuoePJdap,denies_distro,2626,0,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive']
5,2018-03-23 00:40:36,Copy successful traders automatically with Bit...,VictorS61164810,14,0,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive']
7,2018-03-23 00:40:37,One click to start mining cryptocurrencies tog...,cloud_speaker,6560,0,"[u'bitcoin', u'PaaS', u'cloudnetwork']","<a href=""http://itunes.apple.com/us/app/twitte...",['neutral']
8,2018-03-23 00:40:38,"first speaker @digitsu\n\n""how we can get bitc...",MADinMelbourne,2991,0,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive']


In [91]:
tweet['hashtaglist']=tweet['hashtaglist'].str.strip('[]') #cleanup square brackets
tweet['feelings']=tweet['feelings'].str.strip('[]')

In [92]:
tweet.head() #data looks ok now. we can ignore the utf characters

Unnamed: 0,created_at,text,screen_name,followers_count,favourite_count,hashtaglist,device_used,feelings
1,2018-03-23 00:40:34,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,BitMocro,1295,0,u'Bitcoin',"<a href=""http://twitter.com/download/android"" ...",'neutral'
3,2018-03-23 00:40:36,free coins https://t.co/DiuoePJdap,denies_distro,2626,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive'
5,2018-03-23 00:40:36,Copy successful traders automatically with Bit...,VictorS61164810,14,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive'
7,2018-03-23 00:40:37,One click to start mining cryptocurrencies tog...,cloud_speaker,6560,0,"u'bitcoin', u'PaaS', u'cloudnetwork'","<a href=""http://itunes.apple.com/us/app/twitte...",'neutral'
8,2018-03-23 00:40:38,"first speaker @digitsu\n\n""how we can get bitc...",MADinMelbourne,2991,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive'


## How Do People Feel? 

This column uses the sentiment.polarity function of the TextBlob library to ascertain the sentiment of tweet text. 
A polarity less than 0 is negative, a polarity of 0 is neutral and anything more is considered positive. 


In [93]:
feelings=tweet.groupby(['feelings'])['feelings'].count()

In [94]:
feelings.head()

feelings
'negative'     3177
'neutral'     10894
'positive'     8936
Name: feelings, dtype: int64

In [95]:
feelings.iplot(kind='bar',color="blue")

Overall it looks like most tweets are in the neutral category

## Tweets Over Time

To find this first we need to convert the created_at to local time zone. This means we need to do UTC+ 8 as I am in Singapore therefore I wish to see the local time from where I am located. From there we can create a column to compute the hour of day and then calculate the number of tweets. 

In [96]:
utc_offset = +8 #singapore time 
tweet['Local_Time']=tweet.created_at + pd.to_timedelta(utc_offset,unit='h')

In [97]:
tweet.head() # checking our new column 

Unnamed: 0,created_at,text,screen_name,followers_count,favourite_count,hashtaglist,device_used,feelings,Local_Time
1,2018-03-23 00:40:34,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,BitMocro,1295,0,u'Bitcoin',"<a href=""http://twitter.com/download/android"" ...",'neutral',2018-03-23 08:40:34
3,2018-03-23 00:40:36,free coins https://t.co/DiuoePJdap,denies_distro,2626,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:36
5,2018-03-23 00:40:36,Copy successful traders automatically with Bit...,VictorS61164810,14,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:36
7,2018-03-23 00:40:37,One click to start mining cryptocurrencies tog...,cloud_speaker,6560,0,"u'bitcoin', u'PaaS', u'cloudnetwork'","<a href=""http://itunes.apple.com/us/app/twitte...",'neutral',2018-03-23 08:40:37
8,2018-03-23 00:40:38,"first speaker @digitsu\n\n""how we can get bitc...",MADinMelbourne,2991,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:38


In [98]:
tweet['hour']=[time.hour for time in tweet.Local_Time]

In [99]:
tweet.head() #great we have the hour column

Unnamed: 0,created_at,text,screen_name,followers_count,favourite_count,hashtaglist,device_used,feelings,Local_Time,hour
1,2018-03-23 00:40:34,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,BitMocro,1295,0,u'Bitcoin',"<a href=""http://twitter.com/download/android"" ...",'neutral',2018-03-23 08:40:34,8
3,2018-03-23 00:40:36,free coins https://t.co/DiuoePJdap,denies_distro,2626,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:36,8
5,2018-03-23 00:40:36,Copy successful traders automatically with Bit...,VictorS61164810,14,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:36,8
7,2018-03-23 00:40:37,One click to start mining cryptocurrencies tog...,cloud_speaker,6560,0,"u'bitcoin', u'PaaS', u'cloudnetwork'","<a href=""http://itunes.apple.com/us/app/twitte...",'neutral',2018-03-23 08:40:37,8
8,2018-03-23 00:40:38,"first speaker @digitsu\n\n""how we can get bitc...",MADinMelbourne,2991,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:38,8


In [100]:
tweets_per_hour=tweet.groupby(['hour'])['text'].count().reset_index(name='tweets_count') # grouped to get number of tweets from author per hour

In [101]:
tweets_per_hour.iplot(kind='scatter',x='hour',y='tweets_count',title="Tweets Per Hour",xTitle="Hour",yTitle="Tweet Count",color='green')

15 (3pm) hours seems to be a popular time for tweets.

## Tweet Mentions

A tweet mention is any tweet whereby a tweeter has called out to  another. We search for tweet text that starts with '@'

In [102]:
tweet['mention']=tweet['text'].str.startswith('@') #returns a boolean for all that match and adds new column

In [103]:
tweet.head() # notice new column 'mention'

Unnamed: 0,created_at,text,screen_name,followers_count,favourite_count,hashtaglist,device_used,feelings,Local_Time,hour,mention
1,2018-03-23 00:40:34,@lopp @_Kevin_Pham @psycho_sage @naval But @Pr...,BitMocro,1295,0,u'Bitcoin',"<a href=""http://twitter.com/download/android"" ...",'neutral',2018-03-23 08:40:34,8,True
3,2018-03-23 00:40:36,free coins https://t.co/DiuoePJdap,denies_distro,2626,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:36,8,False
5,2018-03-23 00:40:36,Copy successful traders automatically with Bit...,VictorS61164810,14,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:36,8,False
7,2018-03-23 00:40:37,One click to start mining cryptocurrencies tog...,cloud_speaker,6560,0,"u'bitcoin', u'PaaS', u'cloudnetwork'","<a href=""http://itunes.apple.com/us/app/twitte...",'neutral',2018-03-23 08:40:37,8,False
8,2018-03-23 00:40:38,"first speaker @digitsu\n\n""how we can get bitc...",MADinMelbourne,2991,0,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",'positive',2018-03-23 08:40:38,8,False


In [104]:
# check if mention is true
mentions=tweet[tweet['mention']==True].groupby(['screen_name','mention'])['mention'].count().sort_values(ascending=False).reset_index(name='mentions').head(20)

In [105]:
mentions.iplot(kind='bar',x='screen_name',y='mentions',color='darkblue',xTitle='Tweet Account',yTitle="Mentions Count",title="Top 20 Tweeters that have mentioned")

## Top 20 Tweeters with most followers

In [106]:
followers=tweet[['screen_name','followers_count']].sort_values(by='followers_count',ascending=False).drop_duplicates('screen_name').head(20)

In [107]:
followers

Unnamed: 0,screen_name,followers_count
50368,TechCrunch,10194437
37077,RT_com,2682400
6986,TheNextWeb,1849938
8140,nucfootball,1476307
49740,AFP,1441588
8887,nypost,1370648
42235,ARYNEWSOFFICIAL,1286473
16816,PopSci,1264695
14515,icokingmaker,1250852
32036,GeorgeMentz,1247009


What we did here, was first create a dataframe that consists of only the screen_name and followers_count, we ordered it by ascending false so that the largest number is at the top and then we dropped duplicates for the screen_name.  
We can now plot this 

In [108]:
followers.iplot(kind='bar',x='screen_name',xTitle='Tweet Account',yTitle='Number Of Followers', title="Top 20 With Most Followers",color='purple')

## Tweets Per User Per Hour

For this we can do a simple bubble chart, with four data points,  x axis will be hour of day, y axis will be the numbe of tweets, size of the bubble will be the value of y and categories will be the screen_name

In [109]:

tweet_per_account=tweet.groupby(['screen_name','hour'])['text'].count().sort_values(ascending=False).reset_index(name='num_of_tweets').head(20)

In [110]:
tweet_per_account.iplot(kind='bubble',x='hour',y='num_of_tweets',size='num_of_tweets',categories='screen_name',text='screen_name',xTitle="Hour of day",yTitle='Tweet Count',title='Tweets per User')

In [111]:
hashtags=tweet.groupby(['hashtaglist','hour'])['hour'].count().sort_values(ascending=False).reset_index(name='hashtag_count').head(30)

In [112]:
hashtags.head()

Unnamed: 0,hashtaglist,hour,hashtag_count
0,,15,2060
1,,9,1722
2,,10,1594
3,,11,1548
4,,13,1524


In [113]:
hashtags[9::].iplot(kind='bubble',x='hour',y='hashtag_count',size='hashtag_count',categories='hashtaglist')