# Quick Analysis of Bitcoin Tweets from Tweepy streaming API, Pandas and Plot.ly


This notebook is a simple analysis of some tweets I collected using the tweepy API that tracks a given key word in this case 'bitcoin' writes selected fields from the twitter json data set into a csv which we then read into a pandas dataframe for analysis.

[How Do People Feel?](#How-Do-People-Feel?)

[Tweets Over Time](#Tweets-Over-Time)

[Tweet Mentions](#Tweet-Mentions)

[Top 20 Tweeters with most followers](#Top-20-Tweeters-with-most-followers)

[Tweets Per User Per Hour](#Tweets-Per-User-Per-Hour)


In [5]:
#Setup the Environment by importing the libraries needed for this notebook

import sys
import os
import pandas as pd
import cufflinks as cf
import plotly as py
import plotly.graph_objs as go
import datetime
from plotly.offline import download_plotlyjs, init_notebook_mode,plot,iplot #allow offline and #notebook 


In [6]:
init_notebook_mode(connected=True) #allow plotly graphs to work inside notebook

In [7]:
column_name=['created_at','text','screen_name','followers_count','favourite_count','hashtaglist','device_used','feelings']

In [8]:
bitcoin_tweets='../input/bitcointweets.csv'
tweet=pd.read_csv(bitcoin_tweets,parse_dates=['created_at'],encoding='utf-8',names=column_name) #parse dates

In [9]:
pd.set_option('display.max_columns',None)
pd.set_option('display.expand_frame_repr', False)

In [10]:
tweet.head()

In [11]:
tweet=tweet[~tweet['text'].str.contains('RT')] #remove retweets

In [12]:
tweet.head()

In [13]:
tweet['hashtaglist']=tweet['hashtaglist'].str.strip('[]') #cleanup square brackets
tweet['feelings']=tweet['feelings'].str.strip('[]')

In [14]:
tweet.head() #data looks ok now. we can ignore the utf characters

## How Do People Feel? 

This column uses the sentiment.polarity function of the TextBlob library to ascertain the sentiment of tweet text. 
A polarity less than 0 is negative, a polarity of 0 is neutral and anything more is considered positive. 


In [19]:
feelings=tweet.groupby(['feelings'])['feelings'].count().reset_index(name='feelings_count')

In [20]:
feelings.head()

In [23]:
#feelings.iplot(kind='bar',color="blue")
iplot([go.Bar(x=feelings['feelings'],y=feelings['feelings_count'])])

Overall it looks like most tweets are in the neutral category

## Tweets Over Time

To find this first we need to convert the created_at to local time zone. This means we need to do UTC+ 8 as I am in Singapore therefore I wish to see the local time from where I am located. From there we can create a column to compute the hour of day and then calculate the number of tweets. 

In [24]:
utc_offset = +8 #singapore time 
tweet['Local_Time']=tweet.created_at + pd.to_timedelta(utc_offset,unit='h')

In [25]:
tweet.head() # checking our new column 

In [26]:
tweet['hour']=[time.hour for time in tweet.Local_Time]

In [27]:
tweet.head() #great we have the hour column

In [28]:
tweets_per_hour=tweet.groupby(['hour'])['text'].count().reset_index(name='tweets_count') # grouped to get number of tweets from author per hour

In [33]:
#tweets_per_hour.iplot(kind='scatter',x='hour',y='tweets_count',title="Tweets Per Hour",xTitle="Hour",yTitle="Tweet Count",color='green')
iplot([go.Scatter(x=tweets_per_hour['hour'],y=tweets_per_hour['tweets_count'])])

15 (3pm) hours seems to be a popular time for tweets.

## Tweet Mentions

A tweet mention is any tweet whereby a tweeter has called out to  another. We search for tweet text that starts with '@'

In [35]:
tweet['mention']=tweet['text'].str.startswith('@') #returns a boolean for all that match and adds new column

In [36]:
tweet.head() # notice new column 'mention'

In [37]:
# check if mention is true
mentions=tweet[tweet['mention']==True].groupby(['screen_name','mention'])['mention'].count().sort_values(ascending=False).reset_index(name='mentions').head(20)

In [40]:
#mentions.iplot(kind='bar',x='screen_name',y='mentions',color='darkblue',xTitle='Tweet Account',yTitle="Mentions Count",title="Top 20 Tweeters that have mentioned")
iplot([go.Bar(x=mentions['screen_name'],y=mentions['mentions'])])

## Top 20 Tweeters with most followers

In [41]:
followers=tweet[['screen_name','followers_count']].sort_values(by='followers_count',ascending=False).drop_duplicates('screen_name').head(20)

In [42]:
followers

What we did here, was first create a dataframe that consists of only the screen_name and followers_count, we ordered it by ascending false so that the largest number is at the top and then we dropped duplicates for the screen_name.  
We can now plot this 

In [43]:
#followers.iplot(kind='bar',x='screen_name',xTitle='Tweet Account',yTitle='Number Of Followers', title="Top 20 With Most Followers",color='purple')
iplot([go.Bar(x=followers['screen_name'],y=followers['followers_count'])])

## Tweets Per User Per Hour

For this we can do a simple bubble chart, with four data points,  x axis will be hour of day, y axis will be the numbe of tweets, legend will show the twitter accounts

In [46]:

tweet_per_account=tweet.groupby(['screen_name','hour'])['text'].count().sort_values(ascending=False).reset_index(name='num_of_tweets').head(20)

In [51]:
tweet_per_account.head()

In [88]:
accounts=list(set(tweet_per_account.screen_name))

In [89]:
traces=[]

In [90]:
for account in accounts:
    df=tweet_per_account[tweet_per_account['screen_name'].isin([account])]
    traces.append(go.Bar(x=df['hour'],y=df.num_of_tweets,name=account))

In [91]:
iplot({'data': traces,'layout':go.Layout(title='Tweets per account',barmode='grouped',xaxis={'tickangle': 30},margin={'b': 100})})