In [27]:
%matplotlib inline
import matplotlib.pyplot as plt

import seaborn as sns
from IPython import display

import pandas as pd
import twitter

A basic twitter grab and do something. 

## make a twitter dev account and get api keys

First, we need access to the twitter api, which one gets over at [twitter's dev site](https://dev.twitter.com/). Sign up as a dev, then [go to the twitter apps site](https://apps.twitter.com/) and click create a new app. This gives you four, yes four thingamjigs u need to access the API. Why four? why can't it just one thing? 

Now this notebook is in github, so step 1 is to put all four of the secret codes in a file which doesn't get uploaded to github. Twitter has a [built in module called configparser](https://docs.python.org/3/library/configparser.html) which parses config files, so I have a config.ini txt file which looks like:

```
[twitter]

c_key = this_is_a_fake_to_be_replaced_by_real_thingamajig
c_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig 

a_token = this_is_a_fake_to_be_replaced_by_real_thingamajig
a_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig
```

### Now to read the keys into our python script/notebook

In [17]:
# api keys are in config.ini to keep them outside of this public notebook
import configparser
config = configparser.ConfigParser()
config.read('config.ini')

print(f'The config file has the following sections: {config.sections()}')

if "twitter" in config:
    twit = config['twitter']

# check to see if we got all the keys needed to access the twitter api
[key for key in twit]

The config file has the following sections: ['twitter']


['c_key', 'c_secret', 'a_token', 'a_secret']

## using python to access the twitter api

Now, there are many [twitter api libraries](https://dev.twitter.com/resources/twitter-libraries) but 
I'm using the [python-twitter module](https://github.com/bear/python-twitter), just cause it seems popular and is the first one listed under python libraries.

In [26]:
## define the necessary keys
cKey = twit["c_key"]
cSecret = twit["c_secret"]
aKey = twit["a_token"]
aSecret = twit["a_secret"]

## create the api object with the twitter-python library
api = twitter.Api(consumer_key=cKey,
                  consumer_secret=cSecret,
                  access_token_key=aKey,
                  access_token_secret=aSecret)
api.VerifyCredentials()

User(ID=7914, ScreenName=KO)

All right! we have a succesful api connection to twitter!

### get tweets from a user

this grabs the tweets alongs with a bunch of metadata for each tweet:

In [111]:
## get the user timeline with screen_name = 'KO'
statuses = api.GetUserTimeline(screen_name = 'KO')
print(f"so we got {len(statuses)} statuses, printing the first:")
status = [s for s in statuses][0]
status

so we got 20 statuses, printing the first:


Status(ID=895087330561675264, ScreenName=KO, Created=Wed Aug 09 01:00:24 +0000 2017, Text='@ahmadomar55 @abido \n\nyour iPhone is about get very dated: https://t.co/sxH8Dc1Ev1 https://t.co/knpDHcHtEh')

So each status is a [class representing the twitter status object](http://python-twitter.readthedocs.io/en/latest/twitter.html#twitter.models.Status).

Now, the status object can be resturned as a dictionary, which is handy since we can use that to build a pandas dataframe:

In [109]:
## create a data frame
## first get a list of panda Series
pdSeriesList = [pd.Series(t.AsDict()) for t in statuses]

## then create the data frame
data = pd.DataFrame(pdSeriesList)

data.head(2)

Unnamed: 0,created_at,favorite_count,favorited,hashtags,id,id_str,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,lang,...,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,urls,user,user_mentions
0,Sat Aug 05 01:08:43 +0000 2017,,,[],893639875223666688,893639875223666688,sharmeenalikhan,8.935853e+17,321927176.0,en,...,,,,,,"<a href=""http://twitter.com/download/android"" ...",@sharmeenalikhan That's how I felt when the Mo...,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 321927176, 'name': 'sharmeen', 'screen..."
1,Sat Aug 05 01:08:04 +0000 2017,,True,[],893639709515108353,893639709515108353,,,,en,...,8.936198e+17,8.936197627601674e+17,3.0,True,{'created_at': 'Fri Aug 04 23:50:35 +0000 2017...,"<a href=""http://twitter.com/download/android"" ...",RT @shakirhusain: Take a bow @afewmofilms http...,[{'expanded_url': 'https://twitter.com/godfath...,{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 88913700, 'name': 'Shakir Husain', 'sc..."


In [103]:
data.columns

Index(['created_at', 'favorite_count', 'favorited', 'hashtags', 'id', 'id_str',
       'in_reply_to_screen_name', 'in_reply_to_status_id',
       'in_reply_to_user_id', 'lang', 'quoted_status', 'quoted_status_id',
       'quoted_status_id_str', 'retweet_count', 'retweeted',
       'retweeted_status', 'source', 'text', 'truncated', 'urls', 'user',
       'user_mentions'],
      dtype='object')

## grabbing more tweets

See [twitter timeline doc](https://dev.twitter.com/rest/public/timelines) - this says you can grab at most 200 tweets in one request, for a max of 3,200 tweets altogether.

Now we only grabbed the first 20 tweets with the above, so we need a function which keeps making requests for tweets until we hit twitters 3,200 tweet limit:

In [161]:
def get_tweets(user="KO", limit=50):
    # initial batch of tweets
    statuses = api.GetUserTimeline(screen_name = user, count=limit)
    
    ## create a data frame
    ## first get a list of panda Series
    pdSeriesList = [pd.Series(t.AsDict()) for t in statuses]

    ## then create the data frame
    tweets = pd.DataFrame(pdSeriesList)

    # now to grab the older ones
    
    while len(statuses) >= 20:
        # get the last tweet id and subtract one to make sure we don't get a duplicate tweet
        last_tweet_id = tweets.tail(1)["id"].values[0] -1
        statuses = api.GetUserTimeline(screen_name = 'KO', max_id=last_tweet_id, count=limit)
        
        pdSeriesList = [pd.Series(t.AsDict()) for t in statuses]
        tweets = tweets.append(pdSeriesList, ignore_index=True)
        
    return tweets

tweets = get_tweets()

In [162]:
print(tweets.shape)
tweets.head()

(3204, 23)


Unnamed: 0,created_at,favorite_count,favorited,hashtags,id,id_str,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,lang,...,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,urls,user,user_mentions
0,Wed Aug 09 01:00:24 +0000 2017,,,[],895087330561675264,895087330561675264,ahmadomar55,,259198715.0,en,...,,,,,"<a href=""http://itunes.apple.com/us/app/twitte...",@ahmadomar55 @abido \n\nyour iPhone is about g...,,[{'expanded_url': 'http://appleinsider.com/art...,{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 259198715, 'name': 'AO', 'screen_name'..."
1,Tue Aug 08 04:08:05 +0000 2017,,True,[{'text': 'longreads'}],894772174766174208,894772174766174208,,,,en,...,,57.0,True,{'created_at': 'Tue Aug 08 03:27:09 +0000 2017...,"<a href=""http://nuzzel.com/"" rel=""nofollow"">Nu...","RT @NickBryantNY: ""How America lost its mind"" ...",,[{'expanded_url': 'https://www.theatlantic.com...,{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 289236003, 'name': 'Nick Bryant', 'scr..."
2,Tue Aug 08 04:02:25 +0000 2017,,,[],894770750716084225,894770750716084225,,,,en,...,8.944536732329286e+17,4.0,True,{'created_at': 'Mon Aug 07 07:09:43 +0000 2017...,"<a href=""http://nuzzel.com/"" rel=""nofollow"">Nu...",RT @shakirhusain: This is an excellent piece. ...,,[{'expanded_url': 'https://twitter.com/titojou...,{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 88913700, 'name': 'Shakir Husain', 'sc..."
3,Tue Aug 08 03:42:24 +0000 2017,,,[],894765713004478465,894765713004478465,,,,en,...,,1504.0,True,{'created_at': 'Tue Aug 08 00:58:05 +0000 2017...,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",RT @KrangTNelson: Henry Kissinger is an evil m...,,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 4517565515, 'name': 'LT. COLONEL KRANG..."
4,Tue Aug 08 01:04:29 +0000 2017,,,[],894725970669613058,894725970669613058,,,,en,...,,42.0,True,{'created_at': 'Fri Jul 28 13:43:12 +0000 2017...,"<a href=""http://itunes.apple.com/us/app/twitte...",RT @ZachJCarter: Placing sanctions on Iran for...,,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 755826044225986560, 'name': 'Zach Cart..."


## we got tweets in a dataframe! 

In [220]:
t = [u for u in tweets['text'].values]
t[:3]

['@ahmadomar55 @abido \n\nyour iPhone is about get very dated: https://t.co/sxH8Dc1Ev1 https://t.co/knpDHcHtEh',
 'RT @NickBryantNY: "How America lost its mind" - brilliant essay by Kurt Andersen #longreads @TheAtlantic  https://t.co/OC2qTYITND',
 'RT @shakirhusain: This is an excellent piece. Forgot to mention private doctors who operate on cash only. https://t.co/s2laWXIWMr']