## Notebook 1 - Get Tweets using GetOldTweets
The purpose of this notebook is to gather the tweets. Scott Tarlow, a contact shared with me by my advisor Joshua Cook, directed me to this GetOldTweets-python repo (https://github.com/Jefferson-Henrique/GetOldTweets-python) to circumvent the limits of the Twitter API. Here are some of the tweeks I implemented in GetOldTweets to tailor it to my needs:  
    
**Problem:**  
The functionality was written for Python2 and I'm using Python3.  

**Fix:**  
Replaced `urllib2` references with `urllib` in `getJsonReponse` function within `TweetManager.py`.   
    
**Problem:**  
I also discovered that the more tweets I requested, the time to retrieve them grew exponentially. 

**Fix:**  
Added functionality to pull tweets one month at a time. About 1 minute per 500 tweets compared to 100 minutes per 500.   

**Problem:**  
Originally, I wasn't getting tweets from the last day of each month because the end date is not inclusive.  

**Fix:**  
With the exception of the very first month, each month starts with the end date of the previous month.  
  
**Problem:**  
Tweets sent as replies to other tweets were being included. I only want original tweets.  
  
**Fix:**  
I added a check in `getTweets` function within `TweetManager.py` and it skips replies.  
  
**Problem:**  
The original code references a section of the Twitter API repsonse that no longer exists. It led to empty username in data frame.  
  
**Fix:**  
I relocated where username is in the response, and updated the code.

In [1]:
cd /home/jovyan/capstone/GetOldTweets-python/

/home/jovyan/capstone/GetOldTweets-python


In [2]:
import pandas as pd

In [3]:
import got3 as got

# def get_tweets(username, since=None, until=None):
def get_tweets(username):
    from collections import defaultdict
    d = defaultdict(list)
    
    attributes = ['date', 'favorites', 'retweets', 'hashtags', 'id', 'mentions',
                  'text', 'urls', 'username', 'permalink', 'author_id']

    # created to efficiently pull tweets one month at a time    
#     years = ['2017', '2016', '2015', '2014', '2013', '2012', '2011', '2010', '2009', '2008']
    years = ['2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017']
#     months = [('12', '31'), ('11', '30'), ('10', '31'), ('09', '30'), ('08', '31'), ('07', '31'),
#               ('06', '30'), ('05', '31'), ('04', '30'), ('03', '31'), ('02', '28'), ('01', '31')]
    months = [('01', '31'), ('02', '28'), ('03', '31'), ('04', '30'), ('05', '31'), ('06', '30'),
              ('07', '31'), ('08', '31'), ('09', '30'), ('10', '31'), ('11', '30'), ('12', '31')]
    total = 0
    for year in years:
        for month in months:
            # account for non-inclusive end of month by starting where previous month left off
            if year == '2008' and month[0] == '01':
                since = year + '-' + month[0] + '-01'
            else:
                since = until
            until = year + '-' + month[0] + '-' + month[1]
            tweetCriteria = got.manager.TweetCriteria().setUsername(username).setSince(since).setUntil(until)
            tweet_list = got.manager.TweetManager.getTweets(tweetCriteria)                                                                                             
            print('{} tweets from {} to {}.'.format(len(tweet_list), since, until))
            total += len(tweet_list)
            for tweet in tweet_list:
                for att in attributes:
                    d[att].append(eval("tweet." + att))
#         print('Year: {}, Running Total: {}'.format(year, total))
    print('{} total tweets for {}.'.format(total, username))
    return pd.DataFrame(d)

In [4]:
def get_and_export(filename, handle):
    data = get_tweets(str(handle))
    data.to_pickle('../data/' + str(filename) + '_tweets.p')

In [5]:
# list of tuples containing filename for pickle and twitter handle
all_spcas = [('sfspca', 'sfspca'), ('pspca', 'PSPCA'), ('houston', 'HoustonSPCA'), ('texas', 'spcaoftexas'),
            ('tulsa', 'Tulsa_SPCA'), ('richmond', 'RichmondSPCA'), ('ontario', 'OntarioSPCA'), 
            ('alberta', 'FMSPCA'), ('bc', 'BC_SPCA')]

In [6]:
count = 0
for filename, handle in all_spcas:
    count += 1
    print('Pulling {} of {}: {}'.format(count, len(all_spcas), handle))
    get_and_export(filename, handle)

Pulling 1 of 9: sfspca
0 tweets from 2008-01-01 to 2008-01-31.
0 tweets from 2008-01-31 to 2008-02-28.
0 tweets from 2008-02-28 to 2008-03-31.
0 tweets from 2008-03-31 to 2008-04-30.
0 tweets from 2008-04-30 to 2008-05-31.
0 tweets from 2008-05-31 to 2008-06-30.
0 tweets from 2008-06-30 to 2008-07-31.
0 tweets from 2008-07-31 to 2008-08-31.
0 tweets from 2008-08-31 to 2008-09-30.
0 tweets from 2008-09-30 to 2008-10-31.
0 tweets from 2008-10-31 to 2008-11-30.
0 tweets from 2008-11-30 to 2008-12-31.
0 tweets from 2008-12-31 to 2009-01-31.
0 tweets from 2009-01-31 to 2009-02-28.
21 tweets from 2009-02-28 to 2009-03-31.
21 tweets from 2009-03-31 to 2009-04-30.
24 tweets from 2009-04-30 to 2009-05-31.
95 tweets from 2009-05-31 to 2009-06-30.
26 tweets from 2009-06-30 to 2009-07-31.
51 tweets from 2009-07-31 to 2009-08-31.
54 tweets from 2009-08-31 to 2009-09-30.
62 tweets from 2009-09-30 to 2009-10-31.
36 tweets from 2009-10-31 to 2009-11-30.
50 tweets from 2009-11-30 to 2009-12-31.
14 twee

6 tweets from 2014-08-31 to 2014-09-30.
13 tweets from 2014-09-30 to 2014-10-31.
26 tweets from 2014-10-31 to 2014-11-30.
81 tweets from 2014-11-30 to 2014-12-31.
66 tweets from 2014-12-31 to 2015-01-31.
48 tweets from 2015-01-31 to 2015-02-28.
10 tweets from 2015-02-28 to 2015-03-31.
16 tweets from 2015-03-31 to 2015-04-30.
20 tweets from 2015-04-30 to 2015-05-31.
39 tweets from 2015-05-31 to 2015-06-30.
19 tweets from 2015-06-30 to 2015-07-31.
48 tweets from 2015-07-31 to 2015-08-31.
17 tweets from 2015-08-31 to 2015-09-30.
39 tweets from 2015-09-30 to 2015-10-31.
115 tweets from 2015-10-31 to 2015-11-30.
114 tweets from 2015-11-30 to 2015-12-31.
88 tweets from 2015-12-31 to 2016-01-31.
84 tweets from 2016-01-31 to 2016-02-28.
117 tweets from 2016-02-28 to 2016-03-31.
117 tweets from 2016-03-31 to 2016-04-30.
107 tweets from 2016-04-30 to 2016-05-31.
102 tweets from 2016-05-31 to 2016-06-30.
115 tweets from 2016-06-30 to 2016-07-31.
110 tweets from 2016-07-31 to 2016-08-31.
96 tweets

9 tweets from 2011-03-31 to 2011-04-30.
11 tweets from 2011-04-30 to 2011-05-31.
7 tweets from 2011-05-31 to 2011-06-30.
2 tweets from 2011-06-30 to 2011-07-31.
4 tweets from 2011-07-31 to 2011-08-31.
1 tweets from 2011-08-31 to 2011-09-30.
14 tweets from 2011-09-30 to 2011-10-31.
23 tweets from 2011-10-31 to 2011-11-30.
23 tweets from 2011-11-30 to 2011-12-31.
23 tweets from 2011-12-31 to 2012-01-31.
122 tweets from 2012-01-31 to 2012-02-28.
9 tweets from 2012-02-28 to 2012-03-31.
18 tweets from 2012-03-31 to 2012-04-30.
52 tweets from 2012-04-30 to 2012-05-31.
123 tweets from 2012-05-31 to 2012-06-30.
147 tweets from 2012-06-30 to 2012-07-31.
193 tweets from 2012-07-31 to 2012-08-31.
210 tweets from 2012-08-31 to 2012-09-30.
181 tweets from 2012-09-30 to 2012-10-31.
127 tweets from 2012-10-31 to 2012-11-30.
88 tweets from 2012-11-30 to 2012-12-31.
110 tweets from 2012-12-31 to 2013-01-31.
110 tweets from 2013-01-31 to 2013-02-28.
102 tweets from 2013-02-28 to 2013-03-31.
133 tweets f

0 tweets from 2008-01-01 to 2008-01-31.
0 tweets from 2008-01-31 to 2008-02-28.
0 tweets from 2008-02-28 to 2008-03-31.
0 tweets from 2008-03-31 to 2008-04-30.
0 tweets from 2008-04-30 to 2008-05-31.
0 tweets from 2008-05-31 to 2008-06-30.
0 tweets from 2008-06-30 to 2008-07-31.
0 tweets from 2008-07-31 to 2008-08-31.
0 tweets from 2008-08-31 to 2008-09-30.
0 tweets from 2008-09-30 to 2008-10-31.
0 tweets from 2008-10-31 to 2008-11-30.
0 tweets from 2008-11-30 to 2008-12-31.
0 tweets from 2008-12-31 to 2009-01-31.
38 tweets from 2009-01-31 to 2009-02-28.
49 tweets from 2009-02-28 to 2009-03-31.
35 tweets from 2009-03-31 to 2009-04-30.
38 tweets from 2009-04-30 to 2009-05-31.
43 tweets from 2009-05-31 to 2009-06-30.
42 tweets from 2009-06-30 to 2009-07-31.
32 tweets from 2009-07-31 to 2009-08-31.
33 tweets from 2009-08-31 to 2009-09-30.
43 tweets from 2009-09-30 to 2009-10-31.
43 tweets from 2009-10-31 to 2009-11-30.
44 tweets from 2009-11-30 to 2009-12-31.
46 tweets from 2009-12-31 to 

369 tweets from 2014-07-31 to 2014-08-31.
302 tweets from 2014-08-31 to 2014-09-30.
304 tweets from 2014-09-30 to 2014-10-31.
286 tweets from 2014-10-31 to 2014-11-30.
288 tweets from 2014-11-30 to 2014-12-31.
294 tweets from 2014-12-31 to 2015-01-31.
288 tweets from 2015-01-31 to 2015-02-28.
300 tweets from 2015-02-28 to 2015-03-31.
156 tweets from 2015-03-31 to 2015-04-30.
155 tweets from 2015-04-30 to 2015-05-31.
117 tweets from 2015-05-31 to 2015-06-30.
200 tweets from 2015-06-30 to 2015-07-31.
224 tweets from 2015-07-31 to 2015-08-31.
249 tweets from 2015-08-31 to 2015-09-30.
253 tweets from 2015-09-30 to 2015-10-31.
233 tweets from 2015-10-31 to 2015-11-30.
253 tweets from 2015-11-30 to 2015-12-31.
239 tweets from 2015-12-31 to 2016-01-31.
247 tweets from 2016-01-31 to 2016-02-28.
262 tweets from 2016-02-28 to 2016-03-31.
248 tweets from 2016-03-31 to 2016-04-30.
243 tweets from 2016-04-30 to 2016-05-31.
244 tweets from 2016-05-31 to 2016-06-30.
287 tweets from 2016-06-30 to 2016

48 tweets from 2010-12-31 to 2011-01-31.
85 tweets from 2011-01-31 to 2011-02-28.
87 tweets from 2011-02-28 to 2011-03-31.
76 tweets from 2011-03-31 to 2011-04-30.
123 tweets from 2011-04-30 to 2011-05-31.
86 tweets from 2011-05-31 to 2011-06-30.
57 tweets from 2011-06-30 to 2011-07-31.
90 tweets from 2011-07-31 to 2011-08-31.
110 tweets from 2011-08-31 to 2011-09-30.
77 tweets from 2011-09-30 to 2011-10-31.
58 tweets from 2011-10-31 to 2011-11-30.
34 tweets from 2011-11-30 to 2011-12-31.
44 tweets from 2011-12-31 to 2012-01-31.
45 tweets from 2012-01-31 to 2012-02-28.
80 tweets from 2012-02-28 to 2012-03-31.
44 tweets from 2012-03-31 to 2012-04-30.
142 tweets from 2012-04-30 to 2012-05-31.
135 tweets from 2012-05-31 to 2012-06-30.
109 tweets from 2012-06-30 to 2012-07-31.
184 tweets from 2012-07-31 to 2012-08-31.
160 tweets from 2012-08-31 to 2012-09-30.
126 tweets from 2012-09-30 to 2012-10-31.
80 tweets from 2012-10-31 to 2012-11-30.
91 tweets from 2012-11-30 to 2012-12-31.
157 twee

In [None]:
df_sfspca = get_tweets('sfspca')

In [None]:
df_sfspca.to_pickle('../data/sfspca_tweets.p')

In [None]:
df_pspca = get_tweets('PSPCA')

In [None]:
df_pspca.to_pickle('../data/pspca_tweets.p')

In [None]:
df_houston = get_tweets('HoustonSPCA')

In [None]:
df_houston.to_pickle('../data/houston_tweets.p')

In [None]:
df_texas = get_tweets('spcaoftexas')

In [None]:
df_texas.to_pickle('../data/texas_tweets.p')

In [None]:
df_tulsa = get_tweets('Tulsa_SPCA')

In [None]:
df_tulsa.to_pickle('../data/tulsa_tweets.p')

In [None]:
df_richmond = get_tweets('RichmondSPCA')

In [None]:
df_richmond.to_pickle('../data/richmond_tweets.p')

In [None]:
df_ontario = get_tweets('OntarioSPCA')

In [None]:
df_ontario.to_pickle('../data/sfspca_tweets.p')

In [None]:
df_alberta = get_tweets('FMSPCA')

In [None]:
df_alberta.to_pickle('../data/alberta_tweets.p')

In [None]:
df_bc = get_tweets('BC_SPCA')

In [None]:
df_bc.to_pickle('../data/bc_tweets')