# Gather tweets for current legislators
### Big picture:
For each Twitter ID we have (source: https://github.com/unitedstates/congress-legislators), can gather a good chunk of tweets, and save the json to flat files using code adapted from https://gist.github.com/yanofsky/5436496.

## Basic setup

In [1]:
import tweepy

from keys import twitter
import json
import pandas as pd
from tqdm import tqdm

import os.path

In [2]:
data_relative_path = "../data/lincoln/"
tweets_location = data_relative_path + "congress_searched_tweets/"
# log_location = "./logs/stream_congress_log.txt"

## Load data

In [3]:
# GET THE USER IDS
df = pd.read_csv(data_relative_path + "current_social_media.csv", dtype=str)
user_ids = df.twitter_id
good_user_ids = []
blah = []
for uid in user_ids:
    try:
        # THIS IS JUST SO WE IGNORE NANS VALUES
        g = int(uid)
        good_user_ids.append(uid)
    except ValueError:
        pass

## Gather data from a user's Twitter timeline
#### The next two cells are very much adapted from https://gist.github.com/yanofsky/5436496

In [4]:
auth = tweepy.OAuthHandler(twitter["consumer_key"], twitter["consumer_secret"])
auth.set_access_token(twitter["access_token"], twitter["access_token_secret"])
api = tweepy.API(auth)

In [17]:
def get_all_tweets(user_id):
    #initialize a list to hold all the tweepy Tweets
    alltweets = []

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(user_id = user_id,count=200)
    
    #save most recent tweets
    alltweets.extend(new_tweets)
    
    # some users haven't tweeted ever!
    if len(alltweets) != 0:
    
        #save the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        #keep grabbing tweets until there are no tweets left to grab
        while len(new_tweets) > 0:
            print ("getting tweets before %s" % (oldest))

            #all subsiquent requests use the max_id param to prevent duplicates
            new_tweets = api.user_timeline(user_id = user_id,count=200,max_id=oldest)

            #save most recent tweets
            alltweets.extend(new_tweets)

            #update the id of the oldest tweet less one
            oldest = alltweets[-1].id - 1

            print ("...%s tweets downloaded so far" % (len(alltweets)))
        
    with open(tweets_location + user_id + '.json', 'a') as f:
        for tweet in alltweets:
            f.write(json.dumps(tweet._json) + '\n')


#### Does the actually calling of the gathering function
#### Needs to be rerun when we hit a rate limit (but doesn't rerun same IDs)

In [115]:
for guid in tqdm(good_user_ids):
    # if the file exists, we've already done this one
    # this is not the smoothest way to do this, but oh well
    # http://stackoverflow.com/questions/82831/how-do-i-check-whether-a-file-exists-using-python?page=1&tab=votes#tab-top
    if os.path.isfile(tweets_location + guid + '.json') == False: 
        get_all_tweets(guid)


  0%|          | 0/526 [00:00<?, ?it/s][A

getting tweets before 842554842421846015
...400 tweets downloaded so far
getting tweets before 829453799207104511
...600 tweets downloaded so far
getting tweets before 816343380867293183
...604 tweets downloaded so far
getting tweets before 813499617052532735
...604 tweets downloaded so far



 90%|████████▉ | 472/526 [00:04<00:00, 105.11it/s][A

getting tweets before 839900219605204994





...400 tweets downloaded so far
getting tweets before 504291042267791359
...598 tweets downloaded so far
getting tweets before 460776013648367615
...797 tweets downloaded so far
getting tweets before 423134143094870016
...996 tweets downloaded so far
getting tweets before 378206411051655167
...1196 tweets downloaded so far
getting tweets before 326453750166331393
...1362 tweets downloaded so far
getting tweets before 292002090283319295
...1362 tweets downloaded so far
getting tweets before 543415574639165439
...400 tweets downloaded so far
getting tweets before 458953764779466752


           90%|████████▉ | 472/526 [00:17<00:02, 26.63it/s]

...600 tweets downloaded so far
getting tweets before 373896059774201855
...800 tweets downloaded so far
getting tweets before 301471657275564032
...823 tweets downloaded so far
getting tweets before 289019147944345599
...823 tweets downloaded so far


 90%|█████████ | 474/526 [00:19<01:59,  2.30s/it] 

getting tweets before 839920348091006978
...400 tweets downloaded so far
getting tweets before 819306645331312640
...418 tweets downloaded so far
getting tweets before 816347277665140735
...418 tweets downloaded so far


 90%|█████████ | 475/526 [00:22<01:59,  2.33s/it]

getting tweets before 837799762762022911
...400 tweets downloaded so far
getting tweets before 827389061417439232
...600 tweets downloaded so far
getting tweets before 819235318876557311
...638 tweets downloaded so far
getting tweets before 816680663504080896
...638 tweets downloaded so far


 90%|█████████ | 476/526 [00:25<02:10,  2.62s/it]

getting tweets before 846791503112495104
...400 tweets downloaded so far
getting tweets before 839195449995087871
...600 tweets downloaded so far
getting tweets before 826857817583865856
...720 tweets downloaded so far
getting tweets before 816374604885151744
...720 tweets downloaded so far


 91%|█████████ | 477/526 [00:31<02:52,  3.52s/it]

getting tweets before 827524274093178885


 91%|█████████ | 478/526 [00:32<02:20,  2.92s/it]

...326 tweets downloaded so far
getting tweets before 816434214392524799
...326 tweets downloaded so far


 91%|█████████ | 479/526 [00:33<01:46,  2.26s/it]

getting tweets before 816332285591044096
...124 tweets downloaded so far
getting tweets before 843922424542482431
...400 tweets downloaded so far
getting tweets before 831362295259394047
...599 tweets downloaded so far
getting tweets before 822473193864753155
...687 tweets downloaded so far
getting tweets before 817062778028625919
...687 tweets downloaded so far


 91%|█████████▏| 481/526 [00:39<01:45,  2.35s/it]

getting tweets before 816317963469025279
...99 tweets downloaded so far
getting tweets before 817099078702088193


 92%|█████████▏| 482/526 [00:41<01:42,  2.34s/it]

...88 tweets downloaded so far
getting tweets before 819556791596052479


 92%|█████████▏| 483/526 [00:43<01:35,  2.22s/it]

...201 tweets downloaded so far
getting tweets before 819227072937816063
...201 tweets downloaded so far


 92%|█████████▏| 484/526 [00:44<01:18,  1.88s/it]

getting tweets before 816416677541904384
...181 tweets downloaded so far


 92%|█████████▏| 485/526 [00:45<01:05,  1.59s/it]

getting tweets before 829004915821383683
...84 tweets downloaded so far


 92%|█████████▏| 486/526 [00:46<00:54,  1.36s/it]

getting tweets before 820796192992935935
...105 tweets downloaded so far
getting tweets before 836702114340560896
...311 tweets downloaded so far
getting tweets before 816335725092360191
...311 tweets downloaded so far


 93%|█████████▎| 487/526 [00:47<00:56,  1.44s/it]

getting tweets before 821836214257012738
...211 tweets downloaded so far
getting tweets before 818721454212714495
...211 tweets downloaded so far


 93%|█████████▎| 488/526 [00:49<00:54,  1.45s/it]

getting tweets before 840269952502382592
...400 tweets downloaded so far
getting tweets before 826184847341322240
...498 tweets downloaded so far
getting tweets before 816672073871458303
...498 tweets downloaded so far


 93%|█████████▎| 489/526 [00:51<01:05,  1.78s/it]

getting tweets before 819190129667571712
...217 tweets downloaded so far
getting tweets before 816378469047095295


 93%|█████████▎| 490/526 [00:53<01:03,  1.76s/it]

...217 tweets downloaded so far
getting tweets before 822190985522401279


 93%|█████████▎| 491/526 [00:56<01:16,  2.19s/it]

...241 tweets downloaded so far
getting tweets before 816277504751206399
...241 tweets downloaded so far
getting tweets before 817494052354584575
...186 tweets downloaded so far


 94%|█████████▎| 492/526 [00:57<01:02,  1.84s/it]

getting tweets before 819925966038429696
...228 tweets downloaded so far
getting tweets before 816305653471969280


 94%|█████████▎| 493/526 [00:59<00:58,  1.78s/it]

...228 tweets downloaded so far
getting tweets before 840272528446754815
...382 tweets downloaded so far
getting tweets before 816724999151161343


 94%|█████████▍| 494/526 [01:02<01:07,  2.11s/it]

...382 tweets downloaded so far
getting tweets before 834049988288188415
...337 tweets downloaded so far
getting tweets before 816296203243622400
...337 tweets downloaded so far


 94%|█████████▍| 496/526 [01:04<00:47,  1.59s/it]

getting tweets before 816454949257314303
...69 tweets downloaded so far


 95%|█████████▍| 498/526 [01:05<00:28,  1.01s/it]

getting tweets before 844204534381592575
...48 tweets downloaded so far
getting tweets before 828981050705641471


 95%|█████████▍| 499/526 [01:06<00:26,  1.03it/s]

...129 tweets downloaded so far
getting tweets before 836307873152008192
...215 tweets downloaded so far
getting tweets before 827283497228259327


 95%|█████████▌| 500/526 [01:09<00:43,  1.68s/it]

...215 tweets downloaded so far
getting tweets before 849332235559948288
...400 tweets downloaded so far
getting tweets before 844209044097290239
...600 tweets downloaded so far
getting tweets before 837678514488893440
...800 tweets downloaded so far
getting tweets before 833522304734871551
...1000 tweets downloaded so far
getting tweets before 826229788738277379
...1200 tweets downloaded so far
getting tweets before 819008303686676480
...1277 tweets downloaded so far
getting tweets before 816121037154906111
...1277 tweets downloaded so far


 95%|█████████▌| 501/526 [01:16<01:21,  3.28s/it]

getting tweets before 828954638925885440
...260 tweets downloaded so far
getting tweets before 816714397825564672
...260 tweets downloaded so far


 95%|█████████▌| 502/526 [01:18<01:07,  2.81s/it]

getting tweets before 816362470843551743


 96%|█████████▌| 503/526 [01:19<00:51,  2.25s/it]

...157 tweets downloaded so far
getting tweets before 826169502161264639
...243 tweets downloaded so far
getting tweets before 816301938010652672


 96%|█████████▌| 504/526 [01:22<00:56,  2.57s/it]

...243 tweets downloaded so far


 96%|█████████▌| 505/526 [01:24<00:48,  2.32s/it]

getting tweets before 817178608955518975
...146 tweets downloaded so far
getting tweets before 837369473778737151
...326 tweets downloaded so far
getting tweets before 819388164259381247
...326 tweets downloaded so far


 96%|█████████▌| 506/526 [01:26<00:47,  2.39s/it]

getting tweets before 836965701638320130
...367 tweets downloaded so far
getting tweets before 816299921166974976
...367 tweets downloaded so far


 96%|█████████▋| 507/526 [01:28<00:41,  2.16s/it]

getting tweets before 836763599288823808


 97%|█████████▋| 508/526 [01:29<00:34,  1.91s/it]

...237 tweets downloaded so far
getting tweets before 819410838717140996
...237 tweets downloaded so far
getting tweets before 846538811337265151
...400 tweets downloaded so far
getting tweets before 836653416491397119
...600 tweets downloaded so far
getting tweets before 824422858524856320
...694 tweets downloaded so far
getting tweets before 816722082239279103


 97%|█████████▋| 509/526 [01:33<00:40,  2.40s/it]

...694 tweets downloaded so far
getting tweets before 817459756369481728


 97%|█████████▋| 510/526 [01:35<00:37,  2.34s/it]

...114 tweets downloaded so far
getting tweets before 849302818498514943
...400 tweets downloaded so far
getting tweets before 844643093186052107
...600 tweets downloaded so far
getting tweets before 839496122624315391
...800 tweets downloaded so far
getting tweets before 833029082011074559
...999 tweets downloaded so far
getting tweets before 825451823599386624
...1199 tweets downloaded so far
getting tweets before 817461000702197760
...1211 tweets downloaded so far
getting tweets before 817098872531144703


 97%|█████████▋| 511/526 [01:41<00:53,  3.58s/it]

...1211 tweets downloaded so far
getting tweets before 818548732044132355
...213 tweets downloaded so far
getting tweets before 816370141143384063


 97%|█████████▋| 512/526 [01:43<00:42,  3.05s/it]

...213 tweets downloaded so far
getting tweets before 830450704401694720
...273 tweets downloaded so far
getting tweets before 816668131653545984


 98%|█████████▊| 513/526 [01:50<00:53,  4.15s/it]

...273 tweets downloaded so far
getting tweets before 829391087018057727
...268 tweets downloaded so far
getting tweets before 816381373732659205


 98%|█████████▊| 514/526 [02:10<01:45,  8.82s/it]

...268 tweets downloaded so far
getting tweets before 817454285696958463
...398 tweets downloaded so far
getting tweets before 675414820460564479
...597 tweets downloaded so far
getting tweets before 595283508366540799
...797 tweets downloaded so far
getting tweets before 514528039481671679
...997 tweets downloaded so far
getting tweets before 417745093190103040
...1197 tweets downloaded so far
getting tweets before 325340561974374399
...1397 tweets downloaded so far
getting tweets before 236565465831452671
...1597 tweets downloaded so far
getting tweets before 73477783103340544
...1797 tweets downloaded so far
getting tweets before 6272644141
...1858 tweets downloaded so far
getting tweets before 1870428520
...1858 tweets downloaded so far


 98%|█████████▊| 515/526 [02:21<01:46,  9.68s/it]

getting tweets before 844684734420701183
...400 tweets downloaded so far
getting tweets before 822261295311417345
...418 tweets downloaded so far
getting tweets before 819992373354631167


 98%|█████████▊| 516/526 [02:25<01:17,  7.71s/it]

...418 tweets downloaded so far


 98%|█████████▊| 517/526 [02:26<00:51,  5.70s/it]

getting tweets before 816793313810518015
...129 tweets downloaded so far


 98%|█████████▊| 518/526 [02:26<00:32,  4.11s/it]

getting tweets before 818878415927410688
...3 tweets downloaded so far
getting tweets before 817400853674004480
...208 tweets downloaded so far
getting tweets before 816333100405968895


 99%|█████████▊| 519/526 [02:28<00:24,  3.57s/it]

...208 tweets downloaded so far
getting tweets before 840213006571229183


 99%|█████████▉| 520/526 [02:29<00:17,  2.88s/it]

...11 tweets downloaded so far
getting tweets before 839206373388877823
...400 tweets downloaded so far
getting tweets before 820001473723502597
...444 tweets downloaded so far
getting tweets before 816424932993302527
...444 tweets downloaded so far


 99%|█████████▉| 521/526 [02:32<00:14,  2.85s/it]

getting tweets before 827238540937412608
...259 tweets downloaded so far
getting tweets before 818890976236290049
...259 tweets downloaded so far


 99%|█████████▉| 522/526 [02:34<00:10,  2.56s/it]

getting tweets before 853003746241953791
...400 tweets downloaded so far
getting tweets before 849756342206189567
...599 tweets downloaded so far
getting tweets before 847572067369091071
...799 tweets downloaded so far
getting tweets before 845411092398428159
...999 tweets downloaded so far
getting tweets before 844295903594860544
...1198 tweets downloaded so far
getting tweets before 842390817436254209
...1398 tweets downloaded so far
getting tweets before 839963447668400131
...1598 tweets downloaded so far
getting tweets before 837754277594439679
...1798 tweets downloaded so far
getting tweets before 834933729860583424
...1998 tweets downloaded so far
getting tweets before 833371514586042372
...2197 tweets downloaded so far
getting tweets before 829830320874622975
...2397 tweets downloaded so far
getting tweets before 826154007542460415
...2595 tweets downloaded so far
getting tweets before 821124209082384384
...2703 tweets downloaded so far
getting tweets before 816311625540075521
.

 99%|█████████▉| 523/526 [03:13<00:39, 13.30s/it]

getting tweets before 831563849484607487
...317 tweets downloaded so far
getting tweets before 816275664596766719


100%|█████████▉| 524/526 [03:16<00:20, 10.32s/it]

...317 tweets downloaded so far


100%|█████████▉| 525/526 [03:17<00:07,  7.62s/it]

getting tweets before 816816691774529535
...67 tweets downloaded so far


100%|██████████| 526/526 [03:19<00:00,  5.77s/it]

getting tweets before 829803754715283457
...127 tweets downloaded so far





## Miscellaneous Debugging Code

Just checking that the numbers match.

In [5]:
located = 0
for guid in good_user_ids:
    if os.path.isfile(tweets_location + guid + '.json'):
        located += 1
print(located, len(good_user_ids), located == len(good_user_ids))

526 526 True
