# Introduction

This notebook follows standard Twython procedure for data collection. In this specific instance, we are querying Twitter for all historic tweets from one specific user (my sibling) who has consented to offering their tweets as practice NLP data.

## Imports

In [1]:
import json
from twython import Twython
import time

## Import API Keys From `secret.json`

In [2]:
with open('secret.json') as f:
    keys = json.load(f)

## Instantiate `Twython` Object

In [3]:
t = Twython(app_key = keys['TWITTER_APP_KEY'], 
            app_secret = keys['TWITTER_APP_KEY_SECRET'], 
            oauth_token = keys['TWITTER_ACCESS_TOKEN'], 
            oauth_token_secret = keys['TWITTER_ACCESS_TOKEN_SECRET'])

# Retrieve Data

At time of writing, my sibling has **5,207 tweets.** We will retrieve all of these in time-staggered batches of 150 tweets each (due to limitations on free Twitter dev accounts). An unknown number of these will be dropped because some are surely retweets, and we don't want those.

In [None]:
# Our first batch has no last_id, and pulls tweets up to his most recent
last_id = None

# 35 batches of tweet retrieval
for i in range(35):
    
    # Query for his tweets, no retweets
    search = t.search(q = 'from:taxpurposes -filter:retweets',
                      max_id = last_id,
                      tweet_mode = 'extended',
                      count = 150)
    
    # Save tweets locally as JSON
    with open('tweets.json') as f:
        data = json.load(f)
    data.update(search)
    with open('tweets.json', 'w') as f:
        json.dump(data, f)
        
    # Retrieve last tweet ID, next search stops here
    last_id = search['statuses'][-1]['id']
    
    # Wait, so as to appease the Twitter Dev Gods
    time.sleep(310)

# JSON TEST

In [51]:
## This works

# with open('tweets.json') as f:
#      data = json.load(f)

# data.update(c_dict)

# with open('tweets.json', 'w') as f:
#     json.dump(data, f)