

## Twiter Object Documentation

#### Twitter Source Documentation for info on tweet objects:

https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet

https://developer.twitter.com/en/docs/twitter-api/tweets/lookup/api-reference/get-tweets


#### Tweet object structure:
-- Tweet object has long list of root level fields such as id, text, created_at

-- Tweet object is parent to several child objects including user, media, poll, and place

-- Use field paramater "tweet.fields" when requesting root level fields on tweet object

- tweet.fields:attachments, author_id, context_annotations, conversation_id, created_at, 
entities, geo, id, in_reply_to_user_id, lang,non_public_metrics, public_metrics, organic_metrics, 
promoted_metrics, possibly_sensitive, referenced_tweets, source, text, withheld


-- Child object paramaters (query paramaters):

- media.fields:  duration_ms, height, media_key, preview_image_url, type, url, width, public_metrics, 
non_public_metrics, organic_metrics, promoted_metrics

- place.fields:  contained_within, country, country_code, full_name, geo, id, name, place_type

- poll.fields:  duration_minutes, end_datetime, id, options, voting_status

- user.fields:  created_at, description, entities, id, location, name, 
pinned_tweet_id, profile_image_url, protected, public_metrics, url, username, verified, withheld

--Other query paramaters:

- expansions:  attachments.poll_ids, attachments.media_keys, author_id, entities.mentions.username, 
geo.place_id, in_reply_to_user_id, referenced_tweets.id, referenced_tweets.id.author_id



In [40]:
## Import dependencies

# Utilities
import requests
import json
import pandas as pd
import time

# Keys
from config import bearer_token


In [41]:
##  Set authorization for Twitter     

headers = {"Authorization": "Bearer {}".format(bearer_token)}

In [71]:
## Set URL and query paramaters; create response object

# Set query keywords
query = "covid -is:retweet lang:en"

#Set place fields to return place location if available 
place_fields = "place.fields=contained_within,country,country_code,geo"

# Set tweet fields to return
tweet_fields = "tweet.fields=author_id,lang,public_metrics,created_at"

# Set max results per page
max_results = 'max_results=100'

# Set initial value of next token 
next = ''

# Set URL
url = "https://api.twitter.com/2/tweets/search/recent?query={}&{}&{}&{}".format(query, place_fields, tweet_fields,
                                                                                max_results, next)

# Create response object
response = requests.request("GET", url, headers=headers)


In [73]:
len(response.json()['data'])

99

In [74]:
## Create while loop to paginate through JSON responses so long as next_token in return in metadata at end of file

# NOTE:  Replace counter fucntion with "While True" format

counter = 1

# Create list to store responses
json_output =[]

# Change while loop back to 'while True:' once output formatting is determined

while True:
    
    # Submit twitter request
    response = requests.request("GET", url, headers=headers)
    
    # Check and print status / first text message of json response to confirm unique pulls
    print(response.status_code)
    print(response.json()['data'][0]['text'])
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    
    # Save JSON response to json_output file
    json_output.append(response.json())
    
    # Check to see if 'next_token' in JSON response metatdata; if so, reset 'next' to iterate; if not, break
    if 'next_token' not in response.json()['meta']:
        break
    next = f"next_token={response.json()['meta']['next_token']}"
    print(next)
    print(f"Request: {counter}")
    print(f"Tweets pulled: {counter*100}")
    print('-------------------------------------------------')
    
    # Pause iteration for 20 seconds to allow unique next_token to generate and json to load
    time.sleep(20)
    
    # Iterate counter
    counter += 1

    
print("")
print("-----------------")
print("API Query Results")
print("-----------------")
print(f"Pages:            {len(json_output)}")
print(f"Tweets per Page:  {len(json_output[0]['data'])}")



200
@realDonaldTrump You’re off your rocker, Donnie.  Either you’re the ultimate sore loser or you’ve dipped into your Covid-infected son’s cokebag.
next_token=b26v89c19zqg8o3fosbvxqzrf4rjxtqi8g5vzu367imbh
Tweets pulled: 100
-------------------------------------------------
200
As COVID test sites reach capacity, Washington state leaders urge restraint and judgment https://t.co/yOusQK0JuK
next_token=b26v89c19zqg8o3fosbvxqzrf4rtml8jp16r8nk94b4al
Tweets pulled: 200
-------------------------------------------------


KeyboardInterrupt: 

In [70]:
json_output[0]

{'data': [{'lang': 'en',
   'text': 'I’m kinda possessive too.. I feel like covid wiped your girl clean like a white board like I forgot all kinds of important ish. School, work, passwords.... wait no I always forget passwords.. but work and school... 🤥',
   'created_at': '2020-11-20T04:12:26.000Z',
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'author_id': '236094999',
   'id': '1329638710770589697'},
  {'lang': 'en',
   'text': "@heyrabeckkkkk I am now.  I'm tired too.  I got a headwind before the election &amp; now I have to take a breather.  My closest coworker today bet me (again) that elections can't be corrupted, overturned, &amp; I'm a conspiracy theorist on COVID. I'll wait....ugh",
   'created_at': '2020-11-20T04:12:26.000Z',
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'author_id': '352458151',
   'id': '1329638710305042432'},
  {'lang': 'en',
  

In [54]:
response.json()['data'][0]['text']

'@philothea13 @TPCarney How’s that working out for you?\nhttps://t.co/HI8asSyONV'

In [55]:
# Review first item in json_output
json_output[0]['data'][0]['text']

'I’m kinda possessive too.. I feel like covid wiped your girl clean like a white board like I forgot all kinds of important ish. School, work, passwords.... wait no I always forget passwords.. but work and school... 🤥'

In [56]:
# Store raw data file 
unformatted_df = pd.DataFrame(json_output)
unformatted_df.to_csv('../Twitter_Data/unformatted_data.csv', index=False)
unformatted_df


Unnamed: 0,data,meta
0,"[{'lang': 'en', 'text': 'I’m kinda possessive ...","{'newest_id': '1329638710770589697', 'oldest_i..."
1,"[{'lang': 'en', 'id': '1329638795470278657', '...","{'newest_id': '1329638795470278657', 'oldest_i..."
2,"[{'author_id': '283604227', 'public_metrics': ...","{'newest_id': '1329638878001684481', 'oldest_i..."
3,[{'text': '@CP24 If we have the most strict lo...,"{'newest_id': '1329638962546307073', 'oldest_i..."
4,"[{'lang': 'en', 'text': 'Dr. Fauci says vaccin...","{'newest_id': '1329639050660114432', 'oldest_i..."
...,...,...
1001,"[{'lang': 'en', 'text': 'Covid gadget alert: T...","{'newest_id': '1329852356369010691', 'oldest_i..."
1002,"[{'author_id': '779890695695896576', 'public_m...","{'newest_id': '1329852440401866756', 'oldest_i..."
1003,"[{'lang': 'en', 'text': '@FDRLST It's all abou...","{'newest_id': '1329852528276754436', 'oldest_i..."
1004,"[{'public_metrics': {'retweet_count': 0, 'repl...","{'newest_id': '1329852608924815360', 'oldest_i..."


In [57]:
# Loop through json_output to grab tweet data; top line only iterates through data block

# Dictionary to store output
tweet_data = {}
tweet_data['ID'] = []
tweet_data['Date'] = []
tweet_data['Retweets'] = []
tweet_data['Replies'] = []
tweet_data['Likes'] = []
tweet_data['Quotes'] = []
tweet_data['Tweet'] = []


for w in range(0, len(json_output)):
    
    for x in range(0, len(json_output[w]['data'])):
        
        # Selet and print data
        id = json_output[w]['data'][x]['id']
        text = json_output[w]['data'][x]['text']
        date = json_output[w]['data'][x]['created_at']

        # Append selected data to dictionary, print output
        tweet_data['ID'].append(id)
        tweet_data['Date'].append(date)
        tweet_data['Tweet'].append(text)
        
#        print(text)
#        print(date)       
        
        for y in range(0, len(json_output[w]['data'][x])):
            
            # Selet data
            retweet_count = json_output[w]['data'][x]['public_metrics']['retweet_count']
            reply_count = json_output[w]['data'][x]['public_metrics']['reply_count']
            like_count = json_output[w]['data'][x]['public_metrics']['like_count']
            quote_count = json_output[w]['data'][x]['public_metrics']['quote_count']
        
        # Append selected data to dictionary, print output
        tweet_data['Retweets'].append(retweet_count)
        tweet_data['Replies'].append(reply_count)
        tweet_data['Likes'].append(like_count)
        tweet_data['Quotes'].append(quote_count)
        
#        print(retweet_count, reply_count, like_count, quote_count)
#        print('-------------------------------------------------')


In [60]:
# Review organized data
df = pd.DataFrame(tweet_data)
df.tail()

Unnamed: 0,ID,Date,Retweets,Replies,Likes,Quotes,Tweet
10047,1329852691141443584,2020-11-20T18:22:43.000Z,0,0,0,0,"According to the latest COVID-19 data, 250,000..."
10048,1329852691032510466,2020-11-20T18:22:43.000Z,0,0,0,0,I work answering an RN COVID hotline and can v...
10049,1329852690923479042,2020-11-20T18:22:43.000Z,0,0,0,0,The Broncos announced that Sunday will be thei...
10050,1329852689539362817,2020-11-20T18:22:43.000Z,0,0,0,0,"@Stats_Burger @numbers_truth ""die from COVID-1..."
10051,1329852689493049344,2020-11-20T18:22:43.000Z,0,0,0,0,Lmao well why would you? Stupid Americans have...


In [61]:
# Filter by mask
mask_df = df[df['Tweet'].str.contains('mask|face covering')]
mask_df

Unnamed: 0,ID,Date,Retweets,Replies,Likes,Quotes,Tweet
36,1329638955268993025,2020-11-20T04:13:25.000Z,0,0,0,0,@DarciSings @Zigmanfreud @GavinNewsom He’s a c...
55,1329639128007249921,2020-11-20T04:14:06.000Z,0,0,0,0,Top WHO official in Europe says face masks cou...
67,1329639209666301952,2020-11-20T04:14:25.000Z,0,0,0,0,"For the love of God, everyone: Wear your masks..."
80,1329639385864704000,2020-11-20T04:15:07.000Z,0,0,0,0,@JoeBiden If masks stop the spread of COVID it...
84,1329639383260082176,2020-11-20T04:15:07.000Z,0,0,0,0,"#FridayThoughts BEEN MASKING FOR MONTHS, Yet s..."
...,...,...,...,...,...,...,...
9982,1329852187934154753,2020-11-20T18:20:43.000Z,0,0,0,0,How many Republican politicians vs Democrat po...
10002,1329852356369010691,2020-11-20T18:21:23.000Z,0,0,0,0,Covid gadget alert: Think your trusty mask nee...
10005,1329852354334765061,2020-11-20T18:21:23.000Z,0,0,0,0,@UHOHitsMark @drmkry @aVoice4MA6 @Don_Fabbri T...
10027,1329852525831327746,2020-11-20T18:22:04.000Z,0,0,0,0,@GatoPolitico5 @tvanhorbeck @cnnbrk I AM SO BE...


In [62]:
# Filter by vaccine
vaccine_df = df[df['Tweet'].str.contains('vaccine')]
vaccine_df

Unnamed: 0,ID,Date,Retweets,Replies,Likes,Quotes,Tweet
14,1329638788507856897,2020-11-20T04:12:45.000Z,0,0,0,0,Coronavirus | Oxford University COVID-19 vacci...
22,1329638876877516800,2020-11-20T04:13:06.000Z,0,0,0,0,Serum Institute of India’s CEO Adar Poonawalla...
41,1329639050387496960,2020-11-20T04:13:47.000Z,0,0,0,0,@berrybram @channel5_tv Some things come back ...
49,1329639039348170752,2020-11-20T04:13:45.000Z,0,0,0,0,@NicolaSturgeon\n@ScotGovFM\n@theSNP\nWATCH fi...
69,1329639205337788416,2020-11-20T04:14:24.000Z,0,0,0,0,Covid vaccine to send flight costs skyrocketin...
...,...,...,...,...,...,...,...
9871,1329851166335721473,2020-11-20T18:16:40.000Z,0,0,0,0,@IndianExpress Check out this blog to know wha...
9902,1329851508821733378,2020-11-20T18:18:01.000Z,0,0,0,0,Geraldo Rivera suggests naming COVID-19 vaccin...
9962,1329852017112543232,2020-11-20T18:20:02.000Z,0,0,0,0,"Covid vaccine How Indian airlines, airports ar..."
9989,1329852182623969281,2020-11-20T18:20:42.000Z,0,0,0,0,@PSformaldehyde @EdwardNorton @neal_katyal @Pr...


In [63]:
# Filter by lockdown
lockdown_df = df[df['Tweet'].str.contains('lockdown|quarantine|stay at home')]
lockdown_df

Unnamed: 0,ID,Date,Retweets,Replies,Likes,Quotes,Tweet
16,1329638784229646336,2020-11-20T04:12:44.000Z,0,0,0,0,@crankyoldman13 @ladalavara @dallasrbaird This...
30,1329638962546307073,2020-11-20T04:13:26.000Z,0,0,0,0,@CP24 If we have the most strict lockdown at t...
55,1329639128007249921,2020-11-20T04:14:06.000Z,0,0,0,0,Top WHO official in Europe says face masks cou...
84,1329639383260082176,2020-11-20T04:15:07.000Z,0,0,0,0,"#FridayThoughts BEEN MASKING FOR MONTHS, Yet s..."
88,1329639380726673409,2020-11-20T04:15:06.000Z,0,0,0,0,@DebiePritchard @GavinNewsom It must physicall...
...,...,...,...,...,...,...,...
9868,1329851166813941760,2020-11-20T18:16:40.000Z,0,0,0,0,France 'still far' from lifting Covid-19 lockd...
9958,1329851927157469188,2020-11-20T18:19:41.000Z,0,0,0,0,Where can I go on holiday after lockdown? #ukl...
9963,1329852016852611074,2020-11-20T18:20:02.000Z,0,0,0,0,Newark officials say residents need to “stay a...
9976,1329852101577625600,2020-11-20T18:20:23.000Z,0,0,0,0,@sc_wadsy Boris J seems to be determined to de...


In [64]:
# Filter by social distancing
distance_df = df[df['Tweet'].str.contains('social distance|social distancing')]
distance_df

Unnamed: 0,ID,Date,Retweets,Replies,Likes,Quotes,Tweet
88,1329639380726673409,2020-11-20T04:15:06.000Z,0,0,0,0,@DebiePritchard @GavinNewsom It must physicall...
188,1329640226898132997,2020-11-20T04:18:28.000Z,0,0,0,0,@1997Aggies @SylvesterL_ @kayleighmcenany @Gov...
329,1329641413592698886,2020-11-20T04:23:11.000Z,0,0,1,0,Did someone say...a panda-emic dining experien...
353,1329641673656242189,2020-11-20T04:24:13.000Z,0,0,0,0,"Wear a mask, social distance, and be mindful. ..."
980,1329647016289337345,2020-11-20T04:45:26.000Z,0,0,0,0,I hope you selfish motherfuckers who don’t tak...
1193,1329648796033196035,2020-11-20T04:52:31.000Z,0,0,0,0,Second wave of covid has hit delhi very badly ...
1547,1329754148489875457,2020-11-20T11:51:09.000Z,0,0,0,0,@redwoodricker @katnandu @MaddowBlog @KamalaHa...
1737,1329782113198071808,2020-11-20T13:42:16.000Z,0,0,0,0,@Lilacarn_ @rob_miller12345 According to COVID...
1950,1329783982855172099,2020-11-20T13:49:42.000Z,0,0,0,0,@MattGertz Why not call it the anti-Trump vacc...
2032,1329784662512848901,2020-11-20T13:52:24.000Z,0,0,0,0,pls stay safe wear mask and keep social distan...


In [65]:
# Push dataframes to CSV

df.to_csv('../Twitter_Data/tweet_data1.csv', index=False)
mask_df.to_csv('../Twitter_Data/mask_data1.csv', index=False)
vaccine_df.to_csv('../Twitter_Data/vaccine_data1.csv', index=False)
lockdown_df.to_csv('../Twitter_Data/lockdown_data1.csv', index=False)
distance_df.to_csv('../Twitter_Data/distance_data1.csv', index=False)