## Hydrating twitter IDs

Twitters Terms of Service do not allow the sharing of the complete collected dataset. Instead, one can only publish derivative/aggregated data or list of (tweet/user) ids. To recreate datasets, these ids can then be used to request the full data via the Twitter API.

(Assuming it hasn't been deleted since.)

The act of turning a list of ids into a list of full data objects is sometimes called "hydrating" and is what this notebook will show you how to do.

This notebook only shows you how to hydrate user id. While the process is easily adapted to tweet ids, I would recommend a more ... complex approach than the one taken here to make hydrating a collection of millions of tweets more feasable. (Using multiple accounts for example.)


In [None]:
# get ids to hydrate

with open("user-ids.txt") as f:
    user_ids = [x.strip() for x in f.readlines()]


In [None]:
print(user_ids[:10])
print("Number of ids:", len(user_ids))

Twitters users/lookup API endpoint returns a maximum of 100 user objects per request, so let's chunk our list into lists of 100 ids each.

In [None]:
id_chunks = [user_ids[i:i+100] for i  in range(0, len(user_ids), 100)]

len(id_chunks)

To connect to the Twitter API, you will need authentication. For details see [the Python Twitter Tools documentation](https://github.com/sixohsix/twitter/tree/master#working-with-oauth).

Once you have your authentication tokens, insert them below to continue.

In [None]:
#### config ####

token = ""
token_secret = ""
consumer_key = ""
consumer_secret = ""




In [None]:
# import python twitter tools

from twitter import *

# get twitter connection object

t = Twitter(auth=OAuth(token, token_secret, consumer_key, consumer_secret))

# check credentials

t.account.verify_credentials()

If that worked we can start downloading the user objects.

In [None]:
user_objects = []

for chunk in id_chunks:
    user_objects.extend(t.users.lookup(user_id=",".join(chunk), _method="POST", retry=True))
    
len(user_objects)

We now have a list of dictionaries built after the XML reply of the twitter API, with keys like "id", "screen_name", etc.

You will notive that we only received a part of the 11189 user objects. (At the time of writing, the code above returns slightly over 9800 user objects.) This happens because the other accounts have since been deleted, either by Twitter moderation or through user choice.

These user objects can now be pickled for later processing in python or saved in some other format at your leisure.

Have fun. :)