# rehydratoR - example JSON file download

This Jupyter Notebook is an example of how to use the rehydratoR R package to facilitate the replication of Twitter-based research by providing a convenient function to download lists of tweets.

The input for the package is a list of tweet ID numbers. See https://archive.org/details/gaza-tweets for an example.

The output of the package are the tweets downloaded as JSON files.

This package limits the rate of tweet downloading so Twitter's 90,000 tweet/15 minute limit is not exceeded. If you choose to download the tweets to JSON files, then a new JSON file will be created for every 90,000 tweet ID numbers.

Tweets that have been deleted or made private cannot be downloaded.

A Twitter developer account is required. You can apply for a developer account at https://developer.twitter.com.

Download the Gaza Tweets from archive.org.

In [1]:
cat(system('curl -L -O https://archive.org/download/gaza-tweets/ids.txt.gz', intern=TRUE), sep='\n')




Decompress the gzip file with the tweet ids.

In [2]:
cat(system('gunzip ids.txt.gz', intern=TRUE), sep='\n')




Use the head command to make ids.txt contain only the first 100 tweet ids. Skip this cell to process all of the tweet ids. The original file contains 4058754 tweet ids, downloading all 4058754 tweets can take a long time.

In [3]:
cat(system('mv ids.txt ids-orig.txt; head -n 100 ids-orig.txt > ids.txt', intern=TRUE), sep='\n')




Load the required libaries. 

In [4]:
library(rehydratoR)

Read the tweet ids from ids.txt into a variable.

In [5]:
tweet_ids_file <- "ids.txt" 

You will need to create a Twitter app from https://developer.twitter.com/en/apps and copy the api keys below.

In [6]:
consumerKey <- ''
consumerSecret <- ''
accessToken <- ''
accessTokenSecret <- ''

Read the tweet ids into data frame.

In [7]:
tweet_ids <- data.frame(read.table(tweet_ids_file, numerals = 'no.loss'))

Run the rehydrator to download all of the tweets to example_001.json.

In [8]:
# Download tweets
rehydratoR(consumerKey, consumerSecret, accessToken, accessTokenSecret, tweet_ids, 'example')

2020-11-20 15:44:29 - Total number of Tweet ids: 100
2020-11-20 15:44:29 - Splitting tweets into 1 groups
 Estimated download time is 0.25 hours

2020-11-20 15:44:29 - Start group: 1
Registered S3 method overwritten by 'openssl':
  method      from
  print.bytes Rcpp
2020-11-20 15:44:29 - Total tweets downloaded: 58


Use the head command to display the first 10 tweets in example_001.json.

In [9]:
cat(system('head -n 10 example_001.json', intern=TRUE), sep='\n')

[{"user_id":"57261519","status_id":"993792695016591361","created_at":"2018-05-08 10:00:16","screen_name":"Metro_TV","text":"Yayasan Media Group bersama Mer-C membuka kesempatan untuk warga Indonesia membantu pembangunan tahap kedua Rumah Sakit Indonesia di Gaza #BerbagiUntukPalestina https://t.co/yBLs8tk2K3","source":"Sprout Social","display_text_width":160,"is_quote":false,"is_retweet":false,"favorite_count":3,"retweet_count":4,"hashtags":["BerbagiUntukPalestina"],"symbols":[null],"urls_url":[null],"urls_t.co":[null],"urls_expanded_url":[null],"media_url":["http://pbs.twimg.com/ext_tw_video_thumb/993792631854518272/pu/img/Bj7pVg773txed_a0.jpg"],"media_t.co":["https://t.co/yBLs8tk2K3"],"media_expanded_url":["https://twitter.com/Metro_TV/status/993792695016591361/video/1"],"media_type":["photo"],"ext_media_url":["http://pbs.twimg.com/ext_tw_video_thumb/993792631854518272/pu/img/Bj7pVg773txed_a0.jpg"],"ext_media_t.co":["https://t.co/yBLs8tk2K3"],"ext_media_expanded_url":["https://twitter

Remove the downloaded tweet id source files.

In [10]:
cat(system('rm ids.txt ids-orig.txt', intern=TRUE), sep='\n')


