# rehydratoR - example tibble download

This Jupyter Notebook is an example of how to use the rehydratoR R package to facilitate the replication of Twitter-based research by providing a convenient function to download lists of tweets.

The input for the package is a list of tweet ID numbers. See https://archive.org/details/gaza-tweets for an example.

The output of the package are the tweets downloaded a [tibble](https://CRAN.R-project.org/package=tibble).

This package limits the rate of tweet downloading so Twitter's 90,000 tweet/15 minute limit is not exceeded.

Tweets that have been deleted or made private cannot be downloaded.

A Twitter developer account is required. You can apply for a developer account at https://developer.twitter.com.

Download the Gaza Tweets from archive.org.

In [1]:
cat(system('curl -L -O https://archive.org/download/gaza-tweets/ids.txt.gz', intern=TRUE), sep='\n')




Decompress the gzip file with the tweet ids.

In [2]:
cat(system('gunzip ids.txt.gz', intern=TRUE), sep='\n')




Use the head command to make ids.txt contain only the first 100 tweet ids. Skip this cell to process all of the tweet ids. The original file contains 4058754 tweet ids, downloading all 4058754 tweets can take a long time.

In [3]:
cat(system('mv ids.txt ids-orig.txt; head -n 100 ids-orig.txt > ids.txt', intern=TRUE), sep='\n')




Load the rehydratoR libary. 

In [4]:
library(rehydratoR)

Set the name for the saved tweets and read the tweet ids from ids.txt into a variable.

In [5]:
saved_tweets <- "tweets.Rda"
tweet_ids_file <- "ids.txt"

Load the tweets into a tibble. You will need to create a Twitter app from https://developer.twitter.com/en/apps and copy the api keys below.

In [6]:
# If saved_tweets exist, load the tweets from disk, otherwise download the tweets from Twitter
if(file.exists(saved_tweets)){
  load(saved_tweets)
} else {
  # Get Twitter api keys from https://developer.twitter.com/en/apps 
  consumerKey <- ''
  consumerSecret <- ''
  accessToken <- ''
  accessTokenSecret <- ''

  # Read tweet ids
  tweet_ids <- data.frame(read.table(tweet_ids_file, numerals = 'no.loss'))

  # Download the tweets using rehydratoR
  tweets <- rehydratoR(consumerKey, consumerSecret, accessToken, accessTokenSecret, tweet_ids)

  # Save tweets to disk
  save(tweets, file=saved_tweets)
}

Read the tweet ids into data frame.

In [7]:
show(tweets)

# A tibble: 58 x 90
   user_id status_id created_at          screen_name text  source
   <chr>   <chr>     <dttm>              <chr>       <chr> <chr> 
 1 572615… 99379269… 2018-05-08 10:00:16 Metro_TV    Yaya… Sprou…
 2 188055… 99379308… 2018-05-08 10:01:49 MKhalaifa   #Gaz… Twitt…
 3 322716… 99378903… 2018-05-08 09:45:43 SayakEnver  @bar… Twitt…
 4 297931… 99379065… 2018-05-08 09:52:10 _Bleam      RAW:… Twitt…
 5 231381… 99379147… 2018-05-08 09:55:27 uplink_shi… "💐劇場… Tweet…
 6 473371… 99377146… 2018-05-08 08:35:55 OmarElQatt… "شئٌ…  Twitt…
 7 959549… 99377407… 2018-05-08 08:46:17 UweFelber1  @mim… Twitt…
 8 280082… 99379152… 2018-05-08 09:55:38 giokonta    RAW:… Twitt…
 9 787624… 99378960… 2018-05-08 09:47:59 Meli_Posit… Donc… Twitt…
10 430858… 99377739… 2018-05-08 08:59:29 Fake_Sheikh "“My… Twitt…
# … with 48 more rows, and 84 more variables: display_text_width <dbl>,
#   reply_to_status_id <chr>, reply_to_user_id <chr>,
#   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <l

Remove the downloaded tweet id source files.

In [8]:
cat(system('rm ids.txt ids-orig.txt', intern=TRUE), sep='\n')


