JavaScript Python HTML
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Twitter reply-to-retweet ratio scraping code

This is the scraping and front-end code used to acquire and visualize the data discussed in A Quick Look at the Reply-to-Retweet Ratio.


Requirements: Python 3.6+ (f-strings!)

$ git clone
$ cd tweetratio
$ python3 -m virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ mkdir -p raw minified csv  # for output


To download realDonaldTrump's last 3200 tweets as json, and add a reply_count field to each tweet, do

>>> import tweetratio
>>> tweetratio.get_user('realDonaldTrump')

This code has to scrape as well as make API calls, so it will take 30-60 minutes, depending on the speed of your internet connection.

The tweets can then be found in raw/realDonaldTrump.json.

If you want a minified copy of the tweets, which contains only the keys necessary for the visualization, and the same data as a CSV file, do

>>> import analysis
>>> analysis.process('realDonaldTrump')

The minified JSON is saved to minified/realDonaldTrump.json. The CSV is saved to csv/realDonaldTrump.csv.


To run the visualization locally, download and minify the data for realDonaldTrump, BernieSanders, BarackObama, HillaryClinton, GovMikeHuckabee, dril and SpeakerRyan (see above). If you'd like to plot other accounts, download those and change web/app.js.


$ mv minified/* web/data/
$ cd web
$ python3 -m http.server

and visit localhost:8000

Analysis contains simple code to load the tweets as a pandas DataFrame. For example:

>>> import analysis
>>> tweets = analysis.load_df()
>>> analysis.plot_trend(tweets)

U.S. Senators demonstrates how to download the tweets for a list of users (e.g. the U.S. senators as of June 2017).