This is the scraping and front-end code used to acquire and visualize the data discussed in A Quick Look at the Reply-to-Retweet Ratio.
Requirements: Python 3.6+ (f-strings!)
$ git clone git@github.com:fastforwardlabs/tweetratio.git
$ cd tweetratio
$ python3 -m virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ mkdir -p raw minified csv # for outputTo download realDonaldTrump's last 3200 tweets as json, and add a
reply_count field to each tweet, do
>>> import tweetratio
>>> tweetratio.get_user('realDonaldTrump')This code has to scrape as well as make API calls, so it will take 30-60 minutes, depending on the speed of your internet connection.
The tweets can then be found in raw/realDonaldTrump.json.
If you want a minified copy of the tweets, which contains only the keys necessary for the visualization, and the same data as a CSV file, do
>>> import analysis
>>> analysis.process('realDonaldTrump')The minified JSON is saved to minified/realDonaldTrump.json. The CSV is saved
to csv/realDonaldTrump.csv.
To run the
visualization
locally, download and minify the data for realDonaldTrump, BernieSanders,
BarackObama, HillaryClinton, GovMikeHuckabee, dril and SpeakerRyan
(see above). If you'd like to plot other accounts, download those and change
web/app.js.
Then
$ mv minified/* web/data/
$ cd web
$ python3 -m http.serverand visit localhost:8000
analysis.py contains simple code to load the tweets as a pandas DataFrame.
For example:
>>> import analysis
>>> tweets = analysis.load_df()
>>> analysis.plot_trend(tweets)batch_download.py demonstrates how to download the
tweets for a list of users (e.g. the U.S. senators as of June 2017).
