Skip to content
This repository has been archived by the owner. It is now read-only.
Archive and analyze results from a Twitter search (**no longer maintained**)
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
analysis
.gitignore
AGBT14.txt
ASHG2013.csv
ASHG2013.txt
GI2013.csv
GI2013.txt
LICENSE
PAGXXII.txt
README.md
SFAF2013.csv
SFAF2013.txt
altbioinf.csv
altbioinf.txt
bioinformatics.csv
bioinformatics.txt
bog13.csv
bog13.txt
bog14.csv
bog14.txt
bosc2014.txt
cville.csv
cville.txt
genomics.csv
genomics.txt
hitseq14.txt
ismb.txt
ismb14.txt
ismb2014.txt
ismbeccb-2013.csv
ismbeccb-2013.txt
metagenomics.csv
metagenomics.txt
rna-seq.csv
rna-seq.txt
rstats.csv
rstats.txt
twitterchive.sh

README.md

twitterchive - Archive Twitter search results

This repository is no longer being maintained.

Blog post about this at http://gettinggeneticsdone.blogspot.com/2013/05/automated-analysis-tweets-bioinformatics-twitterchive.html.

twitterchive.sh

twitterchive.sh: Script to search and save results from a Twitter search.

Script uses sferki's t command line client to search twitter for keywords stored in the arr variable inside the script.

Must first install the t gem and authenticate with OAuth (see the t readme).

Twitter enforces some API limits to how many tweets you can search for in one query, and how many queries you can execute in a given period.

I'm not sure what these limitations are, but I've hit them a few times. To be safe, I would limit the number of queries to ~5, $n to ~200, and run no more than a couple times per hour.

You can set this up in a cron job using something like:

# Run at the top of the hour every four hours. 
00 00,04,08,12,16,20 * * * export PATH=/usr/local/bin:$PATH && cd /path/to/twitterchive && ./twitterchive.sh > /home/user/logs/cronlog.txt 2>&1

analysis/twitterchive.r

analysis/twitterchive.r: R stats script that contains a function to read in and parse the fixed width text files above, and produce some plots:

  • Number of tweets per day for the last n days
  • Frequency of tweets by hour of the day
  • Barplot of the most frequently used hashtags within a query
  • Barplot of the most prolific tweeters
  • The ubiquitous wordcloud
You can’t perform that action at this time.