Skip to content
This repository has been archived by the owner on Dec 9, 2017. It is now read-only.

Commit

Permalink
repo no longer maintained
Browse files Browse the repository at this point in the history
  • Loading branch information
stephenturner committed Mar 25, 2015
0 parents commit c5faeec
Show file tree
Hide file tree
Showing 110 changed files with 826,928 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
@@ -0,0 +1,4 @@
.DS_Store*
._*
.Trashes
.Rhistory
4,282 changes: 4,282 additions & 0 deletions AGBT14.txt

Large diffs are not rendered by default.

4,304 changes: 4,304 additions & 0 deletions ASHG2013.csv

Large diffs are not rendered by default.

4,303 changes: 4,303 additions & 0 deletions ASHG2013.txt

Large diffs are not rendered by default.

2,114 changes: 2,114 additions & 0 deletions GI2013.csv

Large diffs are not rendered by default.

2,113 changes: 2,113 additions & 0 deletions GI2013.txt

Large diffs are not rendered by default.

343 changes: 343 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

2,282 changes: 2,282 additions & 0 deletions PAGXXII.txt

Large diffs are not rendered by default.

34 changes: 34 additions & 0 deletions README.md
@@ -0,0 +1,34 @@
# twitterchive - Archive Twitter search results

**_This repository is no longer being maintained._**

Blog post about this at <http://gettinggeneticsdone.blogspot.com/2013/05/automated-analysis-tweets-bioinformatics-twitterchive.html>.

## `twitterchive.sh`

[`twitterchive.sh`](twitterchive.sh): Script to search and save results from a Twitter search.

Script uses [sferki's `t` command line client](https://github.com/sferik/t) to search twitter for keywords stored in the arr variable inside the script.

Must first install the `t` gem and authenticate with OAuth (see the `t` readme).

Twitter enforces some API limits to how many tweets you can search for in one query, and how many queries you can execute in a given period.

I'm not sure what these limitations are, but I've hit them a few times. To be safe, I would limit the number of queries to ~5, `$n` to ~200, and run no more than a couple times per hour.

You can set this up in a cron job using something like:

```
# Run at the top of the hour every four hours.
00 00,04,08,12,16,20 * * * export PATH=/usr/local/bin:$PATH && cd /path/to/twitterchive && ./twitterchive.sh > /home/user/logs/cronlog.txt 2>&1
```

## `analysis/twitterchive.r`

[`analysis/twitterchive.r`](analysis/twitterchive.r): R stats script that contains a function to read in and parse the fixed width text files above, and produce some plots:

* Number of tweets per day for the last *n* days
* Frequency of tweets by hour of the day
* Barplot of the most frequently used hashtags within a query
* Barplot of the most prolific tweeters
* The ubiquitous wordcloud
847 changes: 847 additions & 0 deletions SFAF2013.csv

Large diffs are not rendered by default.

846 changes: 846 additions & 0 deletions SFAF2013.txt

Large diffs are not rendered by default.

499 changes: 499 additions & 0 deletions altbioinf.csv

Large diffs are not rendered by default.

498 changes: 498 additions & 0 deletions altbioinf.txt

Large diffs are not rendered by default.

Binary file added analysis/ASHG2013--barplot-top-hashtags.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/ASHG2013--barplot-top-users.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/ASHG2013--barplot-tweets-by-date.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/ASHG2013--barplot-tweets-by-hour.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/ASHG2013--wordcloud.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/GI2013--barplot-top-hashtags.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/GI2013--barplot-top-users.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/GI2013--barplot-tweets-by-date.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/GI2013--barplot-tweets-by-hour.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/GI2013--montage.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/GI2013--wordcloud.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/SFAF2013--barplot-top-hashtags.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/SFAF2013--barplot-top-users.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/SFAF2013--barplot-tweets-by-date.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/SFAF2013--barplot-tweets-by-hour.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/SFAF2013--wordcloud.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/altbioinf--barplot-top-hashtags.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/altbioinf--barplot-top-users.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/altbioinf--barplot-tweets-by-date.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/altbioinf--barplot-tweets-by-hour.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/altbioinf--wordcloud.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/bioinformatics--barplot-top-hashtags.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/bioinformatics--barplot-top-users.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added analysis/bioinformatics--wordcloud.png
Binary file added analysis/bog13--barplot-top-hashtags.png
Binary file added analysis/bog13--barplot-top-users.png
Binary file added analysis/bog13--barplot-tweets-by-date.png
Binary file added analysis/bog13--barplot-tweets-by-hour.png
Binary file added analysis/bog13--wordcloud.png
Binary file added analysis/bog14--barplot-top-hashtags.png
Binary file added analysis/bog14--barplot-top-users.png
Binary file added analysis/bog14--barplot-tweets-by-date.png
Binary file added analysis/bog14--barplot-tweets-by-hour.png
Binary file added analysis/bog14--montage.jpg
Binary file added analysis/bog14--wordcloud.png
Binary file added analysis/cville--barplot-top-hashtags.png
Binary file added analysis/cville--barplot-top-users.png
Binary file added analysis/cville--barplot-tweets-by-date.png
Binary file added analysis/cville--barplot-tweets-by-hour.png
Binary file added analysis/cville--wordcloud.png
Binary file added analysis/genomics--barplot-top-hashtags.png
Binary file added analysis/genomics--barplot-top-users.png
Binary file added analysis/genomics--barplot-tweets-by-date.png
Binary file added analysis/genomics--barplot-tweets-by-hour.png
Binary file added analysis/genomics--wordcloud.png
Binary file added analysis/ismbeccb--barplot-top-hashtags.png
Binary file added analysis/ismbeccb--barplot-top-users.png
Binary file added analysis/ismbeccb--barplot-tweets-by-date.png
Binary file added analysis/ismbeccb--barplot-tweets-by-hour.png
Binary file added analysis/ismbeccb--wordcloud.png
Binary file added analysis/ismbeccb-2013--barplot-top-users.png
Binary file added analysis/ismbeccb-2013--wordcloud.png
Binary file added analysis/metagenomics--barplot-top-hashtags.png
Binary file added analysis/metagenomics--barplot-top-users.png
Binary file added analysis/metagenomics--barplot-tweets-by-date.png
Binary file added analysis/metagenomics--barplot-tweets-by-hour.png
Binary file added analysis/metagenomics--wordcloud.png
Binary file added analysis/rna-seq--barplot-top-hashtags.png
Binary file added analysis/rna-seq--barplot-top-users.png
Binary file added analysis/rna-seq--barplot-tweets-by-date.png
Binary file added analysis/rna-seq--barplot-tweets-by-hour.png
Binary file added analysis/rna-seq--wordcloud.png
Binary file added analysis/rstats--barplot-top-hashtags.png
Binary file added analysis/rstats--barplot-top-users.png
Binary file added analysis/rstats--barplot-tweets-by-date.png
Binary file added analysis/rstats--barplot-tweets-by-hour.png
Binary file added analysis/rstats--wordcloud.png
119 changes: 119 additions & 0 deletions analysis/twitterchive.r
@@ -0,0 +1,119 @@
## Most of this code was adapted near-verbatim from Neil's post about ISMB 2012.
## http://nsaunders.wordpress.com/2012/08/16/twitter-coverage-of-the-ismb-2012-meeting-some-statistics/

## Modify this. This is where I keep this repo.
repoDir <- ("~/workprojects/twitterchive/")

## Go to the analysis directory
setwd(paste(repoDir, "analysis", sep=""))

## Function needs better documentation
twitterchivePlots <- function (filename=NULL) {

## Load required packages
require(tm)
require(wordcloud)
require(RColorBrewer)

if (class(filename)!="character") stop("filename must be character")
if (!file.exists(filename)) stop(paste("File does not exist:", filename))

searchTerm <- sub("\\.txt", "", basename(filename))

message(paste("Filename:", filename))
message(paste("Search Term: ", searchTerm))

## Read in the data and munge around the dates.
## I can't promise the fixed widths will always work out for you.
message("Reading in data.")
trim.whitespace <- function(x) gsub("^\\s+|\\s+$", "", x) # Function to trim leading and trailing whitespace from character vectors.
d <- read.fwf(filename, widths=c(18, 14, 18, 1000), stringsAsFactors=FALSE, comment.char="")
d <- as.data.frame(sapply(d, trim.whitespace), stringsAsFactors=FALSE)
names(d) <- c("id", "datetime", "user", "text")
d$user <- sub("@", "", d$user)
d$datetime <- as.POSIXlt(d$datetime, format="%b %d %H:%M")
d$date <- as.Date(d$datetime)
d$hour <- d$datetime$hour
d <- na.omit(d) # CRs cause a problem. explain this later.
write.csv(d, file=sub("\\.txt", "\\.csv", filename))
head(d)

## Number of tweets by date for the last n days
recentDays <- 30
message(paste("Plotting number of tweets by date in the last", recentDays, "days."))
recent <- subset(d, date>=(max(date)-recentDays))
byDate <- as.data.frame(table(recent$date))
names(byDate) <- c("date", "tweets")
png(paste(searchTerm, "barplot-tweets-by-date.png", sep="--"), w=1000, h=700)
par(mar=c(8.5,4,4,1))
with(byDate, barplot(tweets, names=date, col="black", las=2, cex.names=1.2, cex.axis=1.2, mar=c(10,4,4,1), main=paste("Number of Tweets by Date", paste("Term:", searchTerm), sep="\n")))
dev.off()
# ggplot(byDate) + geom_bar(aes(date, tweets), stat="identity", fill="black") + theme_bw() + ggtitle("Number of Tweets by Date") + theme(axis.text.x=element_text(angle=90, hjust=1))

## Number of tweets by hour
message("Plotting number of tweets by hour.")
byHour <- as.data.frame(table(d$hour))
names(byHour) <- c("hour", "tweets")
png(paste(searchTerm, "barplot-tweets-by-hour.png", sep="--"), w=1000, h=700)
with(byHour, barplot(tweets, names.arg=hour, col="black", las=1, cex.names=1.2, cex.axis=1.2, main=paste("Number of Tweets by Hour", paste("Term:", searchTerm), paste("Date:", Sys.Date()), sep="\n")))
dev.off()
# ggplot(byHour) + geom_bar(aes(hour, tweets), stat="identity", fill="black") + theme_bw() + ggtitle("Number of Tweets by Hour")

## Barplot of top 20 hashtags
message("Plotting top 20 hashtags.")
words <- unlist(strsplit(d$text, " "))
head(table(words))
ht <- words[grep("^#", words)]
ht <- tolower(ht)
ht <- gsub("[^A-Za-z0-9]", "", ht) # remove anything not starting with a letter or number
ht <- as.data.frame(table(ht))
ht <- subset(ht, ht!="") # remove blanks
ht <- ht[sort.list(ht$Freq, decreasing=TRUE), ]
ht <- ht[-1, ] # remove the term you're searching for? it usually dominates the results.
ht <- head(ht, 20)
head(ht)
png(paste(searchTerm, "barplot-top-hashtags.png", sep="--"), w=1000, h=700)
par(mar=c(5,10,4,2))
with(ht[order(ht$Freq), ], barplot(Freq, names=ht, horiz=T, col="black", las=1, cex.names=1.2, cex.axis=1.2, main=paste("Top Hashtags", paste("Term:", searchTerm), paste("Date:", Sys.Date()), sep="\n")))
dev.off()
# ggplot(ht) + geom_bar(aes(ht, Freq), fill = "black", stat="identity") + coord_flip() + theme_bw() + ggtitle("Top hashtags")

## Top Users
message("Plotting most prolific users.")
users <- as.data.frame(table(d$user))
colnames(users) <- c("user", "tweets")
users <- users[order(users$tweets, decreasing=T), ]
users <- subset(users, user!=searchTerm)
users <- head(users, 20)
head(users)
png(paste(searchTerm, "barplot-top-users.png", sep="--"), w=1000, h=700)
par(mar=c(5,10,4,2))
with(users[order(users$tweets), ], barplot(tweets, names=user, horiz=T, col="black", las=1, cex.names=1.2, cex.axis=1.2, main=paste("Most prolific users", paste("Term:", searchTerm), paste("Date:", Sys.Date()), sep="\n")))
dev.off()

## Word clouds
message("Plotting a wordcloud.")
words <- unlist(strsplit(d$text, " "))
words <- grep("^[A-Za-z0-9]+$", words, value=T)
words <- tolower(words)
words <- words[-grep("^[rm]t$", words)] # remove "RT"
words <- words[!(words %in% stopwords("en"))] # remove stop words
words <- words[!(words %in% c("mt", "rt", "via", "using", 1:9))] # remove RTs, MTs, via, and single digits.
wordstable <- as.data.frame(table(words))
wordstable <- wordstable[order(wordstable$Freq, decreasing=T), ]
wordstable <- wordstable[-1, ] # remove the hashtag you're searching for? need to functionalize this.
head(wordstable)
png(paste(searchTerm, "wordcloud.png", sep="--"), w=800, h=800)
wordcloud(wordstable$words, wordstable$Freq, scale = c(8, .2), min.freq = 3, max.words = 200, random.order = FALSE, rot.per = .15, colors = brewer.pal(8, "Dark2"))
#mtext(paste(paste("Term:", searchTerm), paste("Date:", Sys.Date()), sep=";"), cex=1.5)
dev.off()

message(paste(searchTerm, ": All done!\n"))
}

filelist <- as.list(list.files("..", pattern="bog14.txt", full.names=T))
#filelist <- list("../bioinformatics.txt", "../metagenomics.txt", "../rstats.txt", "../rna-seq.txt", "../cville.txt", "../SFAF2013.txt")
lapply(filelist, twitterchivePlots)

# Using imagemagick:
# system("montage bog14--barplot-tweets-by-date.png bog14--barplot-tweets-by-hour.png bog14--barplot-top-hashtags.png bog14--barplot-top-users.png -tile 2x -geometry -0-0 bog14--montage.jpg")

0 comments on commit c5faeec

Please sign in to comment.