IBC Crawl

Test crawling the ALS Ice Bucket Challenge phenomenon on Twitter.

Motivation

The goal is to visualize how the Ice Bucket Challenge went viral in late summer 2014. There are several lists online that try to flesh out a connected graph, but it is a difficult task to make such a list exhaustive even if the labor is crowdsourced.

This is one experiment in automating the process of gathering that information by

Harvesting tweets about the Ice Bucket Challenge that contain information about who challenged whom and links to media of accepted challenges
Making it easier to sift through those tweets to verify the accuracy of relevant information

Development

Basics

This assumes development on Mac OS X. Things you should have installed are listed below with the easiest way to get them if you do not:

Homebrew

$ ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

Git

$ brew install git

RVM

$ curl -sSL https://get.rvm.io | bash -s stable

PostgreSQL

Download from here, drag to the applications folder, and double-click.

If Terminal responds to brew, git, rvm, and psql, continue on.

Setup

Clone the app and bundle:

$ git clone git@github.com:O-I/ibc_crawl.git
$ cd ibc_crawl
$ bundle install

You'll need Twitter keys. Get them here. Then create a .env file in the root that mimics the structure of .env_example using your development keys.

Create and migrate the database:

$ rake db:create
$ rake db:migrate

Currently, there is only one rake task to seed the database, rake ibc:crawl. It starts with Chris Kennedy's completed challenge (considered to be the origin of the phenomenon) and iteratively collects the at most 3 earliest tweets (of the last 200) of all mentioned users who reference the ice bucket challenge.

The task defaults to 7 degrees of separation (about 900 people and 1700 tweets) which, with the pauses I have built in for rate limiting, runs fairly slow for my taste. To experiment with a smaller initial set of tweets, say, 5 degrees of separation out with only the earliest tweet per user mentioned, run rake ibc:crawl[5,1].

Run rails s and point your browser to http://localhost:3000 and you should be good to go!

To do

Although it's interesting to use Chris Kennedy's tweet as the sole seed for a deep crawl, it's probably better to start with a sizable list of known Ice Bucket Challenge participants and only go a few iterations deep.

I'm working on testing a task that implements the latter, breadth-first approach both for Twitter and the Facebook Graph API. Hopefully, I can use a combination of overlapping user mention data and post dates to tease out who challenged whom automagically.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
spec		spec
vendor/assets		vendor/assets
.env_example		.env_example
.gitignore		.gitignore
.rspec		.rspec
.travis.yml		.travis.yml
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru

O-I/ibc_crawl

Folders and files

Latest commit

History

Repository files navigation

IBC Crawl

Motivation

Development

Basics

Setup

To do

About

Resources

Stars

Watchers

Forks

Languages