GitHub - mja/aviary: A Twitter archiver, making tweets safe for historians.

mja / aviary Public

Notifications You must be signed in to change notification settings
Fork 2
Star 11

A Twitter archiver, making tweets safe for historians.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
assets		assets
.gitignore		.gitignore
CHANGELOG		CHANGELOG
MIT-LICENSE		MIT-LICENSE
MOTIVATION		MOTIVATION
README		README
TODO		TODO
aviary.rb		aviary.rb
aviary.sh		aviary.sh
cat_statuses.sh		cat_statuses.sh
config-example.yml		config-example.yml
timeline.rb		timeline.rb
twitter_archiver.rb		twitter_archiver.rb

Repository files navigation

== Aviary+, a Twitter Archiver

Aviary+ is a simple Ruby script to retrieve and archive your tweets. Twitter 
"claim no intellectual property rights over the material you provide to the 
Twitter service," so you should have a way of storing and repurposing your 
data.

The original Aviary script (http://github.com/mja/aviary) was written by
Mark James Adams. It only downloaded your tweets, from your timeline. I've
updated it to login via OAuth, download your mentions, your direct messages,
your sent direct messages, and any tweets that you have replied to.

== Requirements

Aviary+ requires the Launchy, nokogiri, and oauth gems.

== Running Aviary+

First, copy the config-example.yml to config.yml, then edit the default
path, for archiving tweets. The default is User/Documents/aviary, which
is probably fine for most people.

If you install your gems with rubygems, you'll need to execute Aviary+
like this:

  $ ruby -rubygems aviary.rb

If you're using Ruby 1.9, this step won't be necessary.

Next, add a Twitter account to aviary by running with
  $ ruby aviary.rb --add accountname

To add my jmartindf account I would execute it like this
  $ ruby aviary.rb --add jmartindf

Run Aviary+ with:

  $ ruby aviary.rb --updates [new|all] [--page XXX]

Aviary+ will create a directory called USERNAME and begin parsing your Twitter 
archive and downloading the raw XML representation of each tweet. The 
"--updates all" option will parse and save every tweet, while "--updates new" 
will will only request tweets that are newer than the most recent one already
present in the USERNAME direcotry.  already been downloaded. Optionally,
specify a page number to retrieve tweets starting from that page. This is
useful when restarting the script after a timeout.

Use the cat_statuses.sh script to combine all downloaded tweets for a user
into one statuses XML file.

  $ SCREEN_NAME=username ./cat_statuses.sh > username.xml
  
will read in all tweets in the username directory and concatenate them 
together in a file called username.xml. 

== Creating an events timeline

Assuming you have already put all of your status files into a single XML file
using cat_statuses.sh, you can transform it into a SIMILE timeline 
(http://simile.mit.edu/timeline/) using the assets/simili.xsl transformation 
stylesheet. View the data by modifying assets/events_template.html, replacing 
"screenname_events.xml" (Line 66) with the filename of your transformed 
statuses XML document.

This can also be done automatically using the timeline.rb script:

    $ ruby timeline.rb -u USERNAME
    
which will read in single tweets from the USERNAME's directory and construct
an events timeline (USERNAME_events.xml) in the timelines directory.
    

== What next?

Now your data is yours! Start thinking about your own way to utilize your
own data. Start thinking about your children's children's children. Will they
be accessing the possibly non-existent http://twitter.com or would they
prefer a paper record of "what you were doing"?