feed_archive

Mihai Parparita edited this page Jul 4, 2013 · 1 revision
Clone this wiki locally

Saves public feed data from Google Reader's feed archive.

Google Reader has (for the most part) a copy of all blog posts and other feed items published since its launch in late 2005 (assuming that at least one Reader user subscribed to the feed). This makes it an invaluable resource for sites that disappear, can serve as a backup mechanism and enables tools to be created.

Presumably access to this data is also going away come July 2013, and thus this tool can be used to get one last shot at archiving feeds you might want to refer to later.

The easiest way to use it is get the OPML file with all your Reader subscriptions, and run it like so:

bin/feed_archive \
    --opml_file=~/Downloads/feeds.opml \
    --output_directory=~/Downloads/feed_archive

The destination specified by --output_directory will be populated with one file per feed, named after its URL. The file contains all items that Reader ever saw in that feed, in the Atom format. Google Reader normally omits unknown (namespaced) elements in its API output, but in the script makes an attempt to use high-fidelity mode to reconstruct the original data as much as possible.

If you have specific feeds you'd like to save the archive for, instead of --opml_file you can also pass in feed URLs as command line arguments:

bin/feed_archive \
    --output_directory=~/Downloads/feed_archive \
    http://googlereader.blogspot.com/atom.xml \
    http://persistent.info/atom.xml \
    ...

The tool supports additional arguments for controlling how many items are fetched, see bin/feed_archive --help for more information.