reader_archive Format

Mihai Parparita edited this page Jun 28, 2013 · 2 revisions
Clone this wiki locally

The reader_archive tool generates a directory that contains your archived Reader account.

Directory Structure

data/

Your account settings and related data. Stored as JSON files. For a detailed list of all properties, see namedtuple subclasses in base/api.py.

  • subscriptions.json: The feeds you were subscribed to.
  • tags.json: Your tags (both subscription-level and item-level).
  • bundles.json: Your subscription bundles.
  • recommendations.json: A few of your current feed recommendations.
  • preferences.json: Your account settings (expanded vs. list view, all vs. unread-only items, etc.).
  • stream-preferences.json: Your per-subscription settings (e.g. sort order).
  • friends.json: The people you were following (and who was following you) before the sharepocalypse.
  • sharing-acl.json: Your sharing settings.
  • sharing-groups.json: The sharing groups you had set up.

streams/

The streams (feeds, shared items, tags) that you had in your account.

For each stream, the list of items (represented as IDs) is stored, along with the time (in microseconds since the epoch) that item appeared in that stream. For example, if an item appeared in XKCD's feed on May 30, 2010, you read it a day later, and then shared it a day after that, then that item's ID would be in the feed/http://xkcd.com/rss.xml stream with a timestamp of 1275220800123456 (May 30), in the user/-/state/com.google/read stream with a timestamp of 1275307200123456 (May 31) and in the user/-/state/com.google/broadcast stream with a timestamp of 1275393600234567 (June 1).

Streams are stored as one per file. See base.paths.stream_id_to_file_name for how to go from a stream ID to its path.

items/

The bodies of the items, keyed by item ID. Items are stored as Atom data, including most XML namespaced elements that were in the feed at the time that Reader crawled it. Items are grouped by directory and file to avoid having a item per file, which would be expensive for very large accounts. See base.paths.item_id_to_file_path for how to get a file path for an item ID.

comments/

Comments on shared items (both your own and those of people you followed). Comments are grouped by item (using the same path scheme), one file per item. If multiple people shared the same item and those shares had comments, then the comments will be in the same file. You can use the venue_stream_id to separate the conversations.

_raw_data/

The cached API responses from Google Reader that were used to construct the above files. If there is something that was not extracted correctly, you may wish to dig around in the API response for the data.

More Reading

For more information on the stream IDs and item IDs that Google Reader used, see this collection of pages.