Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

A library and REST API that converts Facebook, Google+, Instagram, and Twitter data to ActivityStreams format.

branch: master
Octocat-spinner-32 beautifulsoup @ f29f8d2 add beautifulsoup submodule
Octocat-spinner-32 mf2py @ a76437d update mf2py
Octocat-spinner-32 oauth_dropins @ b60023e update oauth_dropins
Octocat-spinner-32 static add format query param to demo
Octocat-spinner-32 templates switch facebook likes to use objectType='activity' and verb='like'
Octocat-spinner-32 testdata Added new test cases for p-repost-of style repost-context (previously
Octocat-spinner-32 .gitignore add google_client_{id,secret}
Octocat-spinner-32 .gitmodules add submodule of @KartikPrabhu's mf2py fork that uses BeautifulSoup
Octocat-spinner-32 README.md update future work section, activitystreams.render_html() => microfor…
Octocat-spinner-32 __init__.py add __init__.py to make this a package
Octocat-spinner-32 activitystreams.py add <meta charset="utf-8"> to html output. thanks @kevinmarks!
Octocat-spinner-32 activitystreams_test.py drop module symlinks; instead, fully qualify all submodule imports
Octocat-spinner-32 alltests.py add back symlinks
Octocat-spinner-32 app.py importing yet again: drop symlinks and sys.modules munging, add sys.p…
Octocat-spinner-32 app.yaml add ssl library to app.yamls
Octocat-spinner-32 app.yaml.facebook add ssl library to app.yamls
Octocat-spinner-32 app.yaml.instagram add ssl library to app.yamls
Octocat-spinner-32 app.yaml.twitter add ssl library to app.yamls
Octocat-spinner-32 appengine_config.py add submodule of @KartikPrabhu's mf2py fork that uses BeautifulSoup
Octocat-spinner-32 facebook.py set fb_object_id field for Facebook posts
Octocat-spinner-32 facebook_test.py set fb_object_id field for Facebook posts
Octocat-spinner-32 googleplus.py fix cache gating in G+ and Twitter get_activities()
Octocat-spinner-32 googleplus_api_discovery.json use a canned G+ API discovery doc in tests instead of fetching over t…
Octocat-spinner-32 googleplus_test.py optimize cache usage in get_activities() to one get_multi() and one s…
Octocat-spinner-32 instagram.py add activity_author_id kwarg to get_comment(), implement it for Facebook
Octocat-spinner-32 instagram_test.py first pass at Instagram.create(). also make user_to_actor() accept ob…
Octocat-spinner-32 microformats2.py get_string_urls before putting urls into a set to avoid unhashable 'd…
Octocat-spinner-32 microformats2_test.py microformats2.json_to_object: priorities rsvp, repost, like above not…
Octocat-spinner-32 source.py fix character encoding bug introduced in bf81214
Octocat-spinner-32 source_test.py fix character encoding bug introduced in bf81214
Octocat-spinner-32 testdata_test.py Added new test cases for p-repost-of style repost-context (previously
Octocat-spinner-32 twitter.py get larger twitter user avatars using a trick from Bridgy source (rem…
Octocat-spinner-32 twitter_test.py set fake TWITTER_APP_KEY and TWITTER_APP_SECRET so tests can run with…
README.md

ActivityStreams activitystreams-unofficial

About

This is a library and REST API that converts Facebook, Google+, Instagram, and Twitter data to ActivityStreams format. You can try it out with these interactive demos:

http://facebook-activitystreams.appspot.com/
http://twitter-activitystreams.appspot.com/
http://instagram-activitystreams.appspot.com/

It's part of a suite of projects that implement the OStatus federation protocols for the major social networks. The other projects include portablecontacts-, salmon-, webfinger-, and ostatus-unofficial.

License: This project is placed in the public domain.

Using

The library and REST API are both based on the OpenSocial Activity Streams service.

Let's start with an example. This code using the library:

from activitystreams_unofficial import twitter
...
tw = twitter.Twitter(ACCESS_TOKEN_KEY, ACCESS_TOKEN_SECRET)
tw.get_activities(group_id='@friends')

is equivalent to this HTTP GET request:

https://twitter-activitystreams.appspot.com/@me/@friends/@app/
  ?access_token_key=ACCESS_TOKEN_KEY&access_token_secret=ACCESS_TOKEN_SECRET

They return the authenticated user's Twitter stream, ie tweets from the people they follow. Here's the JSON output:

{
  "itemsPerPage": 10,
  "startIndex": 0,
  "totalResults": 12
  "items": [{
      "verb": "post",
      "id": "tag:twitter.com,2013:374272979578150912"
      "url": "http://twitter.com/evanpro/status/374272979578150912",
      "content": "Getting stuff for barbecue tomorrow. No ribs left! Got some nice tenderloin though. (@ Metro Plus Famille Lemay) http://t.co/b2PLgiLJwP",
      "actor": {
      "username": "evanpro",
        "displayName": "Evan Prodromou",
        "description": "Prospector.",
        "url": "http://twitter.com/evanpro",
      },
      "object": {
        "tags": [{
            "url": "http://4sq.com/1cw5vf6",
            "startIndex": 113,
            "length": 22,
            "objectType": "article"
          }, ...],
      },
    }, ...]
  ...
}

The request parameters are the same for both, all optional: USER_ID is a source-specific id or @me for the authenticated user. GROUP_ID may be @all, @friends (currently identical to @all), or @self. APP_ID is currently ignored; best practice is to use @app as a placeholder.

Paging is supported via the startIndex and count parameters. They're self explanatory, and described in detail in the OpenSearch spec and OpenSocial spec.

Output data is JSON Activity Streams 1.0 objects wrapped in the OpenSocial envelope, which puts the activities in the top-level items field as a list and adds the itemsPerPage, totalCount, etc. fields.

Most Facebook requests and all Twitter, Google+, and Instagram requests will need OAuth access tokens. If you're using Python on Google App Engine, oauth-dropins is an easy way to add OAuth client flows for these sites. Otherwise, here are the sites' authentication docs: Facebook, Google+, Instagram, Twitter.

If you get an access token and pass it along, it will be used to sign and authorize the underlying requests to the sources providers. See the demos on the REST API endpoints above for examples.

Using the REST API

The endpoints above all serve the OpenSocial Activity Streams REST API. Request paths are of the form:

/USER_ID/GROUP_ID/APP_ID/ACTIVITY_ID?startIndex=...&count=...&format=FORMAT&access_token=...

All query parameters are optional. FORMAT may be json (the default), xml, or atom, both of which return Atom. The rest of the path elements and query params are described above.

Errors are returned with the appropriate HTTP response code, e.g. 403 for Unauthorized, with details in the response body.

To use the REST API in an existing ActivityStreams client, you'll need to hard-code exceptions for the domains you want to use e.g. facebook.com, and redirect HTTP requests to the corresponding endpoint above.

Using the library

See the example above for a quick start guide.

Clone or download this repo into a directory named activitystreams_unofficial (note the underscore instead of dash). Each source works the same way. Import the module for the source you want to use, then instantiate its class by passing the HTTP handler object. The handler should have a request attribute for the current HTTP request.

The useful methods are get_activities() and get_actor(), which returns the current authenticated user (if any). See the individual method docstrings for details. All return values are Python dicts of decoded ActivityStreams JSON.

The microformats2.*_to_html() functions are also useful for rendering ActivityStreams objects as nicely formatted HTML.

Future work

We'd love to add more sites! Off the top of my head, YouTube, Tumblr, WordPress.com, Sina Weibo, Qzone, and RenRen would be good candidates. If you're looking to get started, implementing a new site is a good place to start. It's pretty self contained and the existing sites are good examples to follow, but it's a decent amount of work, so you'll be familiar with the whole project by the end.

Development

Pull requests are welcome! Feel free to ping me with any questions.

Most dependencies are included as git submodules. Be sure to run git submodule update --init --recursive after cloning this repo.

This ActivityStreams validator is useful for manual testing.

Requires the App Engine SDK and expects that it's in ~/google_appengine. A symlink is fine. Sorry about the hard-coded path; if it annoys you, feel free to send a pull request that makes it configurable!

You can run the unit tests with alltests.py. If you send a pull request, please include (or update) a test for the new functionality if possible!

Note the app.yaml.* files, one for each App Engine app id. To work on or deploy a specific app id, symlink app.yaml to its app.yaml.xxx file. Likewise, if you add a new site, you'll need to add a corresponding app.yaml.xxx file.

To deploy:

rm -f app.yaml && ln -s app.yaml.twitter app.yaml && \
  ~/google_appengine/appcfg.py --oauth2 update . && \
rm -f app.yaml && ln -s app.yaml.facebook app.yaml && \
  ~/google_appengine/appcfg.py --oauth2 update . && \
rm -f app.yaml && ln -s app.yaml.instagram app.yaml && \
  ~/google_appengine/appcfg.py --oauth2 update .

Related work

Gnip is by far the most complete project in this vein. It similarly converts social network data to ActivityStreams and supports many more source networks. Unfortunately, it's commercial, there's no free trial or self-serve signup, and plans start at $500.

DataSift looks like broadly the same thing, except they offer self-serve, pay as you go billing, and they use their own proprietary output format instead of ActivityStreams. They're also aimed more at data mining as opposed to individual user access.

Cliqset's FeedProxy used to do this kind of format translation, but unfortunately it and Cliqset died.

Facebook used to officially support ActivityStreams, but that's also dead.

There are a number of products that download your social network data, normalize it, and let you query and visualize it. SocialSafe and ThinkUp are two of the most mature. There's also the lifelogging/lifestream aggregator vein of projects that pull data from multiple source sites. Storytlr is a good example. It doesn't include Facebook, Google+, or Instagram, but does include a number of smaller source sites. There are lots of others, e.g. the Lifestream WordPress plugin. Unfortunately, these are generally aimed at end users, not developers, and don't usually expose libraries or REST APIs.

On the open source side, there are many related projects. php-mf2-shim adds microformats2 to Facebook and Twitter's raw HTML. sockethub is a similar "polyglot" approach, but more focused on writing than reading.

TODO

  • https kwarg to get_activities() etc that converts all http links to https
  • convert most of the per-site tests to testdata tests
Something went wrong with that request. Please try again.