Skip to content

FloodMonkey - Manage Firehose-level Activity Streams from the MySpace Flood API

Notifications You must be signed in to change notification settings

mrflip/flood_monkey

Repository files navigation

FloodMonkey – Supervise your stream

FloodMonkey helps manage subscriptions to the MySpace Flood, their real-time activity stream API.

FloodMonkey is designed for people looking to capture firehose-level (100%) data streams. We use it to haul in 25GB+ of raw data every day with a single small EC2 instance and room to spare.

  • A simple Sinatra frontend to manage and monitor subscriptions
  • An even yet still more lightweight rack responder to stream callbacks directly to disk.
  • Scripts to deploy, configure and monitor the various moving parts.

FloodMonkey is not designed for anything real-time. The endpoint streams everything to disk for bulk processing later — we use wukong and wuclan.

How to

  • Copy config/settings.example.yml to config/settings.yml and enter your info everywhere you see “CHANGEME
  • Create a MySpaceId app. Take it out of sandbox mode.
  • Click on the page for the app and set up the OAuth urls. Be careful about the validation URL: it has to omit the http://, any third-level domain in the host, and any trailing “/” or query params.
   External URL                  http://www.domain.com/endpoint/myspace
   External Callback Validation: domain.com/endpoint/myspace/cb
  • While you’re there, copy out the consumer key and secret. Paste them into the settings.yml
  • create a directory on a large local disk and replace the ‘work’ symlink to point to it. A full stream will fill 25GB a day or so, so be prepared.
  • While developing, I run shotgun --server=thin --host=0.0.0.0 --port=8080 config.ru (you’ll have to run on port 80 or use unicorn in development mode to use the publishing callbacks and some oauth calls).
  • For production, I use the nginx+unicorn setup contained in config/production. You can start unicorn by hand with unicorn -E production -c unicorn-conf.rb config.ru and then manipulate it with signals.
  • try clicking on the ‘test post’ link. It should make a bogus post (stored in a file labelled ‘test’) that you can watch from each end.

Caveats

  • You actually don’t need to authenticate as a user to myspace, just as an app. There are facilities for three-legged oauth in the app – making stream API requests on behalf of a user – but they’re not hooked to anything interesting at the moment.
  • The app streams directly to disk, it doesn’t parse the received data or load it into a database. This is best done by separate worker processes.
  • Make sure you use the vendor’ed OAuth gem, or pull from the current (12/2009) http://github.com/mojodna/oauth edge version. MySpace is overly finicky about the structure of OAuth requests, insisting that the query params land in sort order. The latest versions of the gem tolerate this peccadillo.

Credits

  • Big ups to the very patient Jamie and Joel at MySpace for including us in the beta program, and working through a series of mysterious bugs.
  • The frontend app is based on Monk / Cartilage, a skeleton for building Sinatra apps.
  • The OAuth gem makes this thing work

Setup

Ruby Enterprise Edition

    sudo apt-get install libssl-dev libreadline5-dev;
    latest_ree=ruby-enterprise-1.8.7-2009.10.tar.gz;
    wget "http://rubyforge.org/frs/download.php/66162/${latest_ree}.tar.gz" ;
    tar xvzf ${latest_ree}.tar.gz;
    cd ${latest_ree_gz};
    sudo ./installer --auto /usr/local/ree;

Nginx

    latest_nginx=nginx-0.7.64;
    wget "http://sysoev.ru/nginx/${latest_nginx}.tar.gz" ;
    tar xvzf ${latest_nginx}.tar.gz ;
    sudo /usr/local/ree/bin/passenger-install-nginx-module --prefix=/usr/local/nginx --nginx-source-dir=/usr/local/src/$latest_nginx --extra-configure-flags=' --with-http_ssl_module --with-http_gzip_static_module --add-module=/usr/local/src/ngx-fancyindex-0.2 --add-module=/usr/local/src/nginx_upload_module-2.0.10 --add-module=/usr/local/src/nginx_uploadprogress_module --with-pcre --with-ld-opt=-lssl --with-sha1-asm --with-md5=/usr/include --with-md5-asm --http-proxy-temp-path="/tmp/nginx-proxy_temp" --http-fastcgi-temp-path="/tmp/nginx-fastcgi_temp" --http-client-body-temp-path="/tmp/nginx-client_body_temp" --pid-path="/var/run/nginx/nginx.pid" --conf-path="/slice/etc/nginx/nginx.conf" --lock-path="/var/run/nginx/nginx.lock" --http-log-path="/var/log/nginx/access.log" --error-log-path="/var/log/nginx/error.log" '

Add the deploy user

    sudo adduser deploy
    sudo usermod -a  -G admin,sudo deploy

Deploy

    cap deploy:setup
    scp -p config/settings.yml remotehost.com:/var/www/cartilage/shared/system/settings.yml
    cap deploy:cold

About

FloodMonkey - Manage Firehose-level Activity Streams from the MySpace Flood API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages