Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
FloodMonkey - Manage Firehose-level Activity Streams from the MySpace Flood API
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
app
config
db
lib
log
notes
public
test
tmp
vendor
.gitignore
Capfile
README.textile
Rakefile
Thorfile
config.ru
dependencies
init.rb
unicorn-conf.rb
work

README.textile

FloodMonkey – Supervise your stream

FloodMonkey helps manage subscriptions to the MySpace Flood, their real-time activity stream API.

FloodMonkey is designed for people looking to capture firehose-level (100%) data streams. We use it to haul in 25GB+ of raw data every day with a single small EC2 instance and room to spare.

  • A simple Sinatra frontend to manage and monitor subscriptions
  • An even yet still more lightweight rack responder to stream callbacks directly to disk.
  • Scripts to deploy, configure and monitor the various moving parts.

FloodMonkey is not designed for anything real-time. The endpoint streams everything to disk for bulk processing later — we use wukong and wuclan.

How to

  • Copy config/settings.example.yml to config/settings.yml and enter your info everywhere you see “CHANGEME”
  • Create a MySpaceId app. Take it out of sandbox mode.
  • Click on the page for the app and set up the OAuth urls. Be careful about the validation URL: it has to omit the http://, any third-level domain in the host, and any trailing “/” or query params.
   External URL                  http://www.domain.com/endpoint/myspace
   External Callback Validation: domain.com/endpoint/myspace/cb
  • While you’re there, copy out the consumer key and secret. Paste them into the settings.yml
  • create a directory on a large local disk and replace the ‘work’ symlink to point to it. A full stream will fill 25GB a day or so, so be prepared.
  • While developing, I run shotgun --server=thin --host=0.0.0.0 --port=8080 config.ru (you’ll have to run on port 80 or use unicorn in development mode to use the publishing callbacks and some oauth calls).
  • For production, I use the nginx+unicorn setup contained in config/production. You can start unicorn by hand with unicorn -E production -c unicorn-conf.rb config.ru and then manipulate it with signals.
  • try clicking on the ‘test post’ link. It should make a bogus post (stored in a file labelled ‘test’) that you can watch from each end.

Caveats

  • You actually don’t need to authenticate as a user to myspace, just as an app. There are facilities for three-legged oauth in the app – making stream API requests on behalf of a user – but they’re not hooked to anything interesting at the moment.
  • The app streams directly to disk, it doesn’t parse the received data or load it into a database. This is best done by separate worker processes.
  • Make sure you use the vendor’ed OAuth gem, or pull from the current (12/2009) http://github.com/mojodna/oauth edge version. MySpace is overly finicky about the structure of OAuth requests, insisting that the query params land in sort order. The latest versions of the gem tolerate this peccadillo.

Credits

  • Big ups to the very patient Jamie and Joel at MySpace for including us in the beta program, and working through a series of mysterious bugs.
  • The frontend app is based on Monk / Cartilage, a skeleton for building Sinatra apps.
  • The OAuth gem makes this thing work

Setup

Ruby Enterprise Edition

    sudo apt-get install libssl-dev libreadline5-dev;
    latest_ree=ruby-enterprise-1.8.7-2009.10.tar.gz;
    wget "http://rubyforge.org/frs/download.php/66162/${latest_ree}.tar.gz" ;
    tar xvzf ${latest_ree}.tar.gz;
    cd ${latest_ree_gz};
    sudo ./installer --auto /usr/local/ree;

Nginx

    latest_nginx=nginx-0.7.64;
    wget "http://sysoev.ru/nginx/${latest_nginx}.tar.gz" ;
    tar xvzf ${latest_nginx}.tar.gz ;
    sudo /usr/local/ree/bin/passenger-install-nginx-module --prefix=/usr/local/nginx --nginx-source-dir=/usr/local/src/$latest_nginx --extra-configure-flags=' --with-http_ssl_module --with-http_gzip_static_module --add-module=/usr/local/src/ngx-fancyindex-0.2 --add-module=/usr/local/src/nginx_upload_module-2.0.10 --add-module=/usr/local/src/nginx_uploadprogress_module --with-pcre --with-ld-opt=-lssl --with-sha1-asm --with-md5=/usr/include --with-md5-asm --http-proxy-temp-path="/tmp/nginx-proxy_temp" --http-fastcgi-temp-path="/tmp/nginx-fastcgi_temp" --http-client-body-temp-path="/tmp/nginx-client_body_temp" --pid-path="/var/run/nginx/nginx.pid" --conf-path="/slice/etc/nginx/nginx.conf" --lock-path="/var/run/nginx/nginx.lock" --http-log-path="/var/log/nginx/access.log" --error-log-path="/var/log/nginx/error.log" '

Add the deploy user

    sudo adduser deploy
    sudo usermod -a  -G admin,sudo deploy

Deploy

    cap deploy:setup
    scp -p config/settings.yml remotehost.com:/var/www/cartilage/shared/system/settings.yml
    cap deploy:cold
Something went wrong with that request. Please try again.