FloodMonkey helps manage subscriptions to the MySpace Flood, their real-time activity stream API.
- Real-time Stream: Functional Overview
- Real-time Stream API
- Stream Subscription API
- Subscription Query Spec
- Stream Subscription Spec
- MySpace Blog Post announcing the API launch
FloodMonkey is designed for people looking to capture firehose-level (100%) data streams. We use it to haul in 25GB+ of raw data every day with a single small EC2 instance and room to spare.
- A simple Sinatra frontend to manage and monitor subscriptions
- An even yet still more lightweight rack responder to stream callbacks directly to disk.
- Scripts to deploy, configure and monitor the various moving parts.
FloodMonkey is not designed for anything real-time. The endpoint streams everything to disk for bulk processing later — we use wukong and wuclan.
- Copy config/settings.example.yml to config/settings.yml and enter your info everywhere you see “CHANGEME”
- Create a MySpaceId app. Take it out of sandbox mode.
- Click on the page for the app and set up the OAuth urls. Be careful about the validation URL: it has to omit the http://, any third-level domain in the host, and any trailing “/” or query params.
External URL http://www.domain.com/endpoint/myspace External Callback Validation: domain.com/endpoint/myspace/cb
- While you’re there, copy out the consumer key and secret. Paste them into the settings.yml
- create a directory on a large local disk and replace the ‘work’ symlink to point to it. A full stream will fill 25GB a day or so, so be prepared.
- While developing, I run
shotgun --server=thin --host=0.0.0.0 --port=8080 config.ru
(you’ll have to run on port 80 or use unicorn in development mode to use the publishing callbacks and some oauth calls). - For production, I use the nginx+unicorn setup contained in config/production. You can start unicorn by hand with
unicorn -E production -c unicorn-conf.rb config.ru
and then manipulate it with signals.
- try clicking on the ‘test post’ link. It should make a bogus post (stored in a file labelled ‘test’) that you can watch from each end.
- You actually don’t need to authenticate as a user to myspace, just as an app. There are facilities for three-legged oauth in the app – making stream API requests on behalf of a user – but they’re not hooked to anything interesting at the moment.
- The app streams directly to disk, it doesn’t parse the received data or load it into a database. This is best done by separate worker processes.
- Make sure you use the vendor’ed OAuth gem, or pull from the current (12/2009) http://github.com/mojodna/oauth edge version. MySpace is overly finicky about the structure of OAuth requests, insisting that the query params land in sort order. The latest versions of the gem tolerate this peccadillo.
- Big ups to the very patient Jamie and Joel at MySpace for including us in the beta program, and working through a series of mysterious bugs.
- The frontend app is based on Monk / Cartilage, a skeleton for building Sinatra apps.
- The OAuth gem makes this thing work
Ruby Enterprise Edition
sudo apt-get install libssl-dev libreadline5-dev; latest_ree=ruby-enterprise-1.8.7-2009.10.tar.gz; wget "http://rubyforge.org/frs/download.php/66162/${latest_ree}.tar.gz" ; tar xvzf ${latest_ree}.tar.gz; cd ${latest_ree_gz}; sudo ./installer --auto /usr/local/ree;
Nginx
latest_nginx=nginx-0.7.64; wget "http://sysoev.ru/nginx/${latest_nginx}.tar.gz" ; tar xvzf ${latest_nginx}.tar.gz ; sudo /usr/local/ree/bin/passenger-install-nginx-module --prefix=/usr/local/nginx --nginx-source-dir=/usr/local/src/$latest_nginx --extra-configure-flags=' --with-http_ssl_module --with-http_gzip_static_module --add-module=/usr/local/src/ngx-fancyindex-0.2 --add-module=/usr/local/src/nginx_upload_module-2.0.10 --add-module=/usr/local/src/nginx_uploadprogress_module --with-pcre --with-ld-opt=-lssl --with-sha1-asm --with-md5=/usr/include --with-md5-asm --http-proxy-temp-path="/tmp/nginx-proxy_temp" --http-fastcgi-temp-path="/tmp/nginx-fastcgi_temp" --http-client-body-temp-path="/tmp/nginx-client_body_temp" --pid-path="/var/run/nginx/nginx.pid" --conf-path="/slice/etc/nginx/nginx.conf" --lock-path="/var/run/nginx/nginx.lock" --http-log-path="/var/log/nginx/access.log" --error-log-path="/var/log/nginx/error.log" '
Add the deploy user
sudo adduser deploy sudo usermod -a -G admin,sudo deploy
Deploy
cap deploy:setup scp -p config/settings.yml remotehost.com:/var/www/cartilage/shared/system/settings.yml cap deploy:cold