Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Smelt from a plentiferous gallimaufry of requests an agglomerated bale of responses. With, y'know, concurrency and all that."
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
lib
spec
.document
.gitignore
.rspec
.watchr
Gemfile
Gemfile.lock
LICENSE.txt
README.textile
Rakefile
VERSION
son_of_a_batch.gemspec

README.textile

son_of_a_batch

Smelt from a plentiferous gallimaufry of requests an agglomerated bale of responses. With, y’know, concurrency and all that.

Install this as a shim to allow batch requests against a remote API.

h2.

Results vs Errors

Note that a request may do any of the following:

  • The request might complete and be successful (eg 200 OK). Each of these will be in the “results” block, with its status code set appropriately.
  • The request might complete but be unsuccessful (eg 404 Not Found). These are no different to son_of_a_batch, so each of them will also be in the “results” block with its status code set appropriately.
  • The request might not complete. These will be in the “errors” block.

Usage

Start a test server:

$ ruby app/endpoints/sleepy.rb -sv -p 9002 # an example endpoint that sleeps a given amount of time before responding.
[54941:INFO] 2011-04-23 17:22:37 :: Starting server on 0.0.0.0:9002 in development mode. Watch out for stones.

$ curl 'http://localhost:9002?delay=2.2'
{"start":1303597564.663358,"delay":2.2,"actual":2.2001771926879883}

Start the son_of_a_batch server:

$ ruby ./app/endpoints/sob.rb -p 9004 -sv --config $PWD/config/app.rb
[6774:INFO] 2011-04-25 00:49:00 :: Starting server on 0.0.0.0:9004 in development mode. Watch out for stones.

Using the simple get batch (many params against one host)

curl -v 'http://localhost:9004/batch.json?url_base=http%3A%2F%2Flocalhost%3A9002%2F%3Fdelay%3D&url_vals=1.0,2.3,1.3,4.0&_pretty=true&_show_stats=true&_timeout=2.0'

The JSON form of that:

curl -v -H "Content-Type: application/json" --data-ascii '{ "_pretty":true, "_show_stats":true, "_timeout":2.0, "url_base":"http://localhost:9002/?delay=", "url_vals":"1.0,2.3,1.3,4.0" }' 'http://localhost:9004/batch.json'

Arbitrary collection of URLs. Put your valid Infochimps API key on the APIKEY=XXXXX line…

  APIKEY=XXXXX
curl -v -H "Content-Type: application/json" --data-ascii '{"_pretty":true,"_show_stats":true,"_timeout":2.0,"urls":{
    "food":"http://api.infochimps.com/social/network/tw/search/people_search?_apikey='$APIKEY'&q=food",
    "drink":"http://api.infochimps.com/social/network/tw/search/people_search?_apikey='$APIKEY'&q=drink",
    "sex":"http://api.infochimps.com/social/network/tw/search/people_search?_apikey='$APIKEY'&q=sex",
    "bieber":"http://api.infochimps.com/social/network/tw/search/people_search?_apikey='$APIKEY'&q=bieber",
    "mrflip":"http://api.infochimps.com/social/network/tw/influence/trstrank.json?_apikey='$APIKEY'&screen_name=mrflip"
  }' 'http://localhost:9004/batch.json'

Commandline as an IDE FTW. The first request gets the user IDs for the 100 people ‘closest’ to the given screen_name (infochimps). The next line parses the JSON and writes just the IDs, comma-separated, into /tmp/strong_links_ids.txt. Finally, the last line uses son_of_a_batch to fetch the trstrank of all 100 in a single request — typical total response time is less than 1.2 seconds

  APIKEY=XXXXX
screen_name=infochimps
curl 'http://api.infochimps.com/social/network/tw/graph/strong_links?_apikey='$APIKEY'&screen_name='$screen_name > /tmp/strong_links_raw.txt
cat /tmp/strong_links_raw.txt | ruby -rjson -e 'res = JSON.parse($stdin.read); puts res["strong_links"].map{|id,sc| id }.join(",") ' > /tmp/strong_links_ids.txt
curl -H "Content-Type: application/json" --data-ascii '{ "_pretty":true, "_show_stats":true, "_timeout":2.0, "url_base":"http://api.infochimps.com/social/network/tw/influence/trstrank.json?_apikey='$APIKEY'&user_id=", "url_vals":"'`cat /tmp/strong_links_ids.txt`'" }' -v 'http://localhost:9004/batch.json'

Here I’ll make 30 simultaneous requests against son_of_a_batch, each of them making 30 concurrent requests against our sleepy server:

ab -n30 -c30 'http://127.0.0.1:9004/batch.json?url_base=http%3A%2F%2Flocalhost%3A9002%2F%3Fdelay%3D&url_vals=1.0,2.3,1.3,4.0,1.0,2.3,1.3,4.0,1.0,2.3,1.3,4.0,1.0,2.3,1.3,4.0,1.0,2.3,1.3,4.0,1.0,2.3,1.3,4.0,1.0,2.3,1.3,4.0,3.2,2.8&_pretty=true&_show_stats=true&_timeout=2.0'

The par response time should be around 3000ms; running on my laptop, son_of_a_batch served 30 inbound / 900 outbound simultaneous consumers with 4200ms response time (and minimal worst-case degradation):

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       1
Processing:  4185 4194   4.6   4197    4197
Waiting:      233  480 205.5    478     728
Total:       4185 4194   4.8   4198    4198

Colophon

Uses

  • Goliath — fast asynchronous API library
  • Gorillib — only the lightest of weight helpers plucked from active_support and friends, with nothing obtrusive required by default.

Credits

Built using goliath, gorillib, and the infochimps API

Contributing to son_of_a_batch

  • Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet
  • Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it
  • Fork the project
  • Start a feature/bugfix branch
  • Commit and push until you are happy with your contribution
  • Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.
  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright

Copyright © 2011 Philip (flip) Kromer for Infochimps. See LICENSE.txt for
further details.

Something went wrong with that request. Please try again.