Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
1323 lines (975 sloc) 64.8 KB
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[ql.io blog]]></title>
<link href="http://ql-io.github.com/atom.xml" rel="self"/>
<link href="http://ql-io.github.com/"/>
<updated>2012-07-24T09:09:14-07:00</updated>
<id>http://ql-io.github.com/</id>
<author>
<name><![CDATA[ql.io]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[ql.io XML/Protobuf Performance]]></title>
<link href="http://ql-io.github.com/2012/07/20/xml-protobuf-performance.html"/>
<updated>2012-07-20T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/07/20/xml-protobuf-performance</id>
<content type="html"><![CDATA[<p>Should backend servers send XML or Protobuf responses to ql.io? That is the question this post addresses.</p>
<p>I setup a ql.io application hitting a mock server and ran JMeter to generate load and collect results.</p>
<p>Source code for the app is on <a href="https://github.com/idralyuk/ql.io-protobuf-test">Github</a>.
Mock server is <a href="https://github.com/idralyuk/ql.io-protobuf-test/tree/master/mock_server">here</a> and JMeter script is <a href="https://github.com/idralyuk/ql.io-protobuf-test/tree/master/jmeter">here</a>.</p>
<p>ql.io tables for eBay&#8217;s Marketplaces APIs are <a href="https://github.com/ql-io/ql.io-ebay-mp-apis">here</a>.</p>
<p>The table used for this test is <a href="https://github.com/ql-io/ql.io-ebay-mp-apis/blob/master/tables/finding/findItemsByKeywords.ql">findItemsByKeywords.ql</a>.</p>
<h3>Test Setup</h3>
<p><strong>Server</strong>: Dev workstation ca. 2011 (Dell Precision T5500 with Intel Xeon E5630 2.53GHz 24Gb RAM), Linux 3.0.0-20-generic #34-Ubuntu SMP Tue May 1 17:24:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux</p>
<p><strong>Client</strong>: Dev workstation ca. 2007 (Dell Precision 690 Intel Xeon DP 5060 3.2GHz, 8GB RAM), SunOS 5.11 joyent_20120517T192048Z i86pc i386 i86pc</p>
<p>The server ran node v0.6.18a and ql.io app 0.7.4. On the same box another node process was serving canned xml and protobuf responses from eBay&#8217;s <a href="http://developer.ebay.com/devzone/finding/callref/findItemsByKeywords.html">FindingService</a> on port 6000 (it was on the same box in order to take the network out of the equation).</p>
<p>The test app was running on port 3000, with two route/table/patch combinations (one for XML, one for Protobuf) pointing to the above response server on localhost:6000.</p>
<p>The client was JMeter 2.7, running in server mode (jmeter-server), with HEAP=&#8221;-Xms1024m -Xmx1024m&#8221;.</p>
<p>See Appendix for implementation details.</p>
<h3>Test 1: 200 users, running for 1 hour, hitting the XML path</h3>
<p><strong>Rate: 203.43 trans/sec. Average response time: 72 ms.</strong></p>
<pre><code>Number of Samples: 734,339
Average Response Time: 72 ms
Minimim Response Time: 25 ms
Maximum Response Time: 1,002 ms
Standard Deviation: 49 ms
Error Percentage: 0.00 %
Transaction rate: 203.43 trans/sec
Throughput: 12,248 KB/sec
</code></pre>
<h3>Test 2: 200 users, running for 1 hour, hitting the Protobuf path</h3>
<p><strong>Rate: 213.91 trans/sec. Average response time: 24 ms.</strong></p>
<pre><code>Number of Samples: 772,200
Average Response Time: 24 ms
Minimim Response Time: 8 ms
Maximum Response Time: 198 ms
Standard Deviation: 26 ms
Error Percentage: 0.00 %
Transaction rate: 213.91 trans/sec
Throughput: 12,388 KB/sec
</code></pre>
<h3>Discussion</h3>
<p>The results show that Protobuf is <strong><em>three</em></strong> times faster than XML!</p>
<p>Is it surprising? Yes. Even with the optimize_for = SPEED option turned on in the .proto file, this is a bit extreme.</p>
<p>The exercise of comparing Protobuf to JSON is left to the reader. It is reasonable to assume that they should be on par, as any conversion step is going to be slower than the native format.</p>
<p>This test doesn&#8217;t take into account network latency. Payload sizes play an important role when network is involved; for the resultset being used in the test (50 items) the sizes were as follows:</p>
<pre><code>mock.xml - 83 KB
mock.protobuf - 27 KB
</code></pre>
<h3>Conclusion</h3>
<p>According to the results of the test, it certainly makes sense to use Protobuf instead of XML.</p>
<p>Further testing (involving the network), needs to be done in order to determine whether protobuf is a better solution than json, but given that there is a significant reduction in payload size, Protobuf is likely to come out a winner in that test as well.</p>
<h2>Appendix</h2>
<h3>URLs hit by JMeter (client)</h3>
<pre><code>http://10.xx.xx.xx:3000/ebay/finding/keywords/xml/ipad
http://10.xx.xx.xx:3000/ebay/finding/keywords/protobuf/ipad
</code></pre>
<h3>Routes (server)</h3>
<pre><code>return select searchResult.item, errorMessage
from ebay.finding.findItemsByKeywordsXML where keywords = '{keywords}'
via route '/ebay/finding/keywords/xml/{keywords}' using method get;
return select searchResult.item, errorMessage
from ebay.finding.findItemsByKeywordsProtobuf where keywords = '{keywords}'
via route '/ebay/finding/keywords/protobuf/{keywords}' using method get;
</code></pre>
<h3>Tables (server)</h3>
<pre><code>create table ebay.finding.findItemsByKeywordsXML
on select post to 'http://localhost:6000/mock.xml'
using headers 'X-EBAY-SOA-SECURITY-APPNAME'='{config.tables.ebay.finding.appname}',
'X-EBAY-SOA-OPERATION-NAME'='findItemsByKeywords'
using defaults format = "JSON", limit = 5, offset = 0
using patch 'findItemsByKeywordsXML.js'
using bodyTemplate "findItemsByKeywords.ejs" type 'application/xml'
resultset 'soapenv:Envelope.soapenv:Body.findItemsByKeywordsResponse'
create table ebay.finding.findItemsByKeywordsProtobuf
on select post to 'http://localhost:6000/mock.protobuf'
using headers 'X-EBAY-SOA-SECURITY-APPNAME'='{config.tables.ebay.finding.appname}',
'X-EBAY-SOA-OPERATION-NAME'='findItemsByKeywords'
using defaults format = "JSON", limit = 5, offset = 0
using patch 'findItemsByKeywordsProtobuf.js'
using bodyTemplate "findItemsByKeywords.ejs" type 'application/xml'
resultset 'findItemsByKeywordsResponse'
</code></pre>
<h3>Patches (server)</h3>
<p>The following patch was used in the XML path: <a href="https://github.com/idralyuk/ql.io-protobuf-test/blob/master/tables/finding/findItemsByKeywordsXML.js">findItemsByKeywordsXML.js</a>.</p>
<p>This additional code was inserted into the above patch to decode Protobuf responses:</p>
<pre><code>var fs = require('fs'),
_ = require('underscore'),
Schema = require('protobuf').Schema,
fis_schema = new Schema(fs.readFileSync(__dirname + '/util/FindItemsByKeywords.desc')),
FindItemsByKeywordsResponse = fis_schema['com.ebay.marketplace.search.v1.services.finditemservice.FindItemsByKeywordsResponse'];
exports['parse response'] = function(args) {
var length = 0, idx = 0;
_.each(args.body, function(b) {
length += b.length;
});
var buf = new Buffer(length);
_.each(args.body, function(b) {
idx = idx + b.copy(buf, idx);
});
var fir = { 'findItemsByKeywordsResponse' : FindItemsByKeywordsResponse.parse(buf) };
return {
type: 'application/json',
content: JSON.stringify(fir)
};
}
</code></pre>
<p><strong>Note: two optimizations can be made to the above code: a) allow the patch to return the json structure instead of a string that will need to be parsed again and b) receive buffer length as an argument in order to avoid looping through the data buffers twice.</strong></p>
<h3>Mock Server</h3>
<pre><code>var _ = require('underscore'),
fs = require('fs'),
url = require('url'),
util = require('util'),
http = require('http');
var port = 6000;
function endsWith(str, suffix) {
return str.indexOf(suffix, str.length - suffix.length) !== -1;
}
var server = http.createServer(function(req, res) {
var file = __dirname + '/data/' + req.url
var cType;
if (endsWith(req.url, '.xml')) {
cType = 'text/xml;charset=UTF-8';
} else if (endsWith(req.url, '.json')) {
cType = 'application/json;charset=UTF-8';
} else if (endsWith(req.url, '.protobuf')) {
cType = 'application/octet-stream;charset=UTF-8';
}
var stat = fs.statSync(file);
res.writeHead(200, {
'Content-Type' : cType,
'Content-Length' : stat.size
});
var readStream = fs.createReadStream(file);
util.pump(readStream, res, function(e) {
if (e) {
console.log(e.stack || e);
}
res.end();
});
});
server.listen(port, function() {
console.log('\nmock server listening on ' + port);
});
</code></pre>
<p>Please send comments/suggestions to <a href="http://groups.google.com/group/qlio">ql.io Google Group</a>.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io 0.7]]></title>
<link href="http://ql-io.github.com/2012/06/29/ql.io-0.7.html"/>
<updated>2012-06-29T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/06/29/ql.io-0.7</id>
<content type="html"><![CDATA[<p>Today&#8217;s release of ql.io 0.7 includes the following changes:</p>
<h3>Features</h3>
<ul>
<li>Fallback syntax to the language - see https://github.com/ql-io/ql.io/wiki/%5BProposal%5D-Optional-Inputs-and-Errors</li>
<li>Compiler rewritten to output the DAG with dependencies</li>
<li>Explicit depedencies between modules</li>
<li>Support for pre-requisite params - see https://github.com/ql-io/ql.io/wiki/%5BProposal%5D-Optional-Inputs-and-Errors</li>
<li>Retry once for idempotent requests on timeouts</li>
<li>Update the context with the udf filtered data</li>
<li>Support C style block comments</li>
<li>No Compression if the CPU load is > 50%</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Http client Agent maxSockets increased to 1000 to avoid request backlog on any given socket - https://github.com/ql-io/ql.io/issues/512</li>
<li>Fix https://github.com/ql-io/ql.io/issues/478</li>
<li>Fix https://github.com/ql-io/ql.io/issues/13 Disable autorun in the console.</li>
<li>Add file path/name to comiple errors</li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Evented Orchestration]]></title>
<link href="http://ql-io.github.com/2012/06/08/plan.html"/>
<updated>2012-06-08T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/06/08/plan</id>
<content type="html"><![CDATA[<p>One of the core strengths of ql.io is evented orchestration of reads and writes to HTTP APIs using a
declarative language. In recent weeks, the core processing algorithm used to process q.l.io scripts
went through an overhaul to easily infer what goes on when you submit a script for execution. The
outcome of this exercise is a rewrite of the compiler which now takes a given script and outputs an
execution plan. This helped us achieve two things - further simplification of the orchestration
algorithm (which is now just about 80 lines long), and visualization to identify potential latecy
bottlenecks.</p>
<p>Read on to find out how to generate and visualize execution plans.</p>
<p>For instance, consider the script</p>
<pre><code>prodid = select ProductID[0].Value from eBay.FindProducts where
QueryKeywords = 'macbook pro';
details = select * from eBay.ProductDetails where
ProductID in ('{prodid}') and ProductType = 'Reference';
reviews = select * from eBay.ProductReviews where
ProductID in ('{prodid}') and ProductType = 'Reference';
return select d.ProductID[0].Value as id, d.Title as title,
d.ReviewCount as reviewCount, r.ReviewDetails.AverageRating as rating
from details as d, reviews as r
where d.ProductID[0].Value = r.ProductID.Value
via route '/myapi' using method get;
</code></pre>
<p>A visualization of the execution plan of this script is below.</p>
<div style="max-width: 100%;overflow:auto">
<a href="http://ql-io.github.com/images/2012-06-08-plan-0.svg"><img src="http://ql-io.github.com/images/2012-06-08-plan-0.svg" style="max-width: 1000%" alt="A visualization of a script with one fork and one join"></a>
</div>
<p>By looking at this execution plan we can infer the following:</p>
<ul>
<li>The select statemet on line 8 depends on the statements on lines 3 and 5.</li>
<li>The overall latency of this script depends on the slowest of the statements on lines 3 and 5.</li>
</ul>
<p>Here is the execution plan of another script. This script takes two inputs - a user&#8217;s identity and a
set of IDs of some items, and gets some details from two different APIs (the bottom two nodes). The
responses from those APIs trigger some in-process data extractions and transformations which join
on the node below the node at the top.</p>
<div style="max-width: 100%;overflow:auto">
<a href="http://ql-io.github.com/images/2012-06-08-plan-1.svg"><img src="http://ql-io.github.com/images/2012-06-08-plan-1.svg" style="max-width: 1000%" alt="Another visualization"></a>
</div>
<p>Again, the overall latency depends on the bottom two nodes.</p>
<p>Here is the execution plan of another script which shows one node ([5]) blocking on another ([1]).</p>
<div style="max-width: 100%;overflow:auto">
<a href="http://ql-io.github.com/images/2012-06-08-plan-3.svg"><img src="http://ql-io.github.com/images/2012-06-08-plan-3.svg" style="max-width: 1000%" alt="Another visualization"></a>
</div>
<h2>Generating Excecution Plan</h2>
<p>Generating the execution plan is easy. Here is a node.js script.</p>
<script src="https://gist.github.com/2898580.js"> </script>
<p>You can use the compiler in the browser too. Here is script that works in any modern browser.</p>
<script src="https://gist.github.com/2898590.js"> </script>
<h2>Visualization</h2>
<p>I wrote a small tool to compile a script and generate a .dot file, and feed the output to
<a href="http://www.graphviz.org/">Graphviz</a>.</p>
<p><a href="http://bl.ocks.org/d/2898080/">Dot file generator</a> - to generate .dot files for ql.io scripts.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io 0.6]]></title>
<link href="http://ql-io.github.com/2012/05/21/ql.io-0.6.html"/>
<updated>2012-05-21T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/05/21/ql.io-0.6</id>
<content type="html"><![CDATA[<p>Today&#8217;s release of ql.io 0.6 includes the following changes:</p>
<ul>
<li>Support array style reference in columns clause, such as <code>select 'b-1', 'b-3'['c-1'] from a</code>.</li>
<li>Disable ability to enable/disable ecv checks by default. You can turn it on by adding arg
<code>--ecvControl true</code> to the start script.</li>
<li>Add optional parameters in route. Including &#8220;with optional params&#8221; in route would make params
without <code>^</code> prefix optional. When this clause is present, only required tokens are used for
matching a request to a route.</li>
<li>Be able to start the server on multiple ports</li>
<li>Added support for multiple attachments. See docs on insert http://ql.io/docs/insert</li>
<li>End pending connections on close after responses are written.</li>
<li>Support cache events (hit, miss, new, error, info, heartbeat)</li>
<li>Switch to new cluster2</li>
<li>Added new syntaxes &#8220;with part&#8221; and opaque insert param.</li>
<li>Fix expression parsing in string template so that a token like <code>"{obj.prop[?(@.price &gt; 2)]}"</code> is
valid</li>
<li>Add support for escaped quotes in string values</li>
<li>Update PEG.js to 0.7.</li>
<li>Remove duplicates from in clause.</li>
<li>Use <code>hasOwnProperty</code> in place of prop lookup while joining</li>
<li>Deal with non UTF-8 encodings from upstream resources</li>
<li>When joining, use &#8216;==&#8217; to maintain backwards compat</li>
<li>Refactor logging to error, access, proxy and default logs. The proxy log file contains outgoing
req/resp, access log contains incoming requests, error log contains all errors and warnings,
and the rest go to ql.io.log. All these files are rotated.</li>
<li>Include a payload with begin events</li>
<li>Support local offset and limit</li>
<li>Fix the case of alias names with joins and UDFs.</li>
<li>Add UDFs in where clause to post process rows. You can either tweak or remove a row. See
https://gist.github.com/2334012 for semantics of UDFs. UDF support for the where clause is coming
in version 0.7.</li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[cluster2]]></title>
<link href="http://ql-io.github.com/2012/04/28/cluster2.html"/>
<updated>2012-04-28T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/04/28/cluster2</id>
<content type="html"><![CDATA[<p>cluster2 is a node.js (>= 0.6.x) compatible multi-process management module. This module grew out of
our needs in operationalizing node.js for <a href="https://github.com/ql-io/ql.io">ql.io</a> at eBay. Built on
node&#8217;s <code>cluster</code>, cluster2 adds several safeguards and utility functions to help support real-world
production scenarios:</p>
<ul>
<li>Scriptable start, shutdown and stop flows</li>
<li>Worker monitoring for process deaths</li>
<li>Worker recycling</li>
<li>Graceful shutdown</li>
<li>Idle timeouts</li>
<li>Validation hooks (for other tools to monitor cluster2 apps)</li>
<li>Events for logging cluster activities</li>
</ul>
<p>See <a href="http://ql-io.github.com/cluster2/">cluster2</a> for more info.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io 0.5]]></title>
<link href="http://ql-io.github.com/2012/03/30/ql.io-0.5.html"/>
<updated>2012-03-30T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/03/30/ql.io-0.5</id>
<content type="html"><![CDATA[<p>Today&#8217;s release of ql.io 0.5 includes the following changes:</p>
<!-- more -->
<ul>
<li>Enable caching on ql.io response. You can add <code>using headers</code> clause to routes to add arbitrary
response headers, as in
<code>return 'hello' via route '/hello' using method get using headers 'Cache-Control' = 'max-age=3600';</code></li>
<li>[Experimental] Enable caching of responses from APIs for select statements. <code>create table</code>
statements can now include an <code>expires &lt;interval&gt;</code> clause to specify an interval for caching
responses of <code>select</code> statements, as in <code>create table auto.compute.key on select get from 'http://a.uri.net' ... expires 10;</code></li>
<li>Process gzip/deflate encoded responses from upstream servers.</li>
<li>Support <code>Content-Encoding</code> for incoming requests and gzip/deflate encode responses.</li>
<li>Several bug fixes to maintain hierarchical logging of script execution. You can handle events
emitted by the engine to log script execution flow. Watch this blog for an example soon.</li>
<li>Factor out cluster management to a new module <a href="https://github.com/ql-io/cluster2">cluster2</a> to
support deployments on <a href="https://github.com/ql-io/ql.io-cloudfoundry">CloudFoundry</a> and
<a href="https://github.com/mulder/ql.io-heroku">Heroku</a>.</li>
<li>New URI <code>/api</code> to navigate tables and routes - see
<a href="http://ql-io.github.com/2012/03/12/en-route.html">En Route</a> for details.</li>
<li>Support <code>delete</code> statements.</li>
<li>Upgrade CodeMirror to 2.22 to resolve some quirks in the console UI.</li>
<li>Add a page to show all installed npm packages. Try http://<host>:<monport>/deps
(or http://localhost:3001/deps).</li>
<li>Support custom xml2json convertors to enable clients interop with legacy XML APIs.</li>
<li>Recover shutdown/stop from extraneous pid files.</li>
<li>Simplify response decoding. In stead of setting encoding on the response, collect buffers into
array, and then decode in the default impl of &#8216;parse response&#8217;.</li>
<li>Removed /in-flight requests api.</li>
<li>Enable numbers in <code>in</code> clause and args of udfs</li>
<li>Export version from each module. You can find version of a module using
<code>require('ql.io-&lt;somemodule&gt;').version</code>.</li>
<li>Include version number in <code>User-Agent</code> and <code>Server</code> headers.</li>
<li>Support scatter-gather for requests with bodies by adding a <code>foreach 'param'</code> for the
<code>using bodyTemplate</code> cluase. This allows scripts to batchup POST and PUT requests.</li>
<li>Support ejs body templates.</li>
<li>Skip files that don&#8217;t end with <code>.ql</code>.</li>
<li>&#8220;/ecv&#8221; check returns network ip instead of loopback address</li>
<li>Let the engine allow a monkey patch to parse the response. Useful to process binary formats like avro.</li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io on Cloud Foundry]]></title>
<link href="http://ql-io.github.com/2012/03/21/ql.io-on-cloudfoundry.html"/>
<updated>2012-03-21T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/03/21/ql.io-on-cloudfoundry</id>
<content type="html"><![CDATA[<p>Here are the steps to deploy ql.io on Cloud Foundry with node 0.6.x. This setup automatically spawns
a cluster of node processes using node&#8217;s native cluster. A sample app to demonstrate these steps is
on <a href="http://ql-io.github.com/2012/03/21/ql.io-on-cloudfoundry.html">github</a> - this includes an
<a href="https://github.com/ql-io/ql.io-cloudfoundry/blob/master/app.js">app.js</a> and a
<a href="https://github.com/ql-io/ql.io-cloudfoundry/blob/master/package.json">package.json</a> with all the
dependencies.</p>
<!-- more -->
<h2>Setup Cloud Foundry</h2>
<p>Setup a working Cloud Foundry environment with matching architecture for development and deployment.
ql.io relies non <code>node-expat</code> and hence npm modules installed on the development machine must be
portable to your Cloud Foundry VM. This rules out using a Mac or Windows for development and
<a href="https://my.cloudfoundry.com/micro">Micro Cloud Foundry</a> with Ubuntu for testing the deployment.</p>
<p>If you want to try on a Mac with Micro Cloud Foundry from Mac, use the Micro Cloud Foundry VM as the
development machine. Here are the steps to use.</p>
<ul>
<li><code>ssh vcap@&lt;your Micro Cloud Foundry IP&gt;</code></li>
<li>Install <a href="http://start.cloudfoundry.com/tools/vmc/installing-vmc.html">vmc</a>.</li>
<li>Download and build node (0.6.x)</li>
<li><code>sudo apt-get install expat libexpat-dev</code></li>
</ul>
<h2>Upgrade nginx</h2>
<p>Upgrae nginx on on the Cloud Foundry target machine. This step is required to let nginx correctly
serve chunked encoded responses without <code>Content-Length</code> headers.</p>
<ul>
<li><code>ssh vcap@&lt;your Micro Cloud Foundry IP&gt;</code></li>
<li><code>cd /tmp</code></li>
<li><code>su -</code></li>
<li><code>curl https://raw.github.com/gist/2187930/e081d925a881e51ef44f19bc5649b2918a1e86d4/mcf_nginx_11.sh | bash</code></li>
</ul>
<h2>Setup an App</h2>
<pre><code>git clone git@github.com:ql-io/ql.io-cloudfoundry.git cfapp
</code></pre>
<p><code>vmc</code> failed for me when the directory name of the app had numbers and dots, and hence use a name
without such characters for the app.</p>
<pre><code>cd cfapp
npm install
</code></pre>
<p>Add tables and routes as usual.</p>
<h2>Push the App</h2>
<pre><code>~/cfapp/vmc push --runtime=node06
Would you like to deploy from the current directory? [Yn]: y
Application Name: cfapp
Application Deployed URL [cfapp.xxx.cloudfoundry.me]: &lt;use your deployment URL&gt;
Detected a Node.js Application, is this correct? [Yn]:
Memory Reservation (64M, 128M, 256M, 512M, 1G) [64M]: 512M
Creating Application: OK
Would you like to bind any services to 'cfapp'? [yN]: n
Uploading Application:
Checking for available resources: OK
Processing resources: OK
Packing application: OK
Uploading (160K): OK
Push Status: OK
Staging Application: OK
Starting Application: OK
</code></pre>
<p>Note that, using 512M or more is important - the default 64M is not sufficient for running a cluster
of node instances with all ql.io modules, and <code>vmc</code> fails silently with the default.</p>
<h2>Known Issues</h2>
<p>Here is a summary of issues that I ran into while trying this out.</p>
<ol>
<li>Don&#8217;t forget to add <code>--runtime=node06</code> to <code>vmc push</code>. By default, Cloud Foundry uses node 0.4.x.</li>
<li>Reserve appropriate memory for the app. <code>app.js</code> spans <code>n+1</code> node processes where <code>n</code> equals the
number of the CPU threads on the VM.</li>
<li>Don&#8217;t use characters like <code>.</code> and <code>-</code> in the name of the app.</li>
<li>Cloud Foundry uses an L7 router and hence breaks ql.io&#8217;s console as it uses WebSocket API. If
you would like to use the console on Cloud Foundry, use Safari.</li>
<li>The <code>app.js</code> bundled with <a href="https://github.com/ql-io/ql.io-cloudfoundry">ql.io-cloudfoundry</a> leaves
the paths <code>/console</code>, and <code>/q</code> open for traffic. I don&#8217;t recommend this. In production, consider
disabling these two and use routes in stead from client apps by changing the <code>options</code> in
<code>app.js</code> to the following.</li>
</ol>
<script src="https://gist.github.com/2147268.js"> </script>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[En Route]]></title>
<link href="http://ql-io.github.com/2012/03/12/en-route.html"/>
<updated>2012-03-12T00:00:00-07:00</updated>
<id>http://ql-io.github.com/2012/03/12/en-route</id>
<content type="html"><![CDATA[<p>A route in ql.io is a new consumer-optimized HTTP interface. Routes superimpose a simple and familiar HTTP interface on ql.io scripts without needing to specify an elaborate script in the request. In other words, routes make ql.io a platform to &#8220;build your own APIs&#8221;.</p>
<p>Having built this capability, in this post I want to highlight some potential ways to take advantage of routes.</p>
<!-- more -->
<h2>Discovery</h2>
<p>Let&#8217;s imagine you built some new APIs using ql.io. How can your users find more about those APIs? Here is how.</p>
<p>Every ql.io instance3 now includes a special URI <code>http://{host}:{port}/api</code> that lets you browse through the following:</p>
<ul>
<li>List of routes</li>
<li>For each route, a URI template (or a URI), and HTTP method supported.</li>
<li>For each route, the tables used</li>
<li>For each table, the original HTTP resource that it maps to.</li>
</ul>
<p>Here is an example.</p>
<p><a href="http://ql.io/api"><img src="http://ql-io.github.com/images/2012-03-12-en-route-1.png" alt="API browsing" /></a></p>
<p>Go to <a href="http://ql.io/api">http://ql.io/api</a> to try it out. Though this capability is automatic, it needs a couple of actions as you build tables and routes.</p>
<h2>Parametrizing</h2>
<p>Let us consider the following script.</p>
<pre><code>keyword = "ql.io";
web = select web:Title, web:Url, web:Description from bing.search where q = "{keyword}";
tweets = select id as id, from_user_name as user_name, text as text
from twitter.search where q = "{keyword}";
return {
"keyword": "{keyword}",
"web": "{web}",
"tweets": "{tweets}"
}
</code></pre>
<p>Since <code>keywords</code> is hardcoded in this script, it is only capable of finding &#8220;ql.io&#8221; from twitter and bing. You can parameterize this script by defining it as a route.</p>
<pre><code>web = select web:Title, web:Url, web:Description from bing.search where q = "{keyword}";
tweets = select id as id, from_user_name as user_name, text as text
from twitter.search where q = "{keyword}";
return {
"keyword": "{keyword}",
"web": "{web}",
"tweets": "{tweets}"
} via route '/search?q={keyword}' using method get;
</code></pre>
<p>With this route, you can use <code>http://{host}:{port}/search?q={your keyword here}</code> to run the script. Try this route at <a href="http://ql.io/search?q=ql.io">http://ql.io/search?q=ql.io</a>.</p>
<h2>Similar but Different</h2>
<p>There are other ways to parameterize routes. Let&#8217;s say, you would like to provide the following resources to client apps.</p>
<ol>
<li><code>http://{host}:{port}/item/location?itemid={itemid}</code> with method <code>GET</code>: Retrieve geo-location for a given item id.</li>
<li><code>http://{host}:{port}/item/location?keyword={keyword}</code> with method <code>GET</code>: Given a keyword, find matching items, and then find and return their geo-locations.</li>
<li><code>http://{host}:{port}/item/location?itemid={itemid}&amp;keyword={keyword}</code> with method <code>GET</code>: Given a keyword and item ID, find geolocatons of all items matching the keyword, and also for the item ID. Such a response may be useful when you want to show locations of a given items, but also other items that match the keyword.</li>
</ol>
<p>Notice all the routes above have the same HTTP method <code>GET</code> and the same path <code>/item/location</code> but differ in query parameters. But as the aggregation logic may be different for each of these scripts, you can define three different scripts for the same path.</p>
<p>Here is a route for <code>/item/location?itemid={itemid}</code></p>
<pre><code>-- Matches request /item/location?itemid=140716431558
--
return select e.ItemID as id, e.Title as title, e.ViewItemURLForNaturalSearch as url,
g.geometry.location as latlng
from details as e, google.geocode as g
where e.itemId = "{itemid}"
and g.address = e.Location
via route '/item/location?itemid={itemid}' using method get;
</code></pre>
<p>Here is the route for <code>/item/location?keyword={keyword}</code>.</p>
<pre><code>-- Matches request /item/location?keyword=ipad
--
return select e.ItemID as id, e.Title as title, e.ViewItemURLForNaturalSearch as url,
g.geometry.location as latlng
from details as e, google.geocode as g
where e.itemId in (select itemId from finditems where keywords = "{keyword}")
and g.address = e.Location
via route '/item/location?keyword={keyword}' using method get;
</code></pre>
<p>Finally, here is the route for <code>/item/location?itemid={itemid}&amp;keyword={keyword}</code></p>
<pre><code>-- Matches request /item/location?keyword=ipad&amp;itemid=140716431558
--
keywordResult = select e.ItemID as id, e.Title as title, e.ViewItemURLForNaturalSearch as url,
g.geometry.location as latlng
from details as e, google.geocode as g
where e.itemId in (select itemId from finditems where keywords = "{keyword}")
and g.address = e.Location
itemidResult = select e.ItemID as id, e.Title as title, e.ViewItemURLForNaturalSearch as url,
g.geometry.location as latlng
from details as e, google.geocode as g
where e.itemId = "{itemid}"
and g.address = e.Location
return {
"keywordResult": "{keywordResult}",
"itemidResult" : "{itemidResult}"
} via route '/item/location?itemid={itemid}&amp;keyword={keyword}' using get;
</code></pre>
<p>Given a request URI, ql.io&#8217;s routing engine is capable of matching the request to one of these scripts.</p>
<h2>Non-Idempotent and Unsafe</h2>
<p>Route parameterization is not limited to HTTP <code>GET</code> alone. You can allow clients to supply bodies with <code>POST</code>, <code>PUT</code>, <code>PATCH</code> and <code>DELETE</code> requests.</p>
<p>Consider the following table for bitly APIs.</p>
<pre><code>create table bitly.shorten
on insert get from "http://api.bitly.com/v3/shorten?login={^login}&amp;apiKey={^apikey}&amp;longUrl={^longUrl}&amp;format={format}"
using defaults apikey = "{config.bitly.apikey}", login = "{config.bitly.login}", format = "json"
using patch 'bitly.js'
resultset 'data.url'
on select get from "http://api.bitly.com/v3/expand?login={^login}&amp;apiKey={^apikey}&amp;shortUrl={^shortUrl}&amp;format={format}"
using defaults apikey = "{config.bitly.apikey}", login = "{config.bitly.login}", format = "json"
using patch 'bitly.js'
resultset 'data.expand'
</code></pre>
<p>To shorten a URI, you may want to add a route that uses HTTP method <code>POST</code>.</p>
<pre><code>return insert into bitly.shorten (longUrl) values ('{uri}')
via route '/bitly/shorten' using method post;
</code></pre>
<p>You can then use one of the following URIs to shorten a URI.</p>
<pre><code>curl http://localhost:3000/bitly/shorten -X POST
-H 'Content-Type: application/json'
-d '{"uri": "http://www.ebay.com"}'
curl http://localhost:3000/bitly/shorten -X POST
-H 'Content-Type: application/xml'
-d '&lt;uri&gt;http://www.ebay.com&lt;/uri&gt;'
curl http://localhost:3000/bitly/shorten -X POST
-H 'Content-Type: application/x-www-form-urlencoded'
-d 'uri=http://www.ebay.com'
</code></pre>
<p>In each case, ql.io&#8217;s routing engine will coerce the body into an associative array, and substitutes the values for tokens in the routing script or tables used by the routing script.</p>
<h2>Markdown</h2>
<p>ql.io scripts can include <a href="http://daringfireball.net/projects/markdown/">markdown</a> based line comments that begin with <code>--</code>. ql.io treats comments preceding <code>return</code> statements in routing scripts, and comments preceding <code>create table</code> statements as documentation.</p>
<pre><code>-- Use this resource to shorten a URI using bitly.
return insert into bitly.shorten (longUrl) values ('{uri}')
via route '/bitly/shorten' using method post;
</code></pre>
<p>We&#8217;ve some work to do to support <a href="https://github.com/ql-io/ql.io/issues/340">multi-line comments</a>, but you get the idea!</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Making Peace with HTTP APIs]]></title>
<link href="http://ql-io.github.com/2012/02/22/making-peace.html"/>
<updated>2012-02-22T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2012/02/22/making-peace</id>
<content type="html"><![CDATA[<p>Once in a while you come across an HTTP API that uses HTTP in complicated and incorrect ways. There
are many examples of this on the Web today including those from
<a href="http://developer.ebay.com/devzone/xml/docs/reference/ebay/GetMyeBayBuying.html">eBay</a>,
<a href="http://docs.amazonwebservices.com/amazonswf/latest/developerguide/UsingJSON-swf.html">Amazon</a>,
<a href="http://code.google.com/apis/friendconnect/docs/opensocial_rest_rpc.html">Google</a>,
<a href="http://www.bing.com/toolbox/bingdeveloper/">Microsoft</a> and many many others. These can be hard to
use as they require you to follow proprietary styles for constructing requests and parsing
responses. Some of those also don&#8217;t work well with common HTTP infrastructure like caches.</p>
<p>In this post, I would like to show how you can, in four simple steps, use ql.io to hide the
complexity of such APIs.</p>
<!-- more -->
<p>To illustrate, let me take eBay&#8217;s <a href="http://developer.ebay.com/DevZone/XML/docs/Reference/eBay/PlaceOffer.html">PlaceOffer API</a>
that lets an eBay buyer place an offer for an item listed on eBay. This example may be more complex than other similar
APIs that you have encoutered, but it helps me drive the point.</p>
<p>This API requires you to send a POST request with some custom headers and an XML document in the
body.</p>
<pre><code>POST /ws/api.dll HTTP/1.1
Host: api.ebay.com/ws/api.dll
Content-Type: application/xml; charset=UTF-8
X-EBAY-API-DEV-NAME: developer ID
X-EBAY-API-APP-NAME: app ID
X-EBAY-API-CERT-NAME: cert ID,
X-EBAY-API-CALL-NAME: PlaceOffer
X-EBAY-API-COMPATIBILITY-LEVEL: version
X-EBAY-API-SITEID: site ID
</code></pre>
<p>See <a href="http://tinyurl.com/76q6e7q">developer docs</a> for more details of these headers.</p>
<p>The body of the request is an XML document. An example is below.</p>
<pre><code>&lt;PlaceOfferRequest xmlns="urn:ebay:apis:eBLBaseComponents"&gt;
&lt;ErrorLanguage&gt;en_US&lt;/ErrorLanguage&gt;
&lt;EndUserIP&gt;192.168.255.255&lt;/EndUserIP&gt;
&lt;ItemID&gt;110096039601&lt;/ItemID&gt;
&lt;Offer&gt;
&lt;Action&gt;Bid&lt;/Action&gt;
&lt;MaxBid currencyID="USD"&gt;20.00&lt;/MaxBid&gt;
&lt;Quantity&gt;1&lt;/Quantity&gt;
&lt;/Offer&gt;
&lt;RequesterCredentials&gt;
&lt;eBayAuthToken&gt;ABC...123&lt;/eBayAuthToken&gt;
&lt;/RequesterCredentials&gt;
&lt;WarningLevel&gt;High&lt;/WarningLevel&gt;
&lt;/PlaceOfferRequest&gt;
</code></pre>
<p>A response from this API looks like the following:</p>
<pre><code>&lt;PlaceOfferResponse xmlns="urn:ebay:apis:eBLBaseComponents"&gt;
&lt;Timestamp&gt;2012-02-03T18:06:51.230Z&lt;/Timestamp&gt;
&lt;Ack&gt;Success&lt;/Ack&gt;
&lt;Version&gt;757&lt;/Version&gt;
&lt;Build&gt;E757_CORE_BUNDLED_14364711_R1&lt;/Build&gt;
&lt;UsageData&gt;MTMyOTUyMTQ2LzE1MzczOw**&lt;/UsageData&gt;
&lt;SellingStatus&gt;
&lt;ConvertedCurrentPrice currencyID="USD"&gt;1.0&lt;/ConvertedCurrentPrice&gt;
&lt;CurrentPrice currencyID="USD"&gt;1.0&lt;/CurrentPrice&gt;
&lt;HighBidder&gt;
&lt;UserID&gt;testuser_bountifulbuyer&lt;/UserID&gt;
&lt;/HighBidder&gt;
&lt;MinimumToBid currencyID="USD"&gt;1.25&lt;/MinimumToBid&gt;
&lt;/SellingStatus&gt;
&lt;/PlaceOfferResponse&gt;
</code></pre>
<p>Here is how you can use ql.io to simplify this.</p>
<h3>Step 0: Create an App</h3>
<p>Create a ql.io app.</p>
<pre><code>mkdir myapp
cd myapp
curl https://raw.github.com/ql-io/ql.io/master/modules/template/init.sh | bash
bin/start.sh
</code></pre>
<p>See <a href="http://ql.io/docs">docs</a> for more details.</p>
<h3>Step 1: Create a Table</h3>
<p>Place the following in <code>tables/placeoffer.ql</code>.</p>
<script src="https://gist.github.com/1886983.js?file=gistfile1.sql"></script>
<p>This step binds the API into the ql.io runtime so that you can use ql.io&#8217;s DSL to send requests
and process responses.</p>
<h3>Step 2: Describe the Shape of the Request Body</h3>
<p>Place the following in <code>tables/placeoffer.xml.mu</code>.</p>
<script src="https://gist.github.com/1886988.js?file=gistfile1.xml"></script>
<p>This is just a mustache template. You can use <a href="http://embeddedjs.com/">EJS</a> too if you like.</p>
<h3>Step 3: Create a Route</h3>
<p>Place the following in <code>routes/placeoffer.ql</code></p>
<script src="https://gist.github.com/1886996.js?file=gistfile1.sql"></script>
<h3>Step 4: Use the API</h3>
<pre><code>POST /offers?siteId=0&amp;itemId=your-item-id&amp;offer=your-offer&amp;action=your-action&amp;quantity=your-quantity
Host: api.ebay.com/ws/api.dll
Authorization: your auth token
</code></pre>
<p>This request returns JSON.</p>
<h3>Step 5: Enjoy</h3>
<p>No XML, no schemas, no SDKs. As an added benefit, you can combine this API with other APIs as you
like using ql.io&#8217;s DSL.</p>
<p>Why does this matter? If you have a legacy API that you can not afford to rewrite to use HTTP
sanely, ql.io can help you hide it behind a saner interface.</p>
<p>Thanks to <a href="https://github.com/jmrodriguez">Juan Rodriguez</a> for showing me this example.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io 0.4.0]]></title>
<link href="http://ql-io.github.com/2012/02/13/v0.4.html"/>
<updated>2012-02-13T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2012/02/13/v0.4</id>
<content type="html"><![CDATA[<p>Verson 0.4 of ql.io is out to npm today. Here is a quick summary of changes.</p>
<ul>
<li>Use native cluster module to start the app</li>
<li>Upgrade all dependencies to the latest</li>
<li>Limit response size to 10000000 bytes from upstream sources. You can change this with
<code>maxResponseLength</code> in the config.</li>
<li>Limit outgoing requests per statement to 50. You can change this with <code>maxRequests</code> in the config.</li>
<li>Chain events for logging done with log-emitter.</li>
<li>Add a new JSON based interface to browse tables and routes. Try <code>/routes</code> to start browsing.</li>
<li>Integrate <a href="https://github.com/s3u/har-view">har-view</a></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Slides from NodePDX]]></title>
<link href="http://ql-io.github.com/2012/02/12/nodepdx-slides.html"/>
<updated>2012-02-12T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2012/02/12/nodepdx-slides</id>
<content type="html"><![CDATA[<p>Here are the slides from my talk/demo on ql.io at <a href="http://nodepdx.github.com/">NodePDX</a>.</p>
<div style="width:425px" id="__ss_11530669"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/sallamar/qlio-at-nodepdx" title="ql.io at NodePDX" target="_blank">ql.io at NodePDX</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/11530669" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe> <div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/thecroaker/death-by-powerpoint" target="_blank">PowerPoint</a> from <a href="http://www.slideshare.net/sallamar" target="_blank">Subbu Allamaraju</a> </div> </div>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Slides from Node Summit]]></title>
<link href="http://ql-io.github.com/2012/01/24/node-summit-slides.html"/>
<updated>2012-01-24T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2012/01/24/node-summit-slides</id>
<content type="html"><![CDATA[<p>Here are the slides from the workshop on ql.io at the Node Summit titled
&#8220;ql.io: Consuming HTTP at Scale&#8221;.</p>
<div style="width:425px" id="__ss_11244206"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/sallamar/qlio-consuming-http-at-scale" title="ql.io: Consuming HTTP at Scale " target="_blank">ql.io: Consuming HTTP at Scale </a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/11244206" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe> <div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/thecroaker/death-by-powerpoint" target="_blank">PowerPoint</a> from <a href="http://www.slideshare.net/sallamar" target="_blank">Subbu Allamaraju</a> </div> </div>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io 0.4.0-beta]]></title>
<link href="http://ql-io.github.com/2012/01/17/v0.4-beta.html"/>
<updated>2012-01-17T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2012/01/17/v0.4-beta</id>
<content type="html"><![CDATA[<p>This is a beta release of ql.io on node.js 0.6.x.</p>
<ul>
<li>Use native cluster module to start the app</li>
<li>Upgrade all dependencies to the latest</li>
<li>Limit response size to 10000000 bytes from upstream sources. You can change this with
<code>maxResponseLength</code> in the config.</li>
<li>Limit outgoing requests per statement to 50. You can change this with <code>maxRequests</code> in the config.</li>
<li>Chain events for logging done with log-emitter.</li>
<li>Add a new JSON based interface to browse tables and routes. Try <code>/routes</code> to start browsing.</li>
</ul>
<p>If you are interested in running ql.io on node.js 0.4.x, use the
<a href="https://github.com/ql-io/ql.io/tree/0.3">0.3 branch</a>.</p>
<p>To create an app using ql.io 0.4 modules, follow the usual steps:</p>
<pre><code> mkdir myapp
cd myapp
curl https://raw.github.com/ql-io/ql.io/master/modules/template/init.sh | bash
bin/start.sh
</code></pre>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io Baseline Benchmarks]]></title>
<link href="http://ql-io.github.com/2012/01/09/benchmarks.html"/>
<updated>2012-01-09T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2012/01/09/benchmarks</id>
<content type="html"><![CDATA[<p>One of the key visible benefits of <a href="http://ql.io">ql.io</a> is that it eliminates the code noise that
is common in writing HTTP client apps. As a DSL for writing HTTP client code, it focuses on
automating the task of making multiple HTTP requests and processing responses in the best order
possible taking care of paralleization, orchestration, projections and normalizations behind the
scenes.</p>
<p>In this post, I would like to present the baseline performance benchmarks of ql.io running on
node.js 0.4.12. Though I have done some ad hoc tests in the last 2-3 months for hardware sizing
purposes, this is my first systematic attempt.</p>
<!-- more -->
<p>Since this post is long, here is a quick summary for the &#8220;TL;DR&#8221;.</p>
<ul>
<li>For simple scripts - such a query using a <code>select</code> statement to get data from an HTTP API, ql.io
can handle 2400+ requests/sec at various concurrency levels ranging from 100 to 500.</li>
<li>For scripts involving dependencies between statements (such as statement B needing input from
results of statement A), thoughput drops almost linearly and proportionately. For instance, for
scenario B below, ql.io can handle nearly 1000 requests/sec.</li>
<li>The conventional wisdom of using <code>n</code> worker processes where <code>n</code> is the number of CPU threads
provides a reasonable default for all practical purposes but tuning the number of worker processes
is a good exericise to do. All the test scenarios below yielded better numebers with <code>5*n</code>
workers. Scenario D, which involves a non-trivial amount of CPU bound work, benefitted the most
from the increased number of worker processes.</li>
</ul>
<p>You can find the raw output files of test runs on <a href="https://github.com/ql-io/ql.io-perf">github</a>.
The application used for these tests is on <a href="https://github.com/ql-io/ql.io-site">github</a>. All the
ql.io modules used by the app are on npmjs.org.</p>
<h2>Test Environment</h2>
<p>The test environment is based on the folllowing, each running Ubuntu, sititng under my desk at
work.</p>
<ul>
<li>An Intel Xeon E5645 workstation with 6 cores (12 CPU threads) and 24GB RAM running the
<a href="https://github.com/ql-io/ql.io-site">ql.io-site</a> app on node.js 0.4.12 with 12 worker processes.</li>
<li>An Intel Xeon E5507 workstation with 4 cores (8 CPU threads) with 12GB RAM running apachebench.</li>
<li>An Intel Xeon E5630 workstation with 4 cores (8 CPU threads) with 24GB RAM running Apache
Traffic Server (ATS) 3.0.1 as a forward proxy for all outgoing HTTP requests. The cache is primed
before running benchmarks to avoid making requests to any other machines.</li>
</ul>
<p>All these are running Ubuntu 11.04.</p>
<h2>Test Scenarios</h2>
<p>These tests cover a range of aggregation and orchestration scenarios possible with ql.io and show
how ql.io behaves under varying loads.</p>
<h3>Scenario A</h3>
<pre class="brush: sql toolbar: false;">
select * from twitter.search where q = "ql.io"
</pre>
<p>This scenario involves sending a HTTP GET request to <code>http://search.twitter.com/search.json</code>,
parsing the JSON response, and writing it back to the client&#8217;s response.</p>
<h3>Scenario B</h3>
<pre class="brush: sql toolbar: false;">
select id as id, from_user_name as user_name, text as text from twitter.search where q = "ql.io";
</pre>
<p>This scenario is similar to scenario A except the following:</p>
<ul>
<li>Extract <code>results</code> array from the response and extracts <code>id</code>, <code>from_user_name</code>, and <code>text</code> for
each result.</li>
<li>Assemble the projected fields into an object.</li>
<li>Write all the objects as an array into the client&#8217;s response.</li>
</ul>
<h3>Scenario C</h3>
<pre class="brush: sql toolbar: false;">
select ItemID, ViewItemURLForNaturalSearch, Location from details where itemId in
(select itemId from finditems where keywords='mini cooper');
</pre>
<p>This scenario involves finding IDs of items from one API and sending those IDs to another API to
get details as follows:</p>
<ul>
<li>Send an HTTP request to <code>http://svcs.ebay.com/services/search/FindingService/</code>, parse the JSON
response, and extract the array of items by selecting the
<code>findItemsByKeywordsResponse.searchResult.item</code> field of the response.</li>
<li>For each item in the array, project the item&#8217;s ID. Collect the IDs into an array.</li>
<li>Then send an HTTP request to <code>http://open.api.ebay.com/shopping</code> with all the item IDs.</li>
<li>Parse the JSON response, select the <code>Item</code> array from the response, and project each <code>Item</code> to
extract <code>ItemID</code>, <code>ViewItemURLForNaturalSearch</code>, and <code>Location</code> fields. Assemble the projected
fields into an array.</li>
<li>Write all the arrays to the client&#8217;s response as an array or arrays.</li>
</ul>
<h3>Scenario D</h3>
<pre class="brush: sql toolbar: false;">
prodid = select ProductID[0].Value from eBay.FindProducts where
QueryKeywords = 'macbook pro';
details = select * from eBay.ProductDetails where
ProductID in ('{prodid}') and ProductType = 'Reference';
reviews = select * from eBay.ProductReviews where
ProductID in ('{prodid}') and ProductType = 'Reference';
return select d.ProductID[0].Value as id, d.Title as title,
d.ReviewCount as reviewCount, r.ReviewDetails.AverageRating as rating
from details as d, reviews as r
where d.ProductID[0].Value = r.ProductID.Value
via route '/myapi' using method get;
</pre>
<p>The implementation details for this script are a bit more involved, but at a high level, here is
what happens under the hood:</p>
<ul>
<li>Find the script when the client submits a request to the script through a route <code>/myapi</code>.</li>
<li>Send a HTTP request to <code>http://open.api.ebay.com/shopping?callname=FindProducts</code> with a keyword
and extract product IDs from the response.</li>
<li>Send <code>5</code> HTTP requests to <code>http://open.api.ebay.com/shopping?callname=FindProducts</code> with the
product IDs found and extract the details.</li>
<li>Send <code>5</code> HTTP requests to <code>http://open.api.ebay.com/shopping?callname=FindReviewsAndGuides</code> with
the product IDs found and extract the reviews.</li>
<li>Once the <code>10</code> requests complete, join details and reviews by matching responses by IDs, and
extract the selected fields into an object.</li>
<li>Return an array of objects with each object containing the selected fields.</li>
</ul>
<p>This script covers most of the code paths of ql.io. See <a href="http://ql.io/docs/build-an-app">Build an
App</a> for a step by step description of this scenario</p>
<h3>Differences Between Scenarios</h3>
<ul>
<li>Both scenario A and B are mostly IO bound.</li>
<li>Scenario C is also mostly IO bound, but it makes two HTTP requests in sequence as the outer
<code>select</code> depends on the results of the <code>inner</code> select. The second request is made after the first
one completes.</li>
<li>Scenario D involves making <code>11</code> HTTP requests, parsing and projecting response fields, and joining
members of responses of the second and third statements. These responses are unsorted, and joining
them by a matching product ID takes O(n<sup>2)</sup> steps - in this case 25. Yes - this can be improved -
but <a href="http://calendar.perfplanet.com/2011/measure-twice-cut-once/">let&#8217;s measure twice before cutting
once</a>.</li>
</ul>
<h2>Test Settings</h2>
<p>All tests are done using <code>ab -k</code> to maintain persistent connections from the client to the server.</p>
<p>The ql.io app is run with <code>12</code> node.js worker processes managed by
<a href="http://learnboost.github.com/cluster/">cluster</a>.</p>
<h2>First Round Results</h2>
<h3>Throughput</h3>
<p>Here are the throughput results for concurrency ranging from 100 to 500.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A3%3AE8&gid=0&pub=1","options":{"reverseCategories":false,"curveType":"","titleX":"Concurrency","backgroundColor":"#FFFFFF","pointSize":0,"width":600,"lineWidth":2,"logScale":false,"hAxis":{"maxAlternations":1},"hasLabelsColumn":true,"vAxes":[{"title":"Requests/sec","minValue":null,"viewWindowMode":"pretty","viewWindow":{"min":null,"max":null},"maxValue":null},{"viewWindowMode":"pretty","viewWindow":{}}],"title":"Throughput (12 workers)","height":371,"interpolateNulls":false,"legend":"right","reverseAxis":false},"state":{},"view":"{\"columns\":[0,1,2,3,4]}","chartType":"LineChart","chartName":"Throughput"} </script>
<h3>Mean Response Times</h3>
<p>The corresponding chart showing the mean response time for the same range of concurrency is below.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A27%3AE32&gid=0&pub=1","options":{"reverseCategories":false,"curveType":"","titleX":"Concurrency","pointSize":0,"backgroundColor":"#FFFFFF","lineWidth":2,"logScale":false,"hAxis":{"maxAlternations":1},"hasLabelsColumn":true,"vAxes":[{"title":"Time for 80% requests to complete","minValue":null,"viewWindowMode":"pretty","viewWindow":{"min":null,"max":null},"maxValue":null},{"viewWindowMode":"pretty","viewWindow":{}}],"title":"Mean response time","interpolateNulls":false,"legend":"right","reverseAxis":false,"width":600,"height":371},"state":{},"view":"{\"columns\":[0,1,2,3,4]}","chartType":"LineChart","chartName":"Chart 2"} </script>
<h2>Effect of Number of Workers</h2>
<p>In these tests, scenario D fared badly as it includes a mixture of IO and CPU workloads. The CPU
workload is not predominant but is not insignificant either. Here is a chart of the CPU data
captured using <code>dstat</code> at a concurrency level of 200 for sceanrio D.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A7%3AG443&gid=5&pub=1","options":{"vAxes":[{"viewWindowMode":"pretty","viewWindow":{}},{"viewWindowMode":"pretty","viewWindow":{}}],"displayAnnotations":true,"height":371,"width":709,"displayRangeSelector":true,"displayZoomButtons":true,"hAxis":{"maxAlternations":1},"hasLabelsColumn":true,"wmode":"opaque"},"state":{},"view":"{\"columns\":[0,1,2,3,4,5,6]}","chartType":"AnnotatedTimeLine","chartName":"CPU Load for Scenario A with 12 workers"} </script>
<p>This confirms that there is a fair bit of CPU bounded work going on. How does the number of
workers influence such a scenario? I repeated the tests varying the number of worker processes.</p>
<p>The chart below shows the number of requests per second for Scenario D as I changed the number of
workers from 12 to 96 in increments of 12. All the test runs were done at a concurrency level of
100.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A4%3AB12&gid=2&pub=1","options":{"reverseCategories":false,"curveType":"","titleX":"Number of workers","pointSize":0,"backgroundColor":"#FFFFFF","lineWidth":2,"logScale":false,"hasLabelsColumn":true,"hAxis":{"maxAlternations":1},"vAxes":[{"title":"Req/sec","minValue":null,"viewWindowMode":"pretty","viewWindow":{"min":null,"max":null},"maxValue":null},{"viewWindowMode":"pretty","viewWindow":{}}],"title":"","interpolateNulls":false,"legend":"right","reverseAxis":false,"width":600,"height":371},"state":{},"view":"{\"columns\":[0,1]}","chartType":"LineChart","chartName":"Chart 3"} </script>
<p>The number of requests per sec increase from 192 to 384 as I increased the number of workers from
12 to 96. The improvement is less significant after 60 workers.</p>
<p>Here is chart for the mean response time which shows a similar improvement.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=D4%3AE12&gid=2&pub=1","options":{"series":{"0":{"color":"#6aa84f"}},"reverseCategories":false,"curveType":"","titleX":"Number of workers","pointSize":0,"backgroundColor":"#FFFFFF","lineWidth":2,"logScale":false,"hAxis":{"maxAlternations":1},"hasLabelsColumn":true,"vAxes":[{"title":"Mean response time (msec)","minValue":null,"viewWindowMode":"pretty","viewWindow":{"min":null,"max":null},"maxValue":null},{"viewWindowMode":"pretty","viewWindow":{}}],"title":"","interpolateNulls":false,"legend":"none","reverseAxis":false,"width":600,"height":371},"state":{},"view":"{\"columns\":[0,1]}","chartType":"LineChart","chartName":"Chart 4"} </script>
<p>The flatness of these charts with increased worker count can easily be explained by looking at the
CPU again. The chart below shows the CPU data at a cocurrency level of 200 for scenario D with a
worker count of 96.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A7%3AG288&gid=6&pub=1","options":{"displayAnnotations":true,"vAxes":[{"viewWindowMode":"pretty","viewWindow":{}},{"viewWindowMode":"pretty","viewWindow":{}}],"wmode":"opaque","hasLabelsColumn":true,"hAxis":{"maxAlternations":1},"width":742,"height":371},"state":{},"view":"{\"columns\":[0,1,2,3,4,5,6]}","chartType":"AnnotatedTimeLine","chartName":"Chart 9"} </script>
<p>The chart below shows the effect of increasing the worker count from 12 to 96 across all test
scenarios.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A2%3AI7&gid=1&pub=1","options":{"vAxes":[{"title":"Requests/sec","minValue":null,"viewWindowMode":"pretty","viewWindow":{"min":null,"max":null},"maxValue":null},{"viewWindowMode":"pretty","viewWindow":{}}],"reverseCategories":false,"title":"","titleX":"Concurrency","backgroundColor":"#FFFFFF","legend":"right","logScale":false,"reverseAxis":false,"hasLabelsColumn":true,"hAxis":{"maxAlternations":1},"isStacked":false,"width":1074,"height":340},"state":{},"view":"{\"columns\":[0,1,2,3,4,5,6,7,8]}","chartType":"ColumnChart","chartName":"Chart 5"} </script>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A34%3AI39&gid=1&pub=1","options":{"vAxes":[{"title":"Mean response time (msec)","minValue":null,"viewWindowMode":"pretty","viewWindow":{"min":null,"max":null},"maxValue":null},{"viewWindowMode":"pretty","viewWindow":{}}],"reverseCategories":false,"title":"","titleX":"Concurrency","backgroundColor":"#FFFFFF","legend":"right","logScale":false,"reverseAxis":false,"hAxis":{"maxAlternations":1},"hasLabelsColumn":false,"isStacked":false,"width":972,"height":402},"state":{},"view":"{\"columns\":[0,1,2,3,4,5,6,7,8]}","chartType":"ColumnChart","chartName":"Chart 6"} </script>
<h2>What About Memory</h2>
<p>Below is a chart of the memory usage with 96 workers for scenario D at a concurrency level of 200.</p>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/subbu.org/spreadsheet/tq?key=0ApntcBgpHeZldFREOFlGeXVWdzZZc3lNSld1aWZqdVE&transpose=0&headers=1&range=A299%3AE581&gid=6&pub=1","options":{"displayAnnotations":true,"vAxes":[{"viewWindowMode":"pretty","viewWindow":{}},{"viewWindowMode":"pretty","viewWindow":{}}],"wmode":"opaque","hasLabelsColumn":true,"hAxis":{"maxAlternations":1},"width":708,"height":371},"state":{},"view":"{\"columns\":[0,1,2,3,4]}","chartType":"AnnotatedTimeLine","chartName":"Chart 12"} </script>
<p>The lines remained nearly flat for the duration of the test.</p>
<h2>Summary</h2>
<p>The goal of this exercise is to set a baseline for future work. The scenarios I used show a range of
scripts that cover most of the current capabilities of ql.io.</p>
<p>Here are few key take-aways:</p>
<ul>
<li>ql.io is designed for IO bound workloads. However, data aggregation and orchestration often
involves some CPU bound work such as projections and joins. This is unavoidable. I suspect that
the same is the case with many typical uses of node.js.</li>
<li>On commodity hardware with commodity network layer, my tests show that ql.io can do 400-2400
requests/sec depending on the nature of the work involved. Your mileage may vary.</li>
<li>Use of as many workers as there are CPU threads available is a good starting point, but tuning
the number based on the characteristics of the app may yield better results.</li>
</ul>
<p>We&#8217;re currently working on upgrading ql.io to node.js 0.6.x. See the <a href="https://github.com/ql-io/ql.io/tree/0.4">0.4 branch on
github</a>. Watchout for a repeat of these tests on node.js
0.6.x.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io on Joyent's no.de]]></title>
<link href="http://ql-io.github.com/2011/12/17/no.de.html"/>
<updated>2011-12-17T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2011/12/17/no.de</id>
<content type="html"><![CDATA[<p>If you&#8217;re interested in setting a ql.io app instance on a Joyent&#8217;s <a href="https://no.de/">no.de</a>, here is
how.</p>
<ul>
<li>Clone the template app</li>
</ul>
<pre class="brush: bash toolbar: false;">
git clone git@github.com:ql-io/ql-io.no.de.git
</pre>
<p>This template includes a server.js, some sample ql.io scripts.</p>
<ul>
<li><p>Create a SmartMachine instance on no.de.</p></li>
<li><p>Push the app to no.de</p></li>
</ul>
<pre class="brush: bash toolbar: false;">
# Assuming your smart machine name is "foo"
cd ql-io.no.de
git remote add foo.no.de foo.no.de:repo
git push foo.no.de master
</pre>
<p>This will push the app to your SmartMachine, and bring it up.</p>
<ul>
<li>Try a sample route.</li>
</ul>
<pre class="brush: bash toolbar: false;">
curl http://foo.no.de/myapi
</pre>
<p>Use <code>http://foo.no.de/console</code> to view the console.</p>
<p>See <a href="http://ql.no.de/myapi">ql.no.de/myapi</a> to see this example in action.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Milestone 6]]></title>
<link href="http://ql-io.github.com/2011/12/16/milestone.html"/>
<updated>2011-12-16T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2011/12/16/milestone</id>
<content type="html"><![CDATA[<ul>
<li><p>Clients can occasionally get socket hangup errors when origin servers close connections without
sending a <code>Connection: close</code> header. See
<a href="https://github.com/joyent/node/issues/1135">https://github.com/joyent/node/issues/1135</a> for some
background. To avoid such errors, http.request.js now automatically retries the request once
provided the statement that caused the HTTP request is a <code>select</code>.</p></li>
<li><p>The engine can now consume CSV response in addition to XML and JSON.</p></li>
<li><p>Fixed request body processing for routes (see <a href="https://github.com/ql-io/ql.io/pull/161">issue 161</a>).</p></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Milestone 5]]></title>
<link href="http://ql-io.github.com/2011/12/08/milestone.html"/>
<updated>2011-12-08T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2011/12/08/milestone</id>
<content type="html"><![CDATA[<ul>
<li><p><a href="https://github.com/ql-io/ql.io/issues/121">OAuth example</a> - OAuth2 is trivial as ql.io proxies
headers from clients to servers. OAuth1 requires glue code to compute the Authorization header.
See <a href="http://ql.io/docs/oauth">http://ql.io/docs/oauth</a> for an example.</p></li>
<li><p>Use npm installed modules for ql.io-site (<a href="https://github.com/ql-io/ql.io/issues/116">see issue 116</a>).</p></li>
<li><p>Handle empty response bodies gracefully (<a href="https://github.com/ql-io/ql.io/issues/98">see issue 98</a>).</p></li>
<li><p>Recover from partial failures in case of scatter-gather calls
(<a href="https://github.com/ql-io/ql.io/issues/90">see issue 90</a>) - some statements can result in multiple
HTTP requests. When this happens, the engine used to fail the entire statement if any of those
requests fail. The engine now looks for success responses and aggregates them.</p></li>
<li><p>Update CodeMirror to support line-wrapping (<a href="https://github.com/ql-io/ql.io/issues/11">See issue
11</a>) - no need to split lines manually anymore.</p></li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ql.io Launch]]></title>
<link href="http://ql-io.github.com/2011/11/29/launch.html"/>
<updated>2011-11-29T00:00:00-08:00</updated>
<id>http://ql-io.github.com/2011/11/29/launch</id>
<content type="html"><![CDATA[<p>See <a href="http://www.ebaytechblog.com/2011/11/30/announcing-ql-io/">Announcing ql.io</a>.</p>
]]></content>
</entry>
</feed>