New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document batch configuration options in run #248
Comments
@mlucy I believe you left it undocumented for a reason. Is the code foolproof enough? Does it error correctly if zero or negative values are passed? |
I honestly have no idea. It wasn't originally written for external use. We should write some tests for error cases before we document it for people. |
Does this batching also work with controlling the internal batching size. |
Yeah; it affects the intracluster batch sizes as well. |
This has been fixed (see rethinkdb/rethinkdb#2185), so we should document it for 1.16. |
The changes landed in 1.15. |
Yep, we missed this. /cc @chipotle |
What are the defaults for the various options? There's |
/cc @mlucy |
@neumino pointed me at the source code with the defaults. One other question: I'm assuming that in the JS driver these "sub-optargs" are not converted from camelCase, so you would write
rather than
Is that right? |
I just looked at the code, there's currently a bug in the JS driver. We accept only |
Err sorry, I just read The available options seem to be here: I don't see nested options, so I'm a bit confused what the new syntax is. |
Do we need to create an issue to keep track of the js driver bug? |
I'm not sure if it's a bug in the JS driver, or the actual spec. |
I've looked at batching.cc lines 173-199; I can see what effect changing |
@chipotle -- it's perceived latency. What happens is, people type a query in the repl, run it, and measure how long it takes to get the first batch. They then treat it as an indication of RethinkDB performance. Of course what really happens is that there is a tradeoff between latency and throughput, but that's not generally how people think. So the scaledown factor gives people the first batch quickly to improve perceived latency in repl interactions, and then starts optimizing for throughput for future batches. This turns out to hit a great balance between perceived vs. real latency/throughput performance. |
It looks like the default values ensure that the initial batch will be one row -- |
The |
Argh. |
Code review 2181 open. (@AtnNn, I've set you as the reviewer for this one.) |
@wojons asked how to configure batching. @mlucy pointed out the following. We should document it properly.
So, this isn't documented anywhere, but there's a
batch_conf
optarg which lets you set these things. If you writequery.run(batch_conf:{max_els:5, max_dur:50*1000})
, you'll get a batch back as soon as 5 rows are available or 50*1000 microseconds pass, whichever happens first. (You can also usemax_size
to configure a maximum serialized size in bytes, which is what we use internally for most batch sizing.)The text was updated successfully, but these errors were encountered: