run(stream=true) and custom batch size #2342

wojons · 2014-05-02T19:19:23Z

Info

So after talking with @danielmewes for a little bit i learned about how rethinkdb handles batches that there is an internal batch size witch collates well with the batch size that is sent to the client. I know this problem is not as bad in official drivers since from what i was told they make the request for the next batch while processing the batch they just got.

Use case

So in my use case i have 1 minute data points meaning 1440 points a day and the objects are all between +5kb sometimes they are larger then 10kb. So when i query over a few hours a data and start looking at day, week, month i start to see weird things with the cpu. It wakes up and then goes to sleep wakes up goes to sleep and keeps doing that. and at some point shortly after the last time it wakes up my data has finally all been disabled on the browser.

With really basic queries like r.table().filter() this is not to bad since the time it takes to process each batch is super short. but when you are doing large GMR jobs that consist of lots of loops operations and so on (ask @danielmewes or @AtnNn about my gmr jobs) this becomes a really big problem.

Proposal

I would like a stream flag added to the protocol so the pipeline flows when there is a lot of work to be done in between getting the data from the table and sending it to the client. This can be done on different levels internal buffer on the db size where it processes everything and then able to still batch to the client or that it does the batches but always has batches running at each step and then throwing them at the client. Another option that will be important is for users to submit a batch size if you have production servers with some power behind them 1mb batch is very very small and should also be configurable.

@coffeemug i know your first thought is to put this in to 2.x but this to me is very important and deals very closely with performance of rethinkdb from the users point of view. The database may be very fast but the batching is messing with that idea.

wojons · 2014-05-02T19:25:42Z

One more thing to limit confusion instead of sending just 1 document down the pipeline at a time there should be concurrent batches moving though the pipeline. I am pretty sure a batch should block another batch from over taking it or at least over taking it before being sent to the user. what i mean by this is.

r.table().orderBy().filter().filter().filter()

3 batches or more can be run at once if for some reason the 1st batch is super slow going though the filters other batches can go to the last one and wait till the first batch gets sent to the user. then the other ones can get set.

danielmewes · 2014-05-02T19:25:54Z

Hi @wojons, thanks for the nice issue report.

I think the crux of this is that the cluster-internal batching should also request the next batch while it is still processing the current one.
@mlucy We are not already doing that, are we?

For making the batch configurable by the user, see rethinkdb/docs#248 and #2185

mlucy · 2014-05-02T20:02:37Z

@danielmewes -- we aren't, but we probably should be.

wojons · 2014-05-02T21:45:06Z

@mlucy how hard would something like this be?

mlucy · 2014-05-02T21:49:43Z

Probably not that hard? There's nothing technically difficult about it, anyway; someone just has to sit down and change all the places where we get batches to make them prefetch. (We'd also have to think a little bit about the performance implications for memory-hungry workloads.)

srh · 2014-05-02T22:09:00Z

@wojons: if we implemented pipelining within the cluster like that, it wouldn't be a flag, it would be the default.

anyway; someone just has to sit down and change all the places where we get batches to make them prefetch.

If we wanted to not suck we would just stream things instead of requesting batches back and forth like that. The parser end would only send traffic-control messages back. (Prefetching batches is just a dumb form of traffic control messages, you can obviously do that better.) (Also, the problem with prefetching is that you still have to wait for the previous batch before fetching the next batch.)

mlucy · 2014-05-02T22:11:03Z

Streaming would be cool, but is further away than prefetching, and prefetching will probably be close to as good (except for latency). Also, we'd probably want some batching even with streaming just because there's presumably per-message overhead and such.

srh · 2014-05-02T22:13:46Z

Streaming would be cool, but is further away than prefetching, and prefetching will probably be close to as good (except for latency).

Latency is the problem here, so how would prefetching be good "except for latency"?

danielmewes · 2014-05-02T22:17:07Z

What do you guys mean when you talk of streaming? Obviously we wouldn't want to send one message per document. The overhead would be horrible.

I think there's also some confusion about what latency means.
Prefetching improves how long it takes for a single client to retrieve the full result, which one could call "latency" (in contrast to having multiple clients). It does not improve how long it takes until the first result is received.

wojons · 2014-05-02T22:19:06Z

so in my current example the reason this is a problem is because of how long my pipeline is. if there were lots of batches ready and waiting at the end this would not really be a problem. Now control messages is all cool and advance but i have a feeling that if the style moved to that it will be a while before its done and this issue wll start be coming an issue as more and more people start using it. at least with pre fetching pre batching what ever we want to call it it gives something workable until serval versions from now its fixed full. I feel like pre fetching will be able to make it into 1.14 if it wanted to but streaming would not make it til 2.x

wojons · 2014-05-02T22:24:33Z

@danielmewes what i think they mean is the following

prefetching. server fetches a batches before the client needs or requests them and queues them.

streaming. server starts getting batches and sending them to the client as fast as i can, the client then sends messages to tell the server to speed up or slow down.

@srh correct me if i am worng.

srh · 2014-05-02T22:24:40Z

Prefetching improves how long it takes for a single client to retrieve the full result, which one could call "latency" (in contrast to having multiple clients). It does not improve how long it takes until the first result is received.

Prefetching reduces the problem of latency slowing things down thanks to communication delays within the cluster, because there's less time wasted waiting for responses. If the slow end is on the parser- or client-side, it could eliminate the problem, but if it's on the store side, it could have a little to negligible improvement.

(@wojons: The parser means the machine talking to the client, which also tends to do a lot of ReQL query evaluation, and it could be slow when evaluating a large query.)

srh · 2014-05-02T22:26:27Z

streaming. server starts getting batches and sending them to the client as fast as i can, the client then sends messages to tell the server to speed up or slow down.

I was talking about intra-cluster streaming. As far as the client protocol goes, streaming there would be nice too, but it would require the client driver to spawn its own threads in languages that aren't Javascript, or could require a more advanced interface for the client driver. (Or we could not do that, and stuff could sit in the network buffer.)

wojons · 2014-05-02T22:26:57Z

@srh i am guessing that the parser re does its job everytime the client requests the next batch and the parser then sends the full query but some sort of skip back to the nodes?

wojons · 2014-05-02T22:28:53Z

I was talking about intra-cluster streaming. As far as the client protocol goes, streaming there would be nice too, but it would require the client driver to spawn its own threads in languages that aren't Javascript, or could require a more advanced interface for the client driver. (Or we could not do that, and stuff could sit in the network buffer.)

that is true but many lanauges have sort of instant timeout these days so they can do a level of non blocking that would allow for some sort of streaming as long as its not processing something it got from before. when i said client i sould have said nodes and client streaming in the way of control messages would be killer but i feel like that is var away.

srh · 2014-05-02T22:30:42Z

@srh i am guessing that the parser re does its job everytime the client requests the next batch and the parser then sends the full query but some sort of skip back to the nodes?

No -- datum streams are cached as long as the client connection is open (unless they time out?) and they send a request to stores asking only for the next data. They generally store the greatest key or least key that they received from a store and send the next query asking for keys greater than said key. That's more efficient than doing a skip query (because skip queries are not efficient).

srh · 2014-05-02T22:32:03Z

So at least in terms of how the code is written, the query's evaluated and returns a 'datum stream', and we just ask those for another batch when another request comes in.

danielmewes · 2014-05-02T22:32:15Z

@srh: It's not just about communication delays. Let's say you run r.table().delete().
I believe what happens now is that the parser will load a batch of documents from the shards, and then issues a series of deletions back to the shards (@mlucy: please correct me if I'm wrong).
So for some period, the shards will only be reading, followed by a period where they are only writing. Reading and writing data utilizes non-identical subsets of resources (it might be more extreme in other examples). If both were happening in parallel, the overall speed of the query might improve. And that's what prefetching would do.

danielmewes · 2014-05-02T22:33:48Z

A better example would be r.table('a').insert(r.table('b'))

wojons · 2014-05-02T22:43:46Z

@srh since i lack the internal ways of how somethings are done please excuse me when i make mistakes in trying to understand.

coffeemug added this to the subsequent milestone May 3, 2014

danielmewes modified the milestones: backlog, subsequent Aug 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run(stream=true) and custom batch size #2342

run(stream=true) and custom batch size #2342

wojons commented May 2, 2014

wojons commented May 2, 2014

danielmewes commented May 2, 2014

mlucy commented May 2, 2014

wojons commented May 2, 2014

mlucy commented May 2, 2014

srh commented May 2, 2014

mlucy commented May 2, 2014

srh commented May 2, 2014

danielmewes commented May 2, 2014

wojons commented May 2, 2014

wojons commented May 2, 2014

srh commented May 2, 2014

srh commented May 2, 2014

wojons commented May 2, 2014

wojons commented May 2, 2014

srh commented May 2, 2014

srh commented May 2, 2014

danielmewes commented May 2, 2014

danielmewes commented May 2, 2014

wojons commented May 2, 2014

run(stream=true) and custom batch size #2342

run(stream=true) and custom batch size #2342

Comments

wojons commented May 2, 2014

Info

Use case

Proposal

wojons commented May 2, 2014

danielmewes commented May 2, 2014

mlucy commented May 2, 2014

wojons commented May 2, 2014

mlucy commented May 2, 2014

srh commented May 2, 2014

mlucy commented May 2, 2014

srh commented May 2, 2014

danielmewes commented May 2, 2014

wojons commented May 2, 2014

wojons commented May 2, 2014

srh commented May 2, 2014

srh commented May 2, 2014

wojons commented May 2, 2014

wojons commented May 2, 2014

srh commented May 2, 2014

srh commented May 2, 2014

danielmewes commented May 2, 2014

danielmewes commented May 2, 2014

wojons commented May 2, 2014