feat(executor): accumulate short messages if message queue not empty #2755

jon-chuang · 2022-05-24T02:31:16Z

Certain executors may drastically reduce the number of rows in a given message. Although this is probably not a cause of worry, in the interest of efficiency of message size and execution batches, it may be better to try to accumulate such small messages.

To do so, we could wrap executors which potentially reduce rows (filter, join) in "accumulators" which try to accumulate messages under a certain threshold (e.g. num of rows, num of bytes) before forwarding them if there is a message backlog in the queue or if a timeout has not been reached.

The logic could be as follows:

if current_time() - msg.start_time < FORWARD_MSG_TIMEOUT && !self.msg_queue.empty() {
  msg.append(self.execute(self.msg_queue.pop()).await);
} else {
  self.forward(msg);
  msg = Message::new();
}

liurenjie1024 · 2022-05-24T03:23:53Z

Executors not only reduce chunk sizes, but also increase chunk sizes, for example (join, exchange). So I think a more general approach is to rechunk data chunks to fixed size, which would benefit vectorized execution.

skyzh · 2022-05-24T03:38:12Z

From my perspective, it is better to batch requests on executors that will be affected by chunk number (e.g., joins). It is always beneficial to get message instantly delivered.

jon-chuang · 2022-05-24T03:53:12Z

Executors not only reduce chunk sizes, but also increase chunk sizes

Yes, the current approach was to change the executor itself to yield chunks to ensure chunks are bounded. It needed a refactor to break up straight-line code into one that yielded intermittently. I suppose that in this case, we cared about latency and message size, as limiting the message size alone could be solved by simply breaking up the monolithic message.

I think yes, the approaches are complementary.

It is always beneficial to get message instantly delivered.

I think this is generally true for same-node, however, I think if the queue is not empty and the messages are small, those messages do less work, and introduce overhead. Hence, I believe that accumulating them into larger messages will offset the latency cost if it is tuned properly.

Perhaps we could focus on accumulating batches in dispatch operators or operators which know they are sending over the network.

So that any "slack" in a fragment or in a pipeline introduced by will be straightened out at the network boundary.

it is better to batch requests on executors that will be affected by chunk number

But yes I guess a complimentary approach may be to do the accumulation at the executor itself...

jon-chuang added the type/enhancement Improvements to existing implementation. label May 24, 2022

jon-chuang mentioned this issue May 30, 2022

feat(executor): Stateful operators should stream newly arriving messages #2897

Open

jon-chuang mentioned this issue Jul 11, 2022

(perf): rethink visibility masking v.s. copying data in datachunks #3770

Open

neverchanje added the no-issue-activity label Jul 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(executor): accumulate short messages if message queue not empty #2755

feat(executor): accumulate short messages if message queue not empty #2755

jon-chuang commented May 24, 2022 •

edited

Loading

liurenjie1024 commented May 24, 2022

skyzh commented May 24, 2022

jon-chuang commented May 24, 2022 •

edited

Loading

feat(executor): accumulate short messages if message queue not empty #2755

feat(executor): accumulate short messages if message queue not empty #2755

Comments

jon-chuang commented May 24, 2022 • edited Loading

liurenjie1024 commented May 24, 2022

skyzh commented May 24, 2022

jon-chuang commented May 24, 2022 • edited Loading

jon-chuang commented May 24, 2022 •

edited

Loading

jon-chuang commented May 24, 2022 •

edited

Loading