Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce backpressure to webserver #3108

Merged
merged 3 commits into from
Jun 22, 2021

Conversation

danielkec
Copy link
Contributor

@danielkec danielkec commented Jun 14, 2021

There are basically 2 possibilities how to solve the problem with upstream being faster than the Netty event loop.
Both solutions are introducing back-pressure.

  • Request one-by-one - dead simple solution which can be less performant with publishers designed to produce data in batches(file reading, db access), that can be compensated with buffering on the publisher side on as needed basis(reactive operator for buffering)
  • Requesting by batches depending on the state of newly introduced webserver buffer - this solution would make webserver responsible for additional buffering, solution woul have to be highly configurable for use-cases where buffering is not desirable

This PR fixes #3092 by requesting one-by-one item.

Introduction of the symmetrical back-pressure is expected to bring some small performance penalty:
JMH comparison

# 1M100x means a stream of 100 DataChunks backed by 1MB ByteBuffer

Benchmark                Mode  Cnt    Score    Error  Units
OneByOne1M100x.test     thrpt    5    0.069 ±  0.002  ops/s
OneByOne1k10000x.test   thrpt    5    5.179 ±  0.224  ops/s
OneByOne1k100x.test     thrpt    5  502.839 ± 19.513  ops/s
Unbounded1M100x.test    thrpt    5    0.066 ±  0.002  ops/s
Unbounded1k10000x.test  thrpt    5    5.137 ±  0.346  ops/s
Unbounded1k100x.test    thrpt    5  494.400 ± 57.133  ops/s

  Unbounded1M100x.test  0.066
   OneByOne1M100x.test  0.069
Unbounded1k10000x.test █ 5.137
 OneByOne1k10000x.test █ 5.179
  Unbounded1k100x.test ██████████████████████████████████████████████████████████ 494.400
   OneByOne1k100x.test ████████████████████████████████████████████████████████████ 502.839

Compensation is possible by buffering on the publisher's size, this could be achieved by introducing buffering reactive operator.

Signed-off-by: Daniel Kec daniel.kec@oracle.com

@danielkec danielkec self-assigned this Jun 14, 2021
@danielkec danielkec force-pushed the 3092-webserver-backpressure branch from 78d31ee to ae0c91f Compare June 14, 2021 09:11
@barchetta barchetta added this to the 2.3.1 milestone Jun 14, 2021
@danielkec danielkec requested a review from olotenko June 14, 2021 15:29
@spericas
Copy link
Member

@danielkec We should probably create another issue to run the perf tests after this change.

Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
@danielkec danielkec force-pushed the 3092-webserver-backpressure branch from 0e2b684 to 4a62ab4 Compare June 14, 2021 21:22
Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
@danielkec
Copy link
Contributor Author

danielkec commented Jun 15, 2021

I have ran TechEmpower benchmark locally to compare snapshot version with the change against the 2.2.1

2.2.1
image

2.3.1-SNAPSHOT - with this PR
image

spericas
spericas previously approved these changes Jun 15, 2021
Copy link
Member

@spericas spericas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@olotenko olotenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure what this change achieves.

Ie how is the backpressure introduced here. The old version effectively said: "push as much data as you want, we'll push it to channel as soon as we can". The proposed version effectively says: "push at most two items, and we'll tell you when to push more, and we do that as soon as we pushed it to channel as soon as we could".

I would understand if subscription.request(1) were invoked after channel.write completed - that is definitely allowing more data to arrive after it has been committed to the channel buffers - presumably, Netty won't complete the related Future, if write buffers are full.

Another thing that makes sense - count the bytes that go through here, and allow up to a given amount to be buffered. Eg onNext counts how many bytes were in the buffers it has seen, and requests one more chunk if it has seen below watermark. Then channel.write completion Future subtracts the bytes it has just written. If this results in crossing the "watermark", then request 1. (because onNext stopped requesting more)

Watermarking approach is more predictable. Just deferring to channel.write completion would work poorly, if in the common case socket write completion does not happen during onNext, but will work better than watermarking, if there is this guarantee. (Ie that onNext attempts to call channel.write, through a few calls of intermediate routines, and that Netty completes the future before onNext returns, if the write fully fits in socket send buffers)

Signed-off-by: Daniel Kec <daniel.kec@oracle.com>
@danielkec
Copy link
Contributor Author

We trying to solve the issue #3092 with this PR eg. sending long streams of huge immediately flushed chunks.

Watermarking for not-immediately flushed chunks seems pretty cool and would make webserver truly backpressure aware, lets move it to #3136

Copy link

@olotenko olotenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try...finally is a safer pattern for "always request in onNext, unless explicitly delegated to orderedWrite", but ok, we can live with this implementation.

Thank you for taking care of my comments. Let's implement better backpressure as a separate ticket that you created.

@danielkec danielkec merged commit 4ffadeb into helidon-io:master Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in the webserver
4 participants