Skip to content

Commit

Permalink
split docs
Browse files Browse the repository at this point in the history
  • Loading branch information
yawkat committed May 23, 2024
1 parent 720705a commit f6645bc
Show file tree
Hide file tree
Showing 5 changed files with 122 additions and 127 deletions.
126 changes: 0 additions & 126 deletions src/main/docs/guide/httpServer/byteBody.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,129 +6,3 @@ When an HTTP request comes in, it starts out with an unparsed and unclaimed stre
all request filters have run, typically an argument binder matching the `@Body` parameter of the controller will
"claim" the `byteBody()` and e.g. parse the JSON. Finally, the body is closed at the end of the request lifecycle,
discarding any data if it has not been claimed by the argument binder.

=== Primary operations

`ByteBody` itself does not offer direct access to the data. To begin processing, there must be a _primary
operation_ that converts the body into another form that can be used in the application programming model.

A normal `ByteBody` has two groups of streaming primary operations. `toInputStream()` gives access to the body
as a regular `InputStream`. The `toByteArrayPublisher()` and `toByteBufferPublisher()` methods return a reactive stream
of byte arrays or `ByteBuffer`s.

WARNING: `InputStream` is blocking API, and the netty event loop must never be blocked. If you wish to read from the
body using an `InputStream`, take care to do so only on another thread, or to annotate your filter with
`@ExecuteOn(TaskExecutors.BLOCKING)`.

If you need full access to the body, the `buffer()` method returns a `CompletableFuture` that completes with an
`AvailableByteBody` when the full body has been received. `AvailableByteBody` has a few more convenient
primary operations: `toByteArray()`, `toByteBuffer()` and `toString(Charset)`.

Buffering is limited to the number of bytes configured in the `micronaut.server.max-request-buffer-size` property.

=== Splitting

Because the framework will not buffer the whole body in memory by default, after an ByteBody has been claimed (a
primary operation has been performed), the data is "gone", and the same ByteBody cannot be claimed again. That
means that if a filter were to claim the `ServerHttpRequest.byteBody()` directly (e.g. to print it to a log),
controllers could not access it anymore. The argument binder for the `@Body` argument would throw an exception.

To resolve this exclusivity problem, an `ByteBody` can be _split_ before it is claimed. The split operation
essentially duplicates the body stream so that the two consumers (logging and argument binding) can process it
independently. A body can be `split` any number of times, but only before the primary operation.

While `ServerHttpRequest.byteBody()` returns a normal `ByteBody` -- cleanup is done by the HTTP server if the
body is not consumed--the body returned by `split` is a `CloseableByteBody`. The caller *must* ensure that the
new instance is closed, otherwise there can be resource and memory leaks, stalled connections, or other issues.

==== Backpressure

When there are two consumers of the same stream of input data, the problem of backpressure coordination necessarily
comes up.

Backpressure in an HTTP server describes the behavior when the "downstream" consumers cannot consume data as fast as
the "upstream" supplier (i.e. the HTTP client sending the request) is sending it. To avoid having to buffer large
amounts of incoming data, the server will apply backpressure (make the client send its data more slowly) when
downstream consumers cannot keep up.

A `split` operation now introduces two consumers. Depending on use case, different approaches of dealing with the
backpressure of each downstream consumer may be appropriate. For example, if the two consumers write the body data to
two separate files at the same time, it's best to use the backpressure of the slowest consumer to avoid buffering data.
But in another example, when one consumer is a filter that needs access to all the body data, and the other consumer
is the controller, the filter needs to complete before the controller even reads any data, so we should instead be
guided by the fastest of the two consumers.

These two approaches are already the two most important
api:io.micronaut.http.body.ByteBody.SplitBackpressureMode[]s. The full list of options is as follows:

* `SplitBackpressureMode.SLOWEST` uses the backpressure of the _slowest_ of the two consumers (first example)
* `SplitBackpressureMode.FASTEST` uses the backpressure of the _fastest_ of the two consumers (second example)
* `SplitBackpressureMode.ORIGINAL` uses the backpressure of the original consumer (the one `split()` was called on)
* `SplitBackpressureMode.NEW` uses the backpressure of the new consumer (the one `split()` returns)

The argument-less `split()` method uses `SLOWEST`, but you should pick the mode that is most appropriate for your use
case.

==== Discarding

Some consumers end up not needing the body after all. For example, if a `POST` request cannot be matched to a
controller route, the body is not needed and can be discarded. How discarding is implemented in the server depends on
HTTP version. For HTTP/1, the server might close the connection or simply drop the data (which can still save some
decompression overhead). For HTTP/2 the server can close the input of the request stream, instructing the client to
send no more data.

When there are multiple consumers, discard behavior is dependent on use case. In the above scenario of an unmatched
request, when there is a filter that is also subscribed to the body data, it may be appropriate to drop the request in
some cases (e.g. for logging) but may be necessary to still receive all the data in others.

To signal that the upstream may discard the body, you can call `ByteBody.allowDiscard()`. Only if all consumers
call `allowDiscard()` (or `close()` without a primary operation) may the remaining data actually be discarded. Before
that, _all_ consumers, even those that called `allowDiscard()`, will still receive all data. For the logging use case,
you can call `allowDiscard()` and be assured that you will still log the full body if the controller needs it.

=== Example

This example adds a filter that will log the body bytes as they come in.

snippet::io.micronaut.docs.server.body.BodyLogController[tags="imports,clazz", indent=0, title="A simple controller"]

snippet::io.micronaut.docs.server.body.BodyLogFilter[tags="imports,clazz", indent=0, title="Logging filter"]

<1> The `@Body Person person` parameter is the final consumer of the `ServerHttpRequest.byteBody()`. The argument
binder will internally perform a primary operation on the body, parse the JSON, and convert it to the `Person` object.
<2> The `logBody` filter will be called before the controller. However, it is programmed asynchronously, so the actual
logging may happen later as data is received.
<3> `split` the body so that we can work with it without interfering with the argument binder in <1>. We use `SLOWEST`
mode to prevent buffering: We don't want to overwhelm the controller with data because the logging is usually very
fast, but at the same time we don't want to overwhelm the logging if it is unexpectedly slower than the controller.
<4> The newly split body is in a try-with-resources statement to ensure that it is properly closed and there is no data
leak.
<5> We call `allowDiscard()` to signal that if the controller does not need the body after all, the logging filter is
fine with dropping it entirely. Without this call, the full body would always be logged, even if the body is discarded.
<6> Convert our copy of the body to a project reactor stream of `byte[]`.
<7> Since we called `allowDiscard()`, there may be a `BodyDiscardedException` if the upstream decides that the body can
be dropped. We ignore that exception.
<8> Finally, subscribe to the reactive stream, and log any incoming data. Note that `subscribe` is asynchronous: It
will return immediately and then call the lambda with the log statement as data comes in.

If you run this example, you should see log output like this:

[source]
----
16:29:30.562 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: eyJmaXJzdE5hbWUiOiAiSm9uYXMiLCAibGFzdE5hbWUiOiAiS29ucmFkIn0=
16:29:30.604 [default-nioEventLoopGroup-1-3] INFO i.m.d.server.body.BodyLogController - Creating person Person[firstName=Jonas, lastName=Konrad]
----

With a short body like this, the log will only show one "packet". With more packets, the log statement will be called
multiple times:

[source]
----
16:29:30.562 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
16:29:30.584 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
16:29:30.642 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
16:29:30.773 [default-nioEventLoopGroup-1-3] INFO i.m.d.server.body.BodyLogController - Creating person Person[firstName=..., lastName=...]
16:29:30.708 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
----

Note that the logging in the above example is asynchronous, so the log statements may be interleaved as shown.
44 changes: 44 additions & 0 deletions src/main/docs/guide/httpServer/byteBody/example.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
This example adds a filter that will log the body bytes as they come in.

snippet::io.micronaut.docs.server.body.BodyLogController[tags="imports,clazz", indent=0, title="A simple controller"]

snippet::io.micronaut.docs.server.body.BodyLogFilter[tags="imports,clazz", indent=0, title="Logging filter"]

<1> The `@Body Person person` parameter is the final consumer of the `ServerHttpRequest.byteBody()`. The argument
binder will internally perform a primary operation on the body, parse the JSON, and convert it to the `Person` object.
<2> The `logBody` filter will be called before the controller. However, it is programmed asynchronously, so the actual
logging may happen later as data is received.
<3> `split` the body so that we can work with it without interfering with the argument binder in <1>. We use `SLOWEST`
mode to prevent buffering: We don't want to overwhelm the controller with data because the logging is usually very
fast, but at the same time we don't want to overwhelm the logging if it is unexpectedly slower than the controller.
<4> The newly split body is in a try-with-resources statement to ensure that it is properly closed and there is no data
leak.
<5> We call `allowDiscard()` to signal that if the controller does not need the body after all, the logging filter is
fine with dropping it entirely. Without this call, the full body would always be logged, even if the body is discarded.
<6> Convert our copy of the body to a project reactor stream of `byte[]`.
<7> Since we called `allowDiscard()`, there may be a `BodyDiscardedException` if the upstream decides that the body can
be dropped. We ignore that exception.
<8> Finally, subscribe to the reactive stream, and log any incoming data. Note that `subscribe` is asynchronous: It
will return immediately and then call the lambda with the log statement as data comes in.

If you run this example, you should see log output like this:

[source]
----
16:29:30.562 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: eyJmaXJzdE5hbWUiOiAiSm9uYXMiLCAibGFzdE5hbWUiOiAiS29ucmFkIn0=
16:29:30.604 [default-nioEventLoopGroup-1-3] INFO i.m.d.server.body.BodyLogController - Creating person Person[firstName=Jonas, lastName=Konrad]
----

With a short body like this, the log will only show one "packet". With more packets, the log statement will be called
multiple times:

[source]
----
16:29:30.562 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
16:29:30.584 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
16:29:30.642 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
16:29:30.773 [default-nioEventLoopGroup-1-3] INFO i.m.d.server.body.BodyLogController - Creating person Person[firstName=..., lastName=...]
16:29:30.708 [default-nioEventLoopGroup-1-3] INFO i.m.docs.server.body.BodyLogFilter - Received body: ...
----

Note that the logging in the above example is asynchronous, so the log statements may be interleaved as shown.
16 changes: 16 additions & 0 deletions src/main/docs/guide/httpServer/byteBody/primary.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
`ByteBody` itself does not offer direct access to the data. To begin processing, there must be a _primary
operation_ that converts the body into another form that can be used in the application programming model.

A normal `ByteBody` has two groups of streaming primary operations. `toInputStream()` gives access to the body
as a regular `InputStream`. The `toByteArrayPublisher()` and `toByteBufferPublisher()` methods return a reactive stream
of byte arrays or `ByteBuffer`s.

WARNING: `InputStream` is blocking API, and the netty event loop must never be blocked. If you wish to read from the
body using an `InputStream`, take care to do so only on another thread, or to annotate your filter with
`@ExecuteOn(TaskExecutors.BLOCKING)`.

If you need full access to the body, the `buffer()` method returns a `CompletableFuture` that completes with an
`AvailableByteBody` when the full body has been received. `AvailableByteBody` has a few more convenient
primary operations: `toByteArray()`, `toByteBuffer()` and `toString(Charset)`.

Buffering is limited to the number of bytes configured in the `micronaut.server.max-request-buffer-size` property.
57 changes: 57 additions & 0 deletions src/main/docs/guide/httpServer/byteBody/splitting.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Because the framework will not buffer the whole body in memory by default, after an ByteBody has been claimed (a
primary operation has been performed), the data is "gone", and the same ByteBody cannot be claimed again. That
means that if a filter were to claim the `ServerHttpRequest.byteBody()` directly (e.g. to print it to a log),
controllers could not access it anymore. The argument binder for the `@Body` argument would throw an exception.

To resolve this exclusivity problem, an `ByteBody` can be _split_ before it is claimed. The split operation
essentially duplicates the body stream so that the two consumers (logging and argument binding) can process it
independently. A body can be `split` any number of times, but only before the primary operation.

While `ServerHttpRequest.byteBody()` returns a normal `ByteBody` -- cleanup is done by the HTTP server if the
body is not consumed--the body returned by `split` is a `CloseableByteBody`. The caller *must* ensure that the
new instance is closed, otherwise there can be resource and memory leaks, stalled connections, or other issues.

==== Backpressure

When there are two consumers of the same stream of input data, the problem of backpressure coordination necessarily
comes up.

Backpressure in an HTTP server describes the behavior when the "downstream" consumers cannot consume data as fast as
the "upstream" supplier (i.e. the HTTP client sending the request) is sending it. To avoid having to buffer large
amounts of incoming data, the server will apply backpressure (make the client send its data more slowly) when
downstream consumers cannot keep up.

A `split` operation now introduces two consumers. Depending on use case, different approaches of dealing with the
backpressure of each downstream consumer may be appropriate. For example, if the two consumers write the body data to
two separate files at the same time, it's best to use the backpressure of the slowest consumer to avoid buffering data.
But in another example, when one consumer is a filter that needs access to all the body data, and the other consumer
is the controller, the filter needs to complete before the controller even reads any data, so we should instead be
guided by the fastest of the two consumers.

These two approaches are already the two most important
api:io.micronaut.http.body.ByteBody.SplitBackpressureMode[]s. The full list of options is as follows:

* `SplitBackpressureMode.SLOWEST` uses the backpressure of the _slowest_ of the two consumers (first example)
* `SplitBackpressureMode.FASTEST` uses the backpressure of the _fastest_ of the two consumers (second example)
* `SplitBackpressureMode.ORIGINAL` uses the backpressure of the original consumer (the one `split()` was called on)
* `SplitBackpressureMode.NEW` uses the backpressure of the new consumer (the one `split()` returns)

The argument-less `split()` method uses `SLOWEST`, but you should pick the mode that is most appropriate for your use
case.

==== Discarding

Some consumers end up not needing the body after all. For example, if a `POST` request cannot be matched to a
controller route, the body is not needed and can be discarded. How discarding is implemented in the server depends on
HTTP version. For HTTP/1, the server might close the connection or simply drop the data (which can still save some
decompression overhead). For HTTP/2 the server can close the input of the request stream, instructing the client to
send no more data.

When there are multiple consumers, discard behavior is dependent on use case. In the above scenario of an unmatched
request, when there is a filter that is also subscribed to the body data, it may be appropriate to drop the request in
some cases (e.g. for logging) but may be necessary to still receive all the data in others.

To signal that the upstream may discard the body, you can call `ByteBody.allowDiscard()`. Only if all consumers
call `allowDiscard()` (or `close()` without a primary operation) may the remaining data actually be discarded. Before
that, _all_ consumers, even those that called `allowDiscard()`, will still receive all data. For the logging use case,
you can call `allowDiscard()` and be assured that you will still log the full body if the controller needs it.
6 changes: 5 additions & 1 deletion src/main/docs/guide/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,11 @@ httpServer:
title: HttpServerFilter
httpServerFilterExample: HttpServerFilter Example
httpServerFilterErrorStates: HttpServerFilter Error States
byteBody: Advanced body access
byteBody:
title: Advanced body access
primary: Primary operations
splitting: Splitting
example: Example
sessions: HTTP Sessions
sse: Server Sent Events
websocket:
Expand Down

0 comments on commit f6645bc

Please sign in to comment.