Do not send empty iodata chunks #535

myronmarston · 2017-03-31T16:42:35Z

This is a followup to #396, which stopped sending blank string
chunks. However, body can be any type of iodata, including an
empty list, a list containing a blank string, etc. As such we have
to look at the length of the iodata to determine if it is non-empty.

This is a followup to elixir-plug#396, which stopped sending blank string chunks. However, `body` can be any type of iodata, including an empty list, a list containing a blank string, etc. As such we have to look at the length of the iodata to determine if it is non-empty.

sourcelevel-bot · 2017-03-31T16:42:40Z

Hello, @myronmarston! This is your first Pull Request that will be reviewed by Ebert, an automatic Code Review service. It will leave comments on this diff with potential issues and style violations found in the code as you push new commits. You can also see all the issues found on this Pull Request on its review page. Please check our documentation for more information.

myronmarston · 2017-03-31T16:43:51Z

lib/plug/conn.ex

-      :ok                  -> {:ok, conn}
-      {:ok, body, payload} -> {:ok, %{conn | resp_body: body, adapter: {adapter, payload}}}
-      {:error, _} = error  -> error
+    case IO.iodata_length(chunk) do


@josevalim I assume IO.iodata_length/1 is a linear operation, which is kinda unfortunate here--we don't really need to know the length of the chunk, just whether or not it's empty or not. Do you know of a way to check that without traversing the entire iodata structure to get the length?

I suppose I could add a empty_iodata?/1 function that recursively walks the chunk, aborting as soon as it figures out if its empty or not...but that seems like overkill for this one use and I'm hoping there's a better way.

I did some pretty rudimentary benchmarks but I think :erlang.iolist_size is probably optimal.

It looks like iolist_size (the function behind iodata_length) is written in C https://github.com/erlang/otp/blob/b4ac8b2b32f094217d0533ee139273923c3a8af7/erts/preloaded/src/erlang.erl#L1070 it converts the iolist into a binary and then grabs its bytesize.

And my benchmarks of various ways were not faster

## BasicBench benchmark name iterations average time IO.iodata_length 10000000 0.36 µs/op IO.iodata_to_binary 10000000 0.49 µs/op empty_iodata? 10000000 0.52 µs/op

Thanks for doing that, @jeregrine! I'm curious what kind of iodata you used in your benchmark? For small, simple examples, I'd absolutely expect IO.iodata_length to be optimal given it's written in C. My concern is for large chunks. For example, consider an app that is streaming a CSV as it is generated , sending chunks of 100K rows at a time, where it is emitting iodata chunks as nested lists (one list entry per row where each row is a list containing one entry per column). For such a case, iodata_length is going to have to traverse the nested lists to build the binary. Plus it's generating a lot of garbage if it has to generate the entire binary just to get the length.

An empty_iodata? function--whether pre-existing or hand-written--that short-circuited as soon as it found a non-empty binary at some level of nesting in the iodata, would be noticeably faster, I expect.

@myronmarston IO.iodata_length/1 is quite fast operation: 876 microseconds to get size of List.duplicate([List.duplicate([String.duplicate("1", 5), 1, 2], 100), 3, String.duplicate("2", 50)], 1000) (which is 751_000 bytes).
Not sure about the garbage, but I assume it's handled in C-land. 🤔

@myronmarston I had a couple string sizes, but included @lexmag's large string to the list and got:

## BasicBench benchmark name iterations average time empty_iodata? 10000000 0.76 µs/op IO.iodata_length 1000 1284.00 µs/op IO.iodata_to_binary 500 4616.14 µs/op

Which is better of course since it short circuits, but not really an issue when it runs so quickly anyways.

My empty_iodata? looks like, (which may not be optimal OR correct since I eye balled it over lunch :P)

def empty_iodata?(""), do: true def empty_iodata?([""]), do: true def empty_iodata?([]), do: true def empty_iodata?([head | tail]) do empty_iodata?(head) && empty_iodata?(tail) end def empty_iodata?(_), do: false

Tested using benchfella and https://gist.github.com/jeregrine/64fd4b985a31b1ec381a63a08a62a521

Thanks for the detail! I think I agree that the perf of iodata_length isn't slow enough to warrant us implementing empty_iodata? here (and perhaps getting it wrong; for example, I'm not quite sure if that handles improper lists correctly...). However, given that the short-circuit version is orders of magnitude faster than iodata_length for large iodatas, it seems like Elixir's IO module could benefit from an empty_iodata?/1 function. If/when that is added to Elixir, plug could be updated in the future to use it.

Let's see what others think.

I think empty_iodata? was that fast in @jeregrine's example as it received non-empty iodata.
It starts to be slower with the following (I'd say common) shape:

seed = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" Enum.reduce(String.split(seed, ""), "", fn entry, acc -> [acc, " " | entry] end)

Gives empty_iodata?: 4, iodata_length: 2.

And List.duplicate(List.duplicate(["", []], 2), 5) gives empty_iodata?: 2, iodata_length: 1.

lexmag · 2017-03-31T18:37:28Z

lib/plug/conn.ex

+      0 -> {:ok, conn}
+      _ ->
+        case adapter.chunk(payload, chunk) do
+          :ok                  -> {:ok, conn}


Since we touching these lines, let's get rig of -> aligning.

lexmag · 2017-03-31T18:38:07Z

lib/plug/conn.ex

-      :ok                  -> {:ok, conn}
-      {:ok, body, payload} -> {:ok, %{conn | resp_body: body, adapter: {adapter, payload}}}
-      {:error, _} = error  -> error
+    case IO.iodata_length(chunk) do


I think if would read better.

lexmag · 2017-03-31T18:38:40Z

test/plug/conn_test.exs

-      def chunk(_payload, ""), do: raise "the empty chunk was unexpectedly sent"
-      def chunk(payload, chunk), do: Plug.Adapters.Test.Conn.chunk(payload, chunk)
+      def chunk(payload, chunk) do
+        case IO.iodata_length(chunk) do


if would read better here as well.

josevalim · 2017-03-31T23:18:08Z

Unfortunately this is very expensive, as we will traverse the whole chunk when its size is not zero. However, I think we should revert the original change, as sometimes it is useful to emit empty chunks to ensure the connection won't be closed. The blank check can always exist in the user app level.

josevalim · 2017-03-31T23:20:39Z

Actually, disregard my comment, as an empty chunk often causes the connection to be closed instead.

@myronmarston can you please change the PR to a private function that traverses the list and returns true if the io list is made of empty lists or empty binaries only? Thank you.

myronmarston · 2017-03-31T23:48:00Z

Actually, disregard my comment, as an empty chunk often causes the connection to be closed instead.

Yep. This has happened to us and is what prompted me to dig into this.

@myronmarston can you please change the PR to a private function that traverses the list and returns true if the io list is made of empty lists or empty binaries only? Thank you.

Done. One thing I'm unsure of: does it handle improper lists correctly? I haven't worked with them much but I know iodata supports them.

Also, do you think it would make sense for Elixir to provide an IO.iodata_empty?/1 function, @josevalim? Seems useful for this kind of thing...

lexmag · 2017-04-01T00:21:01Z

lib/plug/conn.ex

+  defp iodata_empty?(""), do: true
+  defp iodata_empty?(binary) when is_binary(binary), do: false
+  defp iodata_empty?([]), do: true
+  defp iodata_empty?([hd | tl]), do: iodata_empty?(hd) && iodata_empty?(tl)


For strictly boolean checks we should use and instead of &&.
Also let's use head and tail.

lexmag · 2017-04-01T00:21:40Z

lib/plug/conn.ex

@@ -474,6 +477,11 @@ defmodule Plug.Conn do
                         "you have called send_chunked/2 before you send a chunk"
  end

+  defp iodata_empty?(""), do: true


Implementation is incomplete, we should handle integers as well.

josevalim · 2017-04-01T00:47:59Z

No need to check for integers or binaries, just say it is non empty for any of the values we don't match. -- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D*

myronmarston · 2017-04-01T01:18:37Z

I've addressed the latest round of feedback.

josevalim · 2017-04-01T01:45:08Z

❤️ 💚 💙 💛 💜

mtrudel · 2019-11-20T00:39:12Z

@josevalim, per your comment above about empty chunks causing the connection to close, that's by design. Per https://tools.ietf.org/html/rfc7230#section-4.1, a zero-sized chunk is used to indicate the end of a response. And in fact, being able to send an empty chunk is a necessary part of the chunked response lifecycle that's not possible to accomplish currently via the Plug API.

Since the Plug.Conn.chunk/2 special cases zero length chunks internally (and in the absence of a Plug.Conn.send_resp/1 analog for chunked responses), it is impossible to explicitly complete the body solely via the Plug.Conn interface. One cannot use Plug.Conn.chunk/2 with an empty chunk (since such a chunk is handled specially in that function), nor can one call Plug.Conn.send_resp/1 (since it require the plug to be in a :set state).

Note that this isn't an issue in common practice, since at the completion of a connection Cowboy will implicitly send a zero length chunk to close the connection (I do the same in bandit), but in both cases this is an implicit completion that only takes effect after a plug has completed execution. There remains no way to explicitly complete the body transmission without reaching down into the underlying adapter.

I see three possible solutions to this:

Allow sending empty chunks via the Plug.Conn.chunk/2 function, making it clear that such chunks close off the response body (this closely mirror the HTTP spec and is likely the least surprising thing for people coming from an HTTP background, though it may be surprising to consumers of the Plug API, and also breaks compatibility with the current implementation)
Allow the Plug.Conn.send_resp/1 function to also complete chunked responses
Define a new Plug.Conn.complete_chunked/1 (or similarly named) function, which closes off a chunked response. This is essentiallly a chunked sibling to the Plug.Conn.send_resp/1 call with explicit chunked response semantics.

I'm happy to work up any of those (or another!) solution, however I think it's worth a discussion in advance to ensure that it aligns with the API design goals of Plug.

ericmj · 2019-11-20T02:14:39Z

Can you explain the use case for explicitly completing the response before returning the conn back to plug?

josevalim · 2019-11-20T09:02:11Z

Hi @mtrudel! Thanks for the input.

I like that we don't close on the empty chunk. It is surprising behaviour and it may not be true in other implementations of chunking (for example, chunking on HTTP 3 can be completely different).

For now, it is totally OK to only complete the chunk when the request is over. We could introduce a function to do it explicitly but as @ericmj mentioned it would be nice to hear some use cases for such.

mtrudel · 2019-11-20T15:31:12Z

@josevalim I agree as well that passing an empty payload to chunk/2 is the wrong solution.

To clarify, I'm not actually in need of such functionality from a user perspective; I came across this in the process of implementing Plug.Conn.Adapter conformance for Bandit (a Plug-centric minimal HTTP server I'm building) and noting that the Plug.Conn API for managing the lifecycle of content-length delimited responses was inconsistent with its API for chunked encoding responses.

Specifically regarding end-of-request handling, content-length delimited responses are explicitly handled in the Plug.Cowboy adapter here, however the same is not done for chunked responses (in the case of Cowboy, it implicitly closes off the response with a zero length chunk after the handler returns).

Moreover, Cowboy.Handler is able to close off content-length delimited responses entirely via the Plug.Conn API since Plug.Conn.send_resp/1 is defined. While a handler can certainly accomplish this by appealing to its internal API (Bandit does this in its conn pipeline, for example), it seems inconsistent to me that closing off content-length responses is a Handler concern, whereas closing off chunked responses is left to the server to manage.

Essentially, I think the argument here is less 'is there a valid user case for this to exist' as it is to provide a consistent set of lifecycle functions for both content-length and chunked responses.

josevalim · 2019-11-20T16:02:26Z

Thanks for clarifying @mtrudel. I think your approach in Bandit is correct. We would have done the same for Cowboy if it automatically didn't close it for us. We could also do it automatically for Cowboy too and I think that wouldn't hurt. Does this make sense?

mtrudel · 2019-11-20T17:11:19Z

I agree that having the handler be explicit about the close at the end of the response is probably a good thing (it's already done for content-length responses, and what's good for the goose is good for the gander, as they say). In the spirit of being consistent, I think it's probably cleanest to do so via a Plug.Conn.complete_chunked/1 function (proposal 3 I outlined above) to provide API parity.

@josevalim if this sounds reasonable I'll happily work up a pair of PRs for the plug and plug_cowboy repos to implement explicit closing as you mention. If nothing else, it'll help the next person trying to make their own alternative to Plug.Cowboy.

josevalim · 2019-11-20T17:17:30Z

@mtrudel per the above, I don't think we should push this concern to the user API unless we have use cases. But we could add the explicit close to the cowboy handler for consistency. Thanks!

mtrudel · 2019-11-20T17:29:13Z

Sounds good; I'll get to work on that. Just so I've got clarity on the intent of the Plug.Conn.send_resp/1 function, is there a use case for it which doesn't also apply to chunked responses (ie: is its presence in the Plug.Conn API required or justified for non-chunked responses in a way that similar functionality for chunked responses is not, or is it more of a 'because it's already in there' case?) I'm not trying to be obstinate, rather I just want to ensure I have a solid mental model of the respective lifecycles from Plug's perspective to ensure I design the complementary parts of Bandit correctly.

josevalim · 2019-11-20T17:34:05Z

send_resp is meant to be used for content-length responses. It is meant to send the response through the socket immediately and it should not hold it in memory. This is a user facing API. It is a contrast to resp/3, which keeps things in memory.

The chunked workflow is a separate workflow which should not use send_resp (IIRC).

mtrudel · 2019-11-20T18:09:55Z

That makes a a lot of sense, thank you! I'd been thinking of send_resp/1 as having a 'send the body AND close off the request at an adapter level' semantic whereas it really just sends the body and doesn't necessarily close the underlying response at all. That's a most useful clarification.

I just realized that there may well be a reasonable justification for a Plug.Conn.complete_chunked/2 function after all, namely the ability to set trailer headers (hence the /2 on that signature). Currently there is no way to send trailer headers via the Plug API, however adding them would be a slightly larger undertaking, since it requires amending the Plug.Conn.Adapter behaviour to provide a mechanism to specify the trailers. If it helps to clarify the intent & semantics of such a call, Cowboy's cowboy_req:stream_trailers documentation has a solid description of how this affects the connection (note specifically that it implicitly ends the connection).

myronmarston commented Mar 31, 2017

View reviewed changes

lexmag reviewed Mar 31, 2017

View reviewed changes

Address code review feedback.

25db9ed

Extract optimized helper for checking if an iodata is empty.

d4cd194

lexmag requested changes Apr 1, 2017

View reviewed changes

Address code review feedback.

36c9ec4

josevalim changed the title ~~Do not send empty iodata chunks.~~ Do not send empty iodata chunks Apr 1, 2017

josevalim merged commit 91b8f57 into elixir-plug:master Apr 1, 2017

myronmarston deleted the myron/fix-chunk-bug branch June 19, 2017 20:31

mtrudel mentioned this pull request Oct 11, 2021

Implement support for HTTP trailers, upstream to Plug mtrudel/bandit#5

Closed

Gazler mentioned this pull request Nov 8, 2022

Need of a Plug.Conn.send_trailers/2 function #1121

Closed

Do not send empty iodata chunks #535

Do not send empty iodata chunks #535

Conversation

myronmarston commented Mar 31, 2017

sourcelevel-bot bot commented Mar 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeregrine Mar 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeregrine Mar 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lexmag Mar 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josevalim commented Mar 31, 2017

josevalim commented Mar 31, 2017

myronmarston commented Mar 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josevalim commented Apr 1, 2017 via email

myronmarston commented Apr 1, 2017

josevalim commented Apr 1, 2017

mtrudel commented Nov 20, 2019 • edited Loading

ericmj commented Nov 20, 2019 • edited Loading

josevalim commented Nov 20, 2019

mtrudel commented Nov 20, 2019 • edited Loading

josevalim commented Nov 20, 2019

mtrudel commented Nov 20, 2019

josevalim commented Nov 20, 2019

mtrudel commented Nov 20, 2019

josevalim commented Nov 20, 2019

mtrudel commented Nov 20, 2019

jeregrine Mar 31, 2017 •

edited

Loading

jeregrine Mar 31, 2017 •

edited

Loading

lexmag Mar 31, 2017 •

edited

Loading

mtrudel commented Nov 20, 2019 •

edited

Loading

ericmj commented Nov 20, 2019 •

edited

Loading

mtrudel commented Nov 20, 2019 •

edited

Loading