Skip to content

can: add "auto-chunking" and client-side "chunkless-streaming" #281

Closed
jrudolph opened this Issue May 3, 2013 · 26 comments

6 participants

@jrudolph
spray member
jrudolph commented May 3, 2013

Let's extend this ticket to add "auto chunking", so we end up with these new spray-can settings:

  • incoming-auto-chunking-threshold-size (on server- and client-side):
    infinite (default): Incoming unchunked messages and chunks are delivered as is, when they have been received completely
    <value>: if the Content-Length of an incoming unchunked message is > value the server delivers incoming packets as MessageChunk instances right when they arrive

  • outgoing-auto-chunking-threshold-size (on server- and client-side):
    infinite (default): All messages and chunks are sent as is
    <value>: if the Content-Length of an outgoing unchunked message is > value the server automatically splits it into chunks before sending it out. This is especially useful when sending big files (see #365).

  • chunkless-streaming (on server- and client-side):
    off (default): All messages and chunks are sent as is
    on: The individual MessageChunk instances coming in from the application are sent as parts of the actual response entity rather than as true HTTP chunks.

The current state is this:

  • incoming-auto-chunking-threshold-size
    • server-side
    • client-side
  • outgoing-auto-chunking-threshold-size
    • server-side
    • client-side
  • chunkless-streaming
    • server-side
    • client-side
@sirthias
spray member

[issue description replaced with this comment]

@acruise
acruise commented Jul 18, 2013

"receive big (non-chunked) requests as chunks" is exactly what I need right now. :) I don't think I care about distinguishing between requests that are literally "Transfer-Encoding: chunked" and those that are merely large. :)

@mpilquist

+1 for server side auto-chunking. I need to support large file upload and stream data to disk. If client doesn't send a chunked request, I want to auto-chunk to avoid heap allocating the entire uploaded file.

@whataboutbob

+1 @jrudolph has done a great job adding the support for server side request auto-chunking and removing the 2GB maxContentLength limitation. After using it for a bit, I am wondering if it might be more helpful to enable auto-chunking per path / route instead of threshold. Reason being that I would need to implement 2 entry points doing the same thing except that one is auto-chunking mode and the other regular upload. Does this make sense?

@mpilquist

Follow up question - will incoming-auto-chunking-threshold-size be supported by spray-servlet?

@sirthias
spray member

I would need to implement 2 entry points doing the same thing except that one is auto-chunking mode and the other regular upload.

What would you do in "regular mode" when a client sends a chunked request?
Auto-chunking simplifies the application side logic because you only have to deal with one kind of request: chunked ones. So, I'm not sure that enabling auto-chunking per path would really buy you anything...

@sirthias
spray member

will incoming-auto-chunking-threshold-size be supported by spray-servlet?

I suspect not.
The only way for us to get to the request entity is via the httpServletRequest.getInputStream, which is a blocking abstraction. So I think it's hard for us to send a "virtual chunk" up to the application whenever a bunch of new bytes have arrived on the connection because we simply cannot know if and when this is the case.
All we see on the inputStream is EOF, which signals that we have read the complete request.

@whataboutbob

What would you do in "regular mode" when a client sends a chunked request?
Auto-chunking simplifies the application side logic because you only have to deal with one kind of request:
chunked ones. So, I'm not sure that enabling auto-chunking per path would really buy you anything...

I see your point but perhaps I misunderstood the current implementation. If I understand it correctly, assuming incoming-auto-chunking-threshold-size = 45K, uploading a file < 45K would trigger a regular HttpRequest (path 1), only when the file size is > 45K then auto-chunking (path 2) is triggered, can you confirm?

@sirthias
spray member

Yes.
In that regard my previous comment only applies to "large" requests, with "large" being defined by your application config. You could configure incoming-auto-chunking-threshold-size to 0 however to make it apply to all non-empty requests.

@whataboutbob

Correct, that's what I have already done but I would also like the option to handle a different path's POST request in regular mode and that's no longer possible, I think, e.g., something like this,

case HttpRequest(POST, Uri.Path("/user/avatar/small-upload"), _, _, _) =>
  ...

case s@ChunkedRequestStart(HttpRequest(POST, Uri.Path("/user/attachments/big-upload"), _, _, _)) =>
  require(!chunkHandlers.contains(sender))
  val client = sender
  val handler = context.actorOf(Props(new FileUploadHandler(client, s)))
  chunkHandlers += (client -> handler)
  handler.tell(s, client)

case c: MessageChunk =>
  if (!chunkHandlers.isEmpty)
    chunkHandlers(sender).tell(c, sender)

case e: ChunkedMessageEnd =>
  if (!chunkHandlers.isEmpty)  {
    chunkHandlers(sender).tell(e, sender)
    chunkHandlers -= sender
  }
@mpilquist

Re: spray-servlet, if auto-chunking will never be supported, then I suppose the simplest alternative is to fall back to a manually written servlet for the resources that require auto-chunking. That's certainly possible but not particularly elegant.

The only way for us to get to the request entity is via the httpServletRequest.getInputStream, which is a blocking abstraction. So I think it's hard for us to send a "virtual chunk" up to the application whenever a bunch of new bytes have arrived on the connection because we simply cannot know if and when this is the case.
All we see on the inputStream is EOF, which signals that we have read the complete request.

I was thinking that if the Content-Length was over the auto-chunking threshold, the connector servlet could spawn a request specific actor or future that pulled from the InputStream and generated chunks. Somewhat risky given that the blocking on the input stream could lead to starvation, but perhaps better than no support at all?

@sirthias
spray member

I was thinking that if the Content-Length was over the auto-chunking threshold ...

Yes, but what about the cases where the incoming request is already a chunked one and there is no Content-Length header?

@sirthias
spray member

Correct, that's what I have already done but I would also like the option to handle a different path's POST request in regular mode and that's no longer possible ...

Note that the client might decide to send even "small" requests with Transfer-Encoding: chunked, so you really cannot make the distinction in the way you'd like to.
Can you give us some more insight into why you'd like to handle small uploads differently from large ones?

@whataboutbob

Can you give us some more insight into why you'd like to handle small uploads differently from large ones?

Based on my experience with web development, it is rare to use Transfer-Encoding: chunked mode, in fact I have never had the need to use it until this project which allows the ability to upload very small files to huge files up to 10GB, hence my desire to handle files of all sizes for that one path using auto-chunking only. However, other POST requests will work just fine without auto-chunking and are actually simpler to implement via the higher level Spray routing. In essence, the auto-chunking need is a one-off instead of the norm in processing POST requests, at least in my experience. I would imagine that if the client enables Transfer-Encoding: chunked mode that it knows that that particular endpoint is capable of handling chunks, otherwise it would be a regular POST request.

@sirthias
spray member

I would imagine that if the client enables Transfer-Encoding: chunked mode that it knows that that particular endpoint is capable of handling chunks, otherwise it would be a regular POST request.

Unfortunately this is not the case. HTTP/1.1 requires all servers to be able to handle chunked requests and all clients to accept chunked responses. Otherwise they cannot call themselves HTTP/1.1 compliant.
So, as a server you will have to be able to deal with incoming chunked requests no matter what.

We do realize that the current support for non-aggregated chunked requests is suboptimal in spray-routing. We are working on getting this improved. Ideally you should be able to use spray-routing on non-aggregated chunked requests pretty much as you would otherwise.

@whataboutbob

Unfortunately this is not the case. HTTP/1.1 requires all servers to be able to handle chunked requests and all
clients to accept chunked responses. Otherwise they cannot call themselves HTTP/1.1 compliant.

Thanks for clarifying. Given my situation, should I just wait for the upcoming non-aggregated chunked requests feature or try to make the current solution work?

@jrudolph
spray member

Going forward, I'd say outgoing-auto-chunking-threshold-size is the least important setting because this is functionality that is nice-to-have but can be implemented on the higher levels as well.

chunkless-streaming on the client-side wouldn't be necessary for connecting against HTTP/1.1 servers as those servers are actually required to understand chunked requests, however, for compatibility reasons it would maybe still make sense to provide it as a fallback.

So in essence, I'd consider client-side chunkless-streaming for 1.0/1.1/1.2 and maybe postpone outgoing-auto-chunking-threshold-size to the next version.

WDYT?

@sirthias
spray member

Agreed.

@acruise
@sirthias
spray member

@acruise server-side chunkless streaming is already available for some time

@acruise
acruise commented Sep 28, 2013

Cool, is it in M8? :)

@sirthias
spray member

Yes, use this setting to enable.

@jrudolph jrudolph added a commit to jrudolph/spray that referenced this issue Oct 14, 2013
@jrudolph jrudolph + can: implement chunkless request streaming, refs #281 2d446b2
@jrudolph jrudolph added a commit to jrudolph/spray that referenced this issue Oct 15, 2013
@jrudolph jrudolph + can: implement chunkless request streaming, refs #281 67f953d
@jrudolph
spray member

When #592 is merged, the only thing missing is outgoing-auto-chunking-threshold-size which we deemed as less important. So, I'm moving the remainder of the ticket to the next milestone.

@RichardBradley

I'm trying to use incoming-auto-chunking-threshold-size to stream in large requests, while using only bounded local memory, similar to mpilquist's comment.

My backing store is slow (a downstream SOAP service).

I am finding that Spray is slurping in the whole request as fast as it can, and holding the whole thing in local memory as MessageChunks. I've put some details on StackOverflow.

Am I right in thinking that this feature only actually satisfies the stated use case if you can process the chunks faster than the client can send them? If so, should we add that to the docs for this setting, or is it simply implied?

Is there anything I can do to limit memory usage if I have a fast connection to the client but a slow backing store?

@jrudolph
spray member

@RichardBradley this is a question that came up before. You are quite right with your observations. See this ML message and jrudolph@e707a2d for a hint at a solution with the current version of spray.

tl;dr: You can use SuspendReading/ResumeReading to stop the inflow of data and it somewhat works but isn't optimal. Upcoming akka-http is going to have a much better solution (but won't be immediately available).

@jrudolph
spray member
jrudolph commented Feb 6, 2015

We won't implement the outgoing autochunking in spray any more. In akka-http it is naturally supported through the new streaming HttpEntity model.

@jrudolph jrudolph closed this Feb 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.