Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming and chunked responses #350

Closed
zimbatm opened this issue Dec 27, 2013 · 7 comments
Closed

Streaming and chunked responses #350

zimbatm opened this issue Dec 27, 2013 · 7 comments

Comments

@zimbatm
Copy link
Contributor

zimbatm commented Dec 27, 2013

Right now the response_stream only gets full chunks of size :chunk_size. This is an issue when trying to follow a continuous stream (like a stream of log lines) because in that case you want to get the latest chunk of data that arrived to forward it an not buffer it.

When going trough the code to find out the behavior I was also mislead by this argument parameter: https://github.com/geemus/excon/blob/master/lib/excon/socket.rb#L31 . max_length really should be called length if it's never returning more or less than the given size.

Also what's the point of using non-blocking IO if the control is blocked by the while loop ? In a single thread I don't see what advantage it's going to offer.

Ideally the response body would be a fake IO stream that can be read until the end of the response body (the connection can still be kept for keep-alive/pipelining) instead of using a block. That way the use can choose how to read the data and if he wants to block or not on it.

It would also solve another issue where the response_block is not called when the stream ends, giving no opportunity for the block to know when he can stop forwarding the data: https://github.com/geemus/excon/blob/master/lib/excon/response.rb#L73

@ghost
Copy link

ghost commented Dec 27, 2013

Right now the response_stream only gets full chunks of size :chunk_size. This is an issue when trying to follow a continuous stream (like a stream of log lines) because in that case you want to get the latest chunk of data that arrived to forward it an not buffer it.

In this case, you would want to set a smaller :chunk_size.

When going trough the code to find out the behavior I was also mislead by this argument parameter: https://github.com/geemus/excon/blob/master/lib/excon/socket.rb#L31 . max_length really should be called length if it's never returning more or less than the given size.

max_length could be changed to length. The method has the same behavior as IO#read. The last read could return less than the requested length.

Also what's the point of using non-blocking IO if the control is blocked by the while loop ? In a single thread I don't see what advantage it's going to offer.

True. But it's not uncommon for applications to use Excon in threads.

Ideally the response body would be a fake IO stream that can be read until the end of the response body (the connection can still be kept for keep-alive/pipelining) instead of using a block. That way the use can choose how to read the data and if he wants to block or not on it.

To do this Excon would have to buffer the entire response, which is what you're trying to avoid.

It would also solve another issue where the response_block is not called when the stream ends, giving no opportunity for the block to know when he can stop forwarding the data: https://github.com/geemus/excon/blob/master/lib/excon/response.rb#L73

The :response_block would not know, but your application will when the call returns. For example:

file = File.open('log', 'a')
block = lambda {|chunk, remaining, total| file.write chunk }
connection.get(:response_block => block)
file.close

@ghost
Copy link

ghost commented Dec 27, 2013

@zimbatm I'm not familiar with docker-api, but I took a quick look. Container#attach doesn't give you a way to set the chunk_size, but you could set Excon.defaults[:chunk_size] if that helps.

@tlunter
Copy link
Contributor

tlunter commented Dec 27, 2013

@burns Docker.options = {:chunk_size => AMOUNT} would pass this to Excon without overriding any other Excon defaults in case other systems are using Excon.

@ghost
Copy link

ghost commented Dec 27, 2013

@tlunter Thanks!

@zimbatm
Copy link
Contributor Author

zimbatm commented Dec 27, 2013

Maybe Excon::Socket should be using #sysread or #readpartial instead (which are more like the underlying posix read()). Unless :chunk_size is set to 1 there is always going to be a bit of buffered data and chunks of 1 byte aren't very practical either.

Ideally, docker would send a line of varying length and I would be able to read it at that time. Otherwise it's not really possible to have an interactive output. There's always going to be lines that are going to be cut in half or come together even if the emission time is separated by multiple seconds.

@phemmer
Copy link

phemmer commented Feb 14, 2014

I agree with the #readpartial idea. A streaming response should be able to process data as it comes in, not having to wait for the entire :chunk_size. And setting a :chunk_size of 1 is just horribly inefficient.

@phemmer
Copy link

phemmer commented Feb 14, 2014

Also what's the point of using non-blocking IO if the control is blocked by the while loop ? In a single thread I don't see what advantage it's going to offer.

True. But it's not uncommon for applications to use Excon in threads

I still don't follow on this. If @nonblock is true, it blocks in IO.select. If @nonblock is false, it blocks in @socket.read.

Single, or multi-threaded, the thread that calls this method is going to block no matter what (if @read_buffer is less than max_length), so the @nonblock option doesn't change the behavior and basically has no purpose.

@geemus
Copy link
Contributor

geemus commented Feb 17, 2014

read_nonblock has similar behavior to read partial (it shouldn't buffer) see: http://www.ruby-doc.org/core-2.1.0/IO.html#method-i-read_nonblock. In particular this section:

If the read byte buffer is not empty, #read_nonblock reads from the buffer like readpartial.

I guess I don't remember enough context around the rest to comment though.

@zimbatm
Copy link
Contributor Author

zimbatm commented Feb 17, 2014

With threads, control is released as soon as you make the syscall so it's not an issue but with nonblock control has to be released explicitly. The function has to return so that a central reactor like EventMachine can invoke other pending IO. Making an API that works with both blocking and non-blocking is tricky IMO.

@geemus
Copy link
Contributor

geemus commented May 29, 2014

I believe this is now fixed on master and in 0.34.0

@geemus geemus closed this as completed May 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants