-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes issue where forced retry would cause timeout. #241
Conversation
Side note: This also fixes TravisCI builds that have been failing on Python 3.3. This fixes https://issuetracker.google.com/issues/111486730 -- if using an expired access token from a cache, a forced retry will occur and attempt to re-read bytes from the compression stream. This fails because the bytes have already been consumed, and the server hangs waiting for us to send the intended bytes from the first attempt. In gsutil, this manifested itself as an SSLError with the message 'The read operation timed out'. While I'd normally avoid reading in an entire stream all at once, this edge case should be fine because: - We start off the entire (uncompressed) byte sequence anyway to begin with and only create the stream because that's the format needed for compression. - This is only used for uploads using the SIMPLE strategy. Note that this isn't an issue for resumable uploads, since the first request in that flow is to create the resumable upload session (which isn't a media transfer request, thus it doesn't try to consume any bytes from the stream), so the credential refresh (if needed) would happen on that request.
@thobrla Adding you since you oversaw the original implementation of the gzip stuff. PTAL and sanity check :) |
# bytes container. | ||
http_request.body = ( | ||
compression.CompressStream( | ||
six.BytesIO(http_request.body))[0].read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads the entire contents of the stream into memory, whereas the CompressStream/StreamingBuffer classes were designed specifically to avoid that. Should we instead fix this at the forced retry level? (i.e., that should not be calling read() without rewinding the stream or buffering the bytes that it reads)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on L 736 mentions that "Both multipart and media request uploads will read the entire stream into memory" -- so this doesn't really change the behavior for the simple upload case, it just does the whole read now rather than later... right? Or am I reading this incorrectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were calling CompressStream
with some arguments that limited the length (like we do elsewhere, but not for simple media and multipart requests here), I'd agree with your point... but calling it with no argument for the length
parameter will just read in the whole stream -- it writes the entire contents of the input stream to the output "stream" (the underlying StreamingBuffer
), so the whole thing is just sitting there in memory waiting for someone to read from the underlying StreamingBuffer
. Those reads may be performed incrementally, but they'll just knock bytes off of the underlying buffer with each read. The only benefit we lose by converting this directly to a string is not being able to do that incremental reading that slowly reduces the memory used as the bytes are read off the underlying StreamingBuffer
... and I'd argue that for large files where this actually matters, users should be utilizing the RESUMABLE strategy, rather than SIMPLE.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I agree; for SIMPLE this is fine.
LGTM |
Side note: This also fixes TravisCI builds that have been failing on
Python 3.3.
This fixes https://issuetracker.google.com/issues/111486730 -- if
using an expired access token from a cache, a forced retry will occur
and attempt to re-read bytes from the compression stream. This fails
because the bytes have already been consumed, and the server hangs
waiting for us to send the intended bytes from the first attempt. In
gsutil, this manifested itself as an SSLError with the message 'The read
operation timed out'.
While I'd normally avoid reading in an entire stream all at once, this
edge case should be fine because:
with and only create the stream because that's the format needed for
compression.
this isn't an issue for resumable uploads, since the first request in
that flow is to create the resumable upload session (which isn't a
media transfer request, thus it doesn't try to consume any bytes from
the stream), so the credential refresh (if needed) would happen on that
request.