Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issue where forced retry would cause timeout. #241

Merged
merged 2 commits into from
Aug 9, 2018

Conversation

houglum
Copy link
Contributor

@houglum houglum commented Aug 9, 2018

Side note: This also fixes TravisCI builds that have been failing on
Python 3.3.

This fixes https://issuetracker.google.com/issues/111486730 -- if
using an expired access token from a cache, a forced retry will occur
and attempt to re-read bytes from the compression stream. This fails
because the bytes have already been consumed, and the server hangs
waiting for us to send the intended bytes from the first attempt. In
gsutil, this manifested itself as an SSLError with the message 'The read
operation timed out'.

While I'd normally avoid reading in an entire stream all at once, this
edge case should be fine because:

  • We start off the entire (uncompressed) byte sequence anyway to begin
    with and only create the stream because that's the format needed for
    compression.
  • This is only used for uploads using the SIMPLE strategy. Note that
    this isn't an issue for resumable uploads, since the first request in
    that flow is to create the resumable upload session (which isn't a
    media transfer request, thus it doesn't try to consume any bytes from
    the stream), so the credential refresh (if needed) would happen on that
    request.

Side note: This also fixes TravisCI builds that have been failing on
Python 3.3.

This fixes https://issuetracker.google.com/issues/111486730 -- if
using an expired access token from a cache, a forced retry will occur
and attempt to re-read bytes from the compression stream. This fails
because the bytes have already been consumed, and the server hangs
waiting for us to send the intended bytes from the first attempt. In
gsutil, this manifested itself as an SSLError with the message 'The read
operation timed out'.

While I'd normally avoid reading in an entire stream all at once, this
edge case should be fine because:

- We start off the entire (uncompressed) byte sequence anyway to begin
  with and only create the stream because that's the format needed for
  compression.
- This is only used for uploads using the SIMPLE strategy. Note that
  this isn't an issue for resumable uploads, since the first request in
  that flow is to create the resumable upload session (which isn't a
  media transfer request, thus it doesn't try to consume any bytes from
  the stream), so the credential refresh (if needed) would happen on that
  request.
@houglum
Copy link
Contributor Author

houglum commented Aug 9, 2018

@thobrla Adding you since you oversaw the original implementation of the gzip stuff. PTAL and sanity check :)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.009%) to 91.695% when pulling 261f66e on houglum:fix-b-111486730 into 72130a5 on google:master.

# bytes container.
http_request.body = (
compression.CompressStream(
six.BytesIO(http_request.body))[0].read())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads the entire contents of the stream into memory, whereas the CompressStream/StreamingBuffer classes were designed specifically to avoid that. Should we instead fix this at the forced retry level? (i.e., that should not be calling read() without rewinding the stream or buffering the bytes that it reads)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on L 736 mentions that "Both multipart and media request uploads will read the entire stream into memory" -- so this doesn't really change the behavior for the simple upload case, it just does the whole read now rather than later... right? Or am I reading this incorrectly?

Copy link
Contributor Author

@houglum houglum Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were calling CompressStream with some arguments that limited the length (like we do elsewhere, but not for simple media and multipart requests here), I'd agree with your point... but calling it with no argument for the length parameter will just read in the whole stream -- it writes the entire contents of the input stream to the output "stream" (the underlying StreamingBuffer), so the whole thing is just sitting there in memory waiting for someone to read from the underlying StreamingBuffer. Those reads may be performed incrementally, but they'll just knock bytes off of the underlying buffer with each read. The only benefit we lose by converting this directly to a string is not being able to do that incremental reading that slowly reduces the memory used as the bytes are read off the underlying StreamingBuffer... and I'd argue that for large files where this actually matters, users should be utilizing the RESUMABLE strategy, rather than SIMPLE.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I agree; for SIMPLE this is fine.

@thobrla
Copy link
Contributor

thobrla commented Aug 9, 2018

LGTM

@houglum houglum merged commit d0bfc7a into google:master Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants