Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: error on HTTP 206 Partial Content discards message leading to confusing errors #4222

Closed
evanj opened this issue Oct 18, 2017 · 3 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. status: awaiting information type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@evanj
Copy link
Contributor

evanj commented Oct 18, 2017

In download_to_file, if the download code raises resumable_media.InvalidResponse [1], it calls _raise_from_invalid_response, which calls exceptions.from_http_response, which creates a new exception using the HTTP response inside the resumable_media.InvalidResponse, which discards any exception message that was put there. This leads to confusing exceptions like the following (full stack trace below):

google.api.core.exceptions.GoogleAPICallError
206 GET https://www.googleapis.com/download/storage/v1/b/bucket_name/o/object_name?alt=media: ����eC�uXS@���..."

This shows a 206 SUCCESS code, but apparently it still failed for some reason. Digging through the code shows a number of reasons this can happen, such as if the response body length does not match the Content-Length header.

I'm not entirely sure how to fix this, but I would be willing to contribute a PR if someone can suggest a solution. My suggestion would be to change _raise_from_invalid_response to replace the message on the exception that is raised with the message from the InvalidResponse object itself (maybe formatted with ''.join(str(s for s in e.args))).

Details
OS type and version: Linux
Python version: Python 2.7.13
google-cloud-python version: 1.4.0 (but code inspection shows this still exists):

google-cloud-core==0.27.1
google-cloud-datastore==1.3.0
google-cloud-storage==1.4.0

Stacktrace:

File "app_code.py", line 123, in some_function
  src_blob.download_to_file(src_temp)
File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 466, in download_to_file
  _raise_from_invalid_response(exc)
File "/usr/local/lib/python2.7/site-packages/google/cloud/storage/blob.py", line 1605, in _raise_from_invalid_response
  raise exceptions.from_http_response(error.response)

Steps to reproduce

  • Somehow get a GCS download to return a 206 code then truncate the response. I don't actually know how to reproduce this.
@dhermes
Copy link
Contributor

dhermes commented Oct 18, 2017

A 206 just means it was a partial response (i.e. if you are downloading in chunks).

This isreally tough to debug / code for without being able to reproduce. If you at least see it again, the stacktrace in Python 3 would be more useful (due to raise from semantics), e.g.

In [1]: def f():
   ...:     raise ValueError
   ...:

In [2]: def g():
   ...:     try:
   ...:         f()
   ...:     except:
   ...:         raise RuntimeError
   ...:

In [3]: g()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-b4762a3d7a10> in g()
      2     try:
----> 3         f()
      4     except:

<ipython-input-1-1e378d1a95b8> in f()
      1 def f():
----> 2     raise ValueError

ValueError:

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-3-5fd69ddb5074> in <module>()
----> 1 g()

<ipython-input-2-b4762a3d7a10> in g()
      3         f()
      4     except:
----> 5         raise RuntimeError

RuntimeError:

In [4]:

Do you have a requirement to use 2.7?

It seems we should be doing at least two things here:

  • Keeping around the caught exception (rather than just it's status)
  • Using six.raise_from to give context

@dhermes dhermes added the api: storage Issues related to the Cloud Storage API. label Oct 18, 2017
@evanj
Copy link
Contributor Author

evanj commented Oct 18, 2017

Python 2.7: We have a large legacy App Engine Standard code base unfortunately. This particular thing does not run on App Engine, but it interacts with libraries that were written for 2.7, so unfortunately we need to use it.

However: Yes, I think using six.raise_from would effectively solve this problem. (not for me due to 2.7, but I can accept that Python 3 users will see the correct thing).

As you might expect: I've basically seen this once, but this mystery did kill a couple hours today, since I was trying to make sure we didn't have a thread-safety issue (due to the weirdness of throwing an exception for a success HTTP status code), and because the binary data in the exception caused a unicode handling error.

This code is downloading in chunks because I set the chunk size to 32 MiB at some point, maybe due to #2222. (Now that I look at this, I know the storage library has changed and its probably hurting us to do this, since it will now do it in a single request if I don't specify the chunk size ...)

The log for this error contained the follow extra information:

Starting new HTTPS connection (1): www.googleapis.com
https://www.googleapis.com:443 "GET /download/storage/v1/b/bucket/o/object?alt=media HTTP/1.1" 206 268412
... (many lines from chardet since the handler tries to access response.txt and the exception includes binary) ...

This particular object is 268412 bytes, so I suspect length wasn't the problem. Unfortunately, due to the way the exception is handled, I have no way to tell what actually caused it :(

@tseaver tseaver added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Nov 2, 2017
@chemelnucfin chemelnucfin changed the title storage: error on HTTP 206 Partial Content discards message leading to confusing errors Storage: error on HTTP 206 Partial Content discards message leading to confusing errors Nov 3, 2017
@chemelnucfin chemelnucfin added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jan 9, 2018
@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label Jun 8, 2018
@tseaver
Copy link
Contributor

tseaver commented Jun 14, 2018

It seems we should be doing at least two things here:

  • Keeping around the caught exception (rather than just it's status)
  • Using six.raise_from to give context

google.api_core.exceptions._from_http_response tries hard to preserve any error message or errors from the response's payload. The issue here is that there are no errors: a 206 response isn't an error.

If google.resumable_media is raising an InvalidResponse exception, ISTM that we should just let it propagate: there is no point in rewrapping it. @dhermes is the person who knew enough to argue otherwise, but he's on leave at the moment.

tseaver added a commit that referenced this issue Jun 14, 2018
When raised by 'google.resumable_media' for responses without an error
payload, the message and args are the only clue to the cause.

Closes #4222.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. status: awaiting information type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

5 participants