Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Started getting random ValueError: Bad response: not multi-part exceptions #404

Closed
fcollman opened this issue Apr 2, 2021 · 9 comments · Fixed by #609
Closed

Started getting random ValueError: Bad response: not multi-part exceptions #404

fcollman opened this issue Apr 2, 2021 · 9 comments · Fixed by #609
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@fcollman
Copy link

fcollman commented Apr 2, 2021

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: Debian GNU/Linux 10 (buster)
  • Python version: Python 3.7.7
  • pip version: pip 21.0.1
  • google-cloud-storage version: 1.37.0

Steps to reproduce

  1. I cannot reproduce this because it is happening periodically, and only in my production flask app, not even on the same docker image running in staging with less traffic. This makes me think its related to some threading related issue, but I can't isolate the problem.
  2. This problem started without any code changes on our side, and without changing the docker image, so something changed in the google cloud storage server response behavior that started triggering this sometime starting on March 30, and then ramping up in frequency on April 1st and 2nd.

Code example

I don't have a concise code example. But it's being triggered with
get_blob(key).

Stack trace

traceback: [
0: "Traceback (most recent call last):
"
1: "  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
"
2: "  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
"
3: "  File "/usr/local/lib/python3.7/site-packages/middle_auth_client/decorators.py", line 171, in decorated_function
    return f(*args, **kwargs)
"
4: "  File "/usr/local/lib/python3.7/site-packages/middle_auth_client/decorators.py", line 253, in decorated_function
    return f(*args, **kwargs)
"
5: "  File "./pychunkedgraph/app/meshing/v1/routes.py", line 75, in handle_get_manifest
    return common.handle_get_manifest(table_id, node_id)
"
6: "  File "./pychunkedgraph/app/meshing/common.py", line 204, in handle_get_manifest
    flexible_start_layer=flexible_start_layer
"
7: "  File "./pychunkedgraph/meshing/meshgen_utils.py", line 152, in get_highest_child_nodes_with_meshes
    existence_dict = stor.files_exist(filenames)
"
8: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/storage/storage.py", line 584, in files_exist
    self.wait(desc)
"
9: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 234, in wait
    self._check_errors()
"
10: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 191, in _check_errors
    raise err
"
11: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 153, in _consume_queue
    self._consume_queue_execution(fn)
"
12: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/threaded_queue.py", line 180, in _consume_queue_execution
    fn()
"
13: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/storage/storage.py", line 575, in exist_thunk
    results.update(interface.files_exist(paths))
"
14: "  File "/usr/local/lib/python3.7/site-packages/cloudvolume/storage/storage_interfaces.py", line 240, in files_exist
    result[file_path] = self._bucket.get_blob(key)
"
15: "  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/batch.py", line 296, in __exit__
    self.finish()
"
16: "  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/batch.py", line 281, in finish
    responses = list(_unpack_batch_response(response))
"
17: "  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/batch.py", line 334, in _unpack_batch_response
    raise ValueError("Bad response:  not multi-part")
"
18: "ValueError: Bad response:  not multi-part

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Apr 2, 2021
@fcollman
Copy link
Author

fcollman commented Apr 2, 2021

we hacked a more informative debug statement into the cloud-storage library in production and got this error.

<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 413 (Request Entity Too Large)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>413.</b> <ins>That’s an error.</ins>
  <p>Your client issued a request that was too large.
 <script>
  (function() { /*

 Copyright The Closure Library Authors.
 SPDX-License-Identifier: Apache-2.0
*/
var c=function(a,d,b){a=a+"=deleted; path="+d;null!=b&&(a+="; domain="+b);document.cookie=a+"; expires=Thu, 01 Jan 1970 00:00:00 GMT"};var g=function(a){var d=e,b=location.hostname;c(d,a,null);c(d,a,b);for(var f=0;;){f=b.indexOf(".",f+1);if(0>f)break;c(d,a,b.substring(f+1))}};var h;if(4E3<unescape(encodeURI(document.cookie)).length){for(var k=document.cookie.split(";"),l=[],m=0;m<k.length;m++){var n=k[m].match(/^\s*([^=]+)/);n&&l.push(n[1])}for(var p=0;p<l.length;p++){var e=l[p];g("/");for(var q=location.pathname,r=0;;){r=q.indexOf("/",r+1);if(0>r)break;var t=q.substring(0,r);g(t);g(t+"/")}"/"!=q.charAt(q.length-1)&&(g(q),g(q+"/"))}h=!0}else h=!1;
h&&setTimeout(function(){if(history.replaceState){var a=location.href;history.replaceState(null,"","/");location.replace(a)}},1E3); })();

</script>
 <ins>That’s all we know.</ins>```

@fcollman
Copy link
Author

fcollman commented Apr 2, 2021

This was not happening a month ago, and the error returned could be much more informative. Seems like the server is enforcing limits it wasn't in the recent past nor is the library helping the user understand the source of the problem.

@shaffeeullah shaffeeullah added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 2, 2021
@danielduhh
Copy link

Thanks for the details @fcollman. We've seen this intermittent 413 error pop up recently in another case. I'm checking in with the internal API team to see if theres a service change that could be causing this and will get back to you as soon as I have something.

@danielduhh danielduhh self-assigned this Apr 9, 2021
@danielduhh
Copy link

Sorry for the late response. No changes from the API side that should cause this. @fcollman can you confirm that 1) you're still intermittently running into this error, and 2) your batch request content size is not greater than the max 10mb

@tseaver tseaver added type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. type: question Request for information or clarification. Not an issue. labels Jul 13, 2021
@tseaver
Copy link
Contributor

tseaver commented Jul 13, 2021

I started to move down the path that this isn't a bug. However, the Batch.finish method makes a low-level API call and does not check the top-level response code, which is a bug: it shouldn't try to unpack the sub-responses if the overall batch failed.

@fcollman
Copy link
Author

ah.. sorry.. we did track down a root cause, which was an expired service account credential, though it wasn't clear from the error that this was the case, and it wasn't clear why it was so unreliable in its nature. Perhaps something to do caching of such things across GCS.

@danielduhh danielduhh removed their assignment Aug 25, 2021
@cojenco cojenco self-assigned this Sep 30, 2021
@cojenco
Copy link
Contributor

cojenco commented Sep 30, 2021

@fcollman Are you still running into this issue after refreshing the service account credentials?

There's also a fix made in v1.42.1 regarding intermittent 413 errors using batch requests.

@fcollman
Copy link
Author

when we fix the service account credentials the error goes away.. I'd still argue that JWT signature error or this valueerror isn't the most clear advertisement of what the underlying issue is.

@cojenco
Copy link
Contributor

cojenco commented Sep 30, 2021

Agreed, I've opened a tracking issue and will submit a fix shortly. Thanks for bringing this to our attention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants