Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to write into GCS bucket with papermill[gcs] #312

Closed
gogasca opened this issue Feb 14, 2019 · 22 comments
Closed

Unable to write into GCS bucket with papermill[gcs] #312

gogasca opened this issue Feb 14, 2019 · 22 comments
Labels

Comments

@gogasca
Copy link
Contributor

gogasca commented Feb 14, 2019

When running GCFS application via papermill[gcs]

papermill gs://my-bucket/test.ipynb gs://my-bucket/output/test.ipynb

I'm getting Error: HTTP 429 Rate exceeds.

Works if output notebook is written locally:

papermill gs://my-bucket/test.ipynb /tmp/test.ipynb

Local file size is: 57K

ls -alh /tmp/test.ipynb 
-rw-r--r--  1 gogasca  wheel    57K Feb 14 10:37 /tmp/test.ipynb

GCSFS reference fsspec/gcsfs#130

How to reproduce?

pip install papermill[gcs]
papermill gs://cloud-samples-data/papermill/samples/test.ipynb gs://<your bucket>/test.ipynb

Logs:

10
Ending Cell 6------------------------------------------
Exception gcsfs.utils.HtmlError: HtmlError(u'The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.',) in <bound method GCSFile.__del__ of <GCSFile d
pe-sandbox/test.ipynb>> ignored
Traceback (most recent call last):
  File "/usr/local/bin/papermill", line 11, in <module>
    sys.exit(papermill())
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/papermill/cli.py", line 165, in papermill
    cwd=cwd,
  File "/usr/local/lib/python2.7/dist-packages/papermill/execute.py", line 90, in execute_notebook
    start_timeout=start_timeout,
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 56, in execute_notebook_with_engine
    return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 296, in execute_notebook
    nb = cls.execute_managed_notebook(nb_man, kernel_name, log_output=log_output, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 352, in execute_managed_notebook
    preprocessor.preprocess(nb_man, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/papermill/preprocess.py", line 27, in preprocess
    nb, resources = self.papermill_process(nb_man, resources)
  File "/usr/local/lib/python2.7/dist-packages/papermill/preprocess.py", line 81, in papermill_process
    nb_man.cell_complete(nb.cells[index])
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 76, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 219, in cell_complete
    self.save()
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 76, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/papermill/engines.py", line 138, in save
    write_ipynb(self.nb, self.output_path)
  File "/usr/local/lib/python2.7/dist-packages/papermill/iorw.py", line 280, in write_ipynb
    papermill_io.write(nbformat.writes(nb), path)
  File "/usr/local/lib/python2.7/dist-packages/papermill/iorw.py", line 82, in write
    return self.get_handler(path).write(buf, path)
  File "/usr/local/lib/python2.7/dist-packages/papermill/iorw.py", line 251, in write
    return f.write(buf)
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-150>", line 2, in close
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 1548, in close
    self.flush(force=True)
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-145>", line 2, in flush
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 1367, in flush
    self._simple_upload()
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-148>", line 2, in _simple_upload
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 1465, in _simple_upload
    validate_response(r, path)
  File "/usr/local/lib/python2.7/dist-packages/gcsfs/core.py", line 162, in validate_response
    raise HtmlError(error)
gcsfs.utils.HtmlError: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
+ err 'Processing notebook failed'
++ date +%Y-%m-%dT%H:%M:%S%z
+ echo '[2019-02-14T18:32:58+0000]: Processing notebook failed'
[2019-02-14T18:32:58+0000]: Processing notebook failed
+ exit 1

I already defined:

export GOOGLE_APPLICATION_CREDENTIALS=/keys/my-project.json
gcloud config set account XXXXXXX-compute@developer.gserviceaccount.com
gcloud auth activate-service-account --key-file=/keys/my-project.json

in MacOS environment I get similar errors: (Added debugging)

papermill gs://cloud-samples-data/papermill/samples/test.ipynb gs://dpe-sandbox/test.ipynb
Input Notebook:  gs://cloud-samples-data/papermill/samples/test.ipynb
Output Notebook: gs://dpe-sandbox/test.ipynb
('dpe-cloud-mle', 'full_control', None, None, 'none', None)
  0%|                                                                                                   | 0/28 [00:00<?, ?it/s]uploading
11317
uploading
11342
uploading
11372
  4%|███▎                                                                                       | 1/28 [00:01<00:50,  1.88s/it]uploading
11397
uploading
11427
  7%|██████▌                                                                                    | 2/28 [00:02<00:41,  1.59s/it]uploading
11452
uploading
11482
 11%|█████████▊                                                                                 | 3/28 [00:03<00:33,  1.32s/it]uploading
11507
uploading
11537
 14%|█████████████                                                                              | 4/28 [00:04<00:26,  1.10s/it]uploading
11562
429

429
Traceback (most recent call last):
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 296, in execute_notebook
    nb = cls.execute_managed_notebook(nb_man, kernel_name, log_output=log_output, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 352, in execute_managed_notebook
    preprocessor.preprocess(nb_man, kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/preprocess.py", line 27, in preprocess
    nb, resources = self.papermill_process(nb_man, resources)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/preprocess.py", line 81, in papermill_process
    nb_man.cell_complete(nb.cells[index])
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 76, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 219, in cell_complete
    self.save()
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 76, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 138, in save
    write_ipynb(self.nb, self.output_path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/iorw.py", line 280, in write_ipynb
    papermill_io.write(nbformat.writes(nb), path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/iorw.py", line 82, in write
    return self.get_handler(path).write(buf, path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/iorw.py", line 251, in write
    return f.write(buf)
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-152>", line 2, in close
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1552, in close
    self.flush(force=True)
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-147>", line 2, in flush
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1369, in flush
    self._simple_upload()
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-150>", line 2, in _simple_upload
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1467, in _simple_upload
    validate_response(r, path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 163, in validate_response
    raise HtmlError(error)
gcsfs.utils.HtmlError: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/bin/papermill", line 10, in <module>
    sys.exit(papermill())
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/cli.py", line 165, in papermill
    cwd=cwd,
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/execute.py", line 90, in execute_notebook
    start_timeout=start_timeout,
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 56, in execute_notebook_with_engine
    return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 302, in execute_notebook
    nb_man.notebook_complete()
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 76, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 249, in notebook_complete
    self.save()
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 76, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/engines.py", line 138, in save
    write_ipynb(self.nb, self.output_path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/iorw.py", line 280, in write_ipynb
    papermill_io.write(nbformat.writes(nb), path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/iorw.py", line 82, in write
    return self.get_handler(path).write(buf, path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/papermill/iorw.py", line 251, in write
    return f.write(buf)
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-152>", line 2, in close
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1552, in close
    self.flush(force=True)
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-147>", line 2, in flush
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1369, in flush
    self._simple_upload()
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-150>", line 2, in _simple_upload
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1467, in _simple_upload
    validate_response(r, path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 163, in validate_response
    raise HtmlError(error)
gcsfs.utils.HtmlError: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
429
429
Exception ignored in: <bound method GCSFile.__del__ of <GCSFile dpe-sandbox/test.ipynb>>
Traceback (most recent call last):
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-153>", line 2, in __del__
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1575, in __del__
    self.close()
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-152>", line 2, in close
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1552, in close
    self.flush(force=True)
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-147>", line 2, in flush
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1369, in flush
    self._simple_upload()
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-150>", line 2, in _simple_upload
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1467, in _simple_upload
    validate_response(r, path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 163, in validate_response
    raise HtmlError(error)
gcsfs.utils.HtmlError: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
429
429
Exception ignored in: <bound method GCSFile.__del__ of <GCSFile dpe-sandbox/test.ipynb>>
Traceback (most recent call last):
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-153>", line 2, in __del__
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1575, in __del__
    self.close()
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-152>", line 2, in close
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1552, in close
    self.flush(force=True)
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-147>", line 2, in flush
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1369, in flush
    self._simple_upload()
  File "</Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/decorator.py:decorator-gen-150>", line 2, in _simple_upload
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 51, in _tracemethod
    return f(self, *args, **kwargs)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 1467, in _simple_upload
    validate_response(r, path)
  File "/Users/gogasca/Documents/Development/dpe/venv/papermill/lib/python3.6/site-packages/gcsfs/core.py", line 163, in validate_response
    raise HtmlError(error)
gcsfs.utils.HtmlError: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
@MSeal
Copy link
Member

MSeal commented Feb 14, 2019

Do you know how the gcs rate limiting system works? We're emitting a save after each cell executes today. We could capture rate limiting requests and try to respect them but the number of saves here should be #cells + 2 which seems reasonable for most interfaces

@gogasca
Copy link
Contributor Author

gogasca commented Feb 14, 2019

Looks like we are experiencing this: "There is no limit to how quickly you can create or update different objects in a bucket. However, a single particular object can only be updated or overwritten up to once per second. For example, if you have an object bar in bucket foo, then you should only upload a new copy of foo/bar about once per second. Updating the same object faster than once per second may result in 429 Too Many Requests errors"
https://cloud.google.com/storage/docs/key-terms#immutability Checking with Cloud Storage team

@gogasca
Copy link
Contributor Author

gogasca commented Feb 15, 2019

@frankyn

@MSeal
Copy link
Member

MSeal commented Feb 15, 2019

We can modify the client wrapper to retry with a backoff on 429 in papermill. It sounds like that would resolve this issue?

@frankyn
Copy link

frankyn commented Feb 15, 2019

+1 @MSeal after speaking with @gogasca, I think that makes the most sense in this case. It keeps from modifying gcsfs with a local cache which may not work for everyone.

@MSeal
Copy link
Member

MSeal commented Feb 16, 2019

Going to release 0.18.1 with the fix. Thanks for getting the issue resolved.

@MSeal MSeal closed this as completed Feb 16, 2019
@PaulSchnau
Copy link

This issue seems to be happening again with gcsfs==0.3.0.

gcsfs==0.2.3 works fine though.

@MSeal
Copy link
Member

MSeal commented Sep 4, 2019

Is this with the latest papermill release (1.1.0) or an earlier one?

@MSeal MSeal reopened this Sep 4, 2019
@PaulSchnau
Copy link

Yes with papermill==1.1.0. I haven't tried other papermill versions.
On macOS with the same error message as above:

pip3 install gcsfs==0.3.0 papermill==1.1.0
papermill gs://cloud-samples-data/papermill/samples/test.ipynb gs://redacted/test.ipynb

@MSeal
Copy link
Member

MSeal commented Sep 4, 2019

Ok thanks for the heads up. If no one else gets to it I can look at it this weekend. 1.1.0 has another minor bug that also needs addressing anyway.

@MSeal MSeal added the bug label Sep 4, 2019
@gogasca
Copy link
Contributor Author

gogasca commented Sep 5, 2019

I tried to reproduce with the same versions and I see that in 0.3.0, Google Cloud is responding with a 429 first then a 410 error:

In 0.2.3 I see GCSFS sending:

https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=multipart HTTP/1.1" 429 463

In 0.3.0:
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 429 463

  1. Checking gcsfs side as behavior change why we send resumable now and how to avoid fix this behavior.
  2. Google side the 410 error. References:

https://b.corp.google.com/issues/137168102
https://stackoverflow.com/questions/56907896/gcs-retry-on-410-gone-errors-in-blobwritechannel-flushbuffer

13c13
< gcsfs==0.2.3
---
> gcsfs==0.3.0
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 429 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 410 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 410 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 410 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 410 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=resumable&upload_id=AEnB2Uo0y3-rNF5CNZ-nXPfhZRxnxrA1hw2Gb6Wl79eD2J7cMqH-4I-8wdr7pEIiUqK8n-GIdJuUMBDDJq_R84MpzpimRhtZuQ&uploadType=resumable HTTP/1.1" 410 463
_call out of retries on exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Traceback (most recent call last):
  File "/home/gogasca/papermill_venv/lib/python3.7/site-packages/gcsfs/core.py", line 462, in _call
    validate_response(r, path)
  File "/home/gogasca/papermill_venv/lib/python3.7/site-packages/gcsfs/core.py", line 165, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
_initiate_upload(args=(), kwargs={})
_call(args=('POST', 'https://www.googleapis.com/upload/storage/v1/b/dpe-sandbox/o'), kwargs={'uploadType': 'resumable', 'json': {'name': 'test.ipynb', 'metadata': None}})

In gcsfs 0.2.3:

\n   "output_path": "gs://dpe-sandbox/test.ipynb",\n   "parameters": {},\n   "start_time": "2019-09-05T04:10:29.358843",\n   "version": "1.1.0"\n  }\n },\n "nbformat": 4,\n "nbformat_minor": 0\n}\n--==0==--'})
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=multipart HTTP/1.1" 429 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=multipart HTTP/1.1" 429 463
_call retrying after exception: The total number of changes to the object dpe-sandbox/test.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
https://www.googleapis.com:443 "POST /upload/storage/v1/b/dpe-sandbox/o?uploadType=multipart HTTP/1.1" 200 721
invalidate_cache(args=('dpe-sandbox',), kwargs={})

@MSeal
Copy link
Member

MSeal commented Sep 5, 2019

Thanks for helping to look into it @gogasca !

@MSeal
Copy link
Member

MSeal commented Sep 10, 2019

FYI @MichelleUfford was taking a look at this one. I got my gcsfs setup running on this computer to test once there's a fix.

@MSeal
Copy link
Member

MSeal commented Sep 16, 2019

So neither myself nor @MichelleUfford can reproduce the issue. Based on the changes in
gcfs (https://github.com/dask/gcsfs/pull/177/files) we're going to change to library to instead use https://github.com/dask/gcsfs/blob/master/gcsfs/utils.py#L124 on line https://github.com/nteract/papermill/blob/master/papermill/iorw.py#L320 so the upstream library can define retry conditions without us having to touch papermill when these change.

@MSeal
Copy link
Member

MSeal commented Sep 21, 2019

I believe this is now fixed in 1.2.0, but I was unable to reproduce the issue to prove it. Can one of the reporters of the problem test with the latest papermill version and confirm if this issue can be closed again?

@MSeal MSeal closed this as completed Sep 30, 2019
@abdsamad1
Copy link

abdsamad1 commented Sep 30, 2019

I am facing this issue with version 1.2.0. papermill is unable to write the output to gcs

@MSeal
Copy link
Member

MSeal commented Sep 30, 2019

@abdsamad1 Could you open a new issue with details for your failed request (as much as you can shate)? Details like the notebook, the rate of cell execution, the stack trace, consistency of failure (happens sometimes, everytime, on Tuesdays), if the failure occurs across buckets or only on a specific key, etc.

@j256
Copy link

j256 commented Feb 4, 2020

For the record, I've heard from Google support about this. To quote:


As of right now, the issue is a bug and not a customer issue, and while a fix is on the way, there is a workaround that can be done on the customer’s side. The official workaround to circumvent 5xx and 410 errors is to implement retries, as was indicated in this comment from a Issue Tracker entry you have commented yourself (see https://issuetracker.google.com/issues/137168102#comment2). The retry method recommendation can also be seen here (https://issuetracker.google.com/issues/35903805#comment2).

To retry successfully, catching 500 and 410 errors is required and, as the official documentation recommends (https://cloud.google.com/storage/docs/json_api/v1/status-codes#410_Gone), implementing a retry by starting a new session for the upload that received an unsuccessful status code but still needs uploading. The new session creation may be what was missing on your end, causing retries to be unsuccessful as you have mentioned previously. Additionally, exponential backoffs recommended in comments (see https://issuetracker.google.com/35903805#comment2) are the way to go to mitigate the issue (see https://cloud.google.com/storage/docs/exponential-backoff ).

@MSeal
Copy link
Member

MSeal commented Feb 4, 2020

Thanks for the link @j256 ! We do have retries and exponential backoff on writes, but it sounds like that's not always sufficient either. Looking forward to the API finally getting fixed.

@informatica92
Copy link

Hi all,
I am trying to execute a notebook saving the result into Google Cloud Storage. I found this issue so probably someone among you can explain me what's happening.

In [2]: import papermill as pm
   ...:
   ...: pm_out = pm.execute_notebook(
   ...:    'covid-19.ipynb',
   ...:    'gs://customer-acquisition-bucket/training_outputs/output.ipynb',
   ...:    parameters=dict()
   ...: )
Executing:   0%|                                                                              | 0/29 [00:00<?, ?cell/s]C:\Users\dev999\AppData\Roaming\Python\Python37\site-packages\google\auth\_default.py:69: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Executing:  14%|█████████▋                                                            | 4/29 [00:07<00:57,  2.28s/cell]_call out of retries on exception: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 470, in _call
    validate_response(r, path)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 120, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Executing:  34%|███████████████████████▊                                             | 10/29 [00:46<00:50,  2.64s/cell]_call out of retries on exception: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 470, in _call
    validate_response(r, path)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 120, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Executing:  62%|██████████████████████████████████████████▊                          | 18/29 [01:28<00:22,  2.00s/cell]_call out of retries on exception: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 470, in _call
    validate_response(r, path)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 120, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Executing:  97%|██████████████████████████████████████████████████████████████████▌  | 28/29 [02:18<00:01,  1.90s/cell]_call out of retries on exception: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 470, in _call
    validate_response(r, path)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gcsfs\core.py", line 120, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: The rate of change requests to the object customer-acquisition-bucket/training_outputs/output.ipynb exceeds the rate limit. Please reduce the rate of create, update, and delete requests.
Executing: 100%|█████████████████████████████████████████████████████████████████████| 29/29 [02:57<00:00,  6.12s/cell]

according to the output it seems that papermill re-tries if the "rate of changes exceeds" error occurs but if I try to downlaod the notebook from the bucket and I try to open it inside Jupyter, locally, I am NOT ABLE to open the notebook (so I think that the notebook in Cloud Storage is not correctly saved by papermill)

@informatica92
Copy link

informatica92 commented Mar 31, 2020

The error I get is:

**Unreadable Notebook**: C:\Users\dev999\Jupyter notebooks\training_outputs_output (1).ipynb **UnicodeDecodeError**('utf-8', b'{\r\n "cells": [\r\n {\r\n "cell_type": "code",\r\n "execution_count": 1,\r\n "metadata": {\r\n "papermill": {\r\n "duration": 1.344472,\r\n "end_time":
...
acquisition-bucket/training_outputs/output.ipynb",\r\n "parameters": {},\r\n "start_time": "2020-03-31T10:30:55.461330",\r\n "version": "2.0.0"\r\n }\r\n },\r\n "nbformat": 4,\r\n "nbformat_minor": 4\r\n}', 8614, 8615, '**invalid continuation byte**')

@MSeal
Copy link
Member

MSeal commented Mar 31, 2020

I have not hit such an error, but I don't consistently use gcfs. You may need to create an issue on the gcsfs extension.

That being said, some things to check are:

  • What version of papermill and jupyter libraries are you using (conda list)
  • Does the notebook save and load correctly using local filesystem as the write location?
  • Are there some unicode characters in the output that's causing an issue? Maybe the file save to gcsfs is not persisting non-ascii characters correctly (would be weird)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants