-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
credentials error with distributed #90
Comments
How is the authorization happening in this case? |
I originally called I am also trying to use a service account json token, but that doesn't work (see #89). These two issues are coupled. |
OK, makes sense. Yes, the browser method makes a token which is cached in the file |
There are no nodes. It's a local, threaded cluster. |
Can you explicitly try |
It works with |
Excellent! So it seems that in the default case ( |
Isn't that what the |
That check has higher demands: it lists the buckets to ensure the given bucket exists and tries to write to it too. |
For example |
@martindurant Can you provide some references to best way to distribute Is there something like this for gcsfs fsspec/s3fs#28 |
The general assumption was that each worker would perform its own authentication, so you would need the google-default login correctly set-up (via CLI or google config files in special locations), a metadata service all can reach, a local token file on each worker, or the special gcs cache file on each worker. The As you can see in #91, many workers attempting to authenticate at the same time appears to cause problems, sometimes. In such a case, you may wish to distribute the token directly, assuming the network is secure. This is not documented, but #91 (comment) should work. |
@martindurant Thank you! If/when I get this working, would you accept a PR with documentation? |
Of course! |
@martindurant per-worker browser-based authentication is a tough route, especially when used in connection with dask-kubernetes or a pangeo-style setup. I've been experimenting with json-based authentication using some combination of |
if distributed + gcsfs were able to recognize the setup of the fuse system as a required worker dependency that would do the trick. is there some hook we could customize to do this manually? |
Once you have done browser authentication, you will have a |
ok thanks |
I am trying to use gcsfs via distributed in pangeo-data/pangeo#150. I have uncovered what seems like a serialization bug.
This works from my notebook (the token appears to be cached):
It returns the four buckets:
['pangeo', 'pangeo-data', 'pangeo-data-private', 'zarr_store_test']
.Now I create a distributed cluster and client and use it to run the same command:
I get the following error:
The text was updated successfully, but these errors were encountered: