-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
signed urls #283
Comments
cc @danielballan @CJ-Wright From the fsspec/gcsfs view, the implementation should certainly support producing signed URLs from current credentials, which is the PR above. You could also write an implementation which interfaces with some broker to do file listings and get signed http urls, and then use these. This is not proxying, this is either redirect (automatically in HTTP-land, or explicitly in code). intake/intake#524 demonstrates a prototype of how you would rewrite "urlpath" data source arg in the intake server to give signed URLs back |
If anyone wants to take over those PRs please feel free. I'm not abandoning them, but they might not fit into my current time constraints. |
Sorry to poke on that old issue but I have a use case that seems similar to what was discussed during that Pangeo meeting. I am working on a backend that will allow users to download/upload files with a custom permission system that can't rely on gcp auth (custom logic in the backend). I haven't tried it but signed urls seems to be the perfect candidates for files but I am not sure I can see that working for folder. One type of file we want users to download/upload are Zarr files (also Parquet partitioned folders for example). My understanding is that we would need to have some kind of logics within the Zarr reader that will ask the server to provide a signed urls for any files he wants to access within the Zarr folder. Is that currently possible with zarr/parquet/fsspec/gcfs? If not, where would be the best place you think to contribute and add that logic (seems fsspec since zarr and pandas support fsspec FS)? |
Firstly: URL signing is indeed implemented in gcsfs (method To upload, you would need to generate a POST signed URL, but it would still need some HTTP payload formatting like in gcsfs.core.simple_upload . There would be one URL per file, so maybe many for a zarr dataset. Having said all that, it should not be too complicated to make a subclass of GCSFileSystem which, upon write, defers to some other system to get a signed URL and then use that - all the calls in gcsfs are HTTP. You might do the same for read. Zarr only accesses |
Thanks @martindurant, for the input! I did some POC and indeed, at least for One thing is that signed urls do not support listing objects (at least from what I saw). My understating is that without a "listing objects" features, the zarr will always need to have a Am I correct about that requirement? |
For example, this zarr HTTP url https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/9836842.zarr/ taken from https://github.com/ome/napari-ome-zarr does not seem to work when using Maybe the broader question is whether HTTP (without listing capability) can work with zarr given that this protocol does not really support a |
You are quite right, zarr (v2) has no way to know the children of a group except by listing in the absence of consolidated metadata. In some situations, the expected arrays can be described elsewhere (such as OME), or you could do something with kerchunk to establish the layout and save it to another place. I haven't read the upstream documentation on URL signing recently, but it's possible that listing can be signed too. S3 does allow this, but using a different call and permissions than GET/PUT/POST for a single file, not surrently implemented in s3fs. |
At today's Pangeo meeting, we discussed the idea of using signed urls to provided unauthenticated users access to google cloud storage, specifically for reading zarr stores. I'm creating this issue to track that idea.
I found this page in the gcs docs with some helpful advice.
https://cloud.google.com/storage/docs/access-control/signing-urls-manually
How would we go about "proxying" access to a restricted bucket via signed urls? I can't quite wrap my head around it. Where would the signing happen? It couldn't happen from the user's notebook pod--it would have to happen from within some service running somewhere with enhanced credentials. Could we connect it to jupyterhub's auth? Or to our auth0 account?
The text was updated successfully, but these errors were encountered: