-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
The following code has an error in it:
import s3fs
import xarray as xr
S3_DIR = 's3://my_bucket'
s3 = s3fs.S3FileSystem(**storage_options)
store = s3fs.S3Map(root=f'{S3_DIR}/my_zarr_store', s3=s3)
array = xr.open_zarr(store)['data']
The presence of "s3://" at the beginning of the string causes to take a really really really long time (I don't have the time off hand but over 10 minutes) to return with a key error, that there is nothing at 'data', which is often a clue of a permissions error.
Without the "s3://" this returns quickly with my data.
This error occurred for me as I was opening other files with dask with code such as
df = dd.read_parquet(f'{S3_DIR}/my_data.parquet', storage_options=storage_options)
I know that this is not technically an xarray issue. However, it is the xarray line that suffers the user experience as the s3fs just returns without any checking.
I was wondering whether the open_zarr function could be generous and inspect the root argument in the case of s3fs access and warn if 's3://' is detected.
I am also wondering what the interaction issue is that causes it to take so long for the permission type error to be returned.
ping @martindurant in case you have thoughts from the s3fs side.