Skip to content

open_zarr hangs if 's3://' at front of root s3fs string #2740

@birdsarah

Description

@birdsarah

The following code has an error in it:

import s3fs
import xarray as xr

S3_DIR = 's3://my_bucket'

s3 = s3fs.S3FileSystem(**storage_options)
store = s3fs.S3Map(root=f'{S3_DIR}/my_zarr_store', s3=s3)
array = xr.open_zarr(store)['data']

The presence of "s3://" at the beginning of the string causes to take a really really really long time (I don't have the time off hand but over 10 minutes) to return with a key error, that there is nothing at 'data', which is often a clue of a permissions error.

Without the "s3://" this returns quickly with my data.

This error occurred for me as I was opening other files with dask with code such as

df = dd.read_parquet(f'{S3_DIR}/my_data.parquet', storage_options=storage_options)

I know that this is not technically an xarray issue. However, it is the xarray line that suffers the user experience as the s3fs just returns without any checking.

I was wondering whether the open_zarr function could be generous and inspect the root argument in the case of s3fs access and warn if 's3://' is detected.

I am also wondering what the interaction issue is that causes it to take so long for the permission type error to be returned.

ping @martindurant in case you have thoughts from the s3fs side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions