Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for zarr.ZipStore #66

Closed
wants to merge 1 commit into from

Conversation

raphaeldussin
Copy link

intake-xarray is presently not able to work with zarr.ZipStore. The issue seems to be coming from the fsspec mapping. From what I understood of fsspec mappings, I am not sure if it is able to handle local zip files or if it's desirable. The proposed fix adds an option to bypass the mapper and use the local path to the file.

problem and solution can be tested with:

cat catalog.yml

plugins:
  source:
    - module: intake_xarray
sources:
  temp_directory:
    description: zarr directory
    driver: zarr
    args:
      urlpath: 'temp_dir'
      consolidated: False
    metadata:
      origin_url: ''
  temp_directory_consolidated:
    description: zarr directory consolidated
    driver: zarr
    args:
      urlpath: 'temp_dirc'
      consolidated: True
    metadata:
      origin_url: ''
  temp_zip:
    description: zarr zipstore
    driver: zarr
    args:
      urlpath: 'temp_zip.zip'
      consolidated: False
    metadata:
      origin_url: ''
  temp_zip_consolidated:
    description: zarr zipstore consolidated
    driver: zarr
    args:
      urlpath: 'temp_zipc.zip'
      consolidated: True
    metadata:
      origin_url: ''
  temp_zip_fix:
    description: zarr zipstore
    driver: zarr
    args:
      urlpath: 'temp_zip.zip'
      consolidated: False
      force_local: True
    metadata:
      origin_url: ''
  temp_zip_consolidated_fix:
    description: zarr zipstore consolidated
    driver: zarr
    args:
      urlpath: 'temp_zipc.zip'
      force_local: True
      consolidated: True
    metadata:
      origin_url: ''
import xarray as xr
import numpy as np
import zarr

ds = xr.Dataset()
ds['temp'] = xr.DataArray(data=np.random.rand(180, 360), dims=('lat', 'lon'))
ds['lon'] = xr.DataArray(data=np.arange(360), dims=('lon'))
ds['lat'] = xr.DataArray(data=np.arange(-90,90), dims=('lat'))

store = zarr.DirectoryStore('temp_dirc')
ds.to_zarr(store, consolidated=True, mode='w')

store = zarr.DirectoryStore('temp_dir')
ds.to_zarr(store, consolidated=False, mode='w')

store = zarr.ZipStore('temp_zipc.zip', mode='w')
ds.to_zarr(store, consolidated=True, mode='w')
store.close()

store = zarr.ZipStore('temp_zip.zip', mode='w')
ds.to_zarr(store, consolidated=False, mode='w')
store.close()

import intake

cat = intake.open_catalog('catalog.yml')

ds1 = cat.temp_directory.to_dask()
print(ds1)

ds2 = cat.temp_directory_consolidated.to_dask()
print(ds2)

try:
    ds3 = cat.temp_zip.to_dask()
    print(ds3)
except:
    print('failed to open entry temp_zip')

try:
    ds4 = cat.temp_zip_consolidated.to_dask()
    print(ds4)
except:
    print('failed to open entry temp_zip_consolidated')

ds5 = cat.temp_zip_fix.to_dask()
print(ds5)

ds6 = cat.temp_zip_consolidated_fix.to_dask()
print(ds6)

@martindurant
Copy link
Member

Using fsspec directly does work, although the incantation s a little strange:

  temp_zip:
    description: zarr zipstore
    driver: zarr
    args:
      urlpath: 'zip://'
      storage_options:
        fo: 'temp_zip.zip'
      consolidated: False

Just like with fsspec.open, fsspec.get_mapper ought to support compound URLs, in this case "zip::temp_zip.zip" (and no extra storage options), but this is not yet done.

@raphaeldussin
Copy link
Author

Hi @martindurant and thanks for the help. I'm trying this out rn and I can make it work for consolidated ZipStore but not un-consolidated. Any idea about what the problem might be:

  temp_zip_martin:
    description: zarr zipstore
    driver: zarr
    args:
      urlpath: 'zip://'
      storage_options:
        fo: 'temp_zip.zip'
      consolidated: False
    metadata:
      origin_url: ''

the error is:

KeyError: "There is no item named '.zattrs/.zarray' in the archive"

my hunch is that it is blind to the difference between directories and files and expect to find a .zarray under .zattrs which is just a file.

what do you think?

@martindurant
Copy link
Member

The exact same thing did work for me, unconsolidated. Are you on windows by any chance?

@raphaeldussin
Copy link
Author

my config is:

OSX Catalina
anaconda python 3.6.7
intake v 0.5.5
fsspec v 0.6.2

@martindurant
Copy link
Member

Would you mind trying with fsspec master (or indeed the chained_mapper branch)?

@raphaeldussin
Copy link
Author

raphaeldussin commented Apr 21, 2020

I've tried both and can't get it to work either. weird...

can you post your test?

@martindurant
Copy link
Member

!cat catalog.yml
sources:
  temp_zip:
    description: zarr zipstore
    driver: zarr
    args:
      urlpath: 'zip://'
      storage_options:
        fo: temp_zip.zip
      consolidated: False
    metadata:
      origin_url: ''

ds = xr.Dataset()
ds['temp'] = xr.DataArray(data=np.random.rand(180, 360), dims=('lat', 'lon'))
ds['lon'] = xr.DataArray(data=np.arange(360), dims=('lon'))
ds['lat'] = xr.DataArray(data=np.arange(-90,90), dims=('lat'))

store = zarr.ZipStore('temp_zip.zip', mode='w')
ds.to_zarr(store, consolidated=False, mode='w')
store.close().  # seems to require kernel restart here, write is incomplete

cat = intake.open_catalog('catalog.yml')
cat.temp_zip.to_dask()

@raphaeldussin
Copy link
Author

great! I'm gonna call docker to the rescue to make sure it's not due to something in my config.
will get back to you soon

@martindurant
Copy link
Member

Perhaps if you show me how you are calling docker, I can follow it. Obviously, it would be good to test this kind of thing here, but fsspec should be released first. Are you aware of #66 ? That might make things more consistent.

@raphaeldussin
Copy link
Author

@martindurant I'm going to create and point you to a github repo with my dockerfiles once I'm done with afternoon meetings. So far fsspec master works with the storage_option but your new branch (fsspec/filesystem_spec#282) does not work with the new syntax.

@martindurant
Copy link
Member

fsspec master works with the storage_option

This is progress!

@raphaeldussin
Copy link
Author

@martindurant
Copy link
Member

The chained version doesn't seem to install the right version

git+https://github.com/martindurant/filesystem_spec.git#chained_mapper
  • should that be a "@"?

In my case, to test, I did extra steps to be certain

git clone https://github.com/martindurant/filesystem_spec
cd filesystem_spec
git checkout chained_mapper
pip3 install -e
cd ..

@martindurant
Copy link
Member

(and it worked)

@raphaeldussin
Copy link
Author

@ and # keys are too close for my own good...
this works for me as well now.

the change to fsspec is enough for my application. This PR has no reason to be anymore.
many thanks @martindurant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants