Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr plugin #7

Merged
merged 5 commits into from Apr 27, 2018
Merged

Zarr plugin #7

merged 5 commits into from Apr 27, 2018

Conversation

martindurant
Copy link
Member

@martindurant martindurant commented Apr 26, 2018

Example

In [1]: %cat cat.yaml
sources:
  newman:
    description: Newmann Ensemble
    driver: zarr
    args:
      urlpath: "gcs://pangeo-data/newman-met-ensemble/"

In [2]: import intake

In [3]: cat = intake.Catalog('cat.yaml')

In [4]: cat.newman.read_chunked()
Out[4]:
<xarray.Dataset>
Dimensions:    (ensemble: 9, lat: 224, lon: 464, time: 12054)
Coordinates:
  * lat        (lat) float64 25.06 25.19 25.31 25.44 25.56 25.69 25.81 25.94 ...
  * lon        (lon) float64 -124.9 -124.8 -124.7 -124.6 -124.4 -124.3 ...
  * time       (time) datetime64[ns] 1980-01-01 1980-01-02 1980-01-03 ...
Dimensions without coordinates: ensemble
Data variables:
    elevation  (ensemble, lat, lon) float64 dask.array<shape=(9, 224, 464), chunksize=(1, 224, 464)>
    mask       (ensemble, lat, lon) int32 dask.array<shape=(9, 224, 464), chunksize=(1, 224, 464)>
    pcp        (ensemble, time, lat, lon) float64 dask.array<shape=(9, 12054, 224, 464), chunksize=(1, 287, 224, 464)>
    t_max      (ensemble, time, lat, lon) float64 dask.array<shape=(9, 12054, 224, 464), chunksize=(1, 287, 224, 464)>
    t_mean     (ensemble, time, lat, lon) float64 dask.array<shape=(9, 12054, 224, 464), chunksize=(1, 287, 224, 464)>
    t_min      (ensemble, time, lat, lon) float64 dask.array<shape=(9, 12054, 224, 464), chunksize=(1, 287, 224, 464)>
    t_range    (ensemble, time, lat, lon) float64 dask.array<shape=(9, 12054, 224, 464), chunksize=(1, 287, 224, 464)>
Attributes:
    _ARRAY_DIMENSIONS:         {'ensemble': 9, 'lat': 224, 'lon': 464, 'time'...
    history:                   Version 1.0 of ensemble dataset, created Decem...
    institution:               National Center for Atmospheric Research (NCAR...
    nco_openmp_thread_number:  1
    references:                Newman et al. 2015: Gridded Ensemble Precipita...
    source:                    Generated using version 1.0 of CONUS ensemble ...
    title:                     CONUS daily 12-km gridded ensemble precipitati...

@martindurant
Copy link
Member Author

martindurant commented Apr 26, 2018

(branched from #6 , commit log will look more reasonable after that is merged)

kwargs:
Further parameters are passed to xr.open_zarr
"""
from intake_xarray.xzarr import ZarrSource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this import to the beginning of the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! We don't want to import until necessary, because import intake would also import, and so take much longer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I figured that might be why.

self._mapper = get_mapper(protocol, self._fs, urlpath)
self._ds = xr.open_zarr(self._mapper, **self.kwargs)

def _get_schema(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take the methods from here down and combine them with netcdf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.



def get_mapper(protocol, fs, path):
if protocol == 's3':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many more protocols do you think there will be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdfs3 has a mapper and I am not aware of any others. You may have noticed that https://github.com/martindurant/filesystem_spec contains a mapper, so with any luck, it should "just work" for any file-system meeting the spec - but that's a long-term goal.

@mmccarty
Copy link
Member

mmccarty commented Apr 27, 2018

Tested locally and it works so far! Had to install a few dependencies.

conda install gcsfs zarr -c conda-forge

@martindurant martindurant merged commit 2768941 into master Apr 27, 2018
@martindurant martindurant changed the title WIP: Zarr plugin Zarr plugin Apr 27, 2018
@martindurant martindurant deleted the zarr branch April 27, 2018 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants