New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr plugin #7
Zarr plugin #7
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,5 @@ | |
.coverage* | ||
.idea/ | ||
__pycache__/ | ||
.cache/ | ||
*egg-info/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import xarray as xr | ||
from intake.source import base | ||
from dask.bytes.core import get_fs, infer_options, update_storage_options | ||
from . import DataSourceMixin | ||
|
||
|
||
class ZarrSource(DataSourceMixin, base.DataSource): | ||
"""Open a xarray dataset. | ||
|
||
Parameters | ||
---------- | ||
urlpath: str | ||
Path to source. This can be a local directory or a remote data | ||
service (i.e., with a protocol specifier like ``'s3://``). | ||
storage_options: dict | ||
Parameters passed to the backend file-system | ||
kwargs: | ||
Further parameters are passed to xr.open_zarr | ||
""" | ||
|
||
def __init__(self, urlpath, storage_options=None, metadata=None, **kwargs): | ||
super(ZarrSource, self).__init__( | ||
container='xarray', metadata=metadata) | ||
self.urlpath = urlpath | ||
self.storage_options = storage_options | ||
self.kwargs = kwargs | ||
self._ds = None | ||
|
||
def _open_dataset(self): | ||
urlpath, protocol, options = infer_options(self.urlpath) | ||
update_storage_options(options, self.storage_options) | ||
|
||
self._fs, _ = get_fs(protocol, options) | ||
if protocol != 'file': | ||
self._mapper = get_mapper(protocol, self._fs, urlpath) | ||
self._ds = xr.open_zarr(self._mapper, **self.kwargs) | ||
else: | ||
self._ds = xr.open_zarr(self.urlpath, **self.kwargs) | ||
|
||
def close(self): | ||
super(ZarrSource, self).close() | ||
self._fs = None | ||
self._mapper = None | ||
|
||
|
||
def get_mapper(protocol, fs, path): | ||
if protocol == 's3': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How many more protocols do you think there will be? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hdfs3 has a mapper and I am not aware of any others. You may have noticed that https://github.com/martindurant/filesystem_spec contains a mapper, so with any luck, it should "just work" for any file-system meeting the spec - but that's a long-term goal. |
||
from s3fs.mapping import S3Map | ||
return S3Map(path, fs) | ||
elif protocol == 'gcs': | ||
from gcsfs.mapping import GCSMap | ||
return GCSMap(path, fs) | ||
else: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this import to the beginning of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! We don't want to import until necessary, because
import intake
would also import, and so take much longer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I figured that might be why.