-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressed feedback. #1
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,18 +12,22 @@ def __init__(self): | |
container='python', | ||
partition_access=True) | ||
|
||
def open(self, urlpath, **kwargs): | ||
def open(self, urlpath, chunks, **kwargs): | ||
""" | ||
Create NetCDFSource instance | ||
|
||
Parameters | ||
---------- | ||
table, connection, qargs, partitions | ||
See ``NetCDFSource``. | ||
urlpath: str | ||
Path to source file. | ||
chunks: int or dict | ||
Chunks is used to load the new dataset into dask | ||
arrays. ``chunks={}`` loads the dataset with dask using a single | ||
chunk for all arrays. | ||
""" | ||
base_kwargs, source_kwargs = self.separate_base_kwargs(kwargs) | ||
qargs = source_kwargs.pop('qargs', {}) | ||
return NetCDFSource(urlpath=urlpath, | ||
chunks=chunks, | ||
xarray_kwargs = source_kwargs, | ||
metadata=base_kwargs['metadata']) | ||
|
||
|
@@ -33,18 +37,21 @@ class NetCDFSource(base.DataSource): | |
|
||
Parameters | ||
---------- | ||
param: type | ||
description | ||
urlpath: str | ||
Path to source file. | ||
chunks: int or dict | ||
Chunks is used to load the new dataset into dask | ||
arrays. ``chunks={}`` loads the dataset with dask using a single | ||
chunk for all arrays. | ||
""" | ||
container = 'python' | ||
|
||
def __init__(self, urlpath, xarray_kwargs=None, metadata=None): | ||
def __init__(self, urlpath, chunks, xarray_kwargs=None, metadata=None): | ||
self.urlpath = urlpath | ||
self.chunks = chunks | ||
self._kwargs = xarray_kwargs or {} | ||
self.chunks = self._kwargs.get('chunks') | ||
self._ds = None | ||
super(NetCDFSource, self).__init__( | ||
container=self.container, | ||
container=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not xarray? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The container is not used since we are overloading the |
||
metadata=metadata) | ||
|
||
def _open_dataset(self): | ||
|
@@ -70,17 +77,17 @@ def _get_schema(self): | |
|
||
def read(self): | ||
self._load_metadata() | ||
return self._ds | ||
return self._ds.load() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Loads all the data. |
||
|
||
def read_chunked(self): | ||
raise Exception('read_chunked not supported for xarray containers.') | ||
self._load_metadata() | ||
return self._ds | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, so this is approximately correct, but it doesn't give something you can iterate over. I'm not sure whether this should be an exception just like get_partition. |
||
|
||
def read_partition(self, i): | ||
raise Exception('read_partition not supported for xarray containers.') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @martindurant read_partition is not implemented yet. I'm wondering if we need to support it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. To my mind, it should be implemented at some point, but the input is a tuple. I would keep an exception for now (but should be NotImplemented, because we may do it later). |
||
|
||
def to_dask(self): | ||
self._load_metadata() | ||
return self._ds.to_dask_dataframe() | ||
return self.read_chunked() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, this should just return the xarray directly (which points to dask arrays); this would be |
||
|
||
def close(self): | ||
self._ds.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added chunks to signature and doc string.