Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow concat_dim_kwargs in THREDDSMergedSource #34

Merged
merged 10 commits into from
Jun 1, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### Features

- Allow `xarray_kwargs` for :py:class:`~intake_thredds.THREDDSMergedSource` [#32](https://github.com/NCAR/intake-thredds/pull/32) ([@aaronspring](https://github.com/aaronspring))
- Allow `concat_dim` for :py:class:`~intake_thredds.THREDDSMergedSource` [#34](https://github.com/NCAR/intake-thredds/pull/34) ([@aaronspring](https://github.com/aaronspring))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do I get the xr docs linked here?


## Intake-thredds v2021.2.17

Expand Down
1 change: 1 addition & 0 deletions ci/environment-upstream-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ dependencies:
- pytest-xdist
- tqdm
- xarray
- cfgrib
- pip:
- git+https://github.com/intake/intake.git
- git+https://github.com/Unidata/siphon.git
1 change: 1 addition & 0 deletions ci/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ dependencies:
- siphon
- tqdm
- xarray
- cfgrib
9 changes: 8 additions & 1 deletion intake_thredds/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,21 @@ def _open_dataset(self):
else:
break
path = self.path[i:]
if 'concat_dim' in self.xarray_kwargs:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of doing dataset concatenation. I think it would be beneficial not to expose the concat_dim option as a standalone argument. 👍🏽 for expanding the api to support other concat options. How about exposing concatenation options as concat_kwargs in __init__? I am imagining having concat_kwargs=None to be the default (which means the user doesn't want us to do any concatenation). With concat_kwargs={....}, we then go ahead and run xr.concat(data, **concat_kwargs).

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two ways here:
1.) try to implement xr.open_mfdataset API: http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html
2.) implement xr.concat(list, args) API: http://xarray.pydata.org/en/stable/generated/xarray.concat.html
but as I compare all concat_kwargs appear in open_mfdataset.

I like your implementation suggestion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so concat_dim=... and xarray_kwargs for all other kwargs of open_mfdataset

concat_dim = self.xarray_kwargs.pop('concat_dim')
else:
concat_dim = None
if self.progressbar:
data = [
ds(xarray_kwargs=self.xarray_kwargs).to_dask()
for ds in tqdm(_match(cat, path), desc='Dataset(s)', ncols=79)
]
else:
data = [ds(xarray_kwargs=self.xarray_kwargs).to_dask() for ds in _match(cat, path)]
self._ds = xr.combine_by_coords(data, combine_attrs='override')
if concat_dim:
self._ds = xr.concat(data, concat_dim)
else:
self._ds = xr.combine_by_coords(data, combine_attrs='override')


def _match(cat, patterns):
Expand Down
24 changes: 23 additions & 1 deletion tests/test_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def test_THREDDSMergedSource_simplecache_fails_opendap(THREDDSMergedSource_cat_s
@pytest.mark.parametrize('driver', ['netcdf', 'opendap'])
@pytest.mark.parametrize('decode_times', [True, False])
def test_THREDDSMergedSource_xarray_kwargs(THREDDSMergedSource_cat_short_url, driver, decode_times):
"""Test xarray_kwargs."""
"""Test THREDDSMergedSource with xarray_kwargs."""
ds = intake.open_thredds_merged(
'https://psl.noaa.gov/thredds/catalog.xml',
[
Expand All @@ -119,3 +119,25 @@ def test_THREDDSMergedSource_xarray_kwargs(THREDDSMergedSource_cat_short_url, dr
assert 'units' not in ds.time.attrs
else:
assert 'units' in ds.time.attrs


def test_concat_dim():
"""Test THREDDSMergedSource with concat_dim. Requires multiple files with same
other coords to be concatinated along new dimension specified by concat_dim.
Here get two ensemble members initialized 20200831 00:00 at 15.5 days = 372h"""
url = 'simplecache::https://www.ncei.noaa.gov/thredds/catalog/model-gefs-003/202008/20200831/catalog.xml'
ds = intake.open_thredds_merged(
url,
['NCEP gens-a Grid 3 Member-Forecast 1[1-2]*-372 for 2020-08-31 00:00*'],
driver='netcdf',
xarray_kwargs=dict(
engine='cfgrib',
concat_dim='number',
backend_kwargs=dict(
filter_by_keys={'typeOfLevel': 'heightAboveGround', 'shortName': '2t'}
),
),
).to_dask()
assert 'number' in ds.dims
assert 11 in ds.number
assert 12 in ds.number