Improvements to existing functionality #45

andersy005 · 2019-03-15T20:54:02Z

Update to_xarray() method to allow users to pass keyword arguments to this method
Refactor cmip.py in preparation for cmip6 integration as proposed in CMIP6 version of cmip.py #43

andersy005 · 2019-03-17T23:49:33Z

@matt-long, I've experienced difficulties while trying to merge/concat CMIP data from different institutions. It appears that there are discrepancies among coordinates of different models' outputs. For queries in which there's data from more than one institution, xarray is unable to align the coordinates.

For instance, I saw one variable in which the coordinates in one model are time, lat, lon, bnds and time, rlat, rlon, bnds in another model from a different institution. In addition to

https://github.com/NCAR/intake-esm/blob/67fb828a202a7c7fc15953b63436075554e3ae1c/intake_esm/cmip.py#L240-L244

should we add a check that forbids merging datasets from different institutions? Let me know what would be the right approach.

matt-long · 2019-03-18T15:01:55Z

@andersy005, I would not expect models from different institutions to be concatenate-able; the different models do indeed have different coordinates. I think we should add a check that ensures prevent this. Another approach would be to return a data structure that includes the datasets from the different models. A dictionary with the institution ID as the key, for instance, could work. This would entail an outer loop, probably best implemented as a method that calls this _open_dataset method.

andersy005 · 2019-03-18T18:13:56Z

@matt-long, this is ready for another look

intake_esm/aggregate.py

intake_esm/cesm.py

intake_esm/aggregate.py

intake_esm/cmip.py

matt-long · 2019-03-18T18:42:15Z

intake_esm/cmip.py

+        _ds_dict = {}
+        grouped = get_subset(self.collection_name, self.collection_type, query).groupby(
+            'institution'
+        )


Does the code above belong in a _validate_concat_open_dataset method?

Could you expand on this?

I think it's probably fine for now. I was wondering whether you could write a general method for validation. Not sure of the API....

def _validate_concat(check_max_instance=None): if check_max_instance is not None: for fld, max_inst in check_max_instance.items(): fld_list = self.query_results[fld].unique() if len(fld_list) > max_inst: raise ValueError(f'message about {fld}')

I see! Now that we are handling an addition case of datasets that cannot be concatenated by returning a dictionary of dsets in:

https://github.com/NCAR/intake-esm/blob/eb8fdafa0d7b22770bffde968bf36453d9cd5b9a/intake_esm/cmip.py#L254-L264

can you think of other cases that wouldn't be handled by the above code?

I probably need to work thru some use cases to really develop my intuition here.

Sounds good! Let me know when you get to play with this.

matt-long · 2019-03-18T18:43:42Z

intake_esm/cmip.py

+        grouped = get_subset(self.collection_name, self.collection_type, query).groupby(
+            'institution'
+        )
+        for name, group in grouped:


Should we have an _open_dataset_groups method that calls the _open_dataset method? Would this allow us to reuse more code between cesm.py and cmip.py?

I see the advantage of generalizing this. What would be the equivalent of _open_dataset_groups() for cesm data, in other words, what is the equivalent of CMIP institutions in CESM?

We could think about this as moving toward any arbitrary collection of datasets that should not be concatenated along a dimension. Observations, integrations from different model version or resolutions, for instance. We might want to have a model general dataset_id or something versus insisting on "institution."

I will take a stab at generalizing it (I am afraid that it may not be a trivial task though)

intake_esm/config.yaml

intake_esm/cesm.py

Add kwargs to to_xarray() method

4493c8c

andersy005 added this to In progress in Backlog via automation Mar 15, 2019

andersy005 added this to the sprint-mar04-mar17 milestone Mar 15, 2019

andersy005 added the usage question User questions which do not appear to be bugs or enhancements. label Mar 15, 2019

andersy005 added 4 commits March 15, 2019 15:25

Move default args to config.yaml

8faf182

Update config.yaml

031578d

Remove parameters that were removed from config.yaml

09d1924

Allow users to pass kwargs to to_xarray()

52db367

andersy005 changed the title ~~Allow users to pass kwargs to to_xarray() method~~ Update to_xarray() method Mar 17, 2019

andersy005 added 2 commits March 17, 2019 16:02

add sample datasets

64f9dfb

Add test for to_xarray() with empty results

5dff21b

andersy005 modified the milestones: sprint-mar04-mar17, sprint-mar18-mar31 Mar 17, 2019

Use aggregate.py methods to load data

67fb828

andersy005 marked this pull request as ready for review March 17, 2019 23:38

andersy005 requested a review from matt-long March 17, 2019 23:38

Update cesm column names to match cmip column names

57b3b57

Return dict of dsets when query has more than 1 institution

938d556

andersy005 changed the title ~~Update to_xarray() method~~ Improvements to existing functionality Mar 18, 2019

This was referenced Mar 18, 2019

CMIP6 integration #46

Merged

CMIP6 version of cmip.py #43

Closed

matt-long reviewed Mar 18, 2019

View reviewed changes

andersy005 added 4 commits March 18, 2019 13:29

Avoid code duplication

4944728

Update configurations

01bbadd

disable dask.delayed

0bdf307

move chunk_size inside the function

ef62f70

matt-long reviewed Mar 18, 2019

View reviewed changes

intake_esm/cesm.py Outdated Show resolved Hide resolved

andersy005 added 4 commits March 18, 2019 16:41

Avoid re-chunking along time

b48a740

Give users control over join

cf82e16

rename infer_time_coord_name to ensure_time_coord_name

eb8fdaf

Remove unnecessary print statement

1a88bea

Backlog automation moved this from In progress to Reviewer approved Mar 19, 2019

matt-long approved these changes Mar 19, 2019

View reviewed changes

Update documentation

e7310e4

andersy005 merged commit 19b8d18 into intake:master Mar 19, 2019

Backlog automation moved this from Reviewer approved to Done Mar 19, 2019

andersy005 deleted the update-to_xarray branch March 19, 2019 13:54

andersy005 mentioned this pull request Apr 10, 2019

CMIP collection input file structure #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to existing functionality #45

Improvements to existing functionality #45

andersy005 commented Mar 15, 2019 •

edited

Loading

andersy005 commented Mar 17, 2019

matt-long commented Mar 18, 2019

andersy005 commented Mar 18, 2019

matt-long Mar 18, 2019

andersy005 Mar 18, 2019

matt-long Mar 19, 2019

andersy005 Mar 19, 2019

matt-long Mar 19, 2019

andersy005 Mar 19, 2019

matt-long Mar 18, 2019

andersy005 Mar 18, 2019

matt-long Mar 18, 2019

andersy005 Mar 18, 2019 •

edited

Loading

Improvements to existing functionality #45

Improvements to existing functionality #45

Conversation

andersy005 commented Mar 15, 2019 • edited Loading

andersy005 commented Mar 17, 2019

matt-long commented Mar 18, 2019

andersy005 commented Mar 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andersy005 Mar 18, 2019 • edited Loading

Choose a reason for hiding this comment

andersy005 commented Mar 15, 2019 •

edited

Loading

andersy005 Mar 18, 2019 •

edited

Loading