-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to existing functionality #45
Conversation
kwargs
to to_xarray()
methodto_xarray()
method
@matt-long, I've experienced difficulties while trying to merge/concat CMIP data from different institutions. It appears that there are discrepancies among coordinates of different models' outputs. For queries in which there's data from more than one institution, xarray is unable to align the coordinates. For instance, I saw one variable in which the coordinates in one model are should we add a check that forbids merging datasets from different institutions? Let me know what would be the right approach. |
@andersy005, I would not expect models from different institutions to be concatenate-able; the different models do indeed have different coordinates. I think we should add a check that ensures prevent this. Another approach would be to return a data structure that includes the datasets from the different models. A dictionary with the institution ID as the key, for instance, could work. This would entail an outer loop, probably best implemented as a method that calls this |
@matt-long, this is ready for another look |
to_xarray()
method_ds_dict = {} | ||
grouped = get_subset(self.collection_name, self.collection_type, query).groupby( | ||
'institution' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the code above belong in a _validate_concat_open_dataset
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine for now. I was wondering whether you could write a general method for validation. Not sure of the API....
def _validate_concat(check_max_instance=None):
if check_max_instance is not None:
for fld, max_inst in check_max_instance.items():
fld_list = self.query_results[fld].unique()
if len(fld_list) > max_inst:
raise ValueError(f'message about {fld}')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Now that we are handling an addition case of datasets that cannot be concatenated by returning a dictionary of dsets in:
can you think of other cases that wouldn't be handled by the above code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably need to work thru some use cases to really develop my intuition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Let me know when you get to play with this.
intake_esm/cmip.py
Outdated
grouped = get_subset(self.collection_name, self.collection_type, query).groupby( | ||
'institution' | ||
) | ||
for name, group in grouped: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an _open_dataset_groups
method that calls the _open_dataset
method? Would this allow us to reuse more code between cesm.py
and cmip.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the advantage of generalizing this. What would be the equivalent of _open_dataset_groups()
for cesm data, in other words, what is the equivalent of CMIP institutions in CESM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think about this as moving toward any arbitrary collection of datasets that should not be concatenated along a dimension
. Observations, integrations from different model version or resolutions, for instance. We might want to have a model general dataset_id
or something versus insisting on "institution."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a stab at generalizing it (I am afraid that it may not be a trivial task though)
to_xarray()
method to allow users to pass keyword arguments to this methodcmip.py
in preparation for cmip6 integration as proposed in CMIP6 version of cmip.py #43