Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to join/concatenate datasets for model CESM2-FV2 #66

Closed
sckw opened this issue Dec 8, 2020 · 7 comments · Fixed by #79
Closed

failed to join/concatenate datasets for model CESM2-FV2 #66

sckw opened this issue Dec 8, 2020 · 7 comments · Fixed by #79
Assignees
Labels
bug Something isn't working

Comments

@sckw
Copy link

sckw commented Dec 8, 2020

I use the following code to load CESM2-FV2

from cmip6_preprocessing.preprocessing import combined_preprocessing

url = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
model = 'CESM2-FV2'

query = dict(experiment_id=['historical'], table_id='Omon', 
             variable_id='tos', grid_label=['gn'], source_id=model)
cat = col.search(**query)
print(cat.df['source_id'].unique())
z_kwargs = {'consolidated': True, 'decode_times':False}
tos_dict = cat.to_dataset_dict(zarr_kwargs=z_kwargs, preprocess=combined_preprocessing)

and I get the following error:

AggregationError: 
        Failed to join/concatenate datasets in group with key=CMIP.NCAR.CESM2-FV2.historical.Omon.gn along a new dimension `member_id`.
        *** Arguments passed to xarray.concat() ***:
        - objs: a list of 3 datasets
        - dim: <xarray.DataArray 'member_id' (member_id: 3)>
array(['r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1'], dtype='<U8')
Dimensions without coordinates: member_id
        - data_vars: ['tos']
        - and kwargs: {'coords': 'minimal', 'compat': 'override'}
        ********************************************
@jbusecke
Copy link
Owner

jbusecke commented Dec 9, 2020

Thank you for using cmip6_preprocessing and bringing up this issue.

I can reproduce this but am not sure yet, why this is happening. Currently I am having some issues with the pangeo cloud, but I will follow up later.

@jbusecke
Copy link
Owner

jbusecke commented Dec 9, 2020

Ok I think that I have a better idea what is happening. If I try to not concat in the member_id dimension:

tos_dict_pp = cat.to_dataset_dict(
    zarr_kwargs=z_kwargs,
    preprocess=combined_preprocessing,
    aggregate=False
)

and then try to combine them manually

# try to manually concat preprocessed datasets
xr.concat([a for a in tos_dict_pp.values()], dim='member_id')

The error is more informative:

ValueError: cannot reindex or align along dimension 'y' because the index has duplicate values

@andersy005 I think in general it would be nice to have a way to display these errors from within intake-esm. Is that possible currently?

The specific problem here is that the lon/lat values in this particular model contain very high values (which should be masked out)!

If we load the model without preprocessing and look at the latitude field

tos_dict = cat.to_dataset_dict(
    zarr_kwargs=z_kwargs,
    preprocess=None,
    aggregate=False
)
list(tos_dict.values())[0].lat.plot(vmax=200)

we get this:
image

The land points have values of 1e37. In the newest version this should be fixed here.

@sckw could you confirm which version you are using? You can do that by running

EDIT: I actually tried it with the newest version and it does not work. So this is a bug.

@jbusecke jbusecke added the bug Something isn't working label Dec 9, 2020
@jbusecke jbusecke self-assigned this Dec 9, 2020
@sckw
Copy link
Author

sckw commented Dec 9, 2020

@jbusecke I get 'earthcube_commit+58.g1d57184'

@jbusecke
Copy link
Owner

jbusecke commented Dec 9, 2020

That is an older version, but I just tested with the current version from github and the bug is still there. Ill see how I can fix this and once this is resolved Ill release a new version, which you can then upgrade to with conda install -c conda-forge cmip6_preprocessing.

@sckw
Copy link
Author

sckw commented Dec 9, 2020

Thank you!

@andersy005
Copy link

@andersy005 I think in general it would be nice to have a way to display these errors from within intake-esm. Is that possible currently?

We are already doing this. It's just that @sckw didn't post the first 2/3 of the traceback which includes the ValueError: cannot reindex or align along dimension 'y' error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/devel/intake/intake-esm/intake_esm/merge_util.py in join_new(dsets, dim_name, coord_value, varname, options, group_key)
     55         concat_dim = xr.DataArray(coord_value, dims=(dim_name), name=dim_name)
---> 56         return xr.concat(dsets, dim=concat_dim, data_vars=varname, **options)
     57     except Exception as exc:

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    190         )
--> 191     return f(
    192         objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    382     datasets = [ds.copy() for ds in datasets]
--> 383     datasets = align(
    384         *datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects)
    339         else:
--> 340             new_obj = obj.reindex(copy=copy, fill_value=fill_value, **valid_indexers)
    341         new_obj.encoding = obj.encoding

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/xarray/core/dataset.py in reindex(self, indexers, method, tolerance, copy, fill_value, **indexers_kwargs)
   2545         """
-> 2546         return self._reindex(
   2547             indexers,

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/xarray/core/dataset.py in _reindex(self, indexers, method, tolerance, copy, fill_value, sparse, **indexers_kwargs)
   2574 
-> 2575         variables, indexes = alignment.reindex_variables(
   2576             self.variables,

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/xarray/core/alignment.py in reindex_variables(variables, sizes, indexes, indexers, method, tolerance, copy, fill_value, sparse)
    549             if not index.is_unique:
--> 550                 raise ValueError(
    551                     "cannot reindex or align along dimension %r because the "

ValueError: cannot reindex or align along dimension 'y' because the index has duplicate values

The above exception was the direct cause of the following exception:

AggregationError                          Traceback (most recent call last)
<ipython-input-4-84f1708f03d1> in <module>
      4 print(cat.df['source_id'].unique())
      5 z_kwargs = {'consolidated': True, 'decode_times':False}
----> 6 tos_dict = cat.to_dataset_dict(zarr_kwargs=z_kwargs, preprocess=combined_preprocessing)

~/devel/intake/intake-esm/intake_esm/core.py in to_dataset_dict(self, zarr_kwargs, cdf_kwargs, preprocess, storage_options, progressbar, aggregate)
    925             ]
    926             for i, task in enumerate(concurrent.futures.as_completed(future_tasks)):
--> 927                 key, ds = task.result()
    928                 self._datasets[key] = ds
    929                 if self.progressbar:

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433 
    434             self._condition.wait(timeout)

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
    386     def __get_result(self):
    387         if self._exception:
--> 388             raise self._exception
    389         else:
    390             return self._result

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/concurrent/futures/thread.py in run(self)
     55 
     56         try:
---> 57             result = self.fn(*self.args, **self.kwargs)
     58         except BaseException as exc:
     59             self.future.set_exception(exc)

~/devel/intake/intake-esm/intake_esm/core.py in _load_source(key, source)
    911 
    912         def _load_source(key, source):
--> 913             return key, source.to_dask()
    914 
    915         sources = {key: source(**source_kwargs) for key, source in self.items()}

~/devel/intake/intake-esm/intake_esm/source.py in to_dask(self)
    244     def to_dask(self):
    245         """Return xarray object (which will have chunks)"""
--> 246         self._load_metadata()
    247         return self._ds
    248 

~/opt/miniconda3/envs/intake-esm-dev/lib/python3.8/site-packages/intake/source/base.py in _load_metadata(self)
    124         """load metadata only if needed"""
    125         if self._schema is None:
--> 126             self._schema = self._get_schema()
    127             self.datashape = self._schema.datashape
    128             self.dtype = self._schema.dtype

~/devel/intake/intake-esm/intake_esm/source.py in _get_schema(self)
    173 
    174         if self._ds is None:
--> 175             self._open_dataset()
    176 
    177             metadata = {

~/devel/intake/intake-esm/intake_esm/source.py in _open_dataset(self)
    230         n_agg = len(self.aggregation_columns)
    231 
--> 232         ds = _aggregate(
    233             self.aggregation_dict,
    234             self.aggregation_columns,

~/devel/intake/intake-esm/intake_esm/merge_util.py in _aggregate(aggregation_dict, agg_columns, n_agg, nd, mapper_dict, group_key)
    238         return ds
    239 
--> 240     return apply_aggregation(nd)
    241 
    242 

~/devel/intake/intake-esm/intake_esm/merge_util.py in apply_aggregation(nd, agg_column, key, level)
    194             agg_options = {}
    195 
--> 196         dsets = [
    197             apply_aggregation(value, agg_column, key=key, level=level + 1)
    198             for key, value in nd.items()

~/devel/intake/intake-esm/intake_esm/merge_util.py in <listcomp>(.0)
    195 
    196         dsets = [
--> 197             apply_aggregation(value, agg_column, key=key, level=level + 1)
    198             for key, value in nd.items()
    199         ]

~/devel/intake/intake-esm/intake_esm/merge_util.py in apply_aggregation(nd, agg_column, key, level)
    216         if agg_type == 'join_new':
    217             varname = dsets[0].attrs['intake_esm_varname']
--> 218             ds = join_new(
    219                 dsets,
    220                 dim_name=agg_column,

~/devel/intake/intake-esm/intake_esm/merge_util.py in join_new(dsets, dim_name, coord_value, varname, options, group_key)
     69         """
     70 
---> 71         raise AggregationError(message) from exc
     72 
     73 

AggregationError: 
        Failed to join/concatenate datasets in group with key=CMIP.NCAR.CESM2-FV2.historical.Omon.gn along a new dimension `member_id`.

        *** Arguments passed to xarray.concat() ***:

        - objs: a list of 3 datasets
        - dim: <xarray.DataArray 'member_id' (member_id: 3)>
array(['r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1'], dtype='<U8')
Dimensions without coordinates: member_id
        - data_vars: ['tos']
        - and kwargs: {'coords': 'minimal', 'compat': 'override'}

        ********************************************

@jbusecke
Copy link
Owner

Ah ok, thanks for following up. I have to admit I also did not see it in there. Gotta look closer next time.

@sckw I am almost ready with #62 (which also confirms this issue for other variables of the same model), and will work on a fix as soon as that is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants