Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with CanESM.ssp585.r9i1p1f1.Omon.tos CMIP6 zarr store and intake-esm #331

Closed
jbusecke opened this issue Apr 9, 2021 · 2 comments
Closed

Comments

@jbusecke
Copy link
Contributor

jbusecke commented Apr 9, 2021

Description

I am running into some obscure problem with a single zarr store in the pangeo CMIP6 archive.

The data in question is CanESM.ssp585.r9i1p1f1.Omon.tos.

I basically follow the standard instructions on how to load the data with intake-esm here

import intake
col = intake.open_esm_datastore("https://storage.googleapis.com/cmip6/pangeo-cmip6.json")

kwargs = {'zarr_kwargs':{'consolidated':True, 'use_cftime':True}, 'aggregate':False}
experiment_id = 'ssp585'
cat_data = col.search(source_id='CanESM5',variable_id='tos', experiment_id=experiment_id, grid_label='gn', table_id='Omon')
ddict = cat_data.to_dataset_dict(**kwargs)

Most of the datasets (there are several members for this model experiment) work as expected

ds = ddict['ScenarioMIP.CCCma.CanESM5.ssp585.r10i1p1f1.Omon.tos.gn.gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp585/r10i1p1f1/Omon/tos/gn/v20190429/.nan.20190429']
ds.load()

image

But there is a single one that fails to load

ds_fail = ddict['ScenarioMIP.CCCma.CanESM5.ssp585.r9i1p1f1.Omon.tos.gn.gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp585/r9i1p1f1/Omon/tos/gn/v20190429/.nan.20190429']
ds_fail.load()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-3d52dfac9758> in <module>
      1 ds_fail = ddict['ScenarioMIP.CCCma.CanESM5.ssp585.r9i1p1f1.Omon.tos.gn.gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp585/r9i1p1f1/Omon/tos/gn/v20190429/.nan.20190429' class="ansi-blue-fg">]
      2 
----> 3 ds_fail.load()

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs)
    739 
    740             # evaluate all the dask arrays simultaneously
--> 741             evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    742 
    743             for k, data in zip(lazy_data, evaluated_data):

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs)
    559         postcomputes.append(x.__dask_postcompute__())
    560 
--> 561     results = schedule(dsk, keys, **kwargs)
    562     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    563 

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     74                 pools[thread][num_workers] = pool
     75 
---> 76     results = get_async(
     77         pool.apply_async,
     78         len(pool._pool),

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    485                         _execute_task(task, data)  # Re-execute locally
    486                     else:
--> 487                         raise_exception(exc, tb)
    488                 res, worker_id = loads(res_info)
    489                 state["cache"][key] = res

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/local.py in reraise(exc, tb)
    315     if exc.__traceback__ is not tb:
    316         raise exc.with_traceback(tb)
--> 317     raise exc
    318 
    319 

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    220     try:
    221         task, data = loads(task_info)
--> 222         result = _execute_task(task, data)
    223         id = get_id()
    224         result = dumps((result, id))

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/srv/conda/envs/notebook/lib/python3.8/site-packages/dask/array/core.py in getter(a, b, asarray, lock)
    104         c = a[b]
    105         if asarray:
--> 106             c = np.asarray(c)
    107     finally:
    108         if lock:

/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    501 
    502     def __array__(self, dtype=None):
--> 503         return np.asarray(self.array, dtype=dtype)
    504 
    505     def __getitem__(self, key):

/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    661 
    662     def __array__(self, dtype=None):
--> 663         return np.asarray(self.array, dtype=dtype)
    664 
    665     def __getitem__(self, key):

/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    566     def __array__(self, dtype=None):
    567         array = as_indexable(self.array)
--> 568         return np.asarray(array[self.key], dtype=None)
    569 
    570     def transpose(self, order):

/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/coding/variables.py in __array__(self, dtype)
     68 
     69     def __array__(self, dtype=None):
---> 70         return self.func(self.array)
     71 
     72     def __repr__(self):

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/coding/variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype)
    136 ) -> np.ndarray:
    137     """Mask all matching values in a NumPy arrays."""
--> 138     data = np.asarray(data, dtype=dtype)
    139     condition = False
    140     for fv in encoded_fill_values:

/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    566     def __array__(self, dtype=None):
    567         array = as_indexable(self.array)
--> 568         return np.asarray(array[self.key], dtype=None)
    569 
    570     def transpose(self, order):

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/backends/zarr.py in __getitem__(self, key)
     55         array = self.get_array()
     56         if isinstance(key, indexing.BasicIndexer):
---> 57             return array[key.tuple]
     58         elif isinstance(key, indexing.VectorizedIndexer):
     59             return array.vindex[

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in __getitem__(self, selection)
    569 
    570         fields, selection = pop_fields(selection)
--> 571         return self.get_basic_selection(selection, fields=fields)
    572 
    573     def get_basic_selection(self, selection=Ellipsis, out=None, fields=None):

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in get_basic_selection(self, selection, out, fields)
    694                                                 fields=fields)
    695         else:
--> 696             return self._get_basic_selection_nd(selection=selection, out=out,
    697                                                 fields=fields)
    698 

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in _get_basic_selection_nd(self, selection, out, fields)
    737         indexer = BasicIndexer(selection, self)
    738 
--> 739         return self._get_selection(indexer=indexer, out=out, fields=fields)
    740 
    741     def get_orthogonal_selection(self, selection, out=None, fields=None):

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in _get_selection(self, indexer, out, fields)
   1032             # allow storage to get multiple items at once
   1033             lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1034             self._chunk_getitems(lchunk_coords, lchunk_selection, out, lout_selection,
   1035                                  drop_axes=indexer.drop_axes, fields=fields)
   1036 

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in _chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   1692         for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   1693             if ckey in cdatas:
-> 1694                 self._process_chunk(out, cdatas[ckey], chunk_select, drop_axes,
   1695                                     out_is_ndarray, fields, out_select)
   1696             else:

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in _process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection)
   1615 
   1616         # decode chunk
-> 1617         chunk = self._decode_chunk(cdata)
   1618 
   1619         # select data from chunk

/srv/conda/envs/notebook/lib/python3.8/site-packages/zarr/core.py in _decode_chunk(self, cdata)
   1830         # ensure correct chunk shape
   1831         chunk = chunk.reshape(-1, order='A')
-> 1832         chunk = chunk.reshape(self._chunks, order=self._order)
   1833 
   1834         return chunk

ValueError: cannot reshape array of size 21999600 into shape (214,291,360)

Weirdly enough I can load the data without problems, when I do not use intake-esm....

import gcsfs
import xarray as xr

# Connect to Google Cloud Storage
fs = gcsfs.GCSFileSystem(token='anon', access='read_only')

# create a MutableMapping from a store URL
mapper = fs.get_mapper('gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp585/r9i1p1f1/Omon/tos/gn/v20190429/')

# loading the raw zarr does not add `member_id`
ds_raw = xr.open_zarr(mapper)
ds_raw.load()

image

I am not really sure how to diagnose further what is going on here? Any thoughts?

Also cc'ing @naomi-henderson (Is there a way to interrogate this single store further?)

Version information: output of intake_esm.show_versions()

'2021.1.15'

@andersy005
Copy link
Member

I'm going to look into this and will let you what I find.

@naomi-henderson
Copy link

@jbusecke and @andersy005 - sorry for not getting to this sooner. The r9i1p1f1 version has crazy latitude values:

import fsspec
ds9 = xr.open_zarr(fsspec.get_mapper('gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp585/r9i1p1f1/Omon/tos/gn/v20190429/'),consolidated=True)
ds10 = xr.open_zarr(fsspec.get_mapper('gs://cmip6/CMIP6/ScenarioMIP/CCCma/CanESM5/ssp585/r10i1p1f1/Omon/tos/gn/v20190429/'),consolidated=True)

then plot

ds9.latitude.plot()

and

ds10.latitude.plot()

@intake intake locked and limited conversation to collaborators Dec 21, 2021
@andersy005 andersy005 converted this issue into discussion #423 Dec 21, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants