New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent blosc decompression errors #58
Comments
Have you seen the blosc error locally? I vaguely recall seeing something like that when I was converting Daymet to Zarr for Azure's Open Datasets. |
Not sure it's related, but we have this warning:
|
Yes, happens constantly in my macbook during testing. |
One hint is that I think it only happens inside the dask executor, so probably has something to do with threads. |
Perhaps we could fix the blosc issue with dask/distributed#1054? |
These seem to have gone away in testing. |
I've been frequently seeing these blosc decompression errors locally, with the following package versions:
These are occurring when executing an I'd been successfully running the recipe in the Pangeo Forge Sandbox. However, it also happened there once this week: ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [1], in <cell line: 165>()
163 from dask.diagnostics import ProgressBar
165 with ProgressBar():
--> 166 delayed.compute()
168 ds = xr.open_zarr(recipe.target_mapper)
169 ds
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/base.py:312, in DaskMethodsMixin.compute(self, **kwargs)
288 def compute(self, **kwargs):
289 """Compute this dask collection
290
291 This turns a lazy Dask collection into its in-memory equivalent.
(...)
310 dask.base.compute
311 """
--> 312 (result,) = compute(self, traverse=False, **kwargs)
313 return result
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/base.py:600, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
597 keys.append(x.__dask_keys__())
598 postcomputes.append(x.__dask_postcompute__())
--> 600 results = schedule(dsk, keys, **kwargs)
601 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/threaded.py:81, in get(dsk, result, cache, num_workers, pool, **kwargs)
78 elif isinstance(pool, multiprocessing.pool.Pool):
79 pool = MultiprocessingPoolExecutor(pool)
---> 81 results = get_async(
82 pool.submit,
83 pool._max_workers,
84 dsk,
85 result,
86 cache=cache,
87 get_id=_thread_get_id,
88 pack_exception=pack_exception,
89 **kwargs,
90 )
92 # Cleanup pools associated to dead threads
93 with pools_lock:
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py:508, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
506 _execute_task(task, data) # Re-execute locally
507 else:
--> 508 raise_exception(exc, tb)
509 res, worker_id = loads(res_info)
510 state["cache"][key] = res
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py:316, in reraise(exc, tb)
314 if exc.__traceback__ is not tb:
315 raise exc.with_traceback(tb)
--> 316 raise exc
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/local.py:221, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
219 try:
220 task, data = loads(task_info)
--> 221 result = _execute_task(task, data)
222 id = get_id()
223 result = dumps((result, id))
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
115 func, args = arg[0], arg[1:]
116 # Note: Don't assign the subtask results to a variable. numpy detects
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/optimization.py:990, in SubgraphCallable.__call__(self, *args)
988 if not len(args) == len(self.inkeys):
989 raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/core.py:149, in get(dsk, out, cache)
147 for key in toposort(dsk):
148 task = dsk[key]
--> 149 result = _execute_task(task, cache)
150 cache[key] = result
151 result = _execute_task(out, cache)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
115 func, args = arg[0], arg[1:]
116 # Note: Don't assign the subtask results to a variable. numpy detects
117 # temporaries by their reference count and can execute certain
118 # operations in-place.
--> 119 return func(*(_execute_task(a, cache) for a in args))
120 elif not ishashable(arg):
121 return arg
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/dask.py:16, in wrap_map_task.<locals>.wrapped(map_arg, config, *dependencies)
15 def wrapped(map_arg, config, *dependencies):
---> 16 return function(map_arg, config=config)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py:635, in store_chunk(chunk_key, config)
631 with lock_for_conflicts(lock_keys, timeout=config.lock_timeout):
632 logger.info(
633 f"Storing variable {vname} chunk {chunk_key!s} " f"to Zarr region {zarr_region}"
634 )
--> 635 zarr_array[zarr_region] = data
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:1285, in Array.__setitem__(self, selection, value)
1283 self.vindex[selection] = value
1284 else:
-> 1285 self.set_basic_selection(pure_selection, value, fields=fields)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:1380, in Array.set_basic_selection(self, selection, value, fields)
1378 return self._set_basic_selection_zd(selection, value, fields=fields)
1379 else:
-> 1380 return self._set_basic_selection_nd(selection, value, fields=fields)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:1680, in Array._set_basic_selection_nd(self, selection, value, fields)
1674 def _set_basic_selection_nd(self, selection, value, fields=None):
1675 # implementation of __setitem__ for array with at least one dimension
1676
1677 # setup indexer
1678 indexer = BasicIndexer(selection, self)
-> 1680 self._set_selection(indexer, value, fields=fields)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:1732, in Array._set_selection(self, indexer, value, fields)
1729 chunk_value = chunk_value[item]
1731 # put data
-> 1732 self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
1733 else:
1734 lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:1994, in Array._chunk_setitem(self, chunk_coords, chunk_selection, value, fields)
1991 lock = self._synchronizer[ckey]
1993 with lock:
-> 1994 self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
1995 fields=fields)
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:1999, in Array._chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields)
1997 def _chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields=None):
1998 ckey = self._chunk_key(chunk_coords)
-> 1999 cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields)
2001 # attempt to delete chunk if it only contains the fill value
2002 if (not self.write_empty_chunks) and all_equal(self.fill_value, cdata):
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:2049, in Array._process_for_setitem(self, ckey, chunk_selection, value, fields)
2044 chunk = np.zeros(self._chunks, dtype=self._dtype, order=self._order)
2046 else:
2047
2048 # decode chunk
-> 2049 chunk = self._decode_chunk(cdata)
2050 if not chunk.flags.writeable:
2051 chunk = chunk.copy(order='K')
File /srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py:2076, in Array._decode_chunk(self, cdata, start, nitems, expected_shape)
2074 chunk = self._compressor.decode_partial(cdata, start, nitems)
2075 else:
-> 2076 chunk = self._compressor.decode(cdata)
2077 else:
2078 chunk = cdata
File numcodecs/blosc.pyx:564, in numcodecs.blosc.Blosc.decode()
File numcodecs/blosc.pyx:394, in numcodecs.blosc.decompress()
RuntimeError: error during blosc decompression: -1 Debugging this locally, I got as far as No problem if you'd prefer this to be recorded in a new issue instead of adding it to this currently closed issue. Cheers, |
I just ran into this issue using the new Beam pipeline approach, the pipeline erroring out when I'm baking the recipe with File "numcodecs/blosc.pyx", line 564, in numcodecs.blosc.Blosc.decode
File "numcodecs/blosc.pyx", line 394, in numcodecs.blosc.decompress
RuntimeError: error during blosc decompression: -1 [while running 'Create|OpenURLWithFSSpec|OpenWithXarray|StoreToZarr/StoreToZarr/StoreDatasetFragments/Map(store_dataset_fragment)'] I then tried adding the following lines to the recipe, since this has worked in other cases where we've had this problem: import numcodecs
numcodecs.blosc.use_threads = False
from numcodecs import Zstd
import zarr
zarr.storage.default_compressor = Zstd(level=7) before the recipe: recipe = (
beam.Create(pattern.items())
| OpenURLWithFSSpec()
| OpenWithXarray(file_type=pattern.file_type, xarray_open_kwargs={"decode_coords": "all"})
| StoreToZarr(
store_name="gene",
combine_dims=pattern.combine_dim_keys,
target_chunks=chunk_plan
)
) but it didn't seem to do anything -- same error occurred. Do we have to do something different to inject the numcodecs/blosc stuff into the pipeline? Or am I heading down the wrong path? |
@rsignell-usgs, thanks for reporting. I've transferred this to a new issue #560. Let's discuss there! |
Executor tests occasionally fail like this
For example: https://github.com/pangeo-forge/pangeo-forge/runs/1754269723?check_suite_focus=true
Same as pangeo-data/pangeo#196
The text was updated successfully, but these errors were encountered: