-
Hi @benjeffery , Following up on the call, I am starting a discussion about how to use sgkit on an HPC cluster. Everything works fine for the small HAPNEST data set (600 samples). I am having issues getting the full chromosome Chr 20 dataset to run. ErrorsWith more or less the same setup just different SLURM job configurations (CPUs, Memory per CPU) I got various errors. OOM KilledI am running the job on a node where I requested one CPU with 320GB memory.
ValueErrorI am not sure about the exact setup of this error but I saw it commonly.
What I am tryingI have converted the
I am chunking in variant and sample dimensions at chunk sizes
I did a quick estimate of memory requirements for the FULL data set before selecting variables to write by computing chunk sizes as
The result of this calculation is on the order of 1GB per chunk. Questions I have
Available HardwareBrown University HPC Cluster (https://docs.ccv.brown.edu/oscar/system-overview)
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 4 replies
-
Here’s a useful blog post with links to videos for running Dask on HPC clusters: https://blog.dask.org/2019/08/28/dask-on-summit. They are using a different scheduler but SLURM should be similar. |
Beta Was this translation helpful? Give feedback.
-
This video could be helpful too? https://m.youtube.com/watch?v=wJHosuzqLaU |
Beta Was this translation helpful? Give feedback.
-
You might also try the Dask discourse, there’s a discussion of SLURM at https://dask.discourse.group/t/some-questions-about-slurmcluster/1553/5 for example |
Beta Was this translation helpful? Give feedback.
-
I am able to work with dask clusters on the HPC now. However, sgkit functions don't seem to be stable when running on a dask cluster. What I dods = sg.count_variant_genotypes(ds) NOTE:
Error---------------------------------------------------------------------------
KilledWorker Traceback (most recent call last)
Cell In[30], line 1
----> 1 ds = sg.count_variant_genotypes(ds)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/sgkit/stats/aggregation.py:350, in count_variant_genotypes(ds, call_genotype, genotype_id, assign_coords, merge)
281 """Count the number of calls of each possible genotype, at each variant.
282
283 The "possible genotypes" at a given variant locus include all possible
(...)
342 [2, 0, 0]], dtype=uint64)
343 """
344 from .conversion_numba_fns import (
345 _comb_with_replacement,
346 _count_biallelic_genotypes,
347 _count_sorted_genotypes,
348 )
--> 350 ds = define_variable_if_absent(
351 ds,
352 variables.genotype_id,
353 genotype_id,
354 genotype_coords,
355 assign_coords=assign_coords,
356 )
357 variables.validate(
358 ds,
359 {
(...)
362 },
363 )
364 mixed_ploidy = ds[call_genotype].attrs.get("mixed_ploidy", False)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/sgkit/utils.py:282, in define_variable_if_absent(ds, default_variable_name, variable_name, func, **kwargs)
278 if variable_name != default_variable_name:
279 raise ValueError(
280 f"Variable '{variable_name}' with non-default name is missing and will not be automatically defined."
281 )
--> 282 return func(ds, **kwargs)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/sgkit/stats/aggregation.py:452, in genotype_coords(ds, chunks, assign_coords, merge)
450 ds = conditional_merge_datasets(ds, new_ds, merge)
451 if assign_coords:
--> 452 ds = ds.assign_coords({"genotypes": S})
453 return ds
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/common.py:621, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
618 else:
619 results = self._calc_assign_results(coords_combined)
--> 621 data.coords.update(results)
622 return data
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/coordinates.py:542, in Coordinates.update(self, other)
540 other_coords = other
541 else:
--> 542 other_coords = create_coords_with_default_indexes(
543 getattr(other, "variables", other)
544 )
546 # Discard original indexed coordinates prior to merge allows to:
547 # - fail early if the new coordinates don't preserve the integrity of existing
548 # multi-coordinate indexes
549 # - drop & replace coordinates without alignment (note: we must keep indexed
550 # coordinates extracted from the DataArray objects passed as values to
551 # `other` - if any - as those are still used for aligning the old/new coordinates)
552 coords_to_align = drop_indexed_coords(set(other_coords) & set(other), self)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/coordinates.py:1001, in create_coords_with_default_indexes(coords, data_vars)
998 if isinstance(obj, DataArray):
999 dataarray_coords.append(obj.coords)
-> 1001 variable = as_variable(obj, name=name)
1003 if variable.dims == (name,):
1004 idx, idx_vars = create_default_index_implicit(variable, all_variables)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/variable.py:159, in as_variable(obj, name)
152 raise TypeError(
153 f"Variable {name!r}: unable to convert object into a variable without an "
154 f"explicit list of dimensions: {obj!r}"
155 )
157 if name is not None and name in obj.dims and obj.ndim == 1:
158 # automatically convert the Variable into an Index
--> 159 obj = obj.to_index_variable()
161 return obj
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/variable.py:572, in Variable.to_index_variable(self)
570 def to_index_variable(self) -> IndexVariable:
571 """Return this variable as an xarray.IndexVariable"""
--> 572 return IndexVariable(
573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
574 )
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/variable.py:2641, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
2639 # Unlike in Variable, always eagerly load values into memory
2640 if not isinstance(self._data, PandasIndexingAdapter):
-> 2641 self._data = PandasIndexingAdapter(self._data)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
1479 from xarray.core.indexes import safe_cast_to_index
-> 1481 self.array = safe_cast_to_index(array)
1483 if dtype is None:
1484 self._dtype = get_valid_numpy_dtype(array)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/indexes.py:469, in safe_cast_to_index(array)
459 emit_user_level_warning(
460 (
461 "`pandas.Index` does not support the `float16` dtype."
(...)
465 category=DeprecationWarning,
466 )
467 kwargs["dtype"] = "float64"
--> 469 index = pd.Index(np.asarray(array), **kwargs)
471 return _maybe_cast_to_cftimeindex(index)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/dask/array/core.py:1700, in Array.__array__(self, dtype, **kwargs)
1699 def __array__(self, dtype=None, **kwargs):
-> 1700 x = self.compute()
1701 if dtype and x.dtype != dtype:
1702 x = x.astype(dtype)
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/dask/base.py:342, in DaskMethodsMixin.compute(self, **kwargs)
318 def compute(self, **kwargs):
319 """Compute this dask collection
320
321 This turns a lazy Dask collection into its in-memory equivalent.
(...)
340 dask.compute
341 """
--> 342 (result,) = compute(self, traverse=False, **kwargs)
343 return result
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
625 postcomputes.append(x.__dask_postcompute__())
627 with shorten_traceback():
--> 628 results = schedule(dsk, keys, **kwargs)
630 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/distributed/client.py:2243, in Client._gather(self, futures, errors, direct, local_worker)
2241 exc = CancelledError(key)
2242 else:
-> 2243 raise exception.with_traceback(traceback)
2244 raise exc
2245 if errors == "skip":
KilledWorker: Attempted to run task ('arange-genotype_as_bytes-index_as_genotype-astype-f71e7c6938594f8f781dbe2ba3bed20c', 0) on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://172.20.220.11:36577. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html. |
Beta Was this translation helpful? Give feedback.
-
Hi @jdstamp, |
Beta Was this translation helpful? Give feedback.
Here’s a useful blog post with links to videos for running Dask on HPC clusters: https://blog.dask.org/2019/08/28/dask-on-summit. They are using a different scheduler but SLURM should be similar.