sgkit on HPC cluster with SLURM scheduler #1178

jdstamp · 2024-01-22T17:48:48Z

jdstamp
Jan 22, 2024
Collaborator

Following up on the call, I am starting a discussion about how to use sgkit on an HPC cluster.

Everything works fine for the small HAPNEST data set (600 samples). I am having issues getting the full chromosome Chr 20 dataset to run.

Errors

With more or less the same setup just different SLURM job configurations (CPUs, Memory per CPU) I got various errors.

OOM Killed

I am running the job on a node where I requested one CPU with 320GB memory.

slurmstepd: error: Detected 1 oom_kill event in StepId=151103.batch. Some of the step tasks have been OOM Killed.

ValueError

I am not sure about the exact setup of this error but I saw it commonly.

ValueError: Codec does not support buffers of > 2147483647 bytes

What I am trying

I have converted the .bed files to .zarr. I have the associated ancestry and phenotype files. I am able to merge and chunk and operate on these data. The current point where it fails is this call:

    ds = ds.chunk(new_chunks)
    variables_to_write = [
        "call_genotype",
        "call_dosage",
        "sample_pheno",
        "sample_pca_projection_0",
        "sample_pca_projection_1",
        "sample_pca_projection_2",
        "variant_position",
        "variant_contig",
    ]
    ds[variables_to_write].to_zarr(gwas_zarr_file, mode="w")

I am chunking in variant and sample dimensions at chunk sizes

    new_chunks = {
    "samples": (1e4),
    "variants": (1e4),
    "ploidy": (2),
    }

I did a quick estimate of memory requirements for the FULL data set before selecting variables to write by computing chunk sizes as

def compute_chunk_size(ds):
    size = 0
    for var_name, var in ds.variables.items():
        try:
            dims = [s[0] for s in var.chunks]
            size += np.prod(dims) * np.dtype(var).itemsize
        except:
            pass
    return size  / (1024 ** 2)

The result of this calculation is on the order of 1GB per chunk.

Questions I have

How can I monitor what dask is doing on an HPC like the one I am working on?
How much do I need to actively do for dask to work in parallel?
How do I optimize the Hardware configuration with respect to chunk sizes and parallel execution?

Available Hardware

Brown University HPC Cluster (https://docs.ccv.brown.edu/oscar/system-overview)

24-48 core nodes with 95-385 GB Memory
32 core nodes with 772-2146 GB Memory

Answered by hammer

Jan 23, 2024

Here’s a useful blog post with links to videos for running Dask on HPC clusters: https://blog.dask.org/2019/08/28/dask-on-summit. They are using a different scheduler but SLURM should be similar.

View full answer

hammer · 2024-01-23T00:43:59Z

hammer
Jan 23, 2024
Maintainer

Here’s a useful blog post with links to videos for running Dask on HPC clusters: https://blog.dask.org/2019/08/28/dask-on-summit. They are using a different scheduler but SLURM should be similar.

1 reply

jdstamp Jan 23, 2024
Collaborator Author

This blog post and video are amazing.

Very clear instructions on how to set up a dask cluster with dashboard and jupyter lab using ssh tunneling.

hammer · 2024-01-23T00:45:30Z

hammer
Jan 23, 2024
Maintainer

This video could be helpful too? https://m.youtube.com/watch?v=wJHosuzqLaU

0 replies

hammer · 2024-01-23T00:46:53Z

hammer
Jan 23, 2024
Maintainer

You might also try the Dask discourse, there’s a discussion of SLURM at https://dask.discourse.group/t/some-questions-about-slurmcluster/1553/5 for example

0 replies

jdstamp · 2024-01-23T22:34:29Z

jdstamp
Jan 23, 2024
Collaborator Author

I am able to work with dask clusters on the HPC now. However, sgkit functions don't seem to be stable when running on a dask cluster.

What I do

ds = sg.count_variant_genotypes(ds)

NOTE:

This error only occurs when running it with the dask cluster. It runs fine without a dask cluster.
It does not seem to be due to memory issues because it fails on 320GB workers even for the 600 samples HAPNEST data. On the full Chr 20 data with 4 x 320GB memory workers and 32 threads each, I never have more than 100GB memory usage per worker.
I can run memory heavy tasks without issues, e.g. from a tutorial manipulating large dask.array runs without issue to completion.

Error

---------------------------------------------------------------------------
KilledWorker                              Traceback (most recent call last)
Cell In[30], line 1
----> 1 ds = sg.count_variant_genotypes(ds)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/sgkit/stats/aggregation.py:350, in count_variant_genotypes(ds, call_genotype, genotype_id, assign_coords, merge)
    281 """Count the number of calls of each possible genotype, at each variant.
    282 
    283 The "possible genotypes" at a given variant locus include all possible
   (...)
    342        [2, 0, 0]], dtype=uint64)
    343 """
    344 from .conversion_numba_fns import (
    345     _comb_with_replacement,
    346     _count_biallelic_genotypes,
    347     _count_sorted_genotypes,
    348 )
--> 350 ds = define_variable_if_absent(
    351     ds,
    352     variables.genotype_id,
    353     genotype_id,
    354     genotype_coords,
    355     assign_coords=assign_coords,
    356 )
    357 variables.validate(
    358     ds,
    359     {
   (...)
    362     },
    363 )
    364 mixed_ploidy = ds[call_genotype].attrs.get("mixed_ploidy", False)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/sgkit/utils.py:282, in define_variable_if_absent(ds, default_variable_name, variable_name, func, **kwargs)
    278 if variable_name != default_variable_name:
    279     raise ValueError(
    280         f"Variable '{variable_name}' with non-default name is missing and will not be automatically defined."
    281     )
--> 282 return func(ds, **kwargs)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/sgkit/stats/aggregation.py:452, in genotype_coords(ds, chunks, assign_coords, merge)
    450 ds = conditional_merge_datasets(ds, new_ds, merge)
    451 if assign_coords:
--> 452     ds = ds.assign_coords({"genotypes": S})
    453 return ds

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/common.py:621, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    618 else:
    619     results = self._calc_assign_results(coords_combined)
--> 621 data.coords.update(results)
    622 return data

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/coordinates.py:542, in Coordinates.update(self, other)
    540     other_coords = other
    541 else:
--> 542     other_coords = create_coords_with_default_indexes(
    543         getattr(other, "variables", other)
    544     )
    546 # Discard original indexed coordinates prior to merge allows to:
    547 # - fail early if the new coordinates don't preserve the integrity of existing
    548 #   multi-coordinate indexes
    549 # - drop & replace coordinates without alignment (note: we must keep indexed
    550 #   coordinates extracted from the DataArray objects passed as values to
    551 #   `other` - if any - as those are still used for aligning the old/new coordinates)
    552 coords_to_align = drop_indexed_coords(set(other_coords) & set(other), self)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/coordinates.py:1001, in create_coords_with_default_indexes(coords, data_vars)
    998 if isinstance(obj, DataArray):
    999     dataarray_coords.append(obj.coords)
-> 1001 variable = as_variable(obj, name=name)
   1003 if variable.dims == (name,):
   1004     idx, idx_vars = create_default_index_implicit(variable, all_variables)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/variable.py:159, in as_variable(obj, name)
    152     raise TypeError(
    153         f"Variable {name!r}: unable to convert object into a variable without an "
    154         f"explicit list of dimensions: {obj!r}"
    155     )
    157 if name is not None and name in obj.dims and obj.ndim == 1:
    158     # automatically convert the Variable into an Index
--> 159     obj = obj.to_index_variable()
    161 return obj

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/variable.py:572, in Variable.to_index_variable(self)
    570 def to_index_variable(self) -> IndexVariable:
    571     """Return this variable as an xarray.IndexVariable"""
--> 572     return IndexVariable(
    573         self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
    574     )

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/variable.py:2641, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
   2639 # Unlike in Variable, always eagerly load values into memory
   2640 if not isinstance(self._data, PandasIndexingAdapter):
-> 2641     self._data = PandasIndexingAdapter(self._data)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
   1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
   1479     from xarray.core.indexes import safe_cast_to_index
-> 1481     self.array = safe_cast_to_index(array)
   1483     if dtype is None:
   1484         self._dtype = get_valid_numpy_dtype(array)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/xarray/core/indexes.py:469, in safe_cast_to_index(array)
    459             emit_user_level_warning(
    460                 (
    461                     "`pandas.Index` does not support the `float16` dtype."
   (...)
    465                 category=DeprecationWarning,
    466             )
    467             kwargs["dtype"] = "float64"
--> 469     index = pd.Index(np.asarray(array), **kwargs)
    471 return _maybe_cast_to_cftimeindex(index)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/dask/array/core.py:1700, in Array.__array__(self, dtype, **kwargs)
   1699 def __array__(self, dtype=None, **kwargs):
-> 1700     x = self.compute()
   1701     if dtype and x.dtype != dtype:
   1702         x = x.astype(dtype)

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/dask/base.py:342, in DaskMethodsMixin.compute(self, **kwargs)
    318 def compute(self, **kwargs):
    319     """Compute this dask collection
    320 
    321     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    340     dask.compute
    341     """
--> 342     (result,) = compute(self, traverse=False, **kwargs)
    343     return result

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    625     postcomputes.append(x.__dask_postcompute__())
    627 with shorten_traceback():
--> 628     results = schedule(dsk, keys, **kwargs)
    630 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /oscar/home/jstamp1/.cache/pypoetry/virtualenvs/sgkit-hapnest-80xaQxl9-py3.11/lib/python3.11/site-packages/distributed/client.py:2243, in Client._gather(self, futures, errors, direct, local_worker)
   2241         exc = CancelledError(key)
   2242     else:
-> 2243         raise exception.with_traceback(traceback)
   2244     raise exc
   2245 if errors == "skip":

KilledWorker: Attempted to run task ('arange-genotype_as_bytes-index_as_genotype-astype-f71e7c6938594f8f781dbe2ba3bed20c', 0) on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://172.20.220.11:36577. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.

2 replies

jeromekelleher Jan 24, 2024
Maintainer

I would bet this is our persistent problem with the numba cache. Try setting the environment variable SGKIT_DISABLE_NUMBA_CACHE=1

See:

jdstamp Jan 24, 2024
Collaborator Author

Setting the environment variable does solve the issue. However, it has to be set outside of the python process to be seen, e.g.

export SGKIT_DISABLE_NUMBA_CACHE=1

in the bash script that launches the SLURM job for the main python process.

Setting the variable using the os package in python did not work.

benjeffery · 2024-01-26T09:29:20Z

benjeffery
Jan 26, 2024
Collaborator

Hi @jdstamp,
Sorry for the delay in getting to this issue - are you still having issues with the ds.chunk code above? I think this might be because that code is attempting to load all the data into cluster memory for the rechunk. Do you need to rechunk it?

1 reply

jdstamp Jan 26, 2024
Collaborator Author

Hi @benjeffery ,

I am able to run sgkit functions with dask now, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sgkit on HPC cluster with SLURM scheduler #1178

{{title}}

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

sgkit on HPC cluster with SLURM scheduler #1178

jdstamp Jan 22, 2024 Collaborator

Errors

OOM Killed

ValueError

What I am trying

Questions I have

Available Hardware

Replies: 5 comments · 4 replies

hammer Jan 23, 2024 Maintainer

jdstamp Jan 23, 2024 Collaborator Author

hammer Jan 23, 2024 Maintainer

hammer Jan 23, 2024 Maintainer

jdstamp Jan 23, 2024 Collaborator Author

What I do

Error

jeromekelleher Jan 24, 2024 Maintainer

jdstamp Jan 24, 2024 Collaborator Author

benjeffery Jan 26, 2024 Collaborator

jdstamp Jan 26, 2024 Collaborator Author

jdstamp
Jan 22, 2024
Collaborator

Replies: 5 comments 4 replies

hammer
Jan 23, 2024
Maintainer

jdstamp Jan 23, 2024
Collaborator Author

hammer
Jan 23, 2024
Maintainer

hammer
Jan 23, 2024
Maintainer

jdstamp
Jan 23, 2024
Collaborator Author

jeromekelleher Jan 24, 2024
Maintainer

jdstamp Jan 24, 2024
Collaborator Author

benjeffery
Jan 26, 2024
Collaborator

jdstamp Jan 26, 2024
Collaborator Author