[FEA] On demand memory spilling #755

madsbk · 2021-10-14T14:18:14Z

Currently, Dask and Dask-CUDA has no way of handling OOM errors other then restarting tasks or workers. Instead they spill preemptively based on some very conservative memory thresholds. For instance, most workflows in Dask-CUDA starts spilling when half the GPU memory is in use.

By using a new RMM resource adaptor like rapidsai/rmm#892, we should be able to implement on demand memory spilling.

jakirkham · 2021-10-14T22:46:48Z

There's also a related Dask issue about spilling when MemoryErrors are thrown ( dask/distributed#3612 ). IIRC RMM throws a MemoryError when it runs out of memory

Use rapidsai/rmm#892 to implement spilling on demand. Requires use of [RMM](https://github.com/rapidsai/rmm) and JIT-unspill enabled. The `device_memory_limit` still works as usual -- when known allocations gets to `device_memory_limit`, Dask-CUDA starts spilling preemptively. However, with this PR it is should be possible to increase `device_memory_limit` significantly since memory spikes will be handled by spilling on demand. Closes #755 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #756

madsbk self-assigned this Oct 14, 2021

madsbk mentioned this issue Oct 15, 2021

Spilling on demand #756

Merged

rapids-bot bot closed this as completed in #756 Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] On demand memory spilling #755

[FEA] On demand memory spilling #755

madsbk commented Oct 14, 2021

jakirkham commented Oct 14, 2021

[FEA] On demand memory spilling #755

[FEA] On demand memory spilling #755

Comments

madsbk commented Oct 14, 2021

jakirkham commented Oct 14, 2021