Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] On demand memory spilling #755

Closed
madsbk opened this issue Oct 14, 2021 · 1 comment · Fixed by #756
Closed

[FEA] On demand memory spilling #755

madsbk opened this issue Oct 14, 2021 · 1 comment · Fixed by #756
Assignees

Comments

@madsbk
Copy link
Member

madsbk commented Oct 14, 2021

Currently, Dask and Dask-CUDA has no way of handling OOM errors other then restarting tasks or workers. Instead they spill preemptively based on some very conservative memory thresholds. For instance, most workflows in Dask-CUDA starts spilling when half the GPU memory is in use.

By using a new RMM resource adaptor like rapidsai/rmm#892, we should be able to implement on demand memory spilling.

@madsbk madsbk self-assigned this Oct 14, 2021
@jakirkham
Copy link
Member

There's also a related Dask issue about spilling when MemoryErrors are thrown ( dask/distributed#3612 ). IIRC RMM throws a MemoryError when it runs out of memory

@rapids-bot rapids-bot bot closed this as completed in #756 Oct 29, 2021
rapids-bot bot pushed a commit that referenced this issue Oct 29, 2021
Use rapidsai/rmm#892 to implement spilling on demand. Requires use of [RMM](https://github.com/rapidsai/rmm) and JIT-unspill enabled.

The `device_memory_limit` still works as usual -- when known allocations gets to `device_memory_limit`, Dask-CUDA starts spilling preemptively. However, with this PR it is should be possible to increase `device_memory_limit` significantly since memory spikes will be handled by spilling on demand.

Closes #755

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #756
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants