[FEA] spil data from main memory to disk memory #3740

jangorecki · 2020-01-09T10:05:43Z

I have dataset stored in CSV of 3 different sizes

0.5 GB
5 GB
50 GB

My machine has 120 GB of main memory and 11 GB of gpu memory.
cudf is capable to make computation on 0.5 GB data using just gpu memory.
If I want to run computations on 5 GB data I can set cudf.set_allocator("managed"). It works really nice and fast, more importantly it allows me to run computation on my medium data.
Problem is when I attempt to run computation on 50 GB data I am getting following error:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: an illegal memory access was encountered

I assume there is not enough of main memory for this computation. This is known python issue for pandas and (in-memory-)dask when attempting to run same computation on that data size.
My feature request is about extending the feature of spilling gpu memory to main memory for a disk memory as well.

The text was updated successfully, but these errors were encountered:

pentschev · 2020-01-10T10:46:33Z

I guess this is more of a question for RMM, but we had a long discussion about this some time ago @jrhemstad , do you have any idea of how managed memory could eventually allow spilling to disk? I don't think that this is something that this is supported currently, given it's a driver level feature.

kkraus14 · 2020-01-10T15:53:05Z

This is out of the scope of cudf. If you need the ability to spill to disk you should use dask-cudf with dask-cuda workers, even with a single GPU. That will handle chunking, spilling from device --> host, and host --> disk.

jangorecki · 2020-01-11T03:10:30Z

Thank you for claryfing, it is well enough for my case.

jangorecki · 2021-05-27T12:51:05Z

I am looking for a feature to spill computation vmem->mem->disk using dask_cudf. Is there any better place for this FR?
@kkraus14 could we re-open this issue?

quasiben · 2021-05-27T13:19:35Z

We have spilling configurations defined in dask-cuda. I would suggest reading over the linked doc and if you have questions, post them in the dask-cuda GH repo

jangorecki · 2021-05-27T13:44:30Z

@quasiben Thank you, I found existing issue for that there already, linking for future readers: rapidsai/dask-cuda#37

jangorecki added Needs Triage Need team to review and classify feature request New feature or request labels Jan 9, 2020

This was referenced Jan 9, 2020

cudf should spil to main memory when running out of gpu memory h2oai/db-benchmark#129

Closed

extend GPU memory to run cuDF for medium and big data h2oai/db-benchmark#97

Closed

kkraus14 closed this as completed Jan 10, 2020

jangorecki mentioned this issue Jan 11, 2020

use dask-cudf to utilize multiple GPUs h2oai/db-benchmark#116

Closed

quasiben mentioned this issue May 27, 2021

Extend memory spilling to multiple storage media rapidsai/dask-cuda#37

Open

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] spil data from main memory to disk memory #3740

[FEA] spil data from main memory to disk memory #3740

jangorecki commented Jan 9, 2020

pentschev commented Jan 10, 2020

kkraus14 commented Jan 10, 2020

jangorecki commented Jan 11, 2020

jangorecki commented May 27, 2021

quasiben commented May 27, 2021

jangorecki commented May 27, 2021

[FEA] spil data from main memory to disk memory #3740

[FEA] spil data from main memory to disk memory #3740

Comments

jangorecki commented Jan 9, 2020

pentschev commented Jan 10, 2020

kkraus14 commented Jan 10, 2020

jangorecki commented Jan 11, 2020

jangorecki commented May 27, 2021

quasiben commented May 27, 2021

jangorecki commented May 27, 2021