Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] spil data from main memory to disk memory #3740

Closed
jangorecki opened this issue Jan 9, 2020 · 6 comments
Closed

[FEA] spil data from main memory to disk memory #3740

jangorecki opened this issue Jan 9, 2020 · 6 comments
Labels
feature request New feature or request

Comments

@jangorecki
Copy link

I have dataset stored in CSV of 3 different sizes

  • 0.5 GB
  • 5 GB
  • 50 GB

My machine has 120 GB of main memory and 11 GB of gpu memory.
cudf is capable to make computation on 0.5 GB data using just gpu memory.
If I want to run computations on 5 GB data I can set cudf.set_allocator("managed"). It works really nice and fast, more importantly it allows me to run computation on my medium data.
Problem is when I attempt to run computation on 50 GB data I am getting following error:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: an illegal memory access was encountered

I assume there is not enough of main memory for this computation. This is known python issue for pandas and (in-memory-)dask when attempting to run same computation on that data size.
My feature request is about extending the feature of spilling gpu memory to main memory for a disk memory as well.

@pentschev
Copy link
Member

I guess this is more of a question for RMM, but we had a long discussion about this some time ago @jrhemstad , do you have any idea of how managed memory could eventually allow spilling to disk? I don't think that this is something that this is supported currently, given it's a driver level feature.

@kkraus14
Copy link
Collaborator

This is out of the scope of cudf. If you need the ability to spill to disk you should use dask-cudf with dask-cuda workers, even with a single GPU. That will handle chunking, spilling from device --> host, and host --> disk.

@jangorecki
Copy link
Author

Thank you for claryfing, it is well enough for my case.

@jangorecki
Copy link
Author

I am looking for a feature to spill computation vmem->mem->disk using dask_cudf. Is there any better place for this FR?
@kkraus14 could we re-open this issue?

@quasiben
Copy link
Member

We have spilling configurations defined in dask-cuda. I would suggest reading over the linked doc and if you have questions, post them in the dask-cuda GH repo

@jangorecki
Copy link
Author

@quasiben Thank you, I found existing issue for that there already, linking for future readers: rapidsai/dask-cuda#37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants