-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Sometimes code works fine sometimes same code with same data processes forever and eats up unlimited RAM than crashes #5260
Comments
Hi @stromal! Thank you so much for opening this issue! I'm attempting to reproduce it locally - although it seems I may not be able to get the exact same error as you since I'm working off of a 2019 MBP (only 16GB RAM), although I did find that the call (with a smaller I believe the reason you are facing this error is because ray finds it difficult to serialize the list of values to replace + the values to replace from when running For context, it seems that the script fails on the I recommend trying the Alternatively, the query could potentially be rewritten - perhaps using a |
@RehanSD I have come across a lot with this previously not jut via |
Hi @stromal! I see - would you be able to share some other examples where the code crashes, so we could try and determine what the similarity is across? That way we can see if this is one bug affecting multiple workloads, or multiple bugs, and then determine a workaround accordingly! |
@stromal some responses: First, do you know how much object store memory is available in your ray cluster? How are you initializing the cluster? What do you get if you run the shell command It's hard to tell what's going on without a reproducible example or looking at the ray worker logs, but it's likely that your ray workers are running out of memory. As @RehanSD points out, broadcasting all of the Does the replace work in pandas? e.g. what if you run
You could try posting here the exact operations you're trying. |
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
python
df_email -> 2 columns: 250k rows
in the main df replacing emails with IDs
than it goes crazy
Issue Description
but this still runs ok
(206529864, 4)
INPUT
OUTPUT (it uses up all the ram just to expert a 6 GB file (imported file was 4 GB than I replaced emails with 200k 0-200'000 IDs -> so it should be even smaller than before, I have 256 GB RAM so it should be able to export it, but it alwasy crashes))
Expected Behavior
Do this in a few seconds like before
also should run in a few seconds as well.
Error Logs
Installed Versions
0.12.1
installed it via #4719 (comment)
The text was updated successfully, but these errors were encountered: