-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes bug in temporary decompression space estimation before calling nvcomp #11879
Fixes bug in temporary decompression space estimation before calling nvcomp #11879
Conversation
@upsj FYI |
batched_decompress_get_temp_size_ex( | ||
compression, num_chunks, max_uncomp_chunk_size, &temp_size, max_total_uncomp_size) | ||
.value_or(batched_decompress_get_temp_size( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we inadvertently changing the semantics here?
The intention of the original code seemed to be to evaluate the "else" path only if std::nullopt
. i.e. If batched_decompress_get_temp_size_ex()
returned nvcompErrorInternal
, it was not to call batched_decompress_get_temp_size()
.
In the new version, batched_decompress_get_temp_size()
is called in both cases. @abellina , @vuule, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like what we should be checking is if nvcomp_status is simply nullopt, then doing the second call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like what we should be checking is if nvcomp_status is simply nullopt, then doing the second call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I'm not sure how much value there is in calling the second if the first failed in the library. It will likely also fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason against trying the old API if the new one failed. Agreed that it's unlikely to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@mythrocks sent me a patch (61e2499) that fixed my code styling many thanks! I am having conda issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I am running some end-to-end tests with this patch, I'll update later tonight as I am having some unrelated local issues. I'll leave it in draft until I have that test completed, but I don't foresee this going further than today. |
Codecov ReportBase: 87.51% // Head: 87.51% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #11879 +/- ##
=============================================
Coverage 87.51% 87.51%
=============================================
Files 133 133
Lines 21826 21826
=============================================
Hits 19100 19100
Misses 2726 2726 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lgtm
Thanks all for the reviews. I am removing draft as this passes my test locally (the parquet file I was trying to read no longer OOMs). |
CC @GregoryKimball this is another late fix for 22.10 |
Closes #11878
This PR fixes an issue we noticed while trying to read a zstd parquet file where the cuDF code was causing a very large allocation to happen (something much higher than GPU memory like 50 or 60 GB).
We bisected the issue to this PR: #11652.
The fix has been verified with the original file and Spark.
Thanks to @nvdbaranec, @jbrennan333, @mythrocks and @vuule for help looking into this!