Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes bug in temporary decompression space estimation before calling nvcomp #11879

Merged

Conversation

abellina
Copy link
Contributor

@abellina abellina commented Oct 7, 2022

Closes #11878

This PR fixes an issue we noticed while trying to read a zstd parquet file where the cuDF code was causing a very large allocation to happen (something much higher than GPU memory like 50 or 60 GB).

We bisected the issue to this PR: #11652.

The fix has been verified with the original file and Spark.

Thanks to @nvdbaranec, @jbrennan333, @mythrocks and @vuule for help looking into this!

@abellina abellina changed the base branch from branch-22.12 to branch-22.10 October 7, 2022 21:31
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 7, 2022
@abellina abellina added bug Something isn't working non-breaking Non-breaking change labels Oct 7, 2022
@abellina
Copy link
Contributor Author

abellina commented Oct 7, 2022

@upsj FYI

batched_decompress_get_temp_size_ex(
compression, num_chunks, max_uncomp_chunk_size, &temp_size, max_total_uncomp_size)
.value_or(batched_decompress_get_temp_size(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we inadvertently changing the semantics here?

The intention of the original code seemed to be to evaluate the "else" path only if std::nullopt. i.e. If batched_decompress_get_temp_size_ex() returned nvcompErrorInternal, it was not to call batched_decompress_get_temp_size().

In the new version, batched_decompress_get_temp_size() is called in both cases. @abellina , @vuule, is that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like what we should be checking is if nvcomp_status is simply nullopt, then doing the second call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like what we should be checking is if nvcomp_status is simply nullopt, then doing the second call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I'm not sure how much value there is in calling the second if the first failed in the library. It will likely also fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason against trying the old API if the new one failed. Agreed that it's unlikely to help.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@abellina
Copy link
Contributor Author

abellina commented Oct 7, 2022

@mythrocks sent me a patch (61e2499) that fixed my code styling many thanks! I am having conda issues.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abellina
Copy link
Contributor Author

abellina commented Oct 7, 2022

I am running some end-to-end tests with this patch, I'll update later tonight as I am having some unrelated local issues. I'll leave it in draft until I have that test completed, but I don't foresee this going further than today.

@codecov
Copy link

codecov bot commented Oct 8, 2022

Codecov Report

Base: 87.51% // Head: 87.51% // No change to project coverage 👍

Coverage data is based on head (61e2499) compared to base (4c4bce9).
Patch has no changes to coverable lines.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-22.10   #11879   +/-   ##
=============================================
  Coverage         87.51%   87.51%           
=============================================
  Files               133      133           
  Lines             21826    21826           
=============================================
  Hits              19100    19100           
  Misses             2726     2726           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm

@abellina abellina marked this pull request as ready for review October 8, 2022 00:58
@abellina abellina requested a review from a team as a code owner October 8, 2022 00:58
@abellina abellina requested review from cwharris and davidwendt and removed request for a team October 8, 2022 00:58
@abellina
Copy link
Contributor Author

abellina commented Oct 8, 2022

Thanks all for the reviews. I am removing draft as this passes my test locally (the parquet file I was trying to read no longer OOMs).

@vuule
Copy link
Contributor

vuule commented Oct 8, 2022

CC @GregoryKimball this is another late fix for 22.10

@jolorunyomi jolorunyomi merged commit 17868b7 into rapidsai:branch-22.10 Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] OOM in readParquet due to too large temp memory estimation in cuDF
7 participants