Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ORC writer OOM issue #7605

Merged
merged 1 commit into from Mar 17, 2021
Merged

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Mar 16, 2021

Closes #7588

The stream size used to be calculated incorrectly, leading to huge allocation for the encoded data buffer.

This PR fixes the stream size computation to count each row group only once.

@vuule vuule added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue non-breaking Non-breaking change labels Mar 16, 2021
@vuule vuule self-assigned this Mar 16, 2021
@vuule vuule added this to PR-WIP in v0.19 Release via automation Mar 16, 2021
@vuule vuule marked this pull request as ready for review March 16, 2021 01:13
@vuule vuule requested a review from a team as a code owner March 16, 2021 01:13
@vuule vuule requested review from trxcllnt, jrhemstad, rgsl888prabhu, kaatish and devavret and removed request for trxcllnt and jrhemstad March 16, 2021 01:13
v0.19 Release automation moved this from PR-WIP to PR-Reviewer approved Mar 16, 2021
@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #7605 (8b01212) into branch-0.19 (7871e7a) will increase coverage by 0.52%.
The diff coverage is 92.89%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7605      +/-   ##
===============================================
+ Coverage        81.86%   82.39%   +0.52%     
===============================================
  Files              101      101              
  Lines            16884    17350     +466     
===============================================
+ Hits             13822    14295     +473     
+ Misses            3062     3055       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/index.py 93.34% <ø> (+0.48%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <60.00%> (+0.10%) ⬆️
python/cudf/cudf/core/column/numerical.py 94.83% <85.71%> (-0.20%) ⬇️
python/cudf/cudf/core/frame.py 89.09% <89.18%> (+0.08%) ⬆️
python/cudf/cudf/core/column/decimal.py 92.75% <90.32%> (-2.12%) ⬇️
python/cudf/cudf/core/dataframe.py 90.58% <95.00%> (+0.11%) ⬆️
python/cudf/cudf/core/series.py 91.57% <96.22%> (+0.79%) ⬆️
python/cudf/cudf/core/column/string.py 86.76% <100.00%> (+0.26%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.45% <100.00%> (+0.14%) ⬆️
python/cudf/cudf/core/dtypes.py 91.13% <100.00%> (+1.40%) ⬆️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36f18c8...8b01212. Read the comment docs.

@vuule
Copy link
Contributor Author

vuule commented Mar 16, 2021

rerun tests

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 16, 2021
@vuule
Copy link
Contributor Author

vuule commented Mar 16, 2021

rerun tests

@vuule
Copy link
Contributor Author

vuule commented Mar 17, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 34cccfe into rapidsai:branch-0.19 Mar 17, 2021
v0.19 Release automation moved this from PR-Reviewer approved to Done Mar 17, 2021
@vuule vuule deleted the bug-orc-writer-oom branch March 17, 2021 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
No open projects
v0.19 Release
  
Done
Development

Successfully merging this pull request may close these issues.

[BUG] Unexpected OOM writing a DataFrame w/ strings to ORC files
4 participants