Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a row index entry error in ORC writer issue #10989

Merged
merged 4 commits into from
May 27, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 27, 2022

Issue #10755

Fixes an issue in protobuf writer where the length on the row index entry was being written into a single byte. This would cause errors when the size is larger than 127.
The issue was uncovered when row group statistics were added. String statistics contain copies to min/max strings, so the size is unbounded.
This PR changes the protobuf writer to write the entry size as a generic uint, allowing larger entries.
Also fixed start_row in row group info array in the reader (unrelated).

@vuule vuule self-assigned this May 27, 2022
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels May 27, 2022
@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels May 27, 2022
@codecov
Copy link

codecov bot commented May 27, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@fd5724f). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #10989   +/-   ##
===============================================
  Coverage                ?   86.32%           
===============================================
  Files                   ?      144           
  Lines                   ?    22696           
  Branches                ?        0           
===============================================
  Hits                    ?    19593           
  Misses                  ?     3103           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd5724f...bf841ff. Read the comment docs.

@vuule vuule marked this pull request as ready for review May 27, 2022 19:41
@vuule vuule requested review from a team as code owners May 27, 2022 19:41
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels May 27, 2022
@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 27, 2022
@vuule
Copy link
Contributor Author

vuule commented May 27, 2022

rerun tests

@vuule
Copy link
Contributor Author

vuule commented May 27, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6e65fa1 into rapidsai:branch-22.08 May 27, 2022
@vuule vuule deleted the bug-orc-index-entry-size branch May 27, 2022 23:33
vuule added a commit to vuule/cudf that referenced this pull request May 31, 2022
Issue rapidsai#10755

Fixes an issue in protobuf writer where the length on the row index entry was being written into a single byte. This would cause errors when the size is larger than 127.
The issue was uncovered when row group statistics were added. String statistics contain copies to min/max strings, so the size is unbounded.
This PR changes the protobuf writer to write the entry size as a generic uint, allowing larger entries.
Also fixed `start_row` in row group info array in the reader (unrelated).

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - David Wendt (https://github.com/davidwendt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: rapidsai#10989
ajschmidt8 pushed a commit that referenced this pull request May 31, 2022
Issue #10755

Backporting the fix to 22.06

Fixes an issue in protobuf writer where the length on the row index entry was being written into a single byte. This would cause errors when the size is larger than 127.
The issue was uncovered when row group statistics were added. String statistics contain copies to min/max strings, so the size is unbounded.
This PR changes the protobuf writer to write the entry size as a generic uint, allowing larger entries.
Also fixed `start_row` in row group info array in the reader (unrelated).

Authors:
   - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - AJ Schmidt (https://github.com/ajschmidt8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants