Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ZLIB compression in ORC writer #11036

Merged
merged 19 commits into from
Jun 15, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Jun 3, 2022

Closes #11023

Expands the nvcomp adapter to cover compression. Supports SNAPPY and ZLIB.
Moves the ORC writer nvcomp compression from stripe_enc.cu to the adapter and add ZLIB support.
Adds padding to compressed blocks in writer to ensure alignment of output pointer alignment that is required for nvcomp ZLIB compression.

Minor changes:

  • Make page/block/chunk naming consistent in the nvcomp adapter -> always use chunk (to match nvcomp).
  • Rename members of batched_args so it also makes sense for compression.

@vuule vuule added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change labels Jun 3, 2022
@vuule vuule self-assigned this Jun 3, 2022
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Jun 3, 2022
@codecov
Copy link

codecov bot commented Jun 3, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@b7c7adc). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #11036   +/-   ##
===============================================
  Coverage                ?   86.34%           
===============================================
  Files                   ?      144           
  Lines                   ?    22738           
  Branches                ?        0           
===============================================
  Hits                    ?    19632           
  Misses                  ?     3106           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7c7adc...6945584. Read the comment docs.

@@ -1233,7 +1232,7 @@ __global__ void __launch_bounds__(1024)
? statuses[ss.first_block + b].bytes_written
: src_len;
uint32_t blk_size24{};
if (dst_len >= src_len) {
if (statuses[ss.first_block + b].status == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous condition was implicit, led to wrong output when the compressed data is larger.

@vuule vuule marked this pull request as ready for review June 6, 2022 18:22
@vuule vuule requested review from a team as code owners June 6, 2022 18:22
@karthikeyann karthikeyann requested review from a team as code owners June 15, 2022 04:08
@karthikeyann karthikeyann changed the base branch from branch-22.06 to branch-22.08 June 15, 2022 04:08
@karthikeyann
Copy link
Contributor

karthikeyann commented Jun 15, 2022

I see 97 modified files that includes already merged changes as well. is this a merge issue or a github issue?

Fixed this issue when I switched to 22.06 and back to 22.08 as base. Also updated reviewers.

@karthikeyann karthikeyann requested review from a team and removed request for a team June 15, 2022 04:12
python/cudf/cudf/tests/test_orc.py Show resolved Hide resolved
cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/comp/nvcomp_adapter.cu Outdated Show resolved Hide resolved
cpp/src/io/comp/nvcomp_adapter.cpp Show resolved Hide resolved
Co-authored-by: Karthikeyan <6488848+karthikeyann@users.noreply.github.com>
@github-actions github-actions bot removed gpuCI CMake CMake build issue Java Affects Java cuDF API. labels Jun 15, 2022
@vuule vuule requested a review from karthikeyann June 15, 2022 12:38
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jun 15, 2022
@isVoid isVoid removed their request for review June 15, 2022 17:16
@vuule
Copy link
Contributor Author

vuule commented Jun 15, 2022

rerun tests

1 similar comment
@galipremsagar
Copy link
Contributor

rerun tests

@vuule
Copy link
Contributor Author

vuule commented Jun 15, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 10cf648 into rapidsai:branch-22.08 Jun 15, 2022
@vuule vuule deleted the fea-orc-writer-zlib branch June 15, 2022 23:16
rapids-bot bot pushed a commit that referenced this pull request Jun 20, 2022
Some recently merged PRs (#11018 + #11036) do not include enough header which may cause compile error in some systems (in particular, CUDA 11.7 + gcc-11.2). This PR adds the missing header (`<optional>`) to fix the compile issue.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #11126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Use nvCOMP DEFLATE compression/decompression in cuIO
4 participants