Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix offset of the string dictionary length stream #8538

Merged

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Jun 16, 2021

Fixes #8514

String dictionary length is RLE encoded and rle_data_size and non_rle_data_size take this into account. However, When computing chunk stream offsets, these streams were treated as non-RLE and non_rle_data_size was not added. This caused discrepancy between non-RLE stream sizes and available space, leading to overlap between chunk streams.

Applied the non_rle_data_size to the offset to correct the discrepancy and added a test that uses decimal columns to increase the size of non-RLE encoded data and enable the overflow.

@vuule vuule self-assigned this Jun 16, 2021
@vuule vuule requested review from a team as code owners June 16, 2021 19:08
@vuule vuule requested review from codereport, davidwendt, charlesbluca and brandon-b-miller and removed request for a team June 16, 2021 19:08
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Jun 16, 2021
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@raydouglass raydouglass added the ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround label Jun 16, 2021
@JohnZed JohnZed added bug Something isn't working non-breaking Non-breaking change labels Jun 16, 2021
@codecov
Copy link

codecov bot commented Jun 16, 2021

Codecov Report

Merging #8538 (6b5a6ea) into branch-21.06 (2e780d0) will decrease coverage by 0.38%.
The diff coverage is n/a.

❗ Current head 6b5a6ea differs from pull request most recent head 4d31f6a. Consider uploading reports for the commit 4d31f6a to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.06    #8538      +/-   ##
================================================
- Coverage         82.83%   82.44%   -0.39%     
================================================
  Files               109      109              
  Lines             17896    17542     -354     
================================================
- Hits              14824    14463     -361     
- Misses             3072     3079       +7     
Impacted Files Coverage Δ
python/custreamz/custreamz/tests/conftest.py 71.42% <0.00%> (-7.15%) ⬇️
python/cudf/cudf/io/hdf.py 50.00% <0.00%> (-7.15%) ⬇️
python/custreamz/custreamz/tests/test_kafka.py 35.71% <0.00%> (-7.15%) ⬇️
python/cudf/cudf/core/abc.py 87.80% <0.00%> (-3.11%) ⬇️
python/cudf/cudf/comm/gpuarrow.py 76.71% <0.00%> (-3.05%) ⬇️
python/cudf/cudf/benchmarks/bench_cudf_io.py 27.65% <0.00%> (-2.96%) ⬇️
python/cudf/cudf/utils/cudautils.py 55.04% <0.00%> (-2.72%) ⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py 97.58% <0.00%> (-2.42%) ⬇️
python/cudf/cudf/api/extensions/accessor.py 92.59% <0.00%> (-0.75%) ⬇️
python/dask_cudf/dask_cudf/accessors.py 88.23% <0.00%> (-0.66%) ⬇️
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65d8f48...4d31f6a. Read the comment docs.

@ajschmidt8 ajschmidt8 merged commit b9a1165 into rapidsai:branch-21.06 Jun 17, 2021
@vuule vuule deleted the bug-orc-writer-dict_len-offset-backport branch April 20, 2022 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants