Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decoding of dictionary encoded FIXED_LEN_BYTE_ARRAY data in Parquet reader #15601

Merged
merged 10 commits into from
May 7, 2024

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Apr 26, 2024

Description

Reading Parquet files with dictionary encoded FIXED_LEN_BYTE_ARRAY data fails because the dictionary page is never parsed, leading to out-of-bounds memory accesses.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Apr 26, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Apr 26, 2024
@etseidl
Copy link
Contributor Author

etseidl commented Apr 26, 2024

Testing with python because cudf cannot currently dictionary encode FIXED_LEN_BYTE_ARRAY data.

@etseidl
Copy link
Contributor Author

etseidl commented Apr 26, 2024

CC @vuule

@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Apr 26, 2024
@vuule
Copy link
Contributor

vuule commented Apr 26, 2024

/ok to test

@vuule
Copy link
Contributor

vuule commented May 6, 2024

/ok to test

@vuule
Copy link
Contributor

vuule commented May 7, 2024

/merge

@rapids-bot rapids-bot bot merged commit 0cfdbc1 into rapidsai:branch-24.06 May 7, 2024
76 checks passed
@etseidl etseidl deleted the flba_dict_decode branch May 7, 2024 18:41
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants