Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

fix invalid parquet read #1330

Merged
merged 1 commit into from
Dec 13, 2022

Conversation

ritchie46
Copy link
Collaborator

fixes #1329

The length of the offsets container return n - 1. This is unintuitive and lead to a bug where the extend_constant added a single value too little. I made the distinction between len (the real number of indexes) and len_proxy the number of values an array with this many offsets would have.

@codecov
Copy link

codecov bot commented Dec 13, 2022

Codecov Report

Base: 83.12% // Head: 83.11% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (ba2c79b) compared to base (1fcfd7c).
Patch coverage: 70.58% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1330      +/-   ##
==========================================
- Coverage   83.12%   83.11%   -0.02%     
==========================================
  Files         370      370              
  Lines       40169    40172       +3     
==========================================
- Hits        33391    33388       -3     
- Misses       6778     6784       +6     
Impacted Files Coverage Δ
src/io/avro/read/nested.rs 61.21% <50.00%> (ø)
src/offset.rs 84.36% <50.00%> (-0.94%) ⬇️
src/io/parquet/read/deserialize/binary/utils.rs 65.30% <66.66%> (ø)
src/array/binary/mutable_values.rs 76.47% <100.00%> (ø)
src/array/list/mutable.rs 79.89% <100.00%> (ø)
src/array/physical_binary.rs 94.67% <100.00%> (ø)
src/array/utf8/mutable_values.rs 83.06% <100.00%> (ø)
src/array/binary/mod.rs 90.07% <0.00%> (-1.15%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jorgecarleitao jorgecarleitao merged commit a4173af into jorgecarleitao:main Dec 13, 2022
@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Dec 13, 2022
@jorgecarleitao
Copy link
Owner

Agree that it is a bit confusing - I was also not sure about what to use. This addresses it indeed ^^

ritchie46 added a commit to ritchie46/arrow2 that referenced this pull request Mar 29, 2023
ritchie46 added a commit to ritchie46/arrow2 that referenced this pull request Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression parquet read Utf8 column.
2 participants