Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support to read dict-encoded required primitive types from parquet #402

Merged

Conversation

Dandandan
Copy link
Collaborator

Closes #400

@codecov
Copy link

codecov bot commented Sep 13, 2021

Codecov Report

Merging #402 (a9d980b) into main (227ab3b) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
+ Coverage   80.90%   80.91%   +0.01%     
==========================================
  Files         347      347              
  Lines       22086    22098      +12     
==========================================
+ Hits        17869    17881      +12     
  Misses       4217     4217              
Impacted Files Coverage Δ
src/io/parquet/read/primitive/basic.rs 85.00% <100.00%> (+2.14%) ⬆️
tests/it/io/parquet/read.rs 98.50% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 227ab3b...a9d980b. Read the comment docs.

@jorgecarleitao
Copy link
Owner

Awesome!

I think we are just missing a small test here. I have been using pyarrow for this. Specifically,

  1. write a parquet (arrow-parquet-integration-testing) with dict-encoded values
  2. have the expected values in fn pyarrow_required in tests/it/io/parquet/mod.rs
  3. perform a read test for the corresponding column and file at tests/it/io/parquet/read.rs

I think 1. and 2. are done, we just need to add something like

#[test]
fn v2_int64_required_dict() -> Result<()> {
    test_pyarrow_integration(0, 2, "basic", true, true)
}

#[test]
fn v1_int64_required_dict() -> Result<()> {
    test_pyarrow_integration(0, 1, "basic", true, true)
}

to check that we can read parquet files written by pyarrow, in both versions.

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Sep 13, 2021
@jorgecarleitao jorgecarleitao changed the title Implement read_dict_buffer_required Added support to read dict-encoded non-null primitive types Sep 13, 2021
@jorgecarleitao jorgecarleitao changed the title Added support to read dict-encoded non-null primitive types Added support to read dict-encoded non-null primitive types from parquet Sep 13, 2021
@jorgecarleitao jorgecarleitao changed the title Added support to read dict-encoded non-null primitive types from parquet Added support to read dict-encoded required primitive types from parquet Sep 13, 2021
@jorgecarleitao jorgecarleitao merged commit ce3b0e9 into jorgecarleitao:main Sep 13, 2021
@jorgecarleitao
Copy link
Owner

Released in v0.5.3.

@Dandandan
Copy link
Collaborator Author

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement full RleDictionary decoding
2 participants