Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support to read and write nested dictionaries to parquet #1175

Merged
merged 4 commits into from
Jul 23, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Jul 22, 2022

Closes #1091
Closes #1174

Note that this preserves the dictionary-encoding structure of the dictionary, thereby ensuring that

  • serializing to parquet does not unpack
  • deserializing from parquet does not unpack
  • the data in parquet is dictionary-encoded

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Jul 22, 2022
@codecov
Copy link

codecov bot commented Jul 22, 2022

Codecov Report

Merging #1175 (eae9244) into main (dc77578) will decrease coverage by 0.24%.
The diff coverage is 67.05%.

@@            Coverage Diff             @@
##             main    #1175      +/-   ##
==========================================
- Coverage   83.69%   83.44%   -0.25%     
==========================================
  Files         365      363       -2     
  Lines       35873    36246     +373     
==========================================
+ Hits        30023    30247     +224     
- Misses       5850     5999     +149     
Impacted Files Coverage Δ
...t/read/deserialize/fixed_size_binary/dictionary.rs 39.56% <5.55%> (-48.54%) ⬇️
src/io/parquet/read/deserialize/mod.rs 72.27% <27.77%> (-13.54%) ⬇️
...c/io/parquet/read/deserialize/dictionary/nested.rs 60.18% <60.18%> (ø)
src/io/parquet/write/mod.rs 86.50% <75.00%> (+5.82%) ⬆️
...c/io/parquet/read/deserialize/binary/dictionary.rs 90.47% <91.07%> (+1.58%) ⬆️
...o/parquet/read/deserialize/primitive/dictionary.rs 93.06% <91.80%> (-0.27%) ⬇️
src/io/parquet/write/dictionary.rs 86.33% <98.66%> (+1.77%) ⬆️
src/array/primitive/mod.rs 81.04% <100.00%> (ø)
src/io/parquet/read/deserialize/binary/nested.rs 79.41% <100.00%> (+3.54%) ⬆️
src/io/parquet/read/deserialize/boolean/nested.rs 75.82% <100.00%> (+4.39%) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc77578...eae9244. Read the comment docs.

@jorgecarleitao jorgecarleitao merged commit c720eb2 into main Jul 23, 2022
@jorgecarleitao jorgecarleitao deleted the dict_parquet_nested branch July 23, 2022 14:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
1 participant