Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support dictionaries in nested types over IPC #587

Merged
merged 7 commits into from
Nov 8, 2021
Merged

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Nov 7, 2021

This PR:

  • adds support to read dictionaries in nested types from IPC
  • adds support to write dictionaries in nested types to IPC
  • adds corresponding integration test

With this change, we support all but two integration tests: decimal256 and non-canonical maps (which covers a technicality of the spec) and we become the most complete implementation of the arrow specification after the official C++ implementantion! Specifically, after this PR, the compatibility matrix becomes:

file_objs = [
        generate_primitive_case([], name='primitive_no_batches'),
        generate_primitive_case([17, 20], name='primitive'),
        generate_primitive_case([0, 0, 0], name='primitive_zerolength'),

        generate_primitive_large_offsets_case([17, 20])
        .skip_category('C#')
        .skip_category('Go')
        .skip_category('JS'),

        generate_null_case([10, 0])
        .skip_category('C#')
        .skip_category('JS'),   # TODO(ARROW-7900)

        generate_null_trivial_case([0, 0])
        .skip_category('C#')
        .skip_category('JS'),   # TODO(ARROW-7900)

        generate_decimal128_case(),

        generate_decimal256_case()
        .skip_category('Go')  # TODO(ARROW-7948): Decimal + Go
        .skip_category('JS')
        .skip_category('Rust'),

        generate_datetime_case()
        .skip_category('C#'),

        generate_interval_case()
        .skip_category('C#')
        .skip_category('JS'),  # TODO(ARROW-5239): Intervals + JS

        generate_month_day_nano_interval_case()
        .skip_category('C#')
        .skip_category('JS'),


        generate_map_case()
        .skip_category('C#'),

        generate_non_canonical_map_case()
        .skip_category('C#')
        .skip_category('Java')   # TODO(ARROW-8715)
        .skip_category('JS')     # TODO(ARROW-8716)
        .skip_category('Rust'),

        generate_nested_case()
        .skip_category('C#'),

        generate_recursive_nested_case()
        .skip_category('C#'),

        generate_nested_large_offsets_case()
        .skip_category('C#')
        .skip_category('Go')
        .skip_category('JS'),

        generate_unions_case()
        .skip_category('C#')
        .skip_category('Go')
        .skip_category('JS'),

        generate_custom_metadata_case()
        .skip_category('C#')
        .skip_category('JS'),

        generate_duplicate_fieldnames_case()
        .skip_category('C#')
        .skip_category('Go')
        .skip_category('JS'),

        # TODO(ARROW-3039, ARROW-5267): Dictionaries in GO
        generate_dictionary_case()
        .skip_category('C#')
        .skip_category('Go'),

        generate_dictionary_unsigned_case()
        .skip_category('C#')
        .skip_category('Go')     # TODO(ARROW-9378)
        .skip_category('Java'),  # TODO(ARROW-9377)

        generate_nested_dictionary_case()
        .skip_category('C#')
        .skip_category('Go')
        .skip_category('Java')  # TODO(ARROW-7779)
        .skip_category('JS'),

        generate_extension_case()
        .skip_category('C#')
        .skip_category('Go')  # TODO(ARROW-3039): requires dictionaries
        .skip_category('JS'),
    ]

In particular, we support more cases of the IPC spec than Java, one of the two reference implementations of the specification (the other being C++).

@jorgecarleitao jorgecarleitao added the feature A new feature label Nov 7, 2021
@codecov
Copy link

codecov bot commented Nov 8, 2021

Codecov Report

Merging #587 (d699e0f) into main (6e9ea35) will decrease coverage by 0.07%.
The diff coverage is 59.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
- Coverage   79.04%   78.97%   -0.08%     
==========================================
  Files         399      400       +1     
  Lines       24743    24822      +79     
==========================================
+ Hits        19559    19603      +44     
- Misses       5184     5219      +35     
Impacted Files Coverage Δ
src/array/dictionary/mod.rs 63.33% <0.00%> (+5.70%) ⬆️
src/datatypes/schema.rs 37.93% <ø> (-5.82%) ⬇️
src/io/flight/mod.rs 0.00% <0.00%> (ø)
src/io/ipc/write/stream.rs 72.00% <0.00%> (ø)
src/io/json_integration/mod.rs 0.00% <0.00%> (ø)
src/io/ipc/read/deserialize.rs 35.89% <11.11%> (-1.52%) ⬇️
src/io/ipc/read/array/dictionary.rs 42.10% <44.44%> (-7.90%) ⬇️
src/io/ipc/read/array/fixed_size_list.rs 25.00% <50.00%> (-5.44%) ⬇️
src/io/ipc/read/array/list.rs 43.75% <50.00%> (-4.64%) ⬇️
src/io/ipc/read/array/map.rs 25.00% <50.00%> (-4.04%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e9ea35...d699e0f. Read the comment docs.

@jorgecarleitao jorgecarleitao marked this pull request as ready for review November 8, 2021 22:56
@jorgecarleitao jorgecarleitao changed the title Support to read dictionaries from nested types Support dictionaries from nested types over IPC Nov 8, 2021
@jorgecarleitao jorgecarleitao changed the title Support dictionaries from nested types over IPC Support dictionaries in nested types over IPC Nov 8, 2021
@jorgecarleitao jorgecarleitao merged commit 37a9c75 into main Nov 8, 2021
@jorgecarleitao jorgecarleitao deleted the ipc_aa branch November 8, 2021 23:19
@jorgecarleitao jorgecarleitao changed the title Support dictionaries in nested types over IPC Added support dictionaries in nested types over IPC Nov 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant