Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Prepare APIs for extension types #357

Merged
merged 12 commits into from Aug 31, 2021
Merged

Prepare APIs for extension types #357

merged 12 commits into from Aug 31, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 30, 2021

This PR is a refactor to enable (logical) extension types.

It is motivated by the design described by @sundy-li in #350.

backward incompatible changes:

  • all arrays (e.g. Utf8Array) now require DataType as argument of from_data, new_null and new_empty

@jorgecarleitao
Copy link
Owner Author

@sundy-li , I kept the DataType on all arrays for now since it is a non-trivial change to decouple arrays from DataType because the logical information is currently necessary in many compute, io and display kernels.

@codecov
Copy link

codecov bot commented Aug 30, 2021

Codecov Report

Merging #357 (082e615) into main (640ec4a) will increase coverage by 0.35%.
The diff coverage is 78.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #357      +/-   ##
==========================================
+ Coverage   80.88%   81.23%   +0.35%     
==========================================
  Files         326      326              
  Lines       21043    20969      -74     
==========================================
+ Hits        17021    17035      +14     
+ Misses       4022     3934      -88     
Impacted Files Coverage Δ
src/array/boolean/ffi.rs 0.00% <0.00%> (ø)
src/compute/like.rs 0.00% <0.00%> (ø)
src/io/ipc/read/array/list.rs 100.00% <ø> (ø)
src/io/ipc/read/array/union.rs 75.00% <ø> (ø)
src/io/ipc/read/common.rs 92.47% <ø> (ø)
src/io/parquet/read/primitive/dictionary.rs 78.84% <ø> (ø)
src/record_batch.rs 44.44% <0.00%> (ø)
src/compute/filter.rs 75.37% <10.52%> (+12.18%) ⬆️
src/array/fixed_size_list/mod.rs 54.00% <25.00%> (-4.34%) ⬇️
src/array/fixed_size_binary/mod.rs 44.82% <33.33%> (-0.94%) ⬇️
... and 75 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 640ec4a...082e615. Read the comment docs.

@sundy-li
Copy link
Collaborator

Ok, that's reasonable.

@jorgecarleitao
Copy link
Owner Author

@sundy-li @ritchie46 , I found a beautiful representation of the primitive types that simplifies a lot of code (-400 LOC). I pushed to this PR and updated the user guide accordingly.

Something like

enum PrimitiveType {
    Int8,
    Int16,
    Int32,
    Int64,
    Int128,
    ...
    DaysMs,
}

enum PhysicalType {
    /// A Null with no allocation.
    Null,
    /// A boolean represented as a single bit.
    Boolean,
    /// An array where each slot has a known compile-time size.
    Primitive(PrimitiveType),
    ...
}

this allow us to write things like

    match values.data_type().to_physical_type() {
        Primitive(primitive) => with_match_primitive_type!(primitive, |$T| {
            let values = values.as_any().downcast_ref().unwrap();
            Ok(Box::new(primitive::take::<$T, _>(&values, indices)))
        }),
        Utf8 => {
            let values = values.as_any().downcast_ref().unwrap();
            Ok(Box::new(utf8::take::<i32, _>(values, indices)))
        }
        ...

which shaved LOC via macros (and will likely do the same for dependents).

Basically, instead of matching PhysicalType::Int32, match PhysicalType::Primitive(PrimitiveType::Int32).

@sundy-li
Copy link
Collaborator

sundy-li commented Aug 31, 2021

Great, we have a similar macro like with_match_primitive_type in https://github.com/datafuselabs/datafuse/blob/7a3bfc5620e545333ca4e35c71ac849b421b74fc/common/functions/src/aggregates/macros.rs#L15-L28

But your implementation is more graceful because the __with_ty__ is just a private macro.

@jorgecarleitao jorgecarleitao merged commit 145b7c8 into main Aug 31, 2021
@jorgecarleitao jorgecarleitao deleted the prep_struct branch August 31, 2021 16:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants