Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on writing MutableDictionaryArray with u16 keys #14590

Open
2 tasks done
aryanshomray opened this issue Feb 19, 2024 · 4 comments
Open
2 tasks done

Error on writing MutableDictionaryArray with u16 keys #14590

aryanshomray opened this issue Feb 19, 2024 · 4 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars

Comments

@aryanshomray
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

use std::fs::File;
use std::sync::Arc;
use polars_arrow::io::ipc;
use polars_arrow::io::ipc::write::WriteOptions;
use polars_arrow::chunk::Chunk;
use polars_arrow::array::{MutableArray, MutableDictionaryArray, MutablePrimitiveArray};
use polars_arrow::datatypes::{Field, ArrowDataType, ArrowSchema, IntegerType};


fn main() {

    // =====================================================
    // Create new ipc file
    // =====================================================
    let filepath = "./test_polars_arrow.ipc";
    let fields = vec![
        Field::new("dict", ArrowDataType::Dictionary(IntegerType::UInt16, Box::new(ArrowDataType::Float32), false), false),
    ];
    let schema = Arc::new(ArrowSchema::from(fields));

    let file = File::create(&filepath).unwrap();
    let options = WriteOptions {
        compression: None
    };
    let mut writer = ipc::write::FileWriter::new(&file, schema, None, options);

    let dict = MutableDictionaryArray::<u16, MutablePrimitiveArray<f32>>::new().as_box();

    writer.start().unwrap();
    writer.write(&Chunk::new(vec![dict]), None).unwrap();
    writer.finish().unwrap();
}

Log output

thread 'main' panicked at /Users/aryanshomray/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-arrow-0.37.0/src/io/ipc/write/common.rs:223:79:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Issue description

fn set_variadic_buffer_counts(counts: &mut Vec<i64>, array: &dyn Array) {
    match array.data_type() {
        ArrowDataType::Utf8View => {
            let array = array.as_any().downcast_ref::<Utf8ViewArray>().unwrap();
            counts.push(array.data_buffers().len() as i64);
        }
        ArrowDataType::BinaryView => {
            let array = array.as_any().downcast_ref::<BinaryViewArray>().unwrap();
            counts.push(array.data_buffers().len() as i64);
        }
        ArrowDataType::Struct(_) => {
            let array = array.as_any().downcast_ref::<StructArray>().unwrap();
            for array in array.values() {
                set_variadic_buffer_counts(counts, array.as_ref())
            }
        }
        ArrowDataType::LargeList(_) => {
            let array = array.as_any().downcast_ref::<LargeListArray>().unwrap();
            set_variadic_buffer_counts(counts, array.values().as_ref())
        }
        ArrowDataType::FixedSizeList(_, _) => {
            let array = array.as_any().downcast_ref::<FixedSizeListArray>().unwrap();
            set_variadic_buffer_counts(counts, array.values().as_ref())
        }
        ArrowDataType::Dictionary(_, _, _) => {
            let array = array.as_any().downcast_ref::<DictionaryArray<u32>>().unwrap();
            set_variadic_buffer_counts(counts, array.values().as_ref())
        }
        _ => (),
    }
}

File: src/io/ipc/write/common.rs(line 198)
The error is due to this function which expects u32 keys & gives error if the keys are u16.

Expected behavior

An arrow file to be generated after running the program.

Installed versions

polars-arrow = { version = "0.37.0", features = ["io_ipc_compression", "io_ipc"] }
polars = { version = "0.37.0" , features = ["lazy", "ipc", "dtype-categorical"]}

@aryanshomray aryanshomray added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Feb 19, 2024
@utkarshgupta137
Copy link
Contributor

I believe this is caused by this change.
cc: @ritchie46

@ritchie46
Copy link
Member

We forked arrow for our needs. We don't want to support mutable dictionary keys we don't use. We want to keep our binary size in check.

@utkarshgupta137
Copy link
Contributor

We forked arrow for our needs. We don't want to support mutable dictionary keys we don't use. We want to keep our binary size in check.

Ah, fair enough. We use polars most of the time, but we migrated from arrow2 to polars-arrow for converting/normalizing some JSON to arrow files.
Would you accept a PR which addresses our use case, but doesn't affect polars' performance/binary size or is the plan to eventually get rid of MutableDictionaryArray entirely?

@ritchie46
Copy link
Member

But it likely will affect the binary size as it needs to compile the u16 branch in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars
Projects
None yet
Development

No branches or pull requests

3 participants