Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Filter operation on sliced utf8 arrays are incorrect #233

Closed
ritchie46 opened this issue Jul 29, 2021 · 1 comment · Fixed by #237
Closed

Filter operation on sliced utf8 arrays are incorrect #233

ritchie46 opened this issue Jul 29, 2021 · 1 comment · Fixed by #237
Assignees
Labels
bug Something isn't working

Comments

@ritchie46
Copy link
Collaborator

This test because the Utf8 column has a different length than that of the primitive values.

I haven't been able to isolate it yet. But what is different for the utf8 column is that it is sliced in to subslices that are sent to differen threads. There the filter is applied (also sliced), and then on the main thread everything is concatenated.

@ritchie46
Copy link
Collaborator Author

And I've got an MWE:

use arrow2::array::{Utf8Array, BooleanArray};
use std::iter::FromIterator;
use arrow2::compute::filter::filter;
use arrow2::compute::concat::concatenate;



fn main() {

    let a = vec![
        "vegetables",
        "seafood",
        "meat",
        "fruit",
        "seafood",
        "meat",
        "vegetables",
        "fruit",
        "seafood",
        "fruit",
        "meat",
        "vegetables",
        "fruit",
        "vegetables",
        "vegetables",
        "seafood",
        "seafood",
        "seafood",
        "fruit",
        "meat",
        "vegetables",
        "seafood",
        "seafood",
        "fruit",
        "meat",
        "vegetables",
        "fruit",
    ];


    let arr = Utf8Array::<i64>::from_iter_values(a.iter());
    let mask = BooleanArray::from_slice(&[
            false,
            true,
            true,
            false,
            true,
            true,
            false,
            false,
            true,
            true,
            true,
            false,
            false,
            false,
            false,
            true,
            true,
            true,
            false,
            true,
            false,
            true,
            false,
            false,
            true,
            false,
            false,
        ]);

    // Slice to create offsets
    let arr = arr.slice(8, 2);  // seafood, fruit
    let mask = mask.slice(8, 2);  // true, true

    let v: Vec<_> = mask.iter().collect();
    dbg!(&v);


    let out = filter(&arr, &mask).unwrap();
    let arr = out.as_any().downcast_ref::<Utf8Array<i64>>().unwrap();

    let v: Vec<_> = arr.iter().collect();
    dbg!(&v);
}

This outputs:

[src/main.rs:77] &v = [
    Some(
        true,
    ),
    Some(
        true,
    ),
]
[src/main.rs:84] &v = [
    Some(
        "fruit",
    ),
]

Expected output

Because the boolean mask was all true, I'd expect two values returning from the filter.

@jorgecarleitao jorgecarleitao added the bug Something isn't working label Jul 29, 2021
@jorgecarleitao jorgecarleitao self-assigned this Jul 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants