-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): Impl any/all for array type #13250
Conversation
@@ -16,20 +16,19 @@ where | |||
let validity = arr.validity().cloned(); | |||
|
|||
// Fast path where all values set (all is free). | |||
let all_set = arrow::compute::boolean::all(values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of any
, this calculation should be avoided.
// TODO! | ||
// We can speed this upp if the boolean array doesn't have nulls | ||
// Then we can work directly on the byte slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this TODO
means we can slice the Values/Bitmap
of BooleanArray
directly to avoid iterate over this array in no-null case.
But looks at arrow::compute::boolean::any/all
a bit more, It seems to have been optimized for this branch already.
pub fn any(array: &BooleanArray) -> bool {
if array.is_empty() {
false
} else if array.null_count() > 0 {
array.into_iter().any(|v| v == Some(true))
} else {
let vals = array.values();
vals.unset_bits() != vals.len()
}
}
If I missed something else, feel free to point it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing an Arrow Array involves Boxing
so this is rather expensive (compared to slicing a slice, which is free). If we don't have nulls. We can keep the &[u8]
and use the offset
+ slice_offset
+ slice_len
to do a bitcount and determine the all
, any
operation.
This is something we can leave as todo, and maybe do in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation, make sense! Will find the time to do this optimization. :)
No description provided.