-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ByteBoolArray
type and fixe a bug in BoolArray
#383
Conversation
ByteBoolArray
type and fixes a bug in BoolArray
slicingByteBoolArray
type and fixes a bug in BoolArray
slicing
ByteBoolArray
type and fixes a bug in BoolArray
slicingByteBoolArray
type and fixe a bug in BoolArray
slicing
ByteBoolArray
type and fixe a bug in BoolArray
slicingByteBoolArray
type and fixe a bug in BoolArray
@@ -9,7 +9,12 @@ use crate::ArrayDType; | |||
impl ScalarAtFn for BoolArray { | |||
fn scalar_at(&self, index: usize) -> VortexResult<Scalar> { | |||
if self.is_valid(index) { | |||
Ok(self.boolean_buffer().value(index).into()) | |||
let value = self.boolean_buffer().value(index); | |||
let s = Scalar::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should wrap the native type -> scalar conversions in #cfg(test), they're just too dangerous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly already pretty hard to work with scalars IMO, but I don't have a much better idea here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an improved API is Scalar::bool(...)
and Scalar::nullable_bool(...)
and we just explicitly write out the constructors.
vortex-array/src/array/bool/mod.rs
Outdated
let buffer_len = buffer.len(); | ||
let buffer_offset = buffer.offset(); | ||
|
||
let inner = buffer.into_inner().bit_slice(buffer_offset, buffer_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this only partially fixes the bug. The problem is that the BoolArray itself doesn't hold an offset, meaning the first bit is always considered to be byte aligned.
Maybe in another PR can fix for real by adding a bit_offset (from 0..8) field to BoolMetadata. The SliceFn compute function and maybe others must also respect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit_slice
actually copies the buffer if the offset isn't byte aligned so this operation isn't as cheap as it used to be. Whether that's a reasonable tradeoff is up to you.
I do agree with the second point, I'm obviously not familiar with every part of the codebase here but taking the offset logic management into vortex will probably be valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the general idea is that construction and SliceFn should be constant time and not re-allocate. Which is why lots of arrays hold offsets.
Reminds me, we should have some logic to slice out those offsets during serialization!
}; | ||
|
||
impl ArrayCompute for ByteBoolArray { | ||
fn as_arrow(&self) -> Option<&dyn crate::compute::as_arrow::AsArrowArray> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move the fully qualified names to imports
|
||
let scalar = match self.is_valid(index).then(|| self.buffer()[index] == 1) { | ||
Some(b) => Scalar::new( | ||
vortex_dtype::DType::Bool(self.validity().nullability()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be self.dtype().clone()
right?
let scalar = match self.is_valid(index).then(|| self.buffer()[index] == 1) { | ||
Some(b) => Scalar::new( | ||
vortex_dtype::DType::Bool(self.validity().nullability()), | ||
vortex_scalar::ScalarValue::Bool(b), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove fully qualified
length, | ||
}, | ||
Some(buffer), | ||
validity.into_array_data().into_iter().collect_vec().into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be vec![validity.into_array_data()]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, took this snippet from BoolArray
, it can probably be simplified but I suspect that's going to take a bigger change
impl From<Vec<bool>> for ByteBoolArray { | ||
fn from(value: Vec<bool>) -> Self { | ||
let value_len = value.len(); | ||
Self::try_with_validity(value, NullBuffer::new_valid(value_len)).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, this will create a nullable bool array I think? Not sure if that was intended?
} | ||
} | ||
|
||
impl AsRef<[bool]> for ByteBoolArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is the one I don't like. These damn nulls!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
|
||
impl ArrayTrait for ByteBoolArray { | ||
fn len(&self) -> usize { | ||
self.metadata().length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.buffer().len()
} | ||
|
||
match self.logical_validity() { | ||
LogicalValidity::AllValid(len) => Ok(all_true_bool_stats(len)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works? Just because all values are non-null, doesn't mean all values are true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
The other high-level comment, we should move this encoding into a crate. The main vortex-array encodings are (mostly) only those that convert 1:1 to Arrow. Anything else is an "extension" |
Pushed fixes for most comment, I think stats are the last thing I need to fix here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only note about changing constructor though could be convinced that we should keep it as is
validity: validity.to_metadata(length)?, | ||
}); | ||
|
||
ArrayData::try_new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You generally want to go through the typed constructor where possible. So you want a
try_new(buffer: Buffer, validity: Validity) -> Self
and then rename existing try_with_validity
to from_vec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, introduced try_new
and renamed to try_from_vec
} | ||
|
||
let bools = self.as_array_ref().clone().into_canonical()?.into_bool()?; | ||
bools.compute_statistics(stat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for future this could be optimized to avoid copy but for now this is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a TODO comment
|
ByteBoolArray
type and fixe a bug in BoolArray
ByteBoolArray
type and fixe a bug in BoolArray
The main two contributions in this PR are:
ByteBoolArray
type (closes Add a ByteBoolArray #335)BoolArray