Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ByteBoolArray type and fixe a bug in BoolArray #383

Merged
merged 30 commits into from
Jun 24, 2024

Conversation

AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Jun 17, 2024

The main two contributions in this PR are:

  1. A new ByteBoolArray type (closes Add a ByteBoolArray #335)
  2. An important fix for slicing and initialization of BoolArray

@AdamGS AdamGS changed the title [WI{Add ByteBoolArray type and fixes a bug in BoolArray slicing [WIP] Add ByteBoolArray type and fixes a bug in BoolArray slicing Jun 17, 2024
@AdamGS AdamGS changed the title [WIP] Add ByteBoolArray type and fixes a bug in BoolArray slicing [WIP] Add ByteBoolArray type and fixe a bug in BoolArray slicing Jun 17, 2024
@AdamGS AdamGS changed the title [WIP] Add ByteBoolArray type and fixe a bug in BoolArray slicing [WIP] Add ByteBoolArray type and fixe a bug in BoolArray Jun 17, 2024
@@ -9,7 +9,12 @@ use crate::ArrayDType;
impl ScalarAtFn for BoolArray {
fn scalar_at(&self, index: usize) -> VortexResult<Scalar> {
if self.is_valid(index) {
Ok(self.boolean_buffer().value(index).into())
let value = self.boolean_buffer().value(index);
let s = Scalar::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should wrap the native type -> scalar conversions in #cfg(test), they're just too dangerous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly already pretty hard to work with scalars IMO, but I don't have a much better idea here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an improved API is Scalar::bool(...) and Scalar::nullable_bool(...) and we just explicitly write out the constructors.

let buffer_len = buffer.len();
let buffer_offset = buffer.offset();

let inner = buffer.into_inner().bit_slice(buffer_offset, buffer_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this only partially fixes the bug. The problem is that the BoolArray itself doesn't hold an offset, meaning the first bit is always considered to be byte aligned.

Maybe in another PR can fix for real by adding a bit_offset (from 0..8) field to BoolMetadata. The SliceFn compute function and maybe others must also respect it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit_slice actually copies the buffer if the offset isn't byte aligned so this operation isn't as cheap as it used to be. Whether that's a reasonable tradeoff is up to you.
I do agree with the second point, I'm obviously not familiar with every part of the codebase here but taking the offset logic management into vortex will probably be valuable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the general idea is that construction and SliceFn should be constant time and not re-allocate. Which is why lots of arrays hold offsets.

Reminds me, we should have some logic to slice out those offsets during serialization!

};

impl ArrayCompute for ByteBoolArray {
fn as_arrow(&self) -> Option<&dyn crate::compute::as_arrow::AsArrowArray> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move the fully qualified names to imports


let scalar = match self.is_valid(index).then(|| self.buffer()[index] == 1) {
Some(b) => Scalar::new(
vortex_dtype::DType::Bool(self.validity().nullability()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be self.dtype().clone() right?

let scalar = match self.is_valid(index).then(|| self.buffer()[index] == 1) {
Some(b) => Scalar::new(
vortex_dtype::DType::Bool(self.validity().nullability()),
vortex_scalar::ScalarValue::Bool(b),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove fully qualified

length,
},
Some(buffer),
validity.into_array_data().into_iter().collect_vec().into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be vec![validity.into_array_data()]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, took this snippet from BoolArray, it can probably be simplified but I suspect that's going to take a bigger change

impl From<Vec<bool>> for ByteBoolArray {
fn from(value: Vec<bool>) -> Self {
let value_len = value.len();
Self::try_with_validity(value, NullBuffer::new_valid(value_len)).unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, this will create a nullable bool array I think? Not sure if that was intended?

}
}

impl AsRef<[bool]> for ByteBoolArray {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the one I don't like. These damn nulls!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


impl ArrayTrait for ByteBoolArray {
fn len(&self) -> usize {
self.metadata().length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.buffer().len()

}

match self.logical_validity() {
LogicalValidity::AllValid(len) => Ok(all_true_bool_stats(len)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works? Just because all values are non-null, doesn't mean all values are true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%

@gatesn
Copy link
Contributor

gatesn commented Jun 18, 2024

The other high-level comment, we should move this encoding into a crate. The main vortex-array encodings are (mostly) only those that convert 1:1 to Arrow. Anything else is an "extension"

@AdamGS
Copy link
Contributor Author

AdamGS commented Jun 24, 2024

Pushed fixes for most comment, I think stats are the last thing I need to fix here

@AdamGS AdamGS requested a review from gatesn June 24, 2024 15:46
Copy link
Contributor

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only note about changing constructor though could be convinced that we should keep it as is

validity: validity.to_metadata(length)?,
});

ArrayData::try_new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You generally want to go through the typed constructor where possible. So you want a
try_new(buffer: Buffer, validity: Validity) -> Self and then rename existing try_with_validity to from_vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, introduced try_new and renamed to try_from_vec

}

let bools = self.as_array_ref().clone().into_canonical()?.into_bool()?;
bools.compute_statistics(stat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future this could be optimized to avoid copy but for now this is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a TODO comment

@AdamGS
Copy link
Contributor Author

AdamGS commented Jun 24, 2024

Seems like one of my recent changes broke tests, looking into Fixed

@AdamGS AdamGS changed the title [WIP] Add ByteBoolArray type and fixe a bug in BoolArray Add ByteBoolArray type and fixe a bug in BoolArray Jun 24, 2024
@AdamGS AdamGS marked this pull request as ready for review June 24, 2024 19:41
@robert3005 robert3005 merged commit e43ec57 into spiraldb:develop Jun 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a ByteBoolArray
3 participants