-
Notifications
You must be signed in to change notification settings - Fork 221
Improved performance in cast Primitive to Binary/String again (4x) #651
Improved performance in cast Primitive to Binary/String again (4x) #651
Conversation
Codecov Report
@@ Coverage Diff @@
## main #651 +/- ##
==========================================
- Coverage 69.89% 69.59% -0.31%
==========================================
Files 299 299
Lines 16634 16746 +112
==========================================
+ Hits 11626 11654 +28
- Misses 5008 5092 +84
Continue to review full report at Codecov.
|
Why this pr makes MIRI tests fail? |
it is unrelated. There is something going on on miri dependencies that are causing some CIs to fail. Could you change the "key" parameter in the |
0cabcc6
to
219fa2c
Compare
mergifiy is a good bot to have. |
I will take a bit more to review this since it uses |
src/compute/cast/primitive_to.rs
Outdated
let mut buffer = vec![]; | ||
let builder = from.iter().fold( | ||
MutableBinaryArray::<O>::with_capacity(from.len()), | ||
|mut builder, x| { | ||
match x { | ||
Some(x) => { | ||
lexical_to_bytes_mut(*x, &mut buffer); | ||
builder.push(Some(buffer.as_slice())); | ||
} | ||
Some(x) => unsafe { | ||
builder.reserve(1, T::FORMATTED_SIZE_DECIMAL); | ||
builder.write_values(|bytes| lexical_core::write(*x, bytes).len()); | ||
}, | ||
None => builder.push_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a really cool idea!
I think we may go a step further, though: since the size is constant, the offsets will be [0, N, 2N, ..., M*N]
and the values can be constructed directly from lexical_core::write
, e.g. via extend
. We also do not need to check for utf8 below because lexical_core guarantees this. We can even ignore the validity of the primitive array and continue writing whatever is in the null slot, and clone the validity.
I think this implementation is best done without a MutableBinaryArray
, though: we benefit from operating on the buffers directly in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the size is constant, the offsets will be [0, N, 2N, ..., M*N]
It's not constant, T::FORMATTED_SIZE_DECIMAL
is the maximum size to reverse.
src/array/utf8/mutable.rs
Outdated
{ | ||
// ensure values has enough capacity and size to write | ||
self.values.set_len(self.values.capacity()); | ||
let buffer = &mut self.values.as_mut_slice()[self.offsets.last().unwrap().to_usize()..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is unsound, even in unsafe
code: a slice must always have initialized data on it. I propose a different implementation below that avoids introducing another API to the MutableBuffer
. LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do agree with you. But now MutableBuffer
is the only way to construct a BinaryArray, maybe we should expose values
, offsets
to outside.
We can have temp values
, offset
vectors in the cast kernel and then construct the MutableBuffer
by these two vectors.
Refer to clickhouse's style:
https://github.com/ClickHouse/ClickHouse/blob/515cc74530d11e1b2b18a63141b66a15b94748ba/src/Columns/ColumnString.h
Performance improved again:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left some minor comments, but overall is ready to merge.
I think that the windows does not support some of the dev dependencies, unfortunately :(
After using
|
Some thoughts: during every In |
Yes, that would be ideal, so that we only have to write (and test) the |
The PR description is outdated, it is more like -75% now ^_^ |
Memcpy
style write, no extra copy.