Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance in cast Primitive to Binary/String again (4x) #651

Merged
merged 7 commits into from
Dec 5, 2021

Conversation

sundy-li
Copy link
Collaborator

@sundy-li sundy-li commented Dec 1, 2021

Memcpy style write, no extra copy.

cast int32 to binary 512                                                                             
                        time:   [2.9503 us 2.9581 us 2.9664 us]
                        change: [-70.283% -70.127% -69.980%] (p = 0.00 < 0.05)
                        Performance has improved.

@codecov
Copy link

codecov bot commented Dec 1, 2021

Codecov Report

Merging #651 (52be4d7) into main (8300684) will decrease coverage by 0.30%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #651      +/-   ##
==========================================
- Coverage   69.89%   69.59%   -0.31%     
==========================================
  Files         299      299              
  Lines       16634    16746     +112     
==========================================
+ Hits        11626    11654      +28     
- Misses       5008     5092      +84     
Impacted Files Coverage Δ
src/array/binary/mod.rs 81.48% <100.00%> (ø)
src/compute/cast/primitive_to.rs 79.68% <100.00%> (+3.44%) ⬆️
src/compute/arithmetics/mod.rs 69.04% <0.00%> (-27.62%) ⬇️
src/compute/arithmetics/time.rs 26.60% <0.00%> (-17.63%) ⬇️
src/compute/arithmetics/decimal/mul.rs 75.00% <0.00%> (-17.31%) ⬇️
src/compute/arithmetics/decimal/div.rs 75.94% <0.00%> (-16.36%) ⬇️
src/array/binary/mutable.rs 79.35% <0.00%> (-0.81%) ⬇️
src/array/utf8/mutable.rs 85.09% <0.00%> (-0.68%) ⬇️
src/bitmap/immutable.rs 86.48% <0.00%> (ø)
src/io/csv/read/deserialize.rs 100.00% <0.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8300684...52be4d7. Read the comment docs.

@sundy-li
Copy link
Collaborator Author

sundy-li commented Dec 1, 2021

Why this pr makes MIRI tests fail?

@jorgecarleitao
Copy link
Owner

it is unrelated. There is something going on on miri dependencies that are causing some CIs to fail. Could you change the "key" parameter in the cache action on the .github/test.yaml for a cache miss? I need to investigate what is causing this.

@sundy-li
Copy link
Collaborator Author

sundy-li commented Dec 2, 2021

mergifiy is a good bot to have.

@jorgecarleitao
Copy link
Owner

I will take a bit more to review this since it uses unsafe.

Comment on lines 20 to 27
let mut buffer = vec![];
let builder = from.iter().fold(
MutableBinaryArray::<O>::with_capacity(from.len()),
|mut builder, x| {
match x {
Some(x) => {
lexical_to_bytes_mut(*x, &mut buffer);
builder.push(Some(buffer.as_slice()));
}
Some(x) => unsafe {
builder.reserve(1, T::FORMATTED_SIZE_DECIMAL);
builder.write_values(|bytes| lexical_core::write(*x, bytes).len());
},
None => builder.push_null(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really cool idea!

I think we may go a step further, though: since the size is constant, the offsets will be [0, N, 2N, ..., M*N] and the values can be constructed directly from lexical_core::write, e.g. via extend. We also do not need to check for utf8 below because lexical_core guarantees this. We can even ignore the validity of the primitive array and continue writing whatever is in the null slot, and clone the validity.

I think this implementation is best done without a MutableBinaryArray, though: we benefit from operating on the buffers directly in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the size is constant, the offsets will be [0, N, 2N, ..., M*N]

It's not constant, T::FORMATTED_SIZE_DECIMAL is the maximum size to reverse.

{
// ensure values has enough capacity and size to write
self.values.set_len(self.values.capacity());
let buffer = &mut self.values.as_mut_slice()[self.offsets.last().unwrap().to_usize()..];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is unsound, even in unsafe code: a slice must always have initialized data on it. I propose a different implementation below that avoids introducing another API to the MutableBuffer. LMK what you think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do agree with you. But now MutableBuffer is the only way to construct a BinaryArray, maybe we should expose values, offsets to outside.

We can have temp values, offset vectors in the cast kernel and then construct the MutableBuffer by these two vectors.

Refer to clickhouse's style:
https://github.com/ClickHouse/ClickHouse/blob/515cc74530d11e1b2b18a63141b66a15b94748ba/src/Columns/ColumnString.h

@sundy-li
Copy link
Collaborator Author

sundy-li commented Dec 5, 2021

Performance improved again:

cast int32 to binary 512                                                                             
        time:   [3.4333 us 3.4434 us 3.4536 us]
        change: [-52.301% -52.104% -51.908%] (p = 0.00 < 0.05)
        Performance has improved.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some minor comments, but overall is ready to merge.

I think that the windows does not support some of the dev dependencies, unfortunately :(

src/compute/cast/primitive_to.rs Outdated Show resolved Hide resolved
src/compute/cast/primitive_to.rs Outdated Show resolved Hide resolved
src/array/binary/mutable.rs Outdated Show resolved Hide resolved
@sundy-li
Copy link
Collaborator Author

sundy-li commented Dec 5, 2021

After using from_data_unchecked

cast int32 to binary 512                                                                             
                        time:   [2.9751 us 2.9841 us 2.9944 us]
                        change: [-12.521% -11.660% -10.883%] (p = 0.00 < 0.05)
                        Performance has improved.

@sundy-li sundy-li changed the title Improved performance in cast Primitive to Binary/String again (2x) Improved performance in cast Primitive to Binary/String again (4x) Dec 5, 2021
@sundy-li
Copy link
Collaborator Author

sundy-li commented Dec 5, 2021

Some thoughts: during every XXXToBinary or XXXToUtf8, we must write these unsafe codes? :(

In databend, there are many functions to be implemented to work with strings, I think it better to have a wrap function to simplify the logic.

@jorgecarleitao
Copy link
Owner

Yes, that would be ideal, so that we only have to write (and test) the unsafe bit once. An inline function should be enough, though.

@jorgecarleitao jorgecarleitao merged commit a18555c into jorgecarleitao:main Dec 5, 2021
@jorgecarleitao
Copy link
Owner

The PR description is outdated, it is more like -75% now ^_^

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants