-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): support run-length encoding #507
Conversation
It's called run-length encoding (RLE), not running-length encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this feature! But before proceeding to review, I'd like to know why we need 2 separate RLEBuilder / RLEIterator for variable / fixed-length types? If we write bounds from the Array
:
pub struct RleBlockIterator<A, B>
where
A: Array,
B: BlockIterator<A>,
This should cover all cases, including both primitive types and Bytes types. Did you meet any difficulty when implementing in this way?
(... in #255, I only considered the case for primitive types. But as long as you have implemented RLE for all types, I think it's reasonable to de-dup code.)
By the way, I've force enabled is_rle and run TPC-H tests on this branch. It seems that...
Some queries are producing wrong results. See https://github.com/risinglightdb/risinglight/blob/main/docs/01-tpch.md#developers-add-new-tpc-h-tests on how to run TPC-H tests with RisingLight. Don't worry about that, I think I might find the root cause when I review. |
If the answer is yes, I think we'd better choose not to implement it to keep our codebase simple. Because RLE may not be very useful for non-primitive types? |
There's a one-char column |
Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
In commit f163ed, I added a new trait RLETypeEncode and implemented only one builder/iterator for both primitive types and Bytes types. However, the |
We may store previous value as <A::Item as ToOwned>::Owned? This is equivalent to i32, String, etc. |
…ator Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
Signed-off-by: ludics <leonludics@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM, excellent work!
b394b96
to
f4d06e4
Compare
Signed-off-by: ludics <leonludics@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM! You may get it merged and fix the remaining small problems later. Thanks for your contribution.
I'm still investigating why enabling RLE will produce wrong result in TPC-H. I'll create an issue later about this problem.
Well, I've finally found the issue with |
storage: support run-length encoding #255
Rle
rle_block_builder
,rle_block_iterator
ColumnBuilder
andColumnIterator
close #255
Signed-off-by: Lu Di leonludics@gmail.com