-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coprocessor/chunk: support encode/decode #3332
Conversation
cool, how the performance increased with this new codec? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
friendly ping @breeswish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
@@ -75,8 +77,30 @@ impl Chunk { | |||
pub fn iter(&self) -> RowIterator { | |||
RowIterator::new(self) | |||
} | |||
|
|||
#[cfg(test)] | |||
pub fn decode(buf: &mut BytesSlice, tps: &[FieldType]) -> Result<Chunk> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will tp be a u8
and when will it be a FieldType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ported from tidb and now only the u8
is used. I think we use the FieldType
just in case one day we may need some other attributes in FieldType
@zz-jason do you think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, Agree with you.
@@ -75,8 +77,30 @@ impl Chunk { | |||
pub fn iter(&self) -> RowIterator { | |||
RowIterator::new(self) | |||
} | |||
|
|||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it #[cfg(test)]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the decode function will only be used in test.
BTW, I recommended to add tests about the data after encoding, so that we need to make sure that it is EXACTLY THE SAME as TiDB's. Seems that TiDB does not have such tests as well. @zz-jason |
Just like we do not have the similar test for |
I would still like to see a test to ensure that our output exactly meets expectations / spec. A byte-to-byte test would be more persuasive than a test of encode + decode. 🤣@zz-jason What do you think? |
Maybe we can do this with the end-to-end integration tests once we are ready to use |
@zz-jason It's an integration level test instead of unit test. I think they cannot replace each other. |
@breeswish Agree. Integration test is more convenient and can cover more scenarios. For unit test, I think we can dump the correct encoded content to a file, this file can be used both by TiKV and TiDB during the unit test. |
Yes, that's exactly what I would like to see! We just need to dump a output in TiDB, and put it in tests in both TiKV and TiDB @zz-jason |
What have you changed? (mandatory)
This PR implements encode and decode functions for arrow chunk.
What are the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
unit tests
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
no
Does this PR affect tidb-ansible update? (mandatory)
no
Refer to a related PR or issue link (optional)
pingcap/tidb#7006
Benchmark result if necessary (optional)
test coprocessor::codec::chunk::bench_encode_chunk ... bench: 12,058 ns/iter (+/- 3,636)
@breeswish @zz-jason PTAL