Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coprocessor/chunk: support encode/decode #3332

Merged
merged 4 commits into from
Jul 25, 2018

Conversation

AndreMouche
Copy link
Member

What have you changed? (mandatory)

This PR implements encode and decode functions for arrow chunk.

What are the type of the changes? (mandatory)

  • New feature (non-breaking change which adds functionality)

How has this PR been tested? (mandatory)

unit tests

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

no

Does this PR affect tidb-ansible update? (mandatory)

no

Refer to a related PR or issue link (optional)

pingcap/tidb#7006

Benchmark result if necessary (optional)

test coprocessor::codec::chunk::bench_encode_chunk ... bench: 12,058 ns/iter (+/- 3,636)

@breeswish @zz-jason PTAL

@zz-jason zz-jason self-assigned this Jul 20, 2018
@siddontang
Copy link
Contributor

cool, how the performance increased with this new codec?

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndreMouche
Copy link
Member Author

friendly ping @breeswish

Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@@ -75,8 +77,30 @@ impl Chunk {
pub fn iter(&self) -> RowIterator {
RowIterator::new(self)
}

#[cfg(test)]
pub fn decode(buf: &mut BytesSlice, tps: &[FieldType]) -> Result<Chunk> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will tp be a u8 and when will it be a FieldType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ported from tidb and now only the u8 is used. I think we use the FieldType just in case one day we may need some other attributes in FieldType @zz-jason do you think so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Agree with you.

@@ -75,8 +77,30 @@ impl Chunk {
pub fn iter(&self) -> RowIterator {
RowIterator::new(self)
}

#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it #[cfg(test)]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the decode function will only be used in test.

@breezewish
Copy link
Member

BTW, I recommended to add tests about the data after encoding, so that we need to make sure that it is EXACTLY THE SAME as TiDB's. Seems that TiDB does not have such tests as well. @zz-jason

@AndreMouche
Copy link
Member Author

Just like we do not have the similar test for Datum and Table codec, I think it's a little strange to add tests about the data after encoding. And also I've made some comparison for some types between the result in tikv and tidb.

@breezewish
Copy link
Member

I would still like to see a test to ensure that our output exactly meets expectations / spec. A byte-to-byte test would be more persuasive than a test of encode + decode. 🤣@zz-jason What do you think?

@zz-jason
Copy link
Member

Maybe we can do this with the end-to-end integration tests once we are ready to use Chunk to do the IPC between TiDB and TiKV?

@breezewish
Copy link
Member

@zz-jason It's an integration level test instead of unit test. I think they cannot replace each other.

@zz-jason
Copy link
Member

@breeswish Agree. Integration test is more convenient and can cover more scenarios. For unit test, I think we can dump the correct encoded content to a file, this file can be used both by TiKV and TiDB during the unit test.

@breezewish
Copy link
Member

Yes, that's exactly what I would like to see! We just need to dump a output in TiDB, and put it in tests in both TiKV and TiDB @zz-jason

@AndreMouche AndreMouche merged commit cfca603 into tikv:master Jul 25, 2018
@AndreMouche AndreMouche deleted the chunk/codec branch July 25, 2018 02:29
sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants