Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: parse the data source directly into data and skip the KV encoder #145

Merged
merged 17 commits into from Apr 3, 2019

Conversation

Projects
None yet
3 participants
@kennytm
Copy link
Member

commented Mar 14, 2019

What problem does this PR solve?

In our testing with a 4.1 TB workload, we found that parsing SQL takes almost half of the time to encode a row. Since we have already used a parser to extract each row, parsing it again is wasting computing resource. Additionally, for CSV we need to perform the complex and unnecessary Parse CSV → Reconstruct SQL → Parse SQL.

What is changed and how it works?

We change the Lightning parsers to directly produce an array of types.Datum for both CSV and SQL. We also get rid of the abstraction layer KvEncoder (since it only accepts SQL statements), and directly use (*table.Table).AddRecord to convert the []types.Datum into KV pairs.

This slashes half of the encoding time according to experiment.

Check List

Tests

  • Unit test
  • Integration test

Side effects

Related changes

@kennytm kennytm force-pushed the kennytm/parse-faster branch from ce043f9 to f0dadce Mar 14, 2019

@kennytm kennytm force-pushed the kennytm/parse-faster branch 2 times, most recently from f81aead to 44db439 Mar 15, 2019

@kennytm kennytm removed the status/WIP label Mar 15, 2019

@kennytm kennytm changed the title [WIP] *: parse the data source directly into data and skip the KV encoder *: parse the data source directly into data and skip the KV encoder Mar 15, 2019

@kennytm

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2019

@lonng Some metrics are temporarily removed, we need to see if we want to tweak the metrics or the process. The old process:

  • Read a block (64 KB) of SQL values → Reconstruct INSERT statement → Execute INSERT statement to get KV pairs → Deliver block

New process:

  • Read a single row and copy to datum array → Encode to get KV pairs of this row → Buffer until 64 KB → Deliver buffer

The following metrics were involved in this change and may need to be repurposed?

  • ChunkParserReadRowSecondsHistogram
  • BlockReadSecondsHistogram (concept of "block" no longer applies)
  • BlockReadBytesHistogram
  • BlockEncodeSecondsHistogram
@lonng

This comment has been minimized.

Copy link
Member

commented Mar 16, 2019

@kennytm I think we could add some new metrics about:

  • Encode to get KV pairs duration → data part + index part
  • Data KV size
  • Index KV size

@kennytm kennytm force-pushed the kennytm/parse-faster branch 2 times, most recently from b83efb5 to 5624ac0 Mar 19, 2019

kennytm added some commits Mar 14, 2019

*: parse the data source directly into data and skip the KV encoder
This skips the more complex pingcap/parser, and speeds up parsing speed
by 50%.

We have also refactored the KV delivery mechanism to use channels
directly, and revamped metrics:

- Make the metrics about engines into its own `engines` counter. The
  `tables` counter is exclusively about tables now.
- Removed `block_read_seconds`, `block_read_bytes`, `block_encode_seconds`
  since the concept of "block" no longer applies. Replaced by the
  equivalents named `row_***`.
- Removed `chunk_parser_read_row_seconds` for being overlapping with
  `row_read_seconds`.
- Changed `block_deliver_bytes` into a histogram vec, with kind=index
  or kind=data. Introduced `block_deliver_kv_pairs`.
tests,restore: prevent spurious error in checkpoint_chunks test
Only kill Lightning if the whole chunk is imported exactly. The chunk
checkpoint may be recorded before a chunk is fully written, and this will
hit the failpoint more than 5 times.
common: disable IsContextCanceledError() when log level = debug
This helps debugging some mysterious cancellation where the log is
inhibited.

Added IsReallyContextCanceledError() for code logic affected by error
type.

@kennytm kennytm force-pushed the kennytm/parse-faster branch from 5624ac0 to 216c812 Mar 23, 2019

@kennytm kennytm added status/PTAL and removed status/WIP labels Mar 23, 2019

kennytm added some commits Mar 23, 2019

kennytm added some commits Apr 2, 2019

@kennytm

This comment has been minimized.

Copy link
Member Author

commented Apr 2, 2019

/run-all-tests

@GregoryIan
Copy link
Collaborator

left a comment

I feel that lightning lacks unit test

_, err = chunkStmt.ExecContext(
c, tableName, engineID,
value.Key.Path, value.Key.Offset, value.Columns, value.ShouldIncludeRowID,
value.Key.Path, value.Key.Offset, colPerm,

This comment has been minimized.

Copy link
@GregoryIan

GregoryIan Apr 2, 2019

Collaborator

I only know we call InsertEngineCheckpoints after calling populateChunks. If so, columns column always empty

This comment has been minimized.

Copy link
@kennytm

kennytm Apr 2, 2019

Author Member

Removed this and the checksum (always 0 too).

func (cr *chunkRestore) saveCheckpoint(t *TableRestore, engineID int32, rc *RestoreController) {
rc.saveCpCh <- saveCp{
tableName: t.tableName,
merger: &RebaseCheckpointMerger{

This comment has been minimized.

Copy link
@GregoryIan

GregoryIan Apr 2, 2019

Collaborator

is it enough to only save it in L529? it seems AllocBase wouldn't change

This comment has been minimized.

Copy link
@kennytm

kennytm Apr 2, 2019

Author Member

Unfortunately no, the AllocBase needs to be larger than all content of _tidb_rowid (or the integer primary key), which value cannot be determined until we've read all data.

Added a comment for this.

{"t2", "CREATE TABLE `t2` (`c1` varchar(30000) NOT NULL)", "failed to ExecDDLSQL `mockdb`.`t2`:.*"},
{"t3", "CREATE TABLE `t3-a` (`c1-a` varchar(5) NOT NULL)", ""},
{"t1", "CREATE TABLE `t1` (`c1` varchar(5) NOT NULL)"},
// {"t2", "CREATE TABLE `t2` (`c1` varchar(30000) NOT NULL)"}, // no longer able to create this kind of table.

This comment has been minimized.

Copy link
@GregoryIan

GregoryIan Apr 2, 2019

Collaborator

is this case meaningless? or add some error cases?

This comment has been minimized.

Copy link
@kennytm

kennytm Apr 2, 2019

Author Member

In this PR we no longer parses the CREATE TABLE DDL, and instead directly unmarshal the JSON result from TiDB (calling tables.TableFromMeta). So yeah this case becomes meaningless since either you can't create a VARCHAR(30000) in TiDB, or you can and produced a proper JSON which Lightning accepts without error.

Anyway, added a separate unit test to ensure malformed table info will produce an error.

Show resolved Hide resolved tests/checkpoint_chunks/run.sh Outdated
Show resolved Hide resolved lightning/restore/restore.go Outdated
Show resolved Hide resolved lightning/mydump/parser.go Outdated
@@ -19,11 +19,12 @@ import (
"strings"

. "github.com/pingcap/check"
"github.com/pingcap/errors"

This comment has been minimized.

Copy link
@GregoryIan

GregoryIan Apr 2, 2019

Collaborator

we don't add more test for it, only old cases

This comment has been minimized.

Copy link
@kennytm

kennytm Apr 2, 2019

Author Member

The old unit test didn't compile since the interface is changed. There's no new or deleted tests in this file.

@GregoryIan

This comment has been minimized.

Copy link
Collaborator

commented Apr 2, 2019

Rest LGTM

kennytm added some commits Apr 2, 2019

@kennytm

This comment has been minimized.

Copy link
Member Author

commented Apr 2, 2019

/run-all-tests

Show resolved Hide resolved tests/sqlmode/run.sh
@lonng

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

Rest LGTM

@kennytm kennytm added status/LGT1 and removed status/PTAL labels Apr 3, 2019

@kennytm

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2019

@GregoryIan PTAL again

@GregoryIan

This comment has been minimized.

Copy link
Collaborator

commented Apr 3, 2019

LGTM

@GregoryIan GregoryIan added status/LGT2 and removed status/LGT1 labels Apr 3, 2019

@kennytm

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2019

/run-all-tests

@kennytm kennytm merged commit 7bae12e into master Apr 3, 2019

3 checks passed

idc-jenkins-ci-lightning/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb-lightning/test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

kennytm added a commit that referenced this pull request Apr 3, 2019

kennytm added a commit that referenced this pull request Apr 4, 2019

tests: fix a test failure due to conflict between #145 and #158 (#159)
* tests: fix a test failure due to conflict between #145 and #158

* restore: apply the row count limit to failpoint KillIfImportedChunk too
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.