Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

config: allow four byte-size config to be specified using human-readable units ("100 GiB") #471

Merged
merged 4 commits into from Nov 13, 2020

Conversation

kennytm
Copy link
Collaborator

@kennytm kennytm commented Nov 11, 2020

What problem does this PR solve?

Several config specifies a large byte size, which is not very readable (and some DBAs are still unfamiliar with the underscore-as-number-separator TOML feature).

What is changed and how it works?

Using docker/go-units (already used by Dumpling and PD), we support parsing human-readable byte size for the following config:

  • tikv-importer.region-split-size (100_663_296 → '96 MiB')
  • mydumper.read-block-size (65536 → '64 KiB')
  • mydumper.batch-size (107_374_182_400 → '100 GiB')
  • mydumper.max-region-size (268_435_456 → '256 MiB')

Byte size can be specified using integers (65536 or 65_536, backward-compatible with existing config), floating point (6.5536e+4), or a string involving byte units ('64k', '64 K', '64KiB', '64 kb', '0.0625 MB', all equivalent).

(This PR is submitted mainly because we expect that disk-quota is also going to need some huge numbers in a custom config, similar to TiKV's ReadableSize struct. Interestingly no such thing exists in TiDB except in dealing with INSPECTION_RULES.)

Check List

Tests

  • Unit test

Side effects

Related changes

  • Need to update the documentation
  • Need to be included in the release note

Release notes

  • The 4 configuration specified above can now accept human-readable format in the form "2.5 GiB".

@kennytm kennytm added the status/PTAL This PR is ready for review. Add this label back after committing new changes label Nov 11, 2020
e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)
@kennytm kennytm force-pushed the kennytm/human-readable-size-config branch from 3698a5e to ac12525 Compare November 12, 2020 00:35
lightning/restore/checksum_test.go Outdated Show resolved Hide resolved
}

// UnmarshalJSON implements json.Unmarshaler (for testing)
func (size *ByteSize) UnmarshalJSON(b []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also implement MarshalTextand MarshalJSON

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems no need to implement the Marshal methods as the unit test passes without them.

@kennytm kennytm force-pushed the kennytm/human-readable-size-config branch from ac12525 to c91399e Compare November 12, 2020 03:24
Copy link
Contributor

@glorv glorv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glorv glorv added status/LGT1 One reviewer already commented LGTM (LGTM1) and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Nov 12, 2020
@overvenus
Copy link
Member

LGTM

@overvenus overvenus added status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2) and removed status/LGT1 One reviewer already commented LGTM (LGTM1) labels Nov 13, 2020
@kennytm kennytm merged commit ccf7f3c into master Nov 13, 2020
@kennytm kennytm deleted the kennytm/human-readable-size-config branch November 13, 2020 07:27
lance6716 added a commit to lance6716/tidb-lightning that referenced this pull request Nov 17, 2020
add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (pingcap#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (pingcap#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (pingcap#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (pingcap#459)

backend: fix auto random default value for primary key (pingcap#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (pingcap#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <kennytm@gmail.com>

address comment

backend/local: use range properties to optimize region range estimate (pingcap#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (pingcap#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (pingcap#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <kennytm@gmail.com>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <kennytm@gmail.com>

fix pd service id is empty (pingcap#460)

fix s3 parquet reader (pingcap#461)

Co-authored-by: Neil Shen <overvenus@gmail.com>

fix service gc ttl again (pingcap#465)

address comment

mydumper: verify file routing config (pingcap#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (pingcap#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <glorvs@163.com>

remove debug log

test: change double type syntax (pingcap#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (pingcap#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code
lance6716 added a commit to lance6716/tidb-lightning that referenced this pull request Nov 17, 2020
add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (pingcap#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (pingcap#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (pingcap#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (pingcap#459)

backend: fix auto random default value for primary key (pingcap#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (pingcap#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <kennytm@gmail.com>

address comment

backend/local: use range properties to optimize region range estimate (pingcap#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (pingcap#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (pingcap#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <kennytm@gmail.com>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <kennytm@gmail.com>

fix pd service id is empty (pingcap#460)

fix s3 parquet reader (pingcap#461)

Co-authored-by: Neil Shen <overvenus@gmail.com>

fix service gc ttl again (pingcap#465)

address comment

mydumper: verify file routing config (pingcap#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (pingcap#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <glorvs@163.com>

remove debug log

test: change double type syntax (pingcap#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (pingcap#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code
glorv pushed a commit that referenced this pull request Nov 23, 2020
* save my work

add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (#459)

backend: fix auto random default value for primary key (#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <kennytm@gmail.com>

address comment

backend/local: use range properties to optimize region range estimate (#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <kennytm@gmail.com>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <kennytm@gmail.com>

fix pd service id is empty (#460)

fix s3 parquet reader (#461)

Co-authored-by: Neil Shen <overvenus@gmail.com>

fix service gc ttl again (#465)

address comment

mydumper: verify file routing config (#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <glorvs@163.com>

remove debug log

test: change double type syntax (#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code

* address comment

* add some comments

* fix CI and change CREATE TABLE
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/LGT2 Two reviewers already commented LGTM, ready for merge (LGTM2)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants