Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Stop server startup if an unrecognized option is found in a config file #9855

Merged
merged 17 commits into from Apr 8, 2019

Conversation

@kolbe
Copy link
Contributor

kolbe commented Mar 21, 2019

What problem does this PR solve?

#7103

What is changed and how it works?

Previously, tidb-server would simply ignore invalid config options encountered in a config.toml file. This can cause big problems if a user puts an option in the wrong section of the config file, or if they mis-spell an option name. This could conceivably lead to data loss (if, for example, data is not written to the correct device), security issues (if listeners are not bound to the correct interface), or administration headaches (if someone puts an option in the wrong section and cannot figure out why it isn't taking effect).

This change uses the metaData returned by toml.DecodeFile to identify any items not decoded. A list of them is made and is returned as an error from Config.Load, which causes server startup to abort.

Check List

Tests

  • Unit test
    Yes, check added to config/config_test.go to make sure that an unrecognized option in a config file throws the correct error.

Code changes

  • Has exported function/method change
    No
  • Has exported variable/fields change
    No
  • Has interface methods change
    No
  • Has persistent data change
    No

Side effects

  • Possible performance regression
    No
  • Increased code complexity
    No
  • Breaking backward compatibility
    Yes, if someone is using a config file in product that has unrecognized options.

Related changes

  • Need to cherry-pick to the release branch
    No
  • Need to update the documentation
    Yes
  • Need to update the tidb-ansible repository
    Maybe, if it has a strange config file with invalid options!
  • Need to be included in the release note
    Yes

@kolbe kolbe requested review from shenli and GregoryIan Mar 21, 2019

kolbe added some commits Mar 21, 2019

@codecov

This comment has been minimized.

Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #9855 into master will decrease coverage by 0.3011%.
The diff coverage is 100%.

@@               Coverage Diff                @@
##             master      #9855        +/-   ##
================================================
- Coverage   77.5172%   77.2161%   -0.3011%     
================================================
  Files           403        405         +2     
  Lines         81827      81641       -186     
================================================
- Hits          63430      63040       -390     
- Misses        13689      13932       +243     
+ Partials       4708       4669        -39
@morgo

This comment has been minimized.

Copy link
Member

morgo commented Mar 21, 2019

For context: this is the mysql server behavior with parsing config files. If an item is discovered that is unknown, an error is returned. As Kolbe pointed out, the logic is:

This could conceivably lead to data loss (if, for example, data is not written to the correct device), security issues (if listeners are not bound to the correct interface), or administration headaches (if someone puts an option in the wrong section and cannot figure out why it isn't taking effect).

@morgo morgo changed the title Stop server startup if an unrecognized option is found in a config file config: Stop server startup if an unrecognized option is found in a config file Mar 21, 2019

@kolbe kolbe changed the base branch from master to rc4 Mar 22, 2019

@kolbe kolbe changed the base branch from rc4 to master Mar 22, 2019

@winkyao winkyao added the status/DNM label Mar 23, 2019

@winkyao
Copy link
Member

winkyao left a comment

mind compatibility

@winkyao winkyao added component/server and removed status/DNM labels Mar 25, 2019

@winkyao

This comment has been minimized.

Copy link
Member

winkyao commented Mar 25, 2019

@jackysp PTAL

@jackysp

This comment has been minimized.

Copy link
Member

jackysp commented Mar 25, 2019

/run-all-tests

@jackysp

This comment has been minimized.

Copy link
Member

jackysp commented Mar 25, 2019

It seems that there are some illegal configurations in the test, which causes TiDB to fail to start.

@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Mar 25, 2019

/run-all-tests

@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Mar 25, 2019

/run-all-tests tidb-test=pr/764

@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Mar 25, 2019

/run-unit-tests tidb-test=pr/764

@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Mar 25, 2019

/run-mybatis-tests tidb-test=pr/764

@xiekeyi98 xiekeyi98 closed this Mar 25, 2019

@xiekeyi98 xiekeyi98 reopened this Mar 25, 2019

for _, item := range undecoded {
undecodedItems += fmt.Sprintf(" %s \n", item.String())
}
err = errors.Errorf("config file %s contained unknown configuration options:\n%s", confFile, undecodedItems)

This comment has been minimized.

Copy link
@zz-jason

zz-jason Mar 25, 2019

Member

error generated by errors.Errorf() is attached with a call stack, we don't need to wrap a errors.Trace() in line 391.

This comment has been minimized.

Copy link
@kolbe

kolbe Mar 25, 2019

Author Contributor

That's fine with me, but the trace is not my code (it was there before, so that a non-existent config file for example would be traced for some reason).

Show resolved Hide resolved config/config.go Outdated
@@ -371,10 +372,22 @@ func GetGlobalConfig() *Config {

// Load loads config options from a toml file.
func (c *Config) Load(confFile string) error {

This comment has been minimized.

Copy link
@zz-jason

zz-jason Mar 25, 2019

Member

When upgrade from old version tidb to a newer version, some old config item maybe is not used in the newer version tidb. The rolling upgrade may meet error in this case, then the rolling upgrade failed.

How about only check the config items when -config-check is provided?

This comment has been minimized.

Copy link
@kolbe

kolbe Mar 25, 2019

Author Contributor

I don't like this suggestion. I think the behavior should be that tidb-server simply refuses to start if there are unknown configuration options, for the reasons I provided above.

A couple points:

  1. Now that there's a --config-check option, it can be a part of the recommended procedure for a rolling upgrade, so that the only people running into the issue you describe are those not following the recommended procedure.

  2. The benefit of a rolling upgrade is that you never have to take the entire cluster offline. When upgrading the first node, it will refuse to start with this error. That's a fine time to address the problem, apply the change to the config files, get that first node upgraded, and then proceed with the rolling upgrade.

@morgo, your thoughts?

This comment has been minimized.

Copy link
@morgo

morgo Mar 25, 2019

Member

I think because our own CI broke, we may need to ease this change in slowly. Here would be my suggestion:

  1. Keep --config-check as is, it is independently very useful.
  2. Change from refuse to start to print a WARN about each incorrect config item.
  3. Add a new flag for --config-strict, which defaults to FALSE. We can document it that enabling it is a best practice, and we may change the default to TRUE in the future. This will result in errors for invalid options.

It is not bulletproof of course, because the default log level is INFO it is possible that the warnings may not be seen.

This comment has been minimized.

Copy link
@kolbe

kolbe Mar 25, 2019

Author Contributor

I think the very fact our own CI broke is evidence of the value of this behavior change. Even we weren't keeping track of whether our config files were well-formed!

  1. Your suggestion above is difficult, because the logging system isn't set up yet when the config file is parsed. Either we have to keep around the list of undecoded items and check them later after logging is possible, or we have to pass some other string/structure/flag around, or we have to simply print something to stderr, or ...? @coocood Morgan mentioned that you may have some thoughts on this?

  2. I dislike this. Adding this new flag means that people may start relying on it, and then we have to keep it around as a no-op if we later make that behavior the default.

This comment has been minimized.

Copy link
@winkyao

winkyao Mar 26, 2019

Member

I do agree with Morgan's first and second suggestions. Our users may use old version ansible to upgrade, that doesn't pre-check the config using tidb-server --config-check, we need to keep the cluster stable when rolling upgrade.

This comment has been minimized.

Copy link
@kolbe

kolbe Mar 26, 2019

Author Contributor

Alright, it sounds like there's a consensus. Can someone offer a suggestion of the best way to technically implement a "warning" before the logging system is set up?

This comment has been minimized.

Copy link
@winkyao

winkyao Apr 2, 2019

Member

@kolbe I will create a pull request to your branch later.

This comment has been minimized.

Copy link
@winkyao

This comment has been minimized.

Copy link
@kolbe

kolbe Apr 2, 2019

Author Contributor

@winkyao I added some comments on your PR. I re-wrote my own interpretation of how I see this working and pushed it to my branch of my fork, please take a look.

Basically I added a kind of unsettling amount of additional stuff to create a custom error in config.go, then use that in the case of failed config validation. That enables checking for the specific error type in main.go, and if configStrict is not enabled, it'll get the string from the warning and keep that until ater logging has been set up.

This is kind of a lot of extra work and adds some ugly things, but the whole idea behind strict config checking (and the --config-strict option) is that this is a temporary situation until we make --config-strict behavior the default (and hopefully only!) behavior of TiDB Server in the future. I think at that point we'd simply remove all this extra stuff with the custom error and strange handling in the server.

Thoughts?

@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Mar 25, 2019

/run-all-tests tidb-test=pr/764

kolbe added some commits Apr 2, 2019

Added new error type and some kind of circuitous handling to save and…
… defer a warning from loadConfig until after logging has been set up. Virtually all of this should be removed after configStrict is made the default behavior of TiDB Server!
@winkyao

This comment has been minimized.

Copy link
Member

winkyao commented Apr 2, 2019

Ci failed by

config_test.go:71:
    c.Assert(conf.Load(configFile), IsNil)
... value *config.ErrConfigValidationFailed = &config.ErrConfigValidationFailed{err:"config file /go/src/github.com/pingcap/tidb/config/config.toml contained unknown configuration options: tikv-client.token-limit"} ("config file /go/src/github.com/pingcap/tidb/config/config.toml contained unknown configuration options: tikv-client.token-limit"

The test case is wrong, you should replace (https://github.com/pingcap/tidb/blob/master/config/config_test.go#L48)

_, err = f.WriteString(`[performance]
[tikv-client]
commit-timeout="41s"
max-batch-size=128

token-limit = -1
`)

with

_, err = f.WriteString(`
token-limit = 0
[performance]
[tikv-client]
commit-timeout="41s"
max-batch-size=128
`)

kolbe and others added some commits Apr 2, 2019

@winkyao
Copy link
Member

winkyao left a comment

LGTM, @zz-jason PTAL

@winkyao

This comment has been minimized.

Copy link
Member

winkyao commented Apr 3, 2019

/run-all-tests tidb-test=pr/764

@morgo morgo added the status/LGT1 label Apr 3, 2019

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Apr 8, 2019

Codecov Report

Merging #9855 into master will decrease coverage by 0.5034%.
The diff coverage is 100%.

@@               Coverage Diff                @@
##             master      #9855        +/-   ##
================================================
- Coverage   78.0225%   77.5191%   -0.5035%     
================================================
  Files           404        403         -1     
  Lines         82016      81932        -84     
================================================
- Hits          63991      63513       -478     
- Misses        13324      13711       +387     
- Partials       4701       4708         +7
@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Apr 8, 2019

/run-all-tests tidb-test=pr/764

@xiekeyi98

This comment has been minimized.

Copy link
Contributor

xiekeyi98 commented Apr 8, 2019

/run-common-test tidb-test=pr/764

@xiekeyi98
Copy link
Contributor

xiekeyi98 left a comment

LGTM

@winkyao winkyao merged commit f514a32 into pingcap:master Apr 8, 2019

14 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
idc-jenkins-ci-tidb/build Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/check_dev Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/code_coverage Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/common-test job succeeded
Details
idc-jenkins-ci-tidb/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/mybatis-test job succeeded
Details
idc-jenkins-ci-tidb/sqllogic-test-1 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/sqllogic-test-2 Jenkins job succeeded.
Details
idc-jenkins-ci-tidb/unit-test Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details

xiekeyi98 added a commit to xiekeyi98/tidb that referenced this pull request Apr 10, 2019

config: Stop server startup if an unrecognized option is found in a c…
…onfig file (pingcap#9855)

 Conflicts:
	config/config_test.go

ngaut added a commit that referenced this pull request Apr 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.