Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server, sessionctx: support token-based authentication #36152

Merged
merged 10 commits into from
Jul 14, 2022

Conversation

djshow832
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #35913

Problem Summary:
Support authentication using a session token. The token serves as the password. This is used for session manager, which migrates sessions.

What is changed and how it works?

  • Add a new auth-plugin tidb_session_token. If the plugin is tidb_session_token, then it will bypass the normal authentication process and only validate the session token.
  • Output the session token in show session_states statement. It requires a TLS connection because the token is confidential.
  • Add 2 global variables tidb_auth_signing_cert and tidb_auth_signing_key, which are used to configure the path of the signing certificate.
  • Start a goroutine to reload the certificate periodically.

Other considerations:

  • For security considerations, there can be more restrictions to showing session tokens and authenticating through tokens.
    • Verify the common name (CN) of the client certificate to ensure that only the session manager can query and use tokens. It's used by other cluster components through gRPC or HTTP interfaces, but not through the MySQL protocol interface. Besides, cluster-verify-cn is not documented, so I assume it's not commonly used.
    • Require a TLS connection to use tokens to authenticate. This avoids the leak of tokens. Nevertheless, when the server reads the plugin tidb_session_token, the token is sent already. So it may require another switch-plugin request to pass the token, which is more complicated. For now, we can only turn on the require-secure-transport.
  • If a network partition happens among TiDB instances for minutes, session migration may fail because the path of certificates is not updated. Similarly, if there's a time bias of more than 10 minutes, the validation of tokens will fail.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@djshow832 djshow832 requested a review from a team as a code owner July 12, 2022 14:01
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jul 12, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • bb7133
  • xhebox

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 12, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Jul 12, 2022

executor/show.go Outdated Show resolved Hide resolved
executor/show.go Outdated
// The token may be leaked without TLS, so we enforce a TLS connection.
if e.ctx.GetSessionVars().TLSConnectionState == nil {
//return ErrCannotMigrateSession.GenWithStackByArgs("the token must be queried in a TLS connection")
return errors.New("the token must be queried in a TLS connection")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use ErrSecureTransportRequired for this error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only if we require secure transport, not if we require TLS.

secure transport is either TLS or a UNIX socket.

I think allowing a UNIX socket or TLS is the right option here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message of ErrSecureTransportRequired is Connections using insecure transport are prohibited while --require_secure_transport=ON., but the --require_secure_transport is OFF when it comes here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

Please make the first letter upper-cased: "The token must be queried in a TLS connection"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only if we require secure transport, not if we require TLS.

secure transport is either TLS or a UNIX socket.

I think allowing a UNIX socket or TLS is the right option here.

To judge whether it's a Unix socket, I made use of SessionVars.ConnectionInfo, which is only used for audit logs before. Thus, every connection has a ConnectionInfo now, which occupies a little more memory. The commit: 40735d3

return nil
}},
{Scope: ScopeGlobal, Name: TiDBAuthSigningKey, Value: "", Type: TypeStr, SetGlobal: func(s *SessionVars, val string) error {
sessionstates.SetKeyPath(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one concern: every tidb may not be deployed identically and may have different filesystems, i.e. different paths for the cert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for now, we could accept the identical cert path for all instances. TiDB instances with different paths like this would be hard to maintain and those paths are generally configured as command-line arguments.

BTW, this concern reminds me to confirm that: if we have multiple TiDB servers deployed on the same node, are they supposed to use the same certs and keys? @djshow832

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MySQL paths are often relative to the data directory, which allows deployments on different paths.
Maybe there should be a CertDir for this as well as other SSL/TLS related files?

It is also possible to store this in either a table in TiDB or on PD/TiKV and use that to distribute this. The benefit for this is that scaling out is easy. The drawback is that doing this in a secure way is more difficult.

Could/should we use something like AWS KMS for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked about this with the PM. He suggests making it a global variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for now, we could accept the identical cert path for all instances. TiDB instances with different paths like this would be hard to maintain and those paths are generally configured as command-line arguments.

BTW, this concern reminds me to confirm that: if we have multiple TiDB servers deployed on the same node, are they supposed to use the same certs and keys? @djshow832

They could use the same certs, as long as they can read them concurrently. The rationale is that the certs are supposed to be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MySQL paths are often relative to the data directory, which allows deployments on different paths. Maybe there should be a CertDir for this as well as other SSL/TLS related files?

I looked up the doc of SSL-related system variables in MySQL: https://dev.mysql.com/doc/refman/8.0/en/using-encrypted-connections.html#using-encrypted-connections-server-side-runtime-configuration. I saw some examples from somewhere else and found that ssl_ca, ssl_cert, and ssl_key are not relative paths. Is that so?
A CertDir is a good idea, although it adds one more variable. The most important thing of updating the configuration of other TLS related files is backward compatibility. I think it's feasible.
I surveyed another DBMS as a reference: it has a CertsDir configuration. The file names under this directory should be organized as client.<username>.crt, client-tenant.<tenant-id>.crt, etc.

Could/should we use something like AWS KMS for this?

I'm not familiar with KMS. I'll take a look. I'll appreciate much if you can summarize how to use it.

Copy link
Contributor

@dveeden dveeden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description said something about validating the CN. If/how is the CN validated? is this going to be the hostname of the client? if so, then probably using the SubjectAlternativeName is what should be done instead.

I haven't actually run the code yet and I don't have a full understanding about how this is going to work yet. However I also don't see any major issues with this, at least not yet, which is good.

domain/domain.go Outdated Show resolved Hide resolved
executor/show.go Outdated
// The token may be leaked without TLS, so we enforce a TLS connection.
if e.ctx.GetSessionVars().TLSConnectionState == nil {
//return ErrCannotMigrateSession.GenWithStackByArgs("the token must be queried in a TLS connection")
return errors.New("the token must be queried in a TLS connection")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only if we require secure transport, not if we require TLS.

secure transport is either TLS or a UNIX socket.

I think allowing a UNIX socket or TLS is the right option here.

@@ -167,6 +167,7 @@ const (
AuthNativePassword = "mysql_native_password" // #nosec G101
AuthCachingSha2Password = "caching_sha2_password" // #nosec G101
AuthSocket = "auth_socket"
AuthTiDBSessionToken = "tidb_session_token"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the tidb_ prefix here. That helps to avoids name collisions from future MySQL authentication methods.

return nil
}},
{Scope: ScopeGlobal, Name: TiDBAuthSigningKey, Value: "", Type: TypeStr, SetGlobal: func(s *SessionVars, val string) error {
sessionstates.SetKeyPath(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MySQL paths are often relative to the data directory, which allows deployments on different paths.
Maybe there should be a CertDir for this as well as other SSL/TLS related files?

It is also possible to store this in either a table in TiDB or on PD/TiKV and use that to distribute this. The benefit for this is that scaling out is easy. The drawback is that doing this in a secure way is more difficult.

Could/should we use something like AWS KMS for this?

@dveeden
Copy link
Contributor

dveeden commented Jul 13, 2022

And for show session_states: could this end up on a dashboard or information_schema table with statement history etc? What about logging?

@xhebox
Copy link
Contributor

xhebox commented Jul 13, 2022

And for show session_states: could this end up on a dashboard or information_schema table with statement history etc? What about logging?

As far as I know, it is safely skipped, for now.

@djshow832
Copy link
Contributor Author

And for show session_states: could this end up on a dashboard or information_schema table with statement history etc? What about logging?

As far as I know, it is safely skipped, for now.

I think it's fine to add it to the statement history and it will, because the result of the query is not recorded. The audit log will also log it.

@djshow832
Copy link
Contributor Author

djshow832 commented Jul 13, 2022

The description said something about validating the CN. If/how is the CN validated? is this going to be the hostname of the client? if so, then probably using the SubjectAlternativeName is what should be done instead.

You can check this: https://docs.pingcap.com/tidb/v5.0/enable-tls-between-components#verify-component-callers-identity. I can configure the common name of the proxy as Ti....

I haven't actually run the code yet and I don't have a full understanding about how this is going to work yet. However I also don't see any major issues with this, at least not yet, which is good.

I'm just notifying you that I'm adding a new auth-plugin, since this module is what you're concerned about. If you like, you can give me some advice, that will be great. @dveeden

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 13, 2022
Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 14, 2022
@bb7133
Copy link
Member

bb7133 commented Jul 14, 2022

PTAL(again) @dveeden @xhebox

@xhebox
Copy link
Contributor

xhebox commented Jul 14, 2022

PTAL(again) @dveeden @xhebox

No objections on new changes.

@djshow832
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 40735d3

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 14, 2022
@djshow832
Copy link
Contributor Author

/run-unit-test

@djshow832
Copy link
Contributor Author

/run-all-tests

@djshow832
Copy link
Contributor Author

/run-check_dev_2

@djshow832
Copy link
Contributor Author

/run-mysql-test

1 similar comment
@djshow832
Copy link
Contributor Author

/run-mysql-test

@ti-chi-bot ti-chi-bot merged commit 9a2ed52 into pingcap:master Jul 14, 2022
@djshow832 djshow832 deleted the token_auth2 branch July 14, 2022 10:29
@sre-bot
Copy link
Contributor

sre-bot commented Jul 14, 2022

TiDB MergeCI notify

🔴 Bad News! New failing [1] after this pr merged.
These new failed integration tests seem to be caused by the current PR, please try to fix these new failed integration tests, thanks!

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci-tidb/integration-common-test 🟥 failed 1, success 10, total 11 49 min New failing
idc-jenkins-ci/integration-cdc-test 🟢 all 36 tests passed 30 min Existing passed
idc-jenkins-ci-tidb/common-test 🟢 all 12 tests passed 15 min Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 14 min Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 14 min Existing passed
idc-jenkins-ci-tidb/integration-ddl-test 🟢 all 6 tests passed 12 min Existing passed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 8 min 19 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 4 min 6 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 44 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support token-based authentication
7 participants