importinto: require S3-like auth for nextgen import (#68231)#68233
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
@D3Hunter This PR has conflicts, I have hold it. |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
📝 WalkthroughWalkthroughThis PR enforces explicit authentication for S3-like object storage in NextGen IMPORT INTO operations. Query parameter normalization is extracted as a reusable helper, S3 authentication constants are defined, and validation logic is added to reject implicit node-role fallback while accepting only explicit AK/SK or role ARN credentials. Comprehensive test coverage spans executor, planner, and integration tests with NextGen-specific branching. ChangesS3-like Storage Authentication Enforcement
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.1)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/planner/core/planbuilder_test.go (1)
1113-1165:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winResolve this cherry-pick conflict before merge.
The raw conflict markers make the file fail to parse, and the
HEADside still asserts that bares3://bucketis allowed. That would lock in the pre-fix behavior instead of the new “explicit auth required for S3-like imports” contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/planner/core/planbuilder_test.go` around lines 1113 - 1165, This file has unresolved git conflict markers around the test cases: remove the conflict markers and the old HEAD block and keep the updated test matrix that exercises checkNextGenS3PathWithSem (the three groups: explicit external-id variants that should return plannererrors.ErrNotSupportedWithSem and message "IMPORT INTO with explicit external ID"; valid credentialed variants that should succeed; and S3-like URIs without valid auth that should return plannererrors.ErrNotSupportedWithSem and message "IMPORT INTO from S3-like storage without access key/secret access key or role ARN"). Ensure only the new for-range loops remain and that checkNextGenS3PathWithSem and plannererrors.ErrNotSupportedWithSem are referenced exactly as in the diff.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/executor/import_into_test.go`:
- Around line 165-184: Resolve the merge conflict by removing the conflict
markers and restoring both SEM test cases as separate t.Run blocks: keep the
"SEM enabled, require explicit auth for S3 like store" test that iterates over
semTestPatternFns and asserts tk.MustMatchErrMsg rejects credential-less
imports, and keep the "SEM enabled, set external ID to keyspace name" test that
verifies external-id injection when explicit auth is present; ensure both tests
call semTestPatternFns setup/cleanup where required and preserve the inner loops
over schemas and the error/message assertions (references: semTestPatternFns, tk
:= testkit.NewTestKit, tk.MustMatchErrMsg, and the two t.Run names) and apply
the same fix for the other conflicting block mentioned.
In `@pkg/objstore/s3like/store.go`:
- Around line 210-218: The function useVirtualHostStyleForAWSS3 currently
detects an explicit ForcePathStyle by searching rawURL for lowercase substrings,
which misses mixed-case query keys; update it to parse rawURL's query (via
url.Parse / ParseQuery), normalize all query keys to lowercase, and treat
presence of "force-path-style" or "force_path_style" (case-insensitively) as
explicitly set (so return false) before falling back to the existing
provider/endpoint/RoleARN logic; reference the existing function name
useVirtualHostStyleForAWSS3 and variables opts, rawURL, domainAWS, and RoleARN
when locating where to implement the parsing and normalization.
In `@pkg/planner/core/planbuilder.go`:
- Around line 4723-4753: The file contains unresolved merge markers and mixed
usage of storage.* and objstore.* which breaks compilation; remove the conflict
markers and standardize on objstore predicates: replace storage.IsLocal(u) with
objstore.IsLocal(u) and storage.IsS3(u) with objstore.IsS3Like(u), keep the
semv1.IsEnabled() check that returns plannererrors.ErrNotSupportedWithSem when
importFromServer is true, and ensure checkNextGenS3PathWithSem(u) is invoked
exactly once in the SEM path (when kerneltype.IsNextGen() && sem.IsEnabled() &&
objstore.IsS3Like(u)); also remove leftover storage imports and tidy imports to
match objstore usage.
In `@pkg/util/sem/compat/BUILD.bazel`:
- Around line 30-35: The BUILD.bazel deps list contains unresolved git conflict
markers (<<<<<<<, =======, >>>>>>>) that break parsing; open the compat_test
rule's deps block and remove the conflict markers, choosing the intended
dependency set (either keep "//br/pkg/storage" or replace it with
"//pkg/config/kerneltype" and "//pkg/objstore/s3like" per the cherry-pick) so
the deps list is a valid Bazel list, then save the file without any conflict
markers.
In `@pkg/util/sem/compat/sem_integration_test.go`:
- Around line 21-26: The file pkg/util/sem/compat/sem_integration_test.go still
contains unresolved git conflict markers and mixed imports; remove the conflict
markers and adopt the NextGen branch changes: keep imports for
github.com/pingcap/tidb/pkg/config/kerneltype and
github.com/pingcap/tidb/pkg/objstore/s3like (drop the old
github.com/pingcap/tidb/br/pkg/storage import), then update the failpoint
assertion that uses the external-ID constant to use the NextGen-compatible
identifier (check usages around kerneltype.IsNextGen() and the failpoint
assertion lines previously referencing external-ID) so the code typechecks and
the conditional path for kerneltype.IsNextGen() uses the correct constant/name
from the new imports.
---
Outside diff comments:
In `@pkg/planner/core/planbuilder_test.go`:
- Around line 1113-1165: This file has unresolved git conflict markers around
the test cases: remove the conflict markers and the old HEAD block and keep the
updated test matrix that exercises checkNextGenS3PathWithSem (the three groups:
explicit external-id variants that should return
plannererrors.ErrNotSupportedWithSem and message "IMPORT INTO with explicit
external ID"; valid credentialed variants that should succeed; and S3-like URIs
without valid auth that should return plannererrors.ErrNotSupportedWithSem and
message "IMPORT INTO from S3-like storage without access key/secret access key
or role ARN"). Ensure only the new for-range loops remain and that
checkNextGenS3PathWithSem and plannererrors.ErrNotSupportedWithSem are
referenced exactly as in the diff.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 62b5ad25-cbb2-4f4f-9fdb-bd3f4e87eea2
📒 Files selected for processing (7)
br/pkg/storage/parse.gopkg/executor/import_into_test.gopkg/objstore/s3like/store.gopkg/planner/core/planbuilder.gopkg/planner/core/planbuilder_test.gopkg/util/sem/compat/BUILD.bazelpkg/util/sem/compat/sem_integration_test.go
| <<<<<<< HEAD | ||
| t.Run("SEM enabled, set S3 external ID to keyspace name", func(t *testing.T) { | ||
| ======= | ||
| t.Run("SEM enabled, require explicit auth for S3 like store", func(t *testing.T) { | ||
| for i, fns := range semTestPatternFns { | ||
| t.Run(fmt.Sprint(i), func(t *testing.T) { | ||
| tk := testkit.NewTestKit(t, store) | ||
| fns[0](t, tk) | ||
| t.Cleanup(func() { | ||
| fns[1](t, tk) | ||
| }) | ||
| for _, schema := range []string{"s3", "oss"} { | ||
| tk.MustMatchErrMsg(fmt.Sprintf("IMPORT INTO test.t FROM '%s://bucket'", schema), `(?i).*Feature 'IMPORT INTO .*without access key/secret access key or role ARN' is not supported when security enhanced mode is enabled`) | ||
| } | ||
| }) | ||
| } | ||
| }) | ||
|
|
||
| t.Run("SEM enabled, set external ID to keyspace name", func(t *testing.T) { | ||
| >>>>>>> 84548dbcc17 (importinto: require S3-like auth for nextgen import (#68231)) |
There was a problem hiding this comment.
Keep both SEM scenarios when you resolve this conflict.
The conflict markers block compilation, and the two sides are not interchangeable: one adds coverage for rejecting credential-less S3/OSS imports, while the other verifies external-id injection when explicit auth is present. The resolved test should preserve both behaviors.
Also applies to: 208-216
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/executor/import_into_test.go` around lines 165 - 184, Resolve the merge
conflict by removing the conflict markers and restoring both SEM test cases as
separate t.Run blocks: keep the "SEM enabled, require explicit auth for S3 like
store" test that iterates over semTestPatternFns and asserts tk.MustMatchErrMsg
rejects credential-less imports, and keep the "SEM enabled, set external ID to
keyspace name" test that verifies external-id injection when explicit auth is
present; ensure both tests call semTestPatternFns setup/cleanup where required
and preserve the inner loops over schemas and the error/message assertions
(references: semTestPatternFns, tk := testkit.NewTestKit, tk.MustMatchErrMsg,
and the two t.Run names) and apply the same fix for the other conflicting block
mentioned.
| func useVirtualHostStyleForAWSS3(opts *S3BackendOptions, rawURL string) bool { | ||
| // If user has explicitly specified ForcePathStyle, use the specified value | ||
| if rawURL == "" || | ||
| strings.Contains(rawURL, "force-path-style") || | ||
| strings.Contains(rawURL, "force_path_style") { | ||
| return false | ||
| } | ||
|
|
||
| return opts.Provider == "aws" || strings.Contains(opts.Endpoint, domainAWS) || opts.RoleARN != "" |
There was a problem hiding this comment.
Honor explicit force-path-style after normalization.
This guard only looks for lowercase force-path-style / force_path_style substrings in rawURL. With the new normalized query-key handling, a user can still provide a mixed-case variant that sets ForcePathStyle, and this branch will incorrectly flip it back to false. Please parse the query here and normalize keys before treating the option as “not explicitly set.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/objstore/s3like/store.go` around lines 210 - 218, The function
useVirtualHostStyleForAWSS3 currently detects an explicit ForcePathStyle by
searching rawURL for lowercase substrings, which misses mixed-case query keys;
update it to parse rawURL's query (via url.Parse / ParseQuery), normalize all
query keys to lowercase, and treat presence of "force-path-style" or
"force_path_style" (case-insensitively) as explicitly set (so return false)
before falling back to the existing provider/endpoint/RoleARN logic; reference
the existing function name useVirtualHostStyleForAWSS3 and variables opts,
rawURL, domainAWS, and RoleARN when locating where to implement the parsing and
normalization.
| <<<<<<< HEAD | ||
| importFromServer = storage.IsLocal(u) | ||
| // for SEM v2, they are checked by configured rules. | ||
| ======= | ||
| importFromServer = objstore.IsLocal(u) | ||
| >>>>>>> 84548dbcc17 (importinto: require S3-like auth for nextgen import (#68231)) | ||
| if semv1.IsEnabled() { | ||
| if importFromServer { | ||
| return nil, plannererrors.ErrNotSupportedWithSem.GenWithStackByArgs("IMPORT INTO from server disk") | ||
| } | ||
| <<<<<<< HEAD | ||
| if kerneltype.IsNextGen() && storage.IsS3(u) { | ||
| if err := checkNextGenS3PathWithSem(u); err != nil { | ||
| return nil, err | ||
| } | ||
| } | ||
| ======= | ||
| >>>>>>> 84548dbcc17 (importinto: require S3-like auth for nextgen import (#68231)) | ||
| } | ||
| // a nextgen cluster might be shared by multiple tenants, and they might | ||
| // share the same AWS role to access import-into source data bucket, this | ||
| // external ID can be used to restrict the access only to the current tenant. | ||
| // when SEM enabled, we need set it. | ||
| <<<<<<< HEAD | ||
| if kerneltype.IsNextGen() && sem.IsEnabled() && storage.IsS3(u) { | ||
| ======= | ||
| if kerneltype.IsNextGen() && sem.IsEnabled() && objstore.IsS3Like(u) { | ||
| if err := checkNextGenS3PathWithSem(u); err != nil { | ||
| return nil, err | ||
| } | ||
| >>>>>>> 84548dbcc17 (importinto: require S3-like auth for nextgen import (#68231)) |
There was a problem hiding this comment.
Finish resolving the cherry-pick conflict in the IMPORT INTO S3 auth path.
Both changed hunks still contain merge markers, so this file will not compile. It also leaves the old storage.IsS3 branch mixed with the new normalized S3-like credential validation, which makes the intended SEM behavior impossible to reason about until one side is selected and the imports are cleaned up.
Also applies to: 6406-6421
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/planner/core/planbuilder.go` around lines 4723 - 4753, The file contains
unresolved merge markers and mixed usage of storage.* and objstore.* which
breaks compilation; remove the conflict markers and standardize on objstore
predicates: replace storage.IsLocal(u) with objstore.IsLocal(u) and
storage.IsS3(u) with objstore.IsS3Like(u), keep the semv1.IsEnabled() check that
returns plannererrors.ErrNotSupportedWithSem when importFromServer is true, and
ensure checkNextGenS3PathWithSem(u) is invoked exactly once in the SEM path
(when kerneltype.IsNextGen() && sem.IsEnabled() && objstore.IsS3Like(u)); also
remove leftover storage imports and tidy imports to match objstore usage.
| <<<<<<< HEAD | ||
| "github.com/pingcap/tidb/br/pkg/storage" | ||
| ======= | ||
| "github.com/pingcap/tidb/pkg/config/kerneltype" | ||
| "github.com/pingcap/tidb/pkg/objstore/s3like" | ||
| >>>>>>> 84548dbcc17 (importinto: require S3-like auth for nextgen import (#68231)) |
There was a problem hiding this comment.
Resolve the import conflict consistently with the new NextGen branch.
This file still has conflict markers, so it already fails typecheck. Also, the new kerneltype.IsNextGen() path means you can't keep the old import side as-is, and if you switch to the new imports you also need to reconcile the downstream external-ID constant usage in the failpoint assertion.
Also applies to: 59-63
🧰 Tools
🪛 golangci-lint (2.12.1)
[error] 21-21: : # github.com/pingcap/tidb/pkg/util/sem/compat_test [github.com/pingcap/tidb/pkg/util/sem/compat.test]
pkg/util/sem/compat/sem_integration_test.go:21:1: syntax error: missing import path
pkg/util/sem/compat/sem_integration_test.go:23:1: syntax error: missing import path
pkg/util/sem/compat/sem_integration_test.go:26:1: syntax error: missing import path
pkg/util/sem/compat/sem_integration_test.go:26:75: invalid character U+0023 '#'
(typecheck)
[error] 21-21: missing import path
(typecheck)
[error] 23-23: missing import path
(typecheck)
[error] 26-26: missing import path
(typecheck)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/util/sem/compat/sem_integration_test.go` around lines 21 - 26, The file
pkg/util/sem/compat/sem_integration_test.go still contains unresolved git
conflict markers and mixed imports; remove the conflict markers and adopt the
NextGen branch changes: keep imports for
github.com/pingcap/tidb/pkg/config/kerneltype and
github.com/pingcap/tidb/pkg/objstore/s3like (drop the old
github.com/pingcap/tidb/br/pkg/storage import), then update the failpoint
assertion that uses the external-ID constant to use the NextGen-compatible
identifier (check usages around kerneltype.IsNextGen() and the failpoint
assertion lines previously referencing external-ID) so the code typechecks and
the conditional path for kerneltype.IsNextGen() uses the correct constant/name
from the new imports.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/planner/core/planbuilder.go (1)
4734-4741:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard all S3-like schemes here, not just
s3://.
checkNextGenS3PathWithSemenforces the new explicit-auth contract, but this branch only runs whenstorage.IsS3(u)is true. That leaves other S3-like URIs outside the SEM path, so they can still skip both the explicit-auth check and keyspaceexternal-idinjection. Please switch this guard to the shared S3-like predicate so OSS/S3-compatible backends get the same restriction.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/planner/core/planbuilder.go` around lines 4734 - 4741, The branch currently gated by storage.IsS3(u) should instead use the shared S3-like predicate so all S3-compatible URIs follow the SEM flow: update the conditional that includes kerneltype.IsNextGen() && sem.IsEnabled() && storage.IsS3(u) to use the S3-like check (the shared predicate used elsewhere) so checkNextGenS3PathWithSem(u) runs for all S3-like schemes and the subsequent values.Set(storage.S3ExternalID, config.GetGlobalKeyspaceName()) and ld.Path = u.String() logic also executes for those URIs; keep the calls to checkNextGenS3PathWithSem, values.Set, and ld.Path assignment unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/planner/core/planbuilder.go`:
- Around line 4734-4741: The branch currently gated by storage.IsS3(u) should
instead use the shared S3-like predicate so all S3-compatible URIs follow the
SEM flow: update the conditional that includes kerneltype.IsNextGen() &&
sem.IsEnabled() && storage.IsS3(u) to use the S3-like check (the shared
predicate used elsewhere) so checkNextGenS3PathWithSem(u) runs for all S3-like
schemes and the subsequent values.Set(storage.S3ExternalID,
config.GetGlobalKeyspaceName()) and ld.Path = u.String() logic also executes for
those URIs; keep the calls to checkNextGenS3PathWithSem, values.Set, and ld.Path
assignment unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8a80f91f-2dc2-4fa5-b865-78708ba14fb3
📒 Files selected for processing (7)
br/pkg/storage/s3.gopkg/executor/import_into_test.gopkg/planner/core/issuetest/BUILD.bazelpkg/planner/core/planbuilder.gopkg/planner/core/planbuilder_test.gopkg/util/sem/compat/BUILD.bazelpkg/util/sem/compat/sem_integration_test.go
💤 Files with no reviewable changes (1)
- pkg/planner/core/issuetest/BUILD.bazel
✅ Files skipped from review due to trivial changes (1)
- pkg/util/sem/compat/BUILD.bazel
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/planner/core/planbuilder_test.go
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release-nextgen-20251011 #68233 +/- ##
=============================================================
Coverage ? 71.8982%
=============================================================
Files ? 1835
Lines ? 493674
Branches ? 0
=============================================================
Hits ? 354943
Misses ? 115383
Partials ? 23348
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, hawkingrei The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/unhold |
eb0ce80
into
pingcap:release-nextgen-20251011
This is an automated cherry-pick of #68231
What problem does this PR solve?
Issue Number: close #68226
Problem Summary:
In NextGen security enhanced mode,
IMPORT INTOaccepted S3-like storage URIs without explicit user-provided credentials. That allowed the object-store client to fall back to TiDB node-role credentials, which weakens the expected boundary for user-specified import sources.What changed and how does it work?
This PR requires explicit authentication for S3-like
IMPORT INTOsources when NextGen and SEM are enabled.Check List
Tests
Unit tests:
./tools/check/failpoint-go-test.sh pkg/planner/core -tags=intest,deadlock,nextgen -run TestProcessNextGenS3Path -count=1./tools/check/failpoint-go-test.sh pkg/executor -tags=intest,deadlock,nextgen -run TestNextGenS3ExternalID -count=1make lintSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit