br: upgrade AWS SDK from v1 to v2 `#66634` | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5 by YuJuncen · Pull Request #68216 · pingcap/tidb

YuJuncen · 2026-05-08T02:36:13Z

successor of #62488

Issue Number: close #62541, ref #61547

Problem Summary:

Migration guide
https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/migrate-gosdk.html

What changed and how does it work?

This also picks:

Confliction Resolving(By GPT):

DEPS.bzl: kept the target commit side for AWS SDK v2 Bazel deps.
- Reason: required to build new v2 imports.
br/pkg/storage/BUILD.bazel: kept target commit side for v2 deps.
- Reason: aligns BUILD deps with migrated S3 implementation.
go.mod / go.sum: kept target commit side in conflict hunks.
- Reason: introduce AWS SDK v2 modules and checksums used by migrated code.
pkg/lightning/log/log.go: kept target commit side in conflict hunk.
- Reason: use smithy/AWS v2 cancellation error handling.
br/cmd/br/operator.go: kept newTestStorageCommand() from target commit, dropped newPitrChecksumCommand() and newUpstreamChecksumCommand().
- Reason: those two symbols are not available in release-8.5 branch, would break compilation.
br/pkg/storage/s3.go: mostly adopted target commit side; removed recording/AccessRecording integration points.
- Reason: release-8.5 branch lacks br/pkg/storage/recording prerequisite commit; preserving those references would not compile.
br/pkg/storage/s3_test.go: changed conflict-hunk tests back to createS3Suite(...) form (without recording collector).
- Reason: same prerequisite gap as above.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
A new subcommand br operator test-storage was added.
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

New Features
- Added a hidden operator command "test-storage" to validate and report on external storage backends.
Improvements
- Upgraded AWS SDK integration to v2 across storage, KMS, EC2, and related features.
- Improved S3-compatible provider detection and checksum handling; configurable retry/error handling.
- Safer storage URI validation with adjusted HTTP client behavior.
Bug Fixes
- Treat certain HTTP 404 storage responses as "object not found" to avoid false errors.
Tests
- Updated storage tests, mocks, and compatibility helpers to align with SDK v2 and broaden coverage.

ref pingcap#61547, close pingcap#62541

pantheon-ai · 2026-05-08T02:36:19Z

@YuJuncen I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai · 2026-05-08T02:36:26Z

📝 Walkthrough

Walkthrough

Migrates the codebase from AWS SDK Go v1 to v2: updates dependency pins and Bazel/go.mod, rewrites S3/KMS/EBS/storage logic to v2 APIs, adds S3API abstraction and v2 mocks, updates tests and scripts, and introduces a storage test harness CLI subcommand.

Changes

AWS SDK v1 to v2 Migration

Layer / File(s)	Summary
Dependency Updates `DEPS.bzl`, `go.mod`	Upgrade makezero v1.1.1→v1.2.0; add aws-sdk-go-v2 modules (config, s3, kms, ec2, sts, cloudwatch, etc.); keep aws-sdk-go v1 as indirect pin.
Build Configuration `br/pkg/aws/BUILD.bazel`, `br/pkg/kms/BUILD.bazel`, `br/pkg/mock/BUILD.bazel`, `br/pkg/storage/BUILD.bazel`, `br/pkg/task/operator/BUILD.bazel`, `pkg/lightning/backend/external/BUILD.bazel`, `pkg/lightning/log/BUILD.bazel`, `Makefile`	Update Bazel targets to reference v2 SDK repos; replace awserr/request deps with smithy-go; remove mockgen s3iface phony target; add nogo exclusion.
S3API Interface `br/pkg/storage/s3_interface.go`	New exported interface defining v2-based S3 contract (HeadBucket, ListObjects, GetObject, PutObject, DeleteObject(s), multipart ops, GetObjectLockConfiguration).
Core S3 Storage `br/pkg/storage/s3.go`	Migrate from s3iface.S3API to internal S3API; add s3Compatible flag; rewrite NewS3Storage using config.LoadDefaultConfig, region detection, retryer/credential/endpoint wiring; migrate S3 calls to v2 shapes; add withContentMD5 middleware and tidbRetryer implementing aws.Retryer.
EBS Orchestration `br/pkg/aws/ebs.go`	EC2Session holds v2 ec2/cloudwatch clients; snapshot/volume APIs migrated to v2; public signatures changed from pointer slices to string slices; error detection switched to smithy.APIError.
KMS Decrypt `br/pkg/kms/aws.go`	AwsKms uses v2 kms.Client via config.LoadDefaultConfig; Decrypt uses client.Decrypt; classifyDecryptError uses smithy.APIError and v2 typed exceptions.
Logging `pkg/lightning/log/log.go`	IsContextCanceledError updated to detect smithy.CanceledError and smithy.OperationError wrappers; removed awserr-specific recursion.
Storage Helpers & Azure Blob `br/pkg/storage/helper.go`, `br/pkg/storage/azblob.go`, `br/pkg/storage/ks3.go`, `br/pkg/storage/storage.go`	ValidateCloudStorageURI now passes a custom http.Client with DisableKeepAlives; azblob copy polling uses aws.ToString; KS3.FileExists treats HTTP 404 as not found; ExternalStorageOptions.S3Retryer now uses v2 aws.Retryer.
Lightning External Backend `pkg/lightning/backend/external/byte_reader.go`, `pkg/lightning/backend/external/byte_reader_test.go`	Switch imports to aws-sdk-go-v2; test S3 client init rewritten to use config.LoadDefaultConfig with BaseEndpoint and UsePathStyle.
S3 Mocks `br/pkg/mock/s3_v2.go`, `br/pkg/mock/BUILD.bazel`	Replace v1 s3iface mock with GoMock-generated MockS3API using v2 s3 types; BUILD updated to include s3_v2.go.
Tests `br/pkg/aws/ebs_test.go`, `br/pkg/storage/s3_test.go`, `pkg/lightning/log/log_test.go`, `pkg/lightning/backend/external/byte_reader_test.go`, `br/pkg/task/restore_ebs_meta.go`, `pkg/sessionctx/variable/sysvar_test.go`	Update imports and mocks to v2 method signatures; construct smithy.GenericAPIError for error cases; add IMDS disable in sysvar test; update snapshot ID map types to []string.
Compatibility Utilities & Storage Test Harness `br/pkg/storage/compatibility_for_test.go`, `br/pkg/task/operator/test_storage.go`, `br/cmd/br/operator.go`	Add BucketPrefix, expose WithContentMD5, add MultipartWriter helper; add RunTestStorage harness (TestReport/TestResult/TestContext) and wire operator `test-storage` subcommand.
Linter Configuration `build/nogo_config.json`	Exclude br/pkg/storage from toomanytests linter check due to large test count.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as operator/test-storage
  participant Harness as RunTestStorage
  participant Storage as ExternalStorage (S3/other)
  CLI->>Harness: parse flags, RunTestStorage(ctx,cfg)
  Harness->>Storage: create client (config.LoadDefaultConfig / credentials / endpoint)
  Harness->>Storage: write/read/list/delete operations
  Storage-->>Harness: responses/errors
  Harness-->>CLI: TestReport (results)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

pingcap/tidb#66634: The main PR and retrieved PR both perform the same large, code-level migration from AWS SDK for Go v1 to v2 (matching changes in DEPS.bzl, go.mod, numerous br/pkg/storage and br/pkg/aws files, new S3API interface and gomock s3_v2, etc.), so they are directly related.
pingcap/tidb#67645: Both PRs modify the br_gcs test scripts (br/tests/br_gcs/run.sh, docker_compatible_gcs/_run.sh, and prepare.sh) to add missing service-account fields and update GCS endpoints — directly related.
pingcap/tidb#68233: Related changes to br/pkg/storage/s3.go and S3 handling; likely overlaps.

Suggested labels

ok-to-test

Suggested reviewers

Leavrth
wjhuang2016
BornChanger

"🐰 Hopping through code with a carrot in hand,
I swapped old SDKs for a newer band.
S3, KMS, and EBS now sing in v2,
Mocks and tests follow — tidy and true,
CI hums softly — hop, patch, and stand!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.27% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title specifically mentions the AWS SDK upgrade from v1 to v2, which is the main focus of the changeset, directly summarizing the primary change.
Description check	✅ Passed	The description includes issue references, problem summary, detailed change explanation with migration guide link, conflict resolution documentation, manual test confirmation, and release notes section following the template structure.
Linked Issues check	✅ Passed	The PR fully addresses issue `#62541` by migrating BR's AWS SDK from v1 to v2, updating dependencies, imports, error handling, and type signatures as required for the upgrade.
Out of Scope Changes check	✅ Passed	All changes are directly related to the AWS SDK v1→v2 migration: dependency updates, import changes, error handling migration, type signature updates, test data structure conversions, and build configuration adjustments are all in-scope for this upgrade effort.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tiprow · 2026-05-08T02:36:36Z

Hi @YuJuncen. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-05-08T02:50:37Z

Codecov Report

❌ Patch coverage is 21.34146% with 1032 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5-20260116-v8.5.5@34227a6). Learn more about missing BASE report.

Additional details and impacted files

@@                       Coverage Diff                        @@
##             release-8.5-20260116-v8.5.5     #68216   +/-   ##
================================================================
  Coverage                               ?   55.4695%           
================================================================
  Files                                  ?       1822           
  Lines                                  ?     653906           
  Branches                               ?          0           
================================================================
  Hits                                   ?     362719           
  Misses                                 ?     264227           
  Partials                               ?      26960

Flag	Coverage Δ
integration	`39.1601% <13.0335%> (?)`
unit	`64.7744% <21.3414%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.9278% <0.0000%> (?)`
parser	`∅ <0.0000%> (?)`
br	`62.8468% <0.0000%> (?)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

YuJuncen · 2026-05-08T02:55:09Z

/retest

tiprow · 2026-05-08T02:55:34Z

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

YuJuncen · 2026-05-08T06:13:36Z

/retest

tiprow · 2026-05-08T06:13:59Z

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

wuhuizuo · 2026-05-08T06:27:32Z

/retest

tiprow · 2026-05-08T06:27:58Z

@wuhuizuo: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

wuhuizuo · 2026-05-08T06:29:42Z

/override pull-build-next-gen pull-error-log-review pull-integration-realcluster-test-next-gen pull-mysql-client-test-next-gen pull-unit-test-next-gen

ti-chi-bot · 2026-05-08T06:29:50Z

@wuhuizuo: Overrode contexts on behalf of wuhuizuo: pull-build-next-gen, pull-error-log-review, pull-integration-realcluster-test-next-gen, pull-mysql-client-test-next-gen, pull-unit-test-next-gen

Details

In response to this:

/override pull-build-next-gen pull-error-log-review pull-integration-realcluster-test-next-gen pull-mysql-client-test-next-gen pull-unit-test-next-gen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tiprow · 2026-05-09T07:31:21Z

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

)

wuhuizuo · 2026-05-09T07:31:51Z

/test pull-br-integration-test

tiprow · 2026-05-09T07:32:15Z

@wuhuizuo: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

br/pkg/aws/ebs.go (2)

530-533: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same nil-deref risk as in EnableDataFSR.

Apply the same aws.ToString fix to *resp.Unsuccessful[0].SnapshotId in the disable path so a nil pointer doesn't turn an FSR error into a panic.

🛡️ Proposed fix

-					return errors.Errorf("Some snapshot fails to disable FSR for available zone %s, such as %s, error code is %v", targetAZ, *resp.Unsuccessful[0].SnapshotId, resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)
+					return errors.Errorf("Some snapshot fails to disable FSR for available zone %s, such as %s, error code is %v", targetAZ, aws.ToString(resp.Unsuccessful[0].SnapshotId), resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs.go` around lines 530 - 533, The disable-FSR error return may
nil-deref *resp.Unsuccessful[0].SnapshotId like the EnableDataFSR path; change
the interpolation to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (same
pattern used in EnableDataFSR) so nil snapshot IDs become empty strings instead
of panics, keeping the descriptive message and FastSnapshotRestoreStateErrors
as-is.

359-362: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid raw pointer deref on Unsuccessful[0].SnapshotId.

resp.Unsuccessful[0].SnapshotId is *string and dereferencing it directly will panic if the field is nil — and this is the failure path where the original FSR error gets reported, so a panic here would mask the real underlying error. The rest of the file consistently uses aws.ToString for v2 pointer fields; please do the same here (and on line 532).

🛡️ Proposed nil-safe formatting

-				return errors.Errorf("Some snapshot fails to enable FSR for available zone %s, such as %s, error code is %v", targetAZ, *resp.Unsuccessful[0].SnapshotId, resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)
+				return errors.Errorf("Some snapshot fails to enable FSR for available zone %s, such as %s, error code is %v", targetAZ, aws.ToString(resp.Unsuccessful[0].SnapshotId), resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs.go` around lines 359 - 362, The error path dereferences
resp.Unsuccessful[0].SnapshotId directly which can be nil and cause a panic;
change the formatting to use aws.ToString(resp.Unsuccessful[0].SnapshotId)
(consistent with other uses in this file) when building the error message and
similarly replace any other raw *string derefs (e.g., the one flagged around
line 532) so the error log and the returned errors reference nil-safe string
conversions for SnapshotId while keeping FastSnapshotRestoreStateErrors
unchanged.

br/pkg/storage/s3.go (1)

631-642: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Return non-NoSuchKey failures from getObjectCheck.

If GetObject fails with a transport/context error that doesn't implement smithy.APIError, this falls through to return nil, so permission validation can pass even though the backend is unreachable. Only NoSuchKey should count as success here.

Suggested fix

 	_, err := svc.GetObject(ctx, input)
+	if err == nil {
+		return nil
+	}
 	var aerr smithy.APIError
 	if goerrors.As(err, &aerr) {
 		if aerr.ErrorCode() == noSuchKey {
 			// if key not exists and we reach this error, that
 			// means we have the correct permission to GetObject
 			// other we will get another error
 			return nil
 		}
-		return errors.Trace(err)
 	}
-	return nil
+	return errors.Trace(err)
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/storage/s3.go` around lines 631 - 642, The getObjectCheck currently
swallows non-API errors by returning nil when err != nil but not an APIError;
change the logic to treat any non-nil err as a failure unless it's an APIError
with ErrorCode() == noSuchKey: call svc.GetObject(ctx, input), if err != nil
then if goerrors.As(err, &aerr) and aerr.ErrorCode() == noSuchKey return nil;
otherwise return errors.Trace(err); only return nil when svc.GetObject succeeded
or when the error is specifically noSuchKey. Ensure you reference svc.GetObject,
smithy.APIError (aerr) and the noSuchKey constant in your change.

🧹 Nitpick comments (5)

DEPS.bzl (1)
414-422: 💤 Low value

Out-of-scope dependency bump for makezero (v1.1.1 → v1.2.0).

This update appears unrelated to the AWS SDK v1 → v2 migration that is the stated objective of this backport. It likely came along via the merge but expands the change surface of a release-branch PR. Confirm it is intentional (e.g., transitively required) or revert to keep the backport scoped to AWS SDK changes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DEPS.bzl` around lines 414 - 422, The DEPS.bzl change upgrades the makezero
dependency from v1.1.1 to v1.2.0 (see the makezero urls, strip_prefix and sha256
entries), which is out-of-scope for the AWS SDK v1→v2 backport; either confirm
it is intentionally required (e.g., transitively needed) or revert this block to
the prior v1.1.1 values — restore the original sha256, strip_prefix and urls for
makezero if reverting — and add a commit message explaining the decision so the
backport only includes AWS SDK-related changes unless the bump is explicitly
required.
br/pkg/aws/ebs.go (1)
372-396: 💤 Low value

Optional: simplify getFSRCreditBalance signature now that snapShotIDs is []string.

getFSRCreditBalance still takes snapshotID *string, forcing the caller at line 387 to wrap with aws.String(...) only to have it dereferenced/used as a v2 pointer field internally. Since the migration changed snapShotIDs to []string, taking string and constructing aws.String once at the v2 API boundary inside getFSRCreditBalance reads more cleanly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs.go` around lines 372 - 396, The getFSRCreditBalance signature
should accept a string (snapshotID string) instead of *string so callers like
waitDataFSREnabled can pass snapShotIDs[startIdx] directly; change
getFSRCreditBalance(snapshotID *string, targetAZ string) to
getFSRCreditBalance(snapshotID string, targetAZ string), update all call sites
(e.g., waitDataFSREnabled) to pass the plain string, and move any
aws.String(...) conversion into getFSRCreditBalance at the AWS SDK boundary
(where DescribeMetrics/CloudWatch calls need *string) so pointer creation
happens once inside the helper.
pkg/lightning/log/log.go (1)
192-214: 💤 Low value

LGTM with a small note.

The smithy-based cancellation detection is correct. Note that goerrors.Is(err, context.Canceled) on line 192 already traverses through *smithy.OperationError.Unwrap(), so the explicit opErr.Unwrap() checks at lines 206 and 209 are largely defensive duplicates. The only one with marginal value is the status.Code(opErr.Unwrap()) check (since status.Code doesn't unwrap on its own). Not blocking — feel free to leave the extra checks as belt-and-suspenders.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/lightning/log/log.go` around lines 192 - 214, The code contains redundant
cancellation checks because goerrors.Is(err, context.Canceled) already unwraps
through *smithy.OperationError, so in the block that handles var opErr
*smithy.OperationError you should remove the duplicate
goerrors.Is(opErr.Unwrap(), context.Canceled) and goerrors.As(opErr.Unwrap(),
&canceledError) checks and only keep the status.Code(opErr.Unwrap()) ==
codes.Canceled check (or alternatively drop the entire opErr branch if you
prefer the single goerrors.Is + smithy.CanceledError checks to suffice); update
the block that references opErr, canceledError, goerrors.Is, goerrors.As, and
status.Code accordingly to avoid duplicate logic while preserving the
status.Code check which does not unwrap.
br/pkg/kms/aws.go (1)
86-117: ⚡ Quick win

Consolidate duplicate error classification.

The second switch (lines 107–116) over the v2 typed exceptions is unreachable: *types.NotFoundException, *types.InvalidKeyUsageException, *types.DependencyTimeoutException, and *types.KMSInternalException all implement smithy.APIError (via embedded smithy.GenericAPIError), so the first errors.As(err, &apiErr) + apiErr.ErrorCode() switch already matches them. Consolidate to a single switch with a default: arm returning "KMS error", and remove the types import (line 13), which will become unused.
♻️ Proposed simplification
-// classifyDecryptError uses v2 SDK error types
 func classifyDecryptError(err error) error {
 	var apiErr smithy.APIError
 	if errors.As(err, &apiErr) {
 		switch apiErr.ErrorCode() {
-		case "NotFoundException":
-			return pErrors.Annotate(err, "wrong master key")
-		case "InvalidKeyUsageException":
+		case "NotFoundException", "InvalidKeyUsageException":
 			return pErrors.Annotate(err, "wrong master key")
 		case "DependencyTimeoutException":
 			return pErrors.Annotate(err, "API timeout")
 		case "KMSInternalException":
 			return pErrors.Annotate(err, "API internal error")
+		default:
+			return pErrors.Annotate(err, "KMS error")
 		}
 	}
-
-	// also check for specific v2 error types
-	var notFoundErr *types.NotFoundException
-	var invalidKeyErr *types.InvalidKeyUsageException
-	var timeoutErr *types.DependencyTimeoutException
-	var internalErr *types.KMSInternalException
-
-	switch {
-	case errors.As(err, &notFoundErr), errors.As(err, &invalidKeyErr):
-		return pErrors.Annotate(err, "wrong master key")
-	case errors.As(err, &timeoutErr):
-		return pErrors.Annotate(err, "API timeout")
-	case errors.As(err, &internalErr):
-		return pErrors.Annotate(err, "API internal error")
-	default:
-		return pErrors.Annotate(err, "KMS error")
-	}
+	return pErrors.Annotate(err, "KMS error")
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/kms/aws.go` around lines 86 - 117, The classifyDecryptError function
duplicates error handling: remove the second switch that checks v2 typed
exceptions and consolidate all classification into the initial smithy.APIError
path (the errors.As(err, &apiErr) block) by adding a default: case that returns
pErrors.Annotate(err, "KMS error") when the ErrorCode() doesn't match known
codes; drop the unused types import and any now-unreferenced local variables
(notFoundErr, invalidKeyErr, timeoutErr, internalErr) so classification relies
solely on apiErr.ErrorCode().
br/pkg/aws/ebs_test.go (1)
85-127: 💤 Low value

Consider reducing test logic duplication.

The testWaitSnapshotsCreated test helper duplicates snapshot-processing logic from the production code. While this approach works for testing, it creates a maintenance burden—changes to the production logic must be manually mirrored here.

Consider whether the production WaitSnapshotsCreated could accept an injectable EC2 API interface (or use the existing S3API-style abstraction pattern) to allow direct testing with mocked DescribeSnapshots responses, eliminating the need for this parallel test-only implementation.

Note: Line 123's indefinite block (<-make(chan struct{})) is intentional to simulate timeout behavior when uncompleted snapshots remain.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs_test.go` around lines 85 - 127, testWaitSnapshotsCreated
duplicates snapshot-processing logic from production; refactor so production
WaitSnapshotsCreated accepts an injectable EC2 DescribeSnapshots interface (or
follow the S3API-style abstraction) and move shared processing into the real
WaitSnapshotsCreated (or a helper used by both) so tests call
WaitSnapshotsCreated with a mocked DescribeSnapshots response instead of a
test-only implementation; update EC2Session to depend on the new interface,
reuse extractSnapProgress, and remove the duplicate testWaitSnapshotsCreated
logic (keep the blocking <-make(chan struct{}) behavior in tests by returning an
uncompleted-snapshots condition from the mocked DescribeSnapshots).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/pkg/storage/s3_test.go`:
- Around line 552-557: The mock callback for
s.s3.EXPECT().DeleteObject(...).DoAndReturn currently declares and returns a
*s3.DeleteObjectInput but the real DeleteObject signature returns
(*s3.DeleteObjectOutput, error); update the DoAndReturn callback signature and
return value to return (*s3.DeleteObjectOutput, error) and return an instance
like &s3.DeleteObjectOutput{} (keeping the same assertions against input.Bucket
and input.Key) so the mock types match the DeleteObject method.

In `@br/pkg/storage/s3.go`:
- Around line 659-665: In PutAndDeleteObjectCheck avoid treating a nil err as an
error: only inspect err2 with goerrors.As when err2 != nil. After calling
svc.DeleteObject(ctx, input) check if err2 is non-nil, then use
goerrors.As(err2, &noSuchKey) to suppress warnings for NoSuchKey but call
log.Warn(...) with zap.Error(err2) for other errors; reference svc.DeleteObject,
err2, types.NoSuchKey, goerrors.As and log.Warn to locate where to add the nil
check.
- Around line 462-490: The S3 client setup currently applies credentials twice:
via config (config.LoadDefaultConfig / cfg.Credentials) and again by appending
an s3 option that sets o.Credentials from autoNewCred; when qs.RoleArn is set
this later overwrite breaks role assumption. Fix by preventing the redundant
s3Opts append that sets o.Credentials when qs.RoleArn is non-empty (or simply
remove the s3Opts credential-setting block that uses autoNewCred), so that when
RoleArn is provided cfg.Credentials (the stscreds.NewAssumeRoleProvider) remains
the effective credentials provider.

In `@br/pkg/task/operator/test_storage.go`:
- Around line 476-530: The test hardcodes startOffset/endOffset (used in the
storage.ReaderOption passed to tc.Store.Open) to [100,200) which fails when
expectedData is smaller (e.g. --test-data-size=128); modify the test to compute
a valid range from len(expectedData) (or validate a minimum size up front):
compute endOffset := int64(len(expectedData)) and if endOffset <= 100 set
startOffset = 0 else set startOffset = endOffset-100 (or clamp endOffset to
startOffset + desiredRange), then use these startOffset/endOffset values in the
ReaderOption so the range is always within expectedData; alternatively add an
explicit minimum-size check in ParseFromFlags and fail fast if test-data-size <
200.
- Around line 226-227: The current logger.Info call is logging
cfg.BackendOptions verbatim (via zap.Any) which can leak credentials; instead
build a redacted representation (e.g., pick only non-sensitive keys or mask
values for keys like access_key, secret_key, session_token, password) and pass
that to logger.Info; update the call that uses logger.Info("Starting external
storage test", zap.String("storage", cfg.StorageURI), zap.Any("options",
cfg.BackendOptions)) to use the redacted map/struct so only safe fields from
cfg.BackendOptions are logged and any sensitive fields are removed or masked.
- Around line 268-279: The current deferred call defer cleanup() runs even on
failure; change it so cleanup respects cfg.CleanupOnSuccess and only runs on
successful tests by adding a boolean flag (e.g., succeeded) captured by the
deferred cleanup or by removing the unconditional defer and invoking cleanup
conditionally at the end of the test; ensure the cleanup function still checks
cfg.CleanupOnSuccess and only performs deletions when both cfg.CleanupOnSuccess
is true and succeeded is true (update references to cleanup,
cfg.CleanupOnSuccess and the defer cleanup() placement accordingly).

---

Outside diff comments:
In `@br/pkg/aws/ebs.go`:
- Around line 530-533: The disable-FSR error return may nil-deref
*resp.Unsuccessful[0].SnapshotId like the EnableDataFSR path; change the
interpolation to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (same pattern
used in EnableDataFSR) so nil snapshot IDs become empty strings instead of
panics, keeping the descriptive message and FastSnapshotRestoreStateErrors
as-is.
- Around line 359-362: The error path dereferences
resp.Unsuccessful[0].SnapshotId directly which can be nil and cause a panic;
change the formatting to use aws.ToString(resp.Unsuccessful[0].SnapshotId)
(consistent with other uses in this file) when building the error message and
similarly replace any other raw *string derefs (e.g., the one flagged around
line 532) so the error log and the returned errors reference nil-safe string
conversions for SnapshotId while keeping FastSnapshotRestoreStateErrors
unchanged.

In `@br/pkg/storage/s3.go`:
- Around line 631-642: The getObjectCheck currently swallows non-API errors by
returning nil when err != nil but not an APIError; change the logic to treat any
non-nil err as a failure unless it's an APIError with ErrorCode() == noSuchKey:
call svc.GetObject(ctx, input), if err != nil then if goerrors.As(err, &aerr)
and aerr.ErrorCode() == noSuchKey return nil; otherwise return
errors.Trace(err); only return nil when svc.GetObject succeeded or when the
error is specifically noSuchKey. Ensure you reference svc.GetObject,
smithy.APIError (aerr) and the noSuchKey constant in your change.

---

Nitpick comments:
In `@br/pkg/aws/ebs_test.go`:
- Around line 85-127: testWaitSnapshotsCreated duplicates snapshot-processing
logic from production; refactor so production WaitSnapshotsCreated accepts an
injectable EC2 DescribeSnapshots interface (or follow the S3API-style
abstraction) and move shared processing into the real WaitSnapshotsCreated (or a
helper used by both) so tests call WaitSnapshotsCreated with a mocked
DescribeSnapshots response instead of a test-only implementation; update
EC2Session to depend on the new interface, reuse extractSnapProgress, and remove
the duplicate testWaitSnapshotsCreated logic (keep the blocking <-make(chan
struct{}) behavior in tests by returning an uncompleted-snapshots condition from
the mocked DescribeSnapshots).

In `@br/pkg/aws/ebs.go`:
- Around line 372-396: The getFSRCreditBalance signature should accept a string
(snapshotID string) instead of *string so callers like waitDataFSREnabled can
pass snapShotIDs[startIdx] directly; change getFSRCreditBalance(snapshotID
*string, targetAZ string) to getFSRCreditBalance(snapshotID string, targetAZ
string), update all call sites (e.g., waitDataFSREnabled) to pass the plain
string, and move any aws.String(...) conversion into getFSRCreditBalance at the
AWS SDK boundary (where DescribeMetrics/CloudWatch calls need *string) so
pointer creation happens once inside the helper.

In `@br/pkg/kms/aws.go`:
- Around line 86-117: The classifyDecryptError function duplicates error
handling: remove the second switch that checks v2 typed exceptions and
consolidate all classification into the initial smithy.APIError path (the
errors.As(err, &apiErr) block) by adding a default: case that returns
pErrors.Annotate(err, "KMS error") when the ErrorCode() doesn't match known
codes; drop the unused types import and any now-unreferenced local variables
(notFoundErr, invalidKeyErr, timeoutErr, internalErr) so classification relies
solely on apiErr.ErrorCode().

In `@DEPS.bzl`:
- Around line 414-422: The DEPS.bzl change upgrades the makezero dependency from
v1.1.1 to v1.2.0 (see the makezero urls, strip_prefix and sha256 entries), which
is out-of-scope for the AWS SDK v1→v2 backport; either confirm it is
intentionally required (e.g., transitively needed) or revert this block to the
prior v1.1.1 values — restore the original sha256, strip_prefix and urls for
makezero if reverting — and add a commit message explaining the decision so the
backport only includes AWS SDK-related changes unless the bump is explicitly
required.

In `@pkg/lightning/log/log.go`:
- Around line 192-214: The code contains redundant cancellation checks because
goerrors.Is(err, context.Canceled) already unwraps through
*smithy.OperationError, so in the block that handles var opErr
*smithy.OperationError you should remove the duplicate
goerrors.Is(opErr.Unwrap(), context.Canceled) and goerrors.As(opErr.Unwrap(),
&canceledError) checks and only keep the status.Code(opErr.Unwrap()) ==
codes.Canceled check (or alternatively drop the entire opErr branch if you
prefer the single goerrors.Is + smithy.CanceledError checks to suffice); update
the block that references opErr, canceledError, goerrors.Is, goerrors.As, and
status.Code accordingly to avoid duplicate logic while preserving the
status.Code check which does not unwrap.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c1e8e88e-9b9f-46f2-bfa3-c389b546fd9e

📥 Commits

Reviewing files that changed from the base of the PR and between 34227a6 and 8cfe383.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (32)

DEPS.bzl
Makefile
br/cmd/br/operator.go
br/pkg/aws/BUILD.bazel
br/pkg/aws/ebs.go
br/pkg/aws/ebs_test.go
br/pkg/kms/BUILD.bazel
br/pkg/kms/aws.go
br/pkg/mock/BUILD.bazel
br/pkg/mock/s3_v2.go
br/pkg/mock/s3iface.go
br/pkg/storage/BUILD.bazel
br/pkg/storage/azblob.go
br/pkg/storage/compatibility_for_test.go
br/pkg/storage/helper.go
br/pkg/storage/ks3.go
br/pkg/storage/s3.go
br/pkg/storage/s3_interface.go
br/pkg/storage/s3_test.go
br/pkg/storage/storage.go
br/pkg/task/operator/BUILD.bazel
br/pkg/task/operator/test_storage.go
br/pkg/task/restore_ebs_meta.go
build/nogo_config.json
go.mod
pkg/lightning/backend/external/BUILD.bazel
pkg/lightning/backend/external/byte_reader.go
pkg/lightning/backend/external/byte_reader_test.go
pkg/lightning/log/BUILD.bazel
pkg/lightning/log/log.go
pkg/lightning/log/log_test.go
pkg/sessionctx/variable/sysvar_test.go

💤 Files with no reviewable changes (1)

Makefile

coderabbitai · 2026-05-09T07:41:32Z

 	s.s3.EXPECT().
-		DeleteObjectWithContext(ctx, gomock.Any()).
-		DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, opt ...request.Option) (*s3.DeleteObjectInput, error) {
-			require.Equal(t, "bucket", aws.StringValue(input.Bucket))
-			require.Equal(t, "prefix/file", aws.StringValue(input.Key))
+		DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).
+		DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) {
+			require.Equal(t, "bucket", aws.ToString(input.Bucket))
+			require.Equal(t, "prefix/file", aws.ToString(input.Key))
 			return &s3.DeleteObjectInput{}, nil


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# Locate the files mentioned in the review find . -name "s3_test.go" -path "*/br/pkg/storage/*" 2>/dev/null find . -name "s3_v2.go" -path "*/br/pkg/mock/*" 2>/dev/null

Repository: pingcap/tidb

Length of output: 107

🏁 Script executed:

# Read the specific lines from s3_test.go if [ -f "br/pkg/storage/s3_test.go" ]; then sed -n '550,560p' "br/pkg/storage/s3_test.go" | cat -n fi

Repository: pingcap/tidb

Length of output: 566

🏁 Script executed:

# Find and examine the DeleteObject method signature in the mock if [ -f "br/pkg/mock/s3_v2.go" ]; then grep -A 5 "DeleteObject" "br/pkg/mock/s3_v2.go" | head -20 fi

Repository: pingcap/tidb

Length of output: 856

Return *s3.DeleteObjectOutput from this mock callback.

The DoAndReturn callback is typed to return (*s3.DeleteObjectInput, error) and returns &s3.DeleteObjectInput{}, but DeleteObject is mocked to return (*s3.DeleteObjectOutput, error). This type mismatch causes GoMock to reject the expectation, breaking the test.

Suggested fix

s.s3.EXPECT(). DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()). - DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) { + DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectOutput, error) { require.Equal(t, "bucket", aws.ToString(input.Bucket)) require.Equal(t, "prefix/file", aws.ToString(input.Key)) - return &s3.DeleteObjectInput{}, nil + return &s3.DeleteObjectOutput{}, nil })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

s.s3.EXPECT().

DeleteObjectWithContext(ctx, gomock.Any()).

DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, opt ...request.Option) (*s3.DeleteObjectInput, error) {

require.Equal(t, "bucket", aws.StringValue(input.Bucket))

require.Equal(t, "prefix/file", aws.StringValue(input.Key))

DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).

DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) {

require.Equal(t, "bucket", aws.ToString(input.Bucket))

require.Equal(t, "prefix/file", aws.ToString(input.Key))

return &s3.DeleteObjectInput{}, nil

s.s3.EXPECT().

DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).

DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectOutput, error) {

require.Equal(t, "bucket", aws.ToString(input.Bucket))

require.Equal(t, "prefix/file", aws.ToString(input.Key))

return &s3.DeleteObjectOutput{}, nil

})

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/storage/s3_test.go` around lines 552 - 557, The mock callback for s.s3.EXPECT().DeleteObject(...).DoAndReturn currently declares and returns a *s3.DeleteObjectInput but the real DeleteObject signature returns (*s3.DeleteObjectOutput, error); update the DoAndReturn callback signature and return value to return (*s3.DeleteObjectOutput, error) and return an instance like &s3.DeleteObjectOutput{} (keeping the same assertions against input.Bucket and input.Key) so the mock types match the DeleteObject method.

coderabbitai · 2026-05-09T07:41:32Z

+	if qs.Profile == "" {
+		cred, err := autoNewCred(&qs)
+		if err != nil {
+			return nil, errors.Trace(err)
+		}
+		if cred != nil {
+			s3Opts = append(s3Opts, func(o *s3.Options) {
+				o.Credentials = cred
+			})
+		}
+	}
+
+	// Handle role assumption if specified
+	if len(qs.RoleArn) > 0 {
+		// Create STS client for role assumption
+		stsClient := sts.NewFromConfig(cfg)
+
+		// Configure assume role provider with external ID if provided
+		var assumeRoleProvider aws.CredentialsProvider
+		if len(qs.ExternalId) > 0 {
+			assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) {
+				o.ExternalID = &qs.ExternalId
+			})
+		} else {
+			assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn)
+		}
+
+		// Update config with assume role credentials
+		cfg.Credentials = assumeRoleProvider


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd br/pkg/storage && head -c 100 s3.go && wc -l s3.go

Repository: pingcap/tidb

Length of output: 167

🏁 Script executed:

# Get the specific lines mentioned in the review sed -n '462,490p' br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 902

🏁 Script executed:

# Get more context around the credential setup to understand the full flow sed -n '440,520p' br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 2911

🏁 Script executed:

# Search for NewS3Storage function definition rg -n "func.*NewS3Storage" br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 259

🏁 Script executed:

# Get the complete function to see the full flow sed -n '372,530p' br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 5337

🌐 Web query:

AWS SDK Go v2 s3.NewFromConfig options credential override behavior

💡 Result:

In AWS SDK for Go v2, s3.NewFromConfig copies credentials from the provided aws.Config, and the constructor/service Options you pass can override those credentials (including disabling signing by forcing anonymous/nil credentials). 1) Default behavior (no override): uses cfg.Credentials (which comes from config.LoadDefaultConfig) - aws.Config has a Credentials field (“The credentials object to use when signing requests”). [1] - When you create the aws.Config via config.LoadDefaultConfig, the SDK resolves credentials via its default credential chain; and if you explicitly provide credentials, “the SDK uses only those credentials.” [2] - s3.NewFromConfig builds the S3 service Options using cfg.Credentials (among other fields), then calls New(opts, optFns...). [3] Practical implication: - If you call cfg, _:= config.LoadDefaultConfig(...), then client:= s3.NewFromConfig(cfg), S3 will sign requests using whatever credentials were resolved into cfg.Credentials. [2][3] 2) Overriding credentials at the S3 client level (optFn / Options): the optFn mutates s3.Options, including Credentials - s3.NewFromConfig takes functional options that mutate the service’s Options struct. [4] - The S3 service Options struct contains Credentials aws.CredentialsProvider (“The credentials object to use when signing requests.”). [5] - s3.NewFromConfig initially sets Options.Credentials = cfg.Credentials before applying optFns, so an optFn can replace o.Credentials and thereby override the credentials used for signing. [3][5] 3) Disabling signing / forcing “anonymous” behavior - The SDK documents that using the AnonymousCredentials provider instructs signing middleware to not sign requests. [6] - It also states: “Using nil credentials when configuring an API client will achieve the same result.” and shows that when using external config loading, you can use config.WithCredentialsProvider(aws.AnonymousCredentials{}) to prevent credential resolution attempts. [6] - Additionally, the same source notes that for NewFromConfig, if the external config resolved a credential provider, you need to explicitly set the API client Options Credentials member to nil to avoid signing; otherwise it may keep the resolved provider. [6] - In the generated S3 service code, there’s special handling to “ignoreAnonymousAuth” by converting AnonymousCredentials to nil. [5] 4) Important distinction: you must use config.LoadDefaultConfig to get the default credential-chain behavior - The credentials “defaults to a chain…” behavior occurs when you use config.LoadDefaultConfig, not automatically just by having an empty aws.Config. This is clarified in an SDK issue discussion. [7] Example patterns (behaviorally accurate) A. Use the default credential chain from env/profile/roles - cfg:= config.LoadDefaultConfig(ctx,...) - client:= s3.NewFromConfig(cfg) (Uses cfg.Credentials resolved by LoadDefaultConfig.) [2][3] B. Override credentials explicitly for the S3 client - client:= s3.NewFromConfig(cfg, func(o *s3.Options) { o.Credentials = myCredProvider }) (Option mutates o.Credentials; Options.Credentials is what signing middleware uses.) [4][5][3] C. Disable signing (public S3 objects / no signature) - client:= s3.NewFromConfig(cfg, func(o *s3.Options) { o.Credentials = nil }) (or o.Credentials = aws.AnonymousCredentials{}) (This is documented as equivalent to using nil for signing middleware behavior.) [6][5] Net answer to “credential override behavior” - s3.NewFromConfig starts with cfg.Credentials. [3] - Any s3.NewFromConfig option function that sets *s3.Options.Credentials overrides cfg.Credentials for request signing. [3][5] - Setting Credentials to aws.AnonymousCredentials{} or nil disables request signing (and may also be converted to nil internally for S3). [6][5]

Citations:

1: https://github.com/aws/aws-sdk-go-v2/blob/main/aws/config.go

2: https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/configure-gosdk.html

3: https://github.com/aws/aws-sdk-go-v2/blob/e18a4ad0322327639a12fc21a0bb57924685aa47/service/s3/api_client.go

4: https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/using.html

5: https://github.com/aws/aws-sdk-go-v2/blob/service/s3/v1.99.0/service/s3/options.go

6: https://github.com/aws/aws-sdk-go-v2/blob/main/aws/credentials.go

7: aws.Config with nil Credentials does not pick up AWS_ACCESS_KEY_ID aws/aws-sdk-go-v2#1840

🏁 Script executed:

# Also check if there are any tests that exercise both RoleArn and credential scenarios find br -name "*test*.go" -type f | xargs rg -l "RoleArn\|AssumeRole" | head -5

Repository: pingcap/tidb

Length of output: 38

🏁 Script executed:

# Look for any comments or documentation in the code about credential precedence rg -n "credential|Credential" br/pkg/storage/s3.go | head -20

Repository: pingcap/tidb

Length of output: 1401

Remove duplicative credential setup at S3 options level when role assumption is used.

The code sets credentials twice: first at the config level (lines 406–416 via config.WithCredentialsProvider), then again at the S3 options level (lines 461–470 via o.Credentials = cred). When RoleArn is specified, cfg.Credentials is updated to the STS assume-role provider, but the S3 options function still applies afterward, overriding it with the static credentials. This prevents role assumption from working.

The simplest fix is to skip the redundant S3 options credential setup when RoleArn is present:

Suggested fix

- if qs.Profile == "" { + if qs.Profile == "" && len(qs.RoleArn) == 0 { cred, err := autoNewCred(&qs) if err != nil { return nil, errors.Trace(err) } if cred != nil { s3Opts = append(s3Opts, func(o *s3.Options) { o.Credentials = cred }) } }

Alternatively, remove the entire lines 461–470 block since credentials are already set during config.LoadDefaultConfig (lines 406–416), making the S3 options setup unnecessary.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if qs.Profile == "" {

cred, err := autoNewCred(&qs)

if err != nil {

return nil, errors.Trace(err)

}

if cred != nil {

s3Opts = append(s3Opts, func(o *s3.Options) {

o.Credentials = cred

})

}

}

// Handle role assumption if specified

if len(qs.RoleArn) > 0 {

// Create STS client for role assumption

stsClient := sts.NewFromConfig(cfg)

// Configure assume role provider with external ID if provided

var assumeRoleProvider aws.CredentialsProvider

if len(qs.ExternalId) > 0 {

assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) {

o.ExternalID = &qs.ExternalId

})

} else {

assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn)

}

// Update config with assume role credentials

cfg.Credentials = assumeRoleProvider

if qs.Profile == "" && len(qs.RoleArn) == 0 {

cred, err := autoNewCred(&qs)

if err != nil {

return nil, errors.Trace(err)

}

if cred != nil {

s3Opts = append(s3Opts, func(o *s3.Options) {

o.Credentials = cred

})

}

}

// Handle role assumption if specified

if len(qs.RoleArn) > 0 {

// Create STS client for role assumption

stsClient := sts.NewFromConfig(cfg)

// Configure assume role provider with external ID if provided

var assumeRoleProvider aws.CredentialsProvider

if len(qs.ExternalId) > 0 {

assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) {

o.ExternalID = &qs.ExternalId

})

} else {

assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn)

}

// Update config with assume role credentials

cfg.Credentials = assumeRoleProvider

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/storage/s3.go` around lines 462 - 490, The S3 client setup currently applies credentials twice: via config (config.LoadDefaultConfig / cfg.Credentials) and again by appending an s3 option that sets o.Credentials from autoNewCred; when qs.RoleArn is set this later overwrite breaks role assumption. Fix by preventing the redundant s3Opts append that sets o.Credentials when qs.RoleArn is non-empty (or simply remove the s3Opts credential-setting block that uses autoNewCred), so that when RoleArn is provided cfg.Credentials (the stscreds.NewAssumeRoleProvider) remains the effective credentials provider.

coderabbitai · 2026-05-09T07:41:32Z

+		_, err2 := svc.DeleteObject(ctx, input)
+		var noSuchKey *types.NoSuchKey
+		if !goerrors.As(err2, &noSuchKey) {
+			log.Warn("failed to delete object used for permission check",
+				zap.String("bucket", options.Bucket),
+				zap.String("key", *input.Key), zap.Error(err2))
 		}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid warning on successful cleanup in PutAndDeleteObjectCheck.

goerrors.As(nil, &noSuchKey) is false, so Line 662 logs "failed to delete..." even when DeleteObject succeeds. That will emit a misleading warning on every successful permission check.

Suggested fix

_, err2 := svc.DeleteObject(ctx, input) var noSuchKey *types.NoSuchKey - if !goerrors.As(err2, &noSuchKey) { + if err2 != nil && !goerrors.As(err2, &noSuchKey) { log.Warn("failed to delete object used for permission check", zap.String("bucket", options.Bucket), zap.String("key", *input.Key), zap.Error(err2)) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_, err2 := svc.DeleteObject(ctx, input)

var noSuchKey *types.NoSuchKey

if !goerrors.As(err2, &noSuchKey) {

log.Warn("failed to delete object used for permission check",

zap.String("bucket", options.Bucket),

zap.String("key", *input.Key), zap.Error(err2))

}

_, err2 := svc.DeleteObject(ctx, input)

var noSuchKey *types.NoSuchKey

if err2 != nil && !goerrors.As(err2, &noSuchKey) {

log.Warn("failed to delete object used for permission check",

zap.String("bucket", options.Bucket),

zap.String("key", *input.Key), zap.Error(err2))

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/storage/s3.go` around lines 659 - 665, In PutAndDeleteObjectCheck avoid treating a nil err as an error: only inspect err2 with goerrors.As when err2 != nil. After calling svc.DeleteObject(ctx, input) check if err2 is non-nil, then use goerrors.As(err2, &noSuchKey) to suppress warnings for NoSuchKey but call log.Warn(...) with zap.Error(err2) for other errors; reference svc.DeleteObject, err2, types.NoSuchKey, goerrors.As and log.Warn to locate where to add the nil check.

coderabbitai · 2026-05-09T07:41:32Z

+	logger := logutil.Logger(ctx)
+	logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't log BackendOptions verbatim.

zap.Any("options", cfg.BackendOptions) will serialize storage credentials when the backend is configured via flags, including access keys and session tokens. Please log a redacted subset instead.

Suggested fix

- logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions)) + logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger := logutil.Logger(ctx)

logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))

logger := logutil.Logger(ctx)

logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/task/operator/test_storage.go` around lines 226 - 227, The current logger.Info call is logging cfg.BackendOptions verbatim (via zap.Any) which can leak credentials; instead build a redacted representation (e.g., pick only non-sensitive keys or mask values for keys like access_key, secret_key, session_token, password) and pass that to logger.Info; update the call that uses logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions)) to use the redacted map/struct so only safe fields from cfg.BackendOptions are logged and any sensitive fields are removed or masked.

coderabbitai · 2026-05-09T07:41:32Z

+	cleanup := func() {
+		if !cfg.CleanupOnSuccess {
+			return
+		}
+		printStep("Cleaning up test files...")
+		_ = store.DeleteFile(ctx, testFileName1)
+		_ = store.DeleteFile(ctx, testFileName2)
+		_ = store.DeleteFile(ctx, testFileNameRenamed)
+		_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1))
+		_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2))
+	}
+	defer cleanup()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

--cleanup currently runs on failure too.

The deferred cleanup is unconditional, so test artifacts are removed even when cases fail. That contradicts the CleanupOnSuccess name/help text and makes --pause-when-fail much less useful.

Suggested fix

- defer cleanup() + defer func() { + if cfg.CleanupOnSuccess && report.FailedTests == 0 { + cleanup() + } + }()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cleanup := func() {

if !cfg.CleanupOnSuccess {

return

}

printStep("Cleaning up test files...")

_ = store.DeleteFile(ctx, testFileName1)

_ = store.DeleteFile(ctx, testFileName2)

_ = store.DeleteFile(ctx, testFileNameRenamed)

_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1))

_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2))

}

defer cleanup()

cleanup := func() {

if !cfg.CleanupOnSuccess {

return

}

printStep("Cleaning up test files...")

_ = store.DeleteFile(ctx, testFileName1)

_ = store.DeleteFile(ctx, testFileName2)

_ = store.DeleteFile(ctx, testFileNameRenamed)

_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1))

_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2))

}

defer func() {

if cfg.CleanupOnSuccess && report.FailedTests == 0 {

cleanup()

}

}()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/task/operator/test_storage.go` around lines 268 - 279, The current deferred call defer cleanup() runs even on failure; change it so cleanup respects cfg.CleanupOnSuccess and only runs on successful tests by adding a boolean flag (e.g., succeeded) captured by the deferred cleanup or by removing the unconditional defer and invoking cleanup conditionally at the end of the test; ensure the cleanup function still checks cfg.CleanupOnSuccess and only performs deletions when both cfg.CleanupOnSuccess is true and succeeded is true (update references to cleanup, cfg.CleanupOnSuccess and the defer cleanup() placement accordingly).

coderabbitai · 2026-05-09T07:41:32Z

+	startOffset := int64(100)
+	endOffset := int64(200)
+	reader, err := tc.Store.Open(tc.Ctx, name, &storage.ReaderOption{
+		StartOffset: &startOffset,
+		EndOffset:   &endOffset,
+	})
+
+	result := TestResult{
+		Name: testName,
+	}
+
+	if err != nil {
+		printError("  ❌ Failed: %v", err)
+		result.Duration = time.Since(start)
+		result.Passed = false
+		result.Error = errors.Annotate(err, "Open with range failed")
+		tc.AddResult(result)
+		return
+	}
+	defer reader.Close()
+
+	data, err := io.ReadAll(reader)
+	duration := time.Since(start)
+	result.Duration = duration
+
+	if err != nil {
+		printError("  ❌ Failed to read: %v", err)
+		result.Passed = false
+		result.Error = errors.Annotate(err, "ReadAll failed")
+		tc.AddResult(result)
+		return
+	}
+
+	expectedSize := endOffset - startOffset
+	if int64(len(data)) != expectedSize {
+		printError("  ❌ Failed: expected size %d, got %d", expectedSize, len(data))
+		result.Passed = false
+		result.Error = errors.Errorf("Range read returned wrong size: expected %d, got %d", expectedSize, len(data))
+	} else {
+		// Verify data matches the expected range
+		dataValid := true
+		for i := range data {
+			if data[i] != expectedData[startOffset+int64(i)] {
+				printError("  ❌ Failed: data mismatch at offset %d", i)
+				result.Passed = false
+				result.Error = errors.Errorf("Range read returned corrupted data at offset %d", i)
+				dataValid = false
+				break
+			}
+		}
+		if dataValid {
+			printSuccess("  ✓ Passed (read %d bytes)", len(data))
+			result.Passed = true
+			result.Details = fmt.Sprintf("Read %d bytes from range [%d, %d)", len(data), startOffset, endOffset)
+		}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Small --test-data-size values make this range test fail spuriously.

This always requests bytes [100, 200), but ParseFromFlags accepts any positive size. A user can set --test-data-size=128 and get a harness failure even when the storage backend is correct. Either validate a minimum size up front or derive the range from len(expectedData).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/task/operator/test_storage.go` around lines 476 - 530, The test hardcodes startOffset/endOffset (used in the storage.ReaderOption passed to tc.Store.Open) to [100,200) which fails when expectedData is smaller (e.g. --test-data-size=128); modify the test to compute a valid range from len(expectedData) (or validate a minimum size up front): compute endOffset := int64(len(expectedData)) and if endOffset <= 100 set startOffset = 0 else set startOffset = endOffset-100 (or clamp endOffset to startOffset + desiredRange), then use these startOffset/endOffset values in the ReaderOption so the range is always within expectedData; alternatively add an explicit minimum-size check in ParseFromFlags and fail fast if test-data-size < 200.

jebter · 2026-05-09T09:23:51Z

/retest

tiprow · 2026-05-09T09:24:15Z

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

YuJuncen · 2026-05-09T09:52:21Z

/retest

tiprow · 2026-05-09T09:52:45Z

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

YuJuncen · 2026-05-09T11:37:02Z

/test pull-br-integration-test

tiprow · 2026-05-09T11:37:27Z

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jebter · 2026-05-11T02:27:21Z

/test pull-br-integration-test

tiprow · 2026-05-11T02:27:45Z

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

close pingcap#67623

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

br/tests/br_gcs/run.sh (1)

142-142: 💤 Low value

Optional: Simplify arithmetic variable reference.

In arithmetic contexts, the ${} syntax is unnecessary and can be simplified for consistency with shell best practices.

♻️ Proposed simplification

     for i in $(seq $DB_COUNT); do
-        row_count_new[${i}]=$(run_sql "SELECT COUNT(*) FROM $DB${i}.$TABLE;" | awk '/COUNT/{print $2}')
+        row_count_new[$i]=$(run_sql "SELECT COUNT(*) FROM $DB${i}.$TABLE;" | awk '/COUNT/{print $2}')
     done

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/tests/br_gcs/run.sh` at line 142, The array index and arithmetic-style
variable reference use unnecessary ${} expansion; change the assignment to use
the simpler form for the index and any pure arithmetic contexts (e.g. replace
row_count_new[${i}] with row_count_new[i]) while keeping ${DB}${i}.$TABLE or
other expansions that are needed for concatenation intact; update the assignment
that calls run_sql accordingly (referencing row_count_new, run_sql, DB, TABLE,
and i).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/tests/br_gcs/run.sh`:
- Line 92: The run_br invocation uses unquoted shell variables which can cause
word splitting/globbing (e.g. --pd $PD_ADDR); update the calls to quote all
variable expansions used in run_br and similar lines so each value is treated as
a single argument: wrap $PD_ADDR, $BUCKET, $DB, $GCS_HOST and $GCS_PORT in
double quotes wherever they are expanded (references: the run_br command and its
--pd and -s arguments shown in the diff; also apply the same quoting to the
other occurrences flagged at lines 101, 111, 139).

In `@br/tests/docker_compatible_gcs/_run.sh`:
- Line 45: The restore command invocation using "bin/br restore db" is passing
the PD address unquoted (PD_ADDR) which can cause word-splitting/globbing;
update the command to quote the variable (use "$PD_ADDR") in the --pd argument
so the entire address is passed as a single token. Locate the line with the
bin/br restore db command and replace --pd $PD_ADDR with --pd "$PD_ADDR".

---

Nitpick comments:
In `@br/tests/br_gcs/run.sh`:
- Line 142: The array index and arithmetic-style variable reference use
unnecessary ${} expansion; change the assignment to use the simpler form for the
index and any pure arithmetic contexts (e.g. replace row_count_new[${i}] with
row_count_new[i]) while keeping ${DB}${i}.$TABLE or other expansions that are
needed for concatenation intact; update the assignment that calls run_sql
accordingly (referencing row_count_new, run_sql, DB, TABLE, and i).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e87ecd62-248d-4df4-a6ff-b9eb65437f65

📥 Commits

Reviewing files that changed from the base of the PR and between 26a1046 and f0302db.

📒 Files selected for processing (3)

br/tests/br_gcs/run.sh
br/tests/docker_compatible_gcs/_run.sh
br/tests/docker_compatible_gcs/prepare.sh

coderabbitai · 2026-05-11T03:30:01Z

-    --cert "$TEST_DIR/certs/br.pem" \
-    --key "$TEST_DIR/certs/br.key" \
-    --pd $PD_ADDR -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
+run_br --pd $PD_ADDR backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote variables to prevent word splitting.

Multiple instances of unquoted $PD_ADDR can lead to word splitting or globbing if the variable contains spaces or special characters.

🛡️ Proposed fix

# Line 92 -run_br --pd $PD_ADDR backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT" +run_br --pd "$PD_ADDR" backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT" # Line 101 --key "$TEST_DIR/certs/br.key" \ - --pd $PD_ADDR -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false + --pd "$PD_ADDR" -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false # Line 111 -run_br restore full -s "gcs://$BUCKET/$DB?" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false +run_br restore full -s "gcs://$BUCKET/$DB?" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false # Line 139 - run_br restore full -s "gcs://$BUCKET/${DB}_old" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false + run_br restore full -s "gcs://$BUCKET/${DB}_old" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

Also applies to: 101-101, 111-111, 139-139

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 92-92: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/tests/br_gcs/run.sh` at line 92, The run_br invocation uses unquoted shell variables which can cause word splitting/globbing (e.g. --pd $PD_ADDR); update the calls to quote all variable expansions used in run_br and similar lines so each value is treated as a single argument: wrap $PD_ADDR, $BUCKET, $DB, $GCS_HOST and $GCS_PORT in double quotes wherever they are expanded (references: the run_br command and its --pd and -s arguments shown in the diff; also apply the same quoting to the other occurrences flagged at lines 101, 111, 139).

coderabbitai · 2026-05-11T03:30:01Z

    echo "restore ${TAG} data starts..."
    # after BR merged into TiDB we need skip version check because the build from tidb is not a release version.
-    bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
+    bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote $PD_ADDR to prevent word splitting.

While unlikely to cause issues in practice, unquoted variables can lead to word splitting or globbing if they contain spaces or special characters.

🛡️ Proposed fix

- bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false + bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 45-45: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/tests/docker_compatible_gcs/_run.sh` at line 45, The restore command invocation using "bin/br restore db" is passing the PD address unquoted (PD_ADDR) which can cause word-splitting/globbing; update the command to quote the variable (use "$PD_ADDR") in the --pd argument so the entire address is passed as a single token. Locate the line with the bin/br restore db command and replace --pd $PD_ADDR with --pd "$PD_ADDR".

ti-chi-bot · 2026-05-11T04:08:25Z

[LGTM Timeline notifier]

Timeline:

2026-05-08 02:50:03.824581408 +0000 UTC m=+408876.697931380: ☑️ agreed by D3Hunter.
2026-05-11 04:08:23.939353388 +0000 UTC m=+65872.472132697: ☑️ agreed by Leavrth.

jebter · 2026-05-11T05:14:09Z

/test pull-br-integration-test

tiprow · 2026-05-11T05:14:33Z

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-05-11T07:03:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [D3Hunter,Leavrth]
~~br/OWNERS~~ [Leavrth]
~~br/pkg/storage/OWNERS~~ [D3Hunter,Leavrth]
~~pkg/lightning/OWNERS~~ [D3Hunter]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

br: upgrade AWS sdk from v1 to v2 (pingcap#66634)

3e84632

ref pingcap#61547, close pingcap#62541

ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels May 8, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 8, 2026

YuJuncen changed the base branch from master to release-8.5-20260116-v8.5.5 May 8, 2026 02:36

ti-chi-bot Bot added component/dumpling This is related to Dumpling of TiDB. component/statistics sig/planner SIG: Planner labels May 8, 2026

YuJuncen closed this May 8, 2026

YuJuncen reopened this May 8, 2026

D3Hunter previously approved these changes May 8, 2026

View reviewed changes

YuJuncen removed component/statistics component/dumpling This is related to Dumpling of TiDB. labels May 8, 2026

ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 8, 2026

YuJuncen removed the sig/planner SIG: Planner label May 8, 2026

wuhuizuo changed the title ~~br: upgrade AWS sdk from v1 to v2 #66634~~ br: upgrade AWS SDK from v1 to v2 #66634 May 8, 2026

wuhuizuo removed the do-not-merge/needs-linked-issue label May 8, 2026

sessionctx: avoid IMDS leak in TestSetTiDBCloudStorageURI (pingcap#67604

8cfe383

)

YuJuncen dismissed D3Hunter’s stale review via 8cfe383 May 9, 2026 07:32

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

try disable ssl for test case

26a1046

br: fix fake-gcs test (pingcap#67645)

f0302db

close pingcap#67623

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Leavrth approved these changes May 11, 2026

View reviewed changes

ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 11, 2026

jebter mentioned this pull request May 11, 2026

br: fix fake-gcs test (#67645) | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 #68126

Closed

13 tasks

D3Hunter approved these changes May 11, 2026

View reviewed changes

ti-chi-bot Bot added the approved label May 11, 2026

ti-chi-bot Bot merged commit 812fde7 into pingcap:release-8.5-20260116-v8.5.5 May 11, 2026
28 of 32 checks passed

		logger := logutil.Logger(ctx)
		logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))

	bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
	bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

Conversation

YuJuncen commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

tiprow Bot commented May 8, 2026

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

YuJuncen commented May 8, 2026

Uh oh!

tiprow Bot commented May 8, 2026

Uh oh!

YuJuncen commented May 8, 2026

Uh oh!

tiprow Bot commented May 8, 2026

Uh oh!

wuhuizuo commented May 8, 2026

Uh oh!

tiprow Bot commented May 8, 2026

Uh oh!

wuhuizuo commented May 8, 2026

Uh oh!

ti-chi-bot Bot commented May 8, 2026

Uh oh!

tiprow Bot commented May 9, 2026

Uh oh!

wuhuizuo commented May 9, 2026

Uh oh!

tiprow Bot commented May 9, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

jebter commented May 9, 2026

Uh oh!

tiprow Bot commented May 9, 2026

Uh oh!

YuJuncen commented May 9, 2026

Uh oh!

tiprow Bot commented May 9, 2026

Uh oh!

YuJuncen commented May 9, 2026

YuJuncen commented May 8, 2026 •

edited by coderabbitai Bot

Loading

pantheon-ai Bot commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading