Skip to content

br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5#68216

Merged
ti-chi-bot[bot] merged 4 commits into
pingcap:release-8.5-20260116-v8.5.5from
YuJuncen:pick-4bd7b34-release-8.5-20260116-v8.5.5
May 11, 2026
Merged

br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5#68216
ti-chi-bot[bot] merged 4 commits into
pingcap:release-8.5-20260116-v8.5.5from
YuJuncen:pick-4bd7b34-release-8.5-20260116-v8.5.5

Conversation

@YuJuncen
Copy link
Copy Markdown
Contributor

@YuJuncen YuJuncen commented May 8, 2026

successor of #62488

Issue Number: close #62541, ref #61547

Problem Summary:

Migration guide
https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/migrate-gosdk.html

What changed and how does it work?

This also picks:

Confliction Resolving(By GPT):

  • DEPS.bzl: kept the target commit side for AWS SDK v2 Bazel deps.
    • Reason: required to build new v2 imports.
  • br/pkg/storage/BUILD.bazel: kept target commit side for v2 deps.
    • Reason: aligns BUILD deps with migrated S3 implementation.
  • go.mod / go.sum: kept target commit side in conflict hunks.
    • Reason: introduce AWS SDK v2 modules and checksums used by migrated code.
  • pkg/lightning/log/log.go: kept target commit side in conflict hunk.
    • Reason: use smithy/AWS v2 cancellation error handling.
  • br/cmd/br/operator.go: kept newTestStorageCommand() from target commit, dropped newPitrChecksumCommand() and newUpstreamChecksumCommand().
    • Reason: those two symbols are not available in release-8.5 branch, would break compilation.
  • br/pkg/storage/s3.go: mostly adopted target commit side; removed recording/AccessRecording integration points.
    • Reason: release-8.5 branch lacks br/pkg/storage/recording prerequisite commit; preserving those references would not compile.
  • br/pkg/storage/s3_test.go: changed conflict-hunk tests back to createS3Suite(...) form (without recording collector).
    • Reason: same prerequisite gap as above.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    A new subcommand br operator test-storage was added.
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • Added a hidden operator command "test-storage" to validate and report on external storage backends.
  • Improvements

    • Upgraded AWS SDK integration to v2 across storage, KMS, EC2, and related features.
    • Improved S3-compatible provider detection and checksum handling; configurable retry/error handling.
    • Safer storage URI validation with adjusted HTTP client behavior.
  • Bug Fixes

    • Treat certain HTTP 404 storage responses as "object not found" to avoid false errors.
  • Tests

    • Updated storage tests, mocks, and compatibility helpers to align with SDK v2 and broaden coverage.

Review Change Stack

@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 8, 2026

@YuJuncen I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

📝 Walkthrough

Walkthrough

Migrates the codebase from AWS SDK Go v1 to v2: updates dependency pins and Bazel/go.mod, rewrites S3/KMS/EBS/storage logic to v2 APIs, adds S3API abstraction and v2 mocks, updates tests and scripts, and introduces a storage test harness CLI subcommand.

Changes

AWS SDK v1 to v2 Migration

AWS SDK v1 to v2 Migration

Layer / File(s) Summary
Dependency Updates
DEPS.bzl, go.mod
Upgrade makezero v1.1.1→v1.2.0; add aws-sdk-go-v2 modules (config, s3, kms, ec2, sts, cloudwatch, etc.); keep aws-sdk-go v1 as indirect pin.
Build Configuration
br/pkg/aws/BUILD.bazel, br/pkg/kms/BUILD.bazel, br/pkg/mock/BUILD.bazel, br/pkg/storage/BUILD.bazel, br/pkg/task/operator/BUILD.bazel, pkg/lightning/backend/external/BUILD.bazel, pkg/lightning/log/BUILD.bazel, Makefile
Update Bazel targets to reference v2 SDK repos; replace awserr/request deps with smithy-go; remove mockgen s3iface phony target; add nogo exclusion.
S3API Interface
br/pkg/storage/s3_interface.go
New exported interface defining v2-based S3 contract (HeadBucket, ListObjects, GetObject, PutObject, DeleteObject(s), multipart ops, GetObjectLockConfiguration).
Core S3 Storage
br/pkg/storage/s3.go
Migrate from s3iface.S3API to internal S3API; add s3Compatible flag; rewrite NewS3Storage using config.LoadDefaultConfig, region detection, retryer/credential/endpoint wiring; migrate S3 calls to v2 shapes; add withContentMD5 middleware and tidbRetryer implementing aws.Retryer.
EBS Orchestration
br/pkg/aws/ebs.go
EC2Session holds v2 ec2/cloudwatch clients; snapshot/volume APIs migrated to v2; public signatures changed from pointer slices to string slices; error detection switched to smithy.APIError.
KMS Decrypt
br/pkg/kms/aws.go
AwsKms uses v2 kms.Client via config.LoadDefaultConfig; Decrypt uses client.Decrypt; classifyDecryptError uses smithy.APIError and v2 typed exceptions.
Logging
pkg/lightning/log/log.go
IsContextCanceledError updated to detect smithy.CanceledError and smithy.OperationError wrappers; removed awserr-specific recursion.
Storage Helpers & Azure Blob
br/pkg/storage/helper.go, br/pkg/storage/azblob.go, br/pkg/storage/ks3.go, br/pkg/storage/storage.go
ValidateCloudStorageURI now passes a custom http.Client with DisableKeepAlives; azblob copy polling uses aws.ToString; KS3.FileExists treats HTTP 404 as not found; ExternalStorageOptions.S3Retryer now uses v2 aws.Retryer.
Lightning External Backend
pkg/lightning/backend/external/byte_reader.go, pkg/lightning/backend/external/byte_reader_test.go
Switch imports to aws-sdk-go-v2; test S3 client init rewritten to use config.LoadDefaultConfig with BaseEndpoint and UsePathStyle.
S3 Mocks
br/pkg/mock/s3_v2.go, br/pkg/mock/BUILD.bazel
Replace v1 s3iface mock with GoMock-generated MockS3API using v2 s3 types; BUILD updated to include s3_v2.go.
Tests
br/pkg/aws/ebs_test.go, br/pkg/storage/s3_test.go, pkg/lightning/log/log_test.go, pkg/lightning/backend/external/byte_reader_test.go, br/pkg/task/restore_ebs_meta.go, pkg/sessionctx/variable/sysvar_test.go
Update imports and mocks to v2 method signatures; construct smithy.GenericAPIError for error cases; add IMDS disable in sysvar test; update snapshot ID map types to []string.
Compatibility Utilities & Storage Test Harness
br/pkg/storage/compatibility_for_test.go, br/pkg/task/operator/test_storage.go, br/cmd/br/operator.go
Add BucketPrefix, expose WithContentMD5, add MultipartWriter helper; add RunTestStorage harness (TestReport/TestResult/TestContext) and wire operator test-storage subcommand.
Linter Configuration
build/nogo_config.json
Exclude br/pkg/storage from toomanytests linter check due to large test count.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as operator/test-storage
  participant Harness as RunTestStorage
  participant Storage as ExternalStorage (S3/other)
  CLI->>Harness: parse flags, RunTestStorage(ctx,cfg)
  Harness->>Storage: create client (config.LoadDefaultConfig / credentials / endpoint)
  Harness->>Storage: write/read/list/delete operations
  Storage-->>Harness: responses/errors
  Harness-->>CLI: TestReport (results)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pingcap/tidb#66634: The main PR and retrieved PR both perform the same large, code-level migration from AWS SDK for Go v1 to v2 (matching changes in DEPS.bzl, go.mod, numerous br/pkg/storage and br/pkg/aws files, new S3API interface and gomock s3_v2, etc.), so they are directly related.
  • pingcap/tidb#67645: Both PRs modify the br_gcs test scripts (br/tests/br_gcs/run.sh, docker_compatible_gcs/_run.sh, and prepare.sh) to add missing service-account fields and update GCS endpoints — directly related.
  • pingcap/tidb#68233: Related changes to br/pkg/storage/s3.go and S3 handling; likely overlaps.

Suggested labels

ok-to-test

Suggested reviewers

  • Leavrth
  • wjhuang2016
  • BornChanger

"🐰 Hopping through code with a carrot in hand,
I swapped old SDKs for a newer band.
S3, KMS, and EBS now sing in v2,
Mocks and tests follow — tidy and true,
CI hums softly — hop, patch, and stand!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title specifically mentions the AWS SDK upgrade from v1 to v2, which is the main focus of the changeset, directly summarizing the primary change.
Description check ✅ Passed The description includes issue references, problem summary, detailed change explanation with migration guide link, conflict resolution documentation, manual test confirmation, and release notes section following the template structure.
Linked Issues check ✅ Passed The PR fully addresses issue #62541 by migrating BR's AWS SDK from v1 to v2, updating dependencies, imports, error handling, and type signatures as required for the upgrade.
Out of Scope Changes check ✅ Passed All changes are directly related to the AWS SDK v1→v2 migration: dependency updates, import changes, error handling migration, type signature updates, test data structure conversions, and build configuration adjustments are all in-scope for this upgrade effort.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 8, 2026
@YuJuncen YuJuncen changed the base branch from master to release-8.5-20260116-v8.5.5 May 8, 2026 02:36
@ti-chi-bot ti-chi-bot Bot added component/dumpling This is related to Dumpling of TiDB. component/statistics sig/planner SIG: Planner labels May 8, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 8, 2026

Hi @YuJuncen. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@YuJuncen YuJuncen closed this May 8, 2026
@YuJuncen YuJuncen reopened this May 8, 2026
D3Hunter
D3Hunter previously approved these changes May 8, 2026
@YuJuncen YuJuncen removed component/statistics component/dumpling This is related to Dumpling of TiDB. labels May 8, 2026
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 8, 2026
@YuJuncen YuJuncen removed the sig/planner SIG: Planner label May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 21.34146% with 1032 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5-20260116-v8.5.5@34227a6). Learn more about missing BASE report.

Additional details and impacted files
@@                       Coverage Diff                        @@
##             release-8.5-20260116-v8.5.5     #68216   +/-   ##
================================================================
  Coverage                               ?   55.4695%           
================================================================
  Files                                  ?       1822           
  Lines                                  ?     653906           
  Branches                               ?          0           
================================================================
  Hits                                   ?     362719           
  Misses                                 ?     264227           
  Partials                               ?      26960           
Flag Coverage Δ
integration 39.1601% <13.0335%> (?)
unit 64.7744% <21.3414%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9278% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 62.8468% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@YuJuncen
Copy link
Copy Markdown
Contributor Author

YuJuncen commented May 8, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 8, 2026

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@YuJuncen
Copy link
Copy Markdown
Contributor Author

YuJuncen commented May 8, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 8, 2026

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wuhuizuo
Copy link
Copy Markdown
Contributor

wuhuizuo commented May 8, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 8, 2026

@wuhuizuo: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wuhuizuo
Copy link
Copy Markdown
Contributor

wuhuizuo commented May 8, 2026

/override pull-build-next-gen pull-error-log-review pull-integration-realcluster-test-next-gen pull-mysql-client-test-next-gen pull-unit-test-next-gen

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 8, 2026

@wuhuizuo: Overrode contexts on behalf of wuhuizuo: pull-build-next-gen, pull-error-log-review, pull-integration-realcluster-test-next-gen, pull-mysql-client-test-next-gen, pull-unit-test-next-gen

Details

In response to this:

/override pull-build-next-gen pull-error-log-review pull-integration-realcluster-test-next-gen pull-mysql-client-test-next-gen pull-unit-test-next-gen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wuhuizuo wuhuizuo changed the title br: upgrade AWS sdk from v1 to v2 #66634 br: upgrade AWS SDK from v1 to v2 #66634 May 8, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 9, 2026

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wuhuizuo
Copy link
Copy Markdown
Contributor

wuhuizuo commented May 9, 2026

/test pull-br-integration-test

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 9, 2026

@wuhuizuo: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
br/pkg/aws/ebs.go (2)

530-533: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same nil-deref risk as in EnableDataFSR.

Apply the same aws.ToString fix to *resp.Unsuccessful[0].SnapshotId in the disable path so a nil pointer doesn't turn an FSR error into a panic.

🛡️ Proposed fix
-					return errors.Errorf("Some snapshot fails to disable FSR for available zone %s, such as %s, error code is %v", targetAZ, *resp.Unsuccessful[0].SnapshotId, resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)
+					return errors.Errorf("Some snapshot fails to disable FSR for available zone %s, such as %s, error code is %v", targetAZ, aws.ToString(resp.Unsuccessful[0].SnapshotId), resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs.go` around lines 530 - 533, The disable-FSR error return may
nil-deref *resp.Unsuccessful[0].SnapshotId like the EnableDataFSR path; change
the interpolation to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (same
pattern used in EnableDataFSR) so nil snapshot IDs become empty strings instead
of panics, keeping the descriptive message and FastSnapshotRestoreStateErrors
as-is.

359-362: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid raw pointer deref on Unsuccessful[0].SnapshotId.

resp.Unsuccessful[0].SnapshotId is *string and dereferencing it directly will panic if the field is nil — and this is the failure path where the original FSR error gets reported, so a panic here would mask the real underlying error. The rest of the file consistently uses aws.ToString for v2 pointer fields; please do the same here (and on line 532).

🛡️ Proposed nil-safe formatting
-				return errors.Errorf("Some snapshot fails to enable FSR for available zone %s, such as %s, error code is %v", targetAZ, *resp.Unsuccessful[0].SnapshotId, resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)
+				return errors.Errorf("Some snapshot fails to enable FSR for available zone %s, such as %s, error code is %v", targetAZ, aws.ToString(resp.Unsuccessful[0].SnapshotId), resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs.go` around lines 359 - 362, The error path dereferences
resp.Unsuccessful[0].SnapshotId directly which can be nil and cause a panic;
change the formatting to use aws.ToString(resp.Unsuccessful[0].SnapshotId)
(consistent with other uses in this file) when building the error message and
similarly replace any other raw *string derefs (e.g., the one flagged around
line 532) so the error log and the returned errors reference nil-safe string
conversions for SnapshotId while keeping FastSnapshotRestoreStateErrors
unchanged.
br/pkg/storage/s3.go (1)

631-642: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Return non-NoSuchKey failures from getObjectCheck.

If GetObject fails with a transport/context error that doesn't implement smithy.APIError, this falls through to return nil, so permission validation can pass even though the backend is unreachable. Only NoSuchKey should count as success here.

Suggested fix
 	_, err := svc.GetObject(ctx, input)
+	if err == nil {
+		return nil
+	}
 	var aerr smithy.APIError
 	if goerrors.As(err, &aerr) {
 		if aerr.ErrorCode() == noSuchKey {
 			// if key not exists and we reach this error, that
 			// means we have the correct permission to GetObject
 			// other we will get another error
 			return nil
 		}
-		return errors.Trace(err)
 	}
-	return nil
+	return errors.Trace(err)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/storage/s3.go` around lines 631 - 642, The getObjectCheck currently
swallows non-API errors by returning nil when err != nil but not an APIError;
change the logic to treat any non-nil err as a failure unless it's an APIError
with ErrorCode() == noSuchKey: call svc.GetObject(ctx, input), if err != nil
then if goerrors.As(err, &aerr) and aerr.ErrorCode() == noSuchKey return nil;
otherwise return errors.Trace(err); only return nil when svc.GetObject succeeded
or when the error is specifically noSuchKey. Ensure you reference svc.GetObject,
smithy.APIError (aerr) and the noSuchKey constant in your change.
🧹 Nitpick comments (5)
DEPS.bzl (1)

414-422: 💤 Low value

Out-of-scope dependency bump for makezero (v1.1.1 → v1.2.0).

This update appears unrelated to the AWS SDK v1 → v2 migration that is the stated objective of this backport. It likely came along via the merge but expands the change surface of a release-branch PR. Confirm it is intentional (e.g., transitively required) or revert to keep the backport scoped to AWS SDK changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@DEPS.bzl` around lines 414 - 422, The DEPS.bzl change upgrades the makezero
dependency from v1.1.1 to v1.2.0 (see the makezero urls, strip_prefix and sha256
entries), which is out-of-scope for the AWS SDK v1→v2 backport; either confirm
it is intentionally required (e.g., transitively needed) or revert this block to
the prior v1.1.1 values — restore the original sha256, strip_prefix and urls for
makezero if reverting — and add a commit message explaining the decision so the
backport only includes AWS SDK-related changes unless the bump is explicitly
required.
br/pkg/aws/ebs.go (1)

372-396: 💤 Low value

Optional: simplify getFSRCreditBalance signature now that snapShotIDs is []string.

getFSRCreditBalance still takes snapshotID *string, forcing the caller at line 387 to wrap with aws.String(...) only to have it dereferenced/used as a v2 pointer field internally. Since the migration changed snapShotIDs to []string, taking string and constructing aws.String once at the v2 API boundary inside getFSRCreditBalance reads more cleanly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs.go` around lines 372 - 396, The getFSRCreditBalance signature
should accept a string (snapshotID string) instead of *string so callers like
waitDataFSREnabled can pass snapShotIDs[startIdx] directly; change
getFSRCreditBalance(snapshotID *string, targetAZ string) to
getFSRCreditBalance(snapshotID string, targetAZ string), update all call sites
(e.g., waitDataFSREnabled) to pass the plain string, and move any
aws.String(...) conversion into getFSRCreditBalance at the AWS SDK boundary
(where DescribeMetrics/CloudWatch calls need *string) so pointer creation
happens once inside the helper.
pkg/lightning/log/log.go (1)

192-214: 💤 Low value

LGTM with a small note.

The smithy-based cancellation detection is correct. Note that goerrors.Is(err, context.Canceled) on line 192 already traverses through *smithy.OperationError.Unwrap(), so the explicit opErr.Unwrap() checks at lines 206 and 209 are largely defensive duplicates. The only one with marginal value is the status.Code(opErr.Unwrap()) check (since status.Code doesn't unwrap on its own). Not blocking — feel free to leave the extra checks as belt-and-suspenders.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/lightning/log/log.go` around lines 192 - 214, The code contains redundant
cancellation checks because goerrors.Is(err, context.Canceled) already unwraps
through *smithy.OperationError, so in the block that handles var opErr
*smithy.OperationError you should remove the duplicate
goerrors.Is(opErr.Unwrap(), context.Canceled) and goerrors.As(opErr.Unwrap(),
&canceledError) checks and only keep the status.Code(opErr.Unwrap()) ==
codes.Canceled check (or alternatively drop the entire opErr branch if you
prefer the single goerrors.Is + smithy.CanceledError checks to suffice); update
the block that references opErr, canceledError, goerrors.Is, goerrors.As, and
status.Code accordingly to avoid duplicate logic while preserving the
status.Code check which does not unwrap.
br/pkg/kms/aws.go (1)

86-117: ⚡ Quick win

Consolidate duplicate error classification.

The second switch (lines 107–116) over the v2 typed exceptions is unreachable: *types.NotFoundException, *types.InvalidKeyUsageException, *types.DependencyTimeoutException, and *types.KMSInternalException all implement smithy.APIError (via embedded smithy.GenericAPIError), so the first errors.As(err, &apiErr) + apiErr.ErrorCode() switch already matches them. Consolidate to a single switch with a default: arm returning "KMS error", and remove the types import (line 13), which will become unused.

♻️ Proposed simplification
-// classifyDecryptError uses v2 SDK error types
 func classifyDecryptError(err error) error {
 	var apiErr smithy.APIError
 	if errors.As(err, &apiErr) {
 		switch apiErr.ErrorCode() {
-		case "NotFoundException":
-			return pErrors.Annotate(err, "wrong master key")
-		case "InvalidKeyUsageException":
+		case "NotFoundException", "InvalidKeyUsageException":
 			return pErrors.Annotate(err, "wrong master key")
 		case "DependencyTimeoutException":
 			return pErrors.Annotate(err, "API timeout")
 		case "KMSInternalException":
 			return pErrors.Annotate(err, "API internal error")
+		default:
+			return pErrors.Annotate(err, "KMS error")
 		}
 	}
-
-	// also check for specific v2 error types
-	var notFoundErr *types.NotFoundException
-	var invalidKeyErr *types.InvalidKeyUsageException
-	var timeoutErr *types.DependencyTimeoutException
-	var internalErr *types.KMSInternalException
-
-	switch {
-	case errors.As(err, &notFoundErr), errors.As(err, &invalidKeyErr):
-		return pErrors.Annotate(err, "wrong master key")
-	case errors.As(err, &timeoutErr):
-		return pErrors.Annotate(err, "API timeout")
-	case errors.As(err, &internalErr):
-		return pErrors.Annotate(err, "API internal error")
-	default:
-		return pErrors.Annotate(err, "KMS error")
-	}
+	return pErrors.Annotate(err, "KMS error")
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/kms/aws.go` around lines 86 - 117, The classifyDecryptError function
duplicates error handling: remove the second switch that checks v2 typed
exceptions and consolidate all classification into the initial smithy.APIError
path (the errors.As(err, &apiErr) block) by adding a default: case that returns
pErrors.Annotate(err, "KMS error") when the ErrorCode() doesn't match known
codes; drop the unused types import and any now-unreferenced local variables
(notFoundErr, invalidKeyErr, timeoutErr, internalErr) so classification relies
solely on apiErr.ErrorCode().
br/pkg/aws/ebs_test.go (1)

85-127: 💤 Low value

Consider reducing test logic duplication.

The testWaitSnapshotsCreated test helper duplicates snapshot-processing logic from the production code. While this approach works for testing, it creates a maintenance burden—changes to the production logic must be manually mirrored here.

Consider whether the production WaitSnapshotsCreated could accept an injectable EC2 API interface (or use the existing S3API-style abstraction pattern) to allow direct testing with mocked DescribeSnapshots responses, eliminating the need for this parallel test-only implementation.

Note: Line 123's indefinite block (<-make(chan struct{})) is intentional to simulate timeout behavior when uncompleted snapshots remain.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/aws/ebs_test.go` around lines 85 - 127, testWaitSnapshotsCreated
duplicates snapshot-processing logic from production; refactor so production
WaitSnapshotsCreated accepts an injectable EC2 DescribeSnapshots interface (or
follow the S3API-style abstraction) and move shared processing into the real
WaitSnapshotsCreated (or a helper used by both) so tests call
WaitSnapshotsCreated with a mocked DescribeSnapshots response instead of a
test-only implementation; update EC2Session to depend on the new interface,
reuse extractSnapProgress, and remove the duplicate testWaitSnapshotsCreated
logic (keep the blocking <-make(chan struct{}) behavior in tests by returning an
uncompleted-snapshots condition from the mocked DescribeSnapshots).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/pkg/storage/s3_test.go`:
- Around line 552-557: The mock callback for
s.s3.EXPECT().DeleteObject(...).DoAndReturn currently declares and returns a
*s3.DeleteObjectInput but the real DeleteObject signature returns
(*s3.DeleteObjectOutput, error); update the DoAndReturn callback signature and
return value to return (*s3.DeleteObjectOutput, error) and return an instance
like &s3.DeleteObjectOutput{} (keeping the same assertions against input.Bucket
and input.Key) so the mock types match the DeleteObject method.

In `@br/pkg/storage/s3.go`:
- Around line 659-665: In PutAndDeleteObjectCheck avoid treating a nil err as an
error: only inspect err2 with goerrors.As when err2 != nil. After calling
svc.DeleteObject(ctx, input) check if err2 is non-nil, then use
goerrors.As(err2, &noSuchKey) to suppress warnings for NoSuchKey but call
log.Warn(...) with zap.Error(err2) for other errors; reference svc.DeleteObject,
err2, types.NoSuchKey, goerrors.As and log.Warn to locate where to add the nil
check.
- Around line 462-490: The S3 client setup currently applies credentials twice:
via config (config.LoadDefaultConfig / cfg.Credentials) and again by appending
an s3 option that sets o.Credentials from autoNewCred; when qs.RoleArn is set
this later overwrite breaks role assumption. Fix by preventing the redundant
s3Opts append that sets o.Credentials when qs.RoleArn is non-empty (or simply
remove the s3Opts credential-setting block that uses autoNewCred), so that when
RoleArn is provided cfg.Credentials (the stscreds.NewAssumeRoleProvider) remains
the effective credentials provider.

In `@br/pkg/task/operator/test_storage.go`:
- Around line 476-530: The test hardcodes startOffset/endOffset (used in the
storage.ReaderOption passed to tc.Store.Open) to [100,200) which fails when
expectedData is smaller (e.g. --test-data-size=128); modify the test to compute
a valid range from len(expectedData) (or validate a minimum size up front):
compute endOffset := int64(len(expectedData)) and if endOffset <= 100 set
startOffset = 0 else set startOffset = endOffset-100 (or clamp endOffset to
startOffset + desiredRange), then use these startOffset/endOffset values in the
ReaderOption so the range is always within expectedData; alternatively add an
explicit minimum-size check in ParseFromFlags and fail fast if test-data-size <
200.
- Around line 226-227: The current logger.Info call is logging
cfg.BackendOptions verbatim (via zap.Any) which can leak credentials; instead
build a redacted representation (e.g., pick only non-sensitive keys or mask
values for keys like access_key, secret_key, session_token, password) and pass
that to logger.Info; update the call that uses logger.Info("Starting external
storage test", zap.String("storage", cfg.StorageURI), zap.Any("options",
cfg.BackendOptions)) to use the redacted map/struct so only safe fields from
cfg.BackendOptions are logged and any sensitive fields are removed or masked.
- Around line 268-279: The current deferred call defer cleanup() runs even on
failure; change it so cleanup respects cfg.CleanupOnSuccess and only runs on
successful tests by adding a boolean flag (e.g., succeeded) captured by the
deferred cleanup or by removing the unconditional defer and invoking cleanup
conditionally at the end of the test; ensure the cleanup function still checks
cfg.CleanupOnSuccess and only performs deletions when both cfg.CleanupOnSuccess
is true and succeeded is true (update references to cleanup,
cfg.CleanupOnSuccess and the defer cleanup() placement accordingly).

---

Outside diff comments:
In `@br/pkg/aws/ebs.go`:
- Around line 530-533: The disable-FSR error return may nil-deref
*resp.Unsuccessful[0].SnapshotId like the EnableDataFSR path; change the
interpolation to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (same pattern
used in EnableDataFSR) so nil snapshot IDs become empty strings instead of
panics, keeping the descriptive message and FastSnapshotRestoreStateErrors
as-is.
- Around line 359-362: The error path dereferences
resp.Unsuccessful[0].SnapshotId directly which can be nil and cause a panic;
change the formatting to use aws.ToString(resp.Unsuccessful[0].SnapshotId)
(consistent with other uses in this file) when building the error message and
similarly replace any other raw *string derefs (e.g., the one flagged around
line 532) so the error log and the returned errors reference nil-safe string
conversions for SnapshotId while keeping FastSnapshotRestoreStateErrors
unchanged.

In `@br/pkg/storage/s3.go`:
- Around line 631-642: The getObjectCheck currently swallows non-API errors by
returning nil when err != nil but not an APIError; change the logic to treat any
non-nil err as a failure unless it's an APIError with ErrorCode() == noSuchKey:
call svc.GetObject(ctx, input), if err != nil then if goerrors.As(err, &aerr)
and aerr.ErrorCode() == noSuchKey return nil; otherwise return
errors.Trace(err); only return nil when svc.GetObject succeeded or when the
error is specifically noSuchKey. Ensure you reference svc.GetObject,
smithy.APIError (aerr) and the noSuchKey constant in your change.

---

Nitpick comments:
In `@br/pkg/aws/ebs_test.go`:
- Around line 85-127: testWaitSnapshotsCreated duplicates snapshot-processing
logic from production; refactor so production WaitSnapshotsCreated accepts an
injectable EC2 DescribeSnapshots interface (or follow the S3API-style
abstraction) and move shared processing into the real WaitSnapshotsCreated (or a
helper used by both) so tests call WaitSnapshotsCreated with a mocked
DescribeSnapshots response instead of a test-only implementation; update
EC2Session to depend on the new interface, reuse extractSnapProgress, and remove
the duplicate testWaitSnapshotsCreated logic (keep the blocking <-make(chan
struct{}) behavior in tests by returning an uncompleted-snapshots condition from
the mocked DescribeSnapshots).

In `@br/pkg/aws/ebs.go`:
- Around line 372-396: The getFSRCreditBalance signature should accept a string
(snapshotID string) instead of *string so callers like waitDataFSREnabled can
pass snapShotIDs[startIdx] directly; change getFSRCreditBalance(snapshotID
*string, targetAZ string) to getFSRCreditBalance(snapshotID string, targetAZ
string), update all call sites (e.g., waitDataFSREnabled) to pass the plain
string, and move any aws.String(...) conversion into getFSRCreditBalance at the
AWS SDK boundary (where DescribeMetrics/CloudWatch calls need *string) so
pointer creation happens once inside the helper.

In `@br/pkg/kms/aws.go`:
- Around line 86-117: The classifyDecryptError function duplicates error
handling: remove the second switch that checks v2 typed exceptions and
consolidate all classification into the initial smithy.APIError path (the
errors.As(err, &apiErr) block) by adding a default: case that returns
pErrors.Annotate(err, "KMS error") when the ErrorCode() doesn't match known
codes; drop the unused types import and any now-unreferenced local variables
(notFoundErr, invalidKeyErr, timeoutErr, internalErr) so classification relies
solely on apiErr.ErrorCode().

In `@DEPS.bzl`:
- Around line 414-422: The DEPS.bzl change upgrades the makezero dependency from
v1.1.1 to v1.2.0 (see the makezero urls, strip_prefix and sha256 entries), which
is out-of-scope for the AWS SDK v1→v2 backport; either confirm it is
intentionally required (e.g., transitively needed) or revert this block to the
prior v1.1.1 values — restore the original sha256, strip_prefix and urls for
makezero if reverting — and add a commit message explaining the decision so the
backport only includes AWS SDK-related changes unless the bump is explicitly
required.

In `@pkg/lightning/log/log.go`:
- Around line 192-214: The code contains redundant cancellation checks because
goerrors.Is(err, context.Canceled) already unwraps through
*smithy.OperationError, so in the block that handles var opErr
*smithy.OperationError you should remove the duplicate
goerrors.Is(opErr.Unwrap(), context.Canceled) and goerrors.As(opErr.Unwrap(),
&canceledError) checks and only keep the status.Code(opErr.Unwrap()) ==
codes.Canceled check (or alternatively drop the entire opErr branch if you
prefer the single goerrors.Is + smithy.CanceledError checks to suffice); update
the block that references opErr, canceledError, goerrors.Is, goerrors.As, and
status.Code accordingly to avoid duplicate logic while preserving the
status.Code check which does not unwrap.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c1e8e88e-9b9f-46f2-bfa3-c389b546fd9e

📥 Commits

Reviewing files that changed from the base of the PR and between 34227a6 and 8cfe383.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (32)
  • DEPS.bzl
  • Makefile
  • br/cmd/br/operator.go
  • br/pkg/aws/BUILD.bazel
  • br/pkg/aws/ebs.go
  • br/pkg/aws/ebs_test.go
  • br/pkg/kms/BUILD.bazel
  • br/pkg/kms/aws.go
  • br/pkg/mock/BUILD.bazel
  • br/pkg/mock/s3_v2.go
  • br/pkg/mock/s3iface.go
  • br/pkg/storage/BUILD.bazel
  • br/pkg/storage/azblob.go
  • br/pkg/storage/compatibility_for_test.go
  • br/pkg/storage/helper.go
  • br/pkg/storage/ks3.go
  • br/pkg/storage/s3.go
  • br/pkg/storage/s3_interface.go
  • br/pkg/storage/s3_test.go
  • br/pkg/storage/storage.go
  • br/pkg/task/operator/BUILD.bazel
  • br/pkg/task/operator/test_storage.go
  • br/pkg/task/restore_ebs_meta.go
  • build/nogo_config.json
  • go.mod
  • pkg/lightning/backend/external/BUILD.bazel
  • pkg/lightning/backend/external/byte_reader.go
  • pkg/lightning/backend/external/byte_reader_test.go
  • pkg/lightning/log/BUILD.bazel
  • pkg/lightning/log/log.go
  • pkg/lightning/log/log_test.go
  • pkg/sessionctx/variable/sysvar_test.go
💤 Files with no reviewable changes (1)
  • Makefile

Comment thread br/pkg/storage/s3_test.go
Comment on lines 552 to 557
s.s3.EXPECT().
DeleteObjectWithContext(ctx, gomock.Any()).
DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, opt ...request.Option) (*s3.DeleteObjectInput, error) {
require.Equal(t, "bucket", aws.StringValue(input.Bucket))
require.Equal(t, "prefix/file", aws.StringValue(input.Key))
DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).
DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) {
require.Equal(t, "bucket", aws.ToString(input.Bucket))
require.Equal(t, "prefix/file", aws.ToString(input.Key))
return &s3.DeleteObjectInput{}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# Locate the files mentioned in the review
find . -name "s3_test.go" -path "*/br/pkg/storage/*" 2>/dev/null
find . -name "s3_v2.go" -path "*/br/pkg/mock/*" 2>/dev/null

Repository: pingcap/tidb

Length of output: 107


🏁 Script executed:

# Read the specific lines from s3_test.go
if [ -f "br/pkg/storage/s3_test.go" ]; then
  sed -n '550,560p' "br/pkg/storage/s3_test.go" | cat -n
fi

Repository: pingcap/tidb

Length of output: 566


🏁 Script executed:

# Find and examine the DeleteObject method signature in the mock
if [ -f "br/pkg/mock/s3_v2.go" ]; then
  grep -A 5 "DeleteObject" "br/pkg/mock/s3_v2.go" | head -20
fi

Repository: pingcap/tidb

Length of output: 856


Return *s3.DeleteObjectOutput from this mock callback.

The DoAndReturn callback is typed to return (*s3.DeleteObjectInput, error) and returns &s3.DeleteObjectInput{}, but DeleteObject is mocked to return (*s3.DeleteObjectOutput, error). This type mismatch causes GoMock to reject the expectation, breaking the test.

Suggested fix
 s.s3.EXPECT().
 	DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).
-	DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) {
+	DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectOutput, error) {
 		require.Equal(t, "bucket", aws.ToString(input.Bucket))
 		require.Equal(t, "prefix/file", aws.ToString(input.Key))
-		return &s3.DeleteObjectInput{}, nil
+		return &s3.DeleteObjectOutput{}, nil
 	})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s.s3.EXPECT().
DeleteObjectWithContext(ctx, gomock.Any()).
DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, opt ...request.Option) (*s3.DeleteObjectInput, error) {
require.Equal(t, "bucket", aws.StringValue(input.Bucket))
require.Equal(t, "prefix/file", aws.StringValue(input.Key))
DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).
DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) {
require.Equal(t, "bucket", aws.ToString(input.Bucket))
require.Equal(t, "prefix/file", aws.ToString(input.Key))
return &s3.DeleteObjectInput{}, nil
s.s3.EXPECT().
DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).
DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectOutput, error) {
require.Equal(t, "bucket", aws.ToString(input.Bucket))
require.Equal(t, "prefix/file", aws.ToString(input.Key))
return &s3.DeleteObjectOutput{}, nil
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/storage/s3_test.go` around lines 552 - 557, The mock callback for
s.s3.EXPECT().DeleteObject(...).DoAndReturn currently declares and returns a
*s3.DeleteObjectInput but the real DeleteObject signature returns
(*s3.DeleteObjectOutput, error); update the DoAndReturn callback signature and
return value to return (*s3.DeleteObjectOutput, error) and return an instance
like &s3.DeleteObjectOutput{} (keeping the same assertions against input.Bucket
and input.Key) so the mock types match the DeleteObject method.

Comment thread br/pkg/storage/s3.go
Comment on lines +462 to +490
if qs.Profile == "" {
cred, err := autoNewCred(&qs)
if err != nil {
return nil, errors.Trace(err)
}
if cred != nil {
s3Opts = append(s3Opts, func(o *s3.Options) {
o.Credentials = cred
})
}
}

// Handle role assumption if specified
if len(qs.RoleArn) > 0 {
// Create STS client for role assumption
stsClient := sts.NewFromConfig(cfg)

// Configure assume role provider with external ID if provided
var assumeRoleProvider aws.CredentialsProvider
if len(qs.ExternalId) > 0 {
assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) {
o.ExternalID = &qs.ExternalId
})
} else {
assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn)
}

// Update config with assume role credentials
cfg.Credentials = assumeRoleProvider
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd br/pkg/storage && head -c 100 s3.go && wc -l s3.go

Repository: pingcap/tidb

Length of output: 167


🏁 Script executed:

# Get the specific lines mentioned in the review
sed -n '462,490p' br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 902


🏁 Script executed:

# Get more context around the credential setup to understand the full flow
sed -n '440,520p' br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 2911


🏁 Script executed:

# Search for NewS3Storage function definition
rg -n "func.*NewS3Storage" br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 259


🏁 Script executed:

# Get the complete function to see the full flow
sed -n '372,530p' br/pkg/storage/s3.go

Repository: pingcap/tidb

Length of output: 5337


🌐 Web query:

AWS SDK Go v2 s3.NewFromConfig options credential override behavior

💡 Result:

In AWS SDK for Go v2, s3.NewFromConfig copies credentials from the provided aws.Config, and the constructor/service Options you pass can override those credentials (including disabling signing by forcing anonymous/nil credentials). 1) Default behavior (no override): uses cfg.Credentials (which comes from config.LoadDefaultConfig) - aws.Config has a Credentials field (“The credentials object to use when signing requests”). [1] - When you create the aws.Config via config.LoadDefaultConfig, the SDK resolves credentials via its default credential chain; and if you explicitly provide credentials, “the SDK uses only those credentials.” [2] - s3.NewFromConfig builds the S3 service Options using cfg.Credentials (among other fields), then calls New(opts, optFns...). [3] Practical implication: - If you call cfg, _:= config.LoadDefaultConfig(...), then client:= s3.NewFromConfig(cfg), S3 will sign requests using whatever credentials were resolved into cfg.Credentials. [2][3] 2) Overriding credentials at the S3 client level (optFn / Options): the optFn mutates s3.Options, including Credentials - s3.NewFromConfig takes functional options that mutate the service’s Options struct. [4] - The S3 service Options struct contains Credentials aws.CredentialsProvider (“The credentials object to use when signing requests.”). [5] - s3.NewFromConfig initially sets Options.Credentials = cfg.Credentials before applying optFns, so an optFn can replace o.Credentials and thereby override the credentials used for signing. [3][5] 3) Disabling signing / forcing “anonymous” behavior - The SDK documents that using the AnonymousCredentials provider instructs signing middleware to not sign requests. [6] - It also states: “Using nil credentials when configuring an API client will achieve the same result.” and shows that when using external config loading, you can use config.WithCredentialsProvider(aws.AnonymousCredentials{}) to prevent credential resolution attempts. [6] - Additionally, the same source notes that for NewFromConfig, if the external config resolved a credential provider, you need to explicitly set the API client Options Credentials member to nil to avoid signing; otherwise it may keep the resolved provider. [6] - In the generated S3 service code, there’s special handling to “ignoreAnonymousAuth” by converting AnonymousCredentials to nil. [5] 4) Important distinction: you must use config.LoadDefaultConfig to get the default credential-chain behavior - The credentials “defaults to a chain…” behavior occurs when you use config.LoadDefaultConfig, not automatically just by having an empty aws.Config. This is clarified in an SDK issue discussion. [7] Example patterns (behaviorally accurate) A. Use the default credential chain from env/profile/roles - cfg:= config.LoadDefaultConfig(ctx,...) - client:= s3.NewFromConfig(cfg) (Uses cfg.Credentials resolved by LoadDefaultConfig.) [2][3] B. Override credentials explicitly for the S3 client - client:= s3.NewFromConfig(cfg, func(o *s3.Options) { o.Credentials = myCredProvider }) (Option mutates o.Credentials; Options.Credentials is what signing middleware uses.) [4][5][3] C. Disable signing (public S3 objects / no signature) - client:= s3.NewFromConfig(cfg, func(o *s3.Options) { o.Credentials = nil }) (or o.Credentials = aws.AnonymousCredentials{}) (This is documented as equivalent to using nil for signing middleware behavior.) [6][5] Net answer to “credential override behavior” - s3.NewFromConfig starts with cfg.Credentials. [3] - Any s3.NewFromConfig option function that sets *s3.Options.Credentials overrides cfg.Credentials for request signing. [3][5] - Setting Credentials to aws.AnonymousCredentials{} or nil disables request signing (and may also be converted to nil internally for S3). [6][5]

Citations:


🏁 Script executed:

# Also check if there are any tests that exercise both RoleArn and credential scenarios
find br -name "*test*.go" -type f | xargs rg -l "RoleArn\|AssumeRole" | head -5

Repository: pingcap/tidb

Length of output: 38


🏁 Script executed:

# Look for any comments or documentation in the code about credential precedence
rg -n "credential|Credential" br/pkg/storage/s3.go | head -20

Repository: pingcap/tidb

Length of output: 1401


Remove duplicative credential setup at S3 options level when role assumption is used.

The code sets credentials twice: first at the config level (lines 406–416 via config.WithCredentialsProvider), then again at the S3 options level (lines 461–470 via o.Credentials = cred). When RoleArn is specified, cfg.Credentials is updated to the STS assume-role provider, but the S3 options function still applies afterward, overriding it with the static credentials. This prevents role assumption from working.

The simplest fix is to skip the redundant S3 options credential setup when RoleArn is present:

Suggested fix
-	if qs.Profile == "" {
+	if qs.Profile == "" && len(qs.RoleArn) == 0 {
 		cred, err := autoNewCred(&qs)
 		if err != nil {
 			return nil, errors.Trace(err)
 		}
 		if cred != nil {
 			s3Opts = append(s3Opts, func(o *s3.Options) {
 				o.Credentials = cred
 			})
 		}
 	}

Alternatively, remove the entire lines 461–470 block since credentials are already set during config.LoadDefaultConfig (lines 406–416), making the S3 options setup unnecessary.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if qs.Profile == "" {
cred, err := autoNewCred(&qs)
if err != nil {
return nil, errors.Trace(err)
}
if cred != nil {
s3Opts = append(s3Opts, func(o *s3.Options) {
o.Credentials = cred
})
}
}
// Handle role assumption if specified
if len(qs.RoleArn) > 0 {
// Create STS client for role assumption
stsClient := sts.NewFromConfig(cfg)
// Configure assume role provider with external ID if provided
var assumeRoleProvider aws.CredentialsProvider
if len(qs.ExternalId) > 0 {
assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) {
o.ExternalID = &qs.ExternalId
})
} else {
assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn)
}
// Update config with assume role credentials
cfg.Credentials = assumeRoleProvider
if qs.Profile == "" && len(qs.RoleArn) == 0 {
cred, err := autoNewCred(&qs)
if err != nil {
return nil, errors.Trace(err)
}
if cred != nil {
s3Opts = append(s3Opts, func(o *s3.Options) {
o.Credentials = cred
})
}
}
// Handle role assumption if specified
if len(qs.RoleArn) > 0 {
// Create STS client for role assumption
stsClient := sts.NewFromConfig(cfg)
// Configure assume role provider with external ID if provided
var assumeRoleProvider aws.CredentialsProvider
if len(qs.ExternalId) > 0 {
assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) {
o.ExternalID = &qs.ExternalId
})
} else {
assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn)
}
// Update config with assume role credentials
cfg.Credentials = assumeRoleProvider
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/storage/s3.go` around lines 462 - 490, The S3 client setup currently
applies credentials twice: via config (config.LoadDefaultConfig /
cfg.Credentials) and again by appending an s3 option that sets o.Credentials
from autoNewCred; when qs.RoleArn is set this later overwrite breaks role
assumption. Fix by preventing the redundant s3Opts append that sets
o.Credentials when qs.RoleArn is non-empty (or simply remove the s3Opts
credential-setting block that uses autoNewCred), so that when RoleArn is
provided cfg.Credentials (the stscreds.NewAssumeRoleProvider) remains the
effective credentials provider.

Comment thread br/pkg/storage/s3.go
Comment on lines +659 to 665
_, err2 := svc.DeleteObject(ctx, input)
var noSuchKey *types.NoSuchKey
if !goerrors.As(err2, &noSuchKey) {
log.Warn("failed to delete object used for permission check",
zap.String("bucket", options.Bucket),
zap.String("key", *input.Key), zap.Error(err2))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid warning on successful cleanup in PutAndDeleteObjectCheck.

goerrors.As(nil, &noSuchKey) is false, so Line 662 logs "failed to delete..." even when DeleteObject succeeds. That will emit a misleading warning on every successful permission check.

Suggested fix
 	_, err2 := svc.DeleteObject(ctx, input)
 	var noSuchKey *types.NoSuchKey
-	if !goerrors.As(err2, &noSuchKey) {
+	if err2 != nil && !goerrors.As(err2, &noSuchKey) {
 		log.Warn("failed to delete object used for permission check",
 			zap.String("bucket", options.Bucket),
 			zap.String("key", *input.Key), zap.Error(err2))
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_, err2 := svc.DeleteObject(ctx, input)
var noSuchKey *types.NoSuchKey
if !goerrors.As(err2, &noSuchKey) {
log.Warn("failed to delete object used for permission check",
zap.String("bucket", options.Bucket),
zap.String("key", *input.Key), zap.Error(err2))
}
_, err2 := svc.DeleteObject(ctx, input)
var noSuchKey *types.NoSuchKey
if err2 != nil && !goerrors.As(err2, &noSuchKey) {
log.Warn("failed to delete object used for permission check",
zap.String("bucket", options.Bucket),
zap.String("key", *input.Key), zap.Error(err2))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/storage/s3.go` around lines 659 - 665, In PutAndDeleteObjectCheck
avoid treating a nil err as an error: only inspect err2 with goerrors.As when
err2 != nil. After calling svc.DeleteObject(ctx, input) check if err2 is
non-nil, then use goerrors.As(err2, &noSuchKey) to suppress warnings for
NoSuchKey but call log.Warn(...) with zap.Error(err2) for other errors;
reference svc.DeleteObject, err2, types.NoSuchKey, goerrors.As and log.Warn to
locate where to add the nil check.

Comment on lines +226 to +227
logger := logutil.Logger(ctx)
logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't log BackendOptions verbatim.

zap.Any("options", cfg.BackendOptions) will serialize storage credentials when the backend is configured via flags, including access keys and session tokens. Please log a redacted subset instead.

Suggested fix
-	logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))
+	logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger := logutil.Logger(ctx)
logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))
logger := logutil.Logger(ctx)
logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/operator/test_storage.go` around lines 226 - 227, The current
logger.Info call is logging cfg.BackendOptions verbatim (via zap.Any) which can
leak credentials; instead build a redacted representation (e.g., pick only
non-sensitive keys or mask values for keys like access_key, secret_key,
session_token, password) and pass that to logger.Info; update the call that uses
logger.Info("Starting external storage test", zap.String("storage",
cfg.StorageURI), zap.Any("options", cfg.BackendOptions)) to use the redacted
map/struct so only safe fields from cfg.BackendOptions are logged and any
sensitive fields are removed or masked.

Comment on lines +268 to +279
cleanup := func() {
if !cfg.CleanupOnSuccess {
return
}
printStep("Cleaning up test files...")
_ = store.DeleteFile(ctx, testFileName1)
_ = store.DeleteFile(ctx, testFileName2)
_ = store.DeleteFile(ctx, testFileNameRenamed)
_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1))
_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2))
}
defer cleanup()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

--cleanup currently runs on failure too.

The deferred cleanup is unconditional, so test artifacts are removed even when cases fail. That contradicts the CleanupOnSuccess name/help text and makes --pause-when-fail much less useful.

Suggested fix
-	defer cleanup()
+	defer func() {
+		if cfg.CleanupOnSuccess && report.FailedTests == 0 {
+			cleanup()
+		}
+	}()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cleanup := func() {
if !cfg.CleanupOnSuccess {
return
}
printStep("Cleaning up test files...")
_ = store.DeleteFile(ctx, testFileName1)
_ = store.DeleteFile(ctx, testFileName2)
_ = store.DeleteFile(ctx, testFileNameRenamed)
_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1))
_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2))
}
defer cleanup()
cleanup := func() {
if !cfg.CleanupOnSuccess {
return
}
printStep("Cleaning up test files...")
_ = store.DeleteFile(ctx, testFileName1)
_ = store.DeleteFile(ctx, testFileName2)
_ = store.DeleteFile(ctx, testFileNameRenamed)
_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1))
_ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2))
}
defer func() {
if cfg.CleanupOnSuccess && report.FailedTests == 0 {
cleanup()
}
}()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/operator/test_storage.go` around lines 268 - 279, The current
deferred call defer cleanup() runs even on failure; change it so cleanup
respects cfg.CleanupOnSuccess and only runs on successful tests by adding a
boolean flag (e.g., succeeded) captured by the deferred cleanup or by removing
the unconditional defer and invoking cleanup conditionally at the end of the
test; ensure the cleanup function still checks cfg.CleanupOnSuccess and only
performs deletions when both cfg.CleanupOnSuccess is true and succeeded is true
(update references to cleanup, cfg.CleanupOnSuccess and the defer cleanup()
placement accordingly).

Comment on lines +476 to +530
startOffset := int64(100)
endOffset := int64(200)
reader, err := tc.Store.Open(tc.Ctx, name, &storage.ReaderOption{
StartOffset: &startOffset,
EndOffset: &endOffset,
})

result := TestResult{
Name: testName,
}

if err != nil {
printError(" ❌ Failed: %v", err)
result.Duration = time.Since(start)
result.Passed = false
result.Error = errors.Annotate(err, "Open with range failed")
tc.AddResult(result)
return
}
defer reader.Close()

data, err := io.ReadAll(reader)
duration := time.Since(start)
result.Duration = duration

if err != nil {
printError(" ❌ Failed to read: %v", err)
result.Passed = false
result.Error = errors.Annotate(err, "ReadAll failed")
tc.AddResult(result)
return
}

expectedSize := endOffset - startOffset
if int64(len(data)) != expectedSize {
printError(" ❌ Failed: expected size %d, got %d", expectedSize, len(data))
result.Passed = false
result.Error = errors.Errorf("Range read returned wrong size: expected %d, got %d", expectedSize, len(data))
} else {
// Verify data matches the expected range
dataValid := true
for i := range data {
if data[i] != expectedData[startOffset+int64(i)] {
printError(" ❌ Failed: data mismatch at offset %d", i)
result.Passed = false
result.Error = errors.Errorf("Range read returned corrupted data at offset %d", i)
dataValid = false
break
}
}
if dataValid {
printSuccess(" ✓ Passed (read %d bytes)", len(data))
result.Passed = true
result.Details = fmt.Sprintf("Read %d bytes from range [%d, %d)", len(data), startOffset, endOffset)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Small --test-data-size values make this range test fail spuriously.

This always requests bytes [100, 200), but ParseFromFlags accepts any positive size. A user can set --test-data-size=128 and get a harness failure even when the storage backend is correct. Either validate a minimum size up front or derive the range from len(expectedData).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/operator/test_storage.go` around lines 476 - 530, The test
hardcodes startOffset/endOffset (used in the storage.ReaderOption passed to
tc.Store.Open) to [100,200) which fails when expectedData is smaller (e.g.
--test-data-size=128); modify the test to compute a valid range from
len(expectedData) (or validate a minimum size up front): compute endOffset :=
int64(len(expectedData)) and if endOffset <= 100 set startOffset = 0 else set
startOffset = endOffset-100 (or clamp endOffset to startOffset + desiredRange),
then use these startOffset/endOffset values in the ReaderOption so the range is
always within expectedData; alternatively add an explicit minimum-size check in
ParseFromFlags and fail fast if test-data-size < 200.

@jebter
Copy link
Copy Markdown

jebter commented May 9, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 9, 2026

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@YuJuncen
Copy link
Copy Markdown
Contributor Author

YuJuncen commented May 9, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 9, 2026

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@YuJuncen YuJuncen changed the title br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5 br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=v8.5.5 pd=v8.5.5 tiflash=v8.5.5 May 9, 2026
@YuJuncen
Copy link
Copy Markdown
Contributor Author

YuJuncen commented May 9, 2026

/test pull-br-integration-test

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 9, 2026

@YuJuncen: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jebter jebter changed the title br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=v8.5.5 pd=v8.5.5 tiflash=v8.5.5 br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5 May 11, 2026
@jebter
Copy link
Copy Markdown

jebter commented May 11, 2026

/test pull-br-integration-test

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 11, 2026

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
br/tests/br_gcs/run.sh (1)

142-142: 💤 Low value

Optional: Simplify arithmetic variable reference.

In arithmetic contexts, the ${} syntax is unnecessary and can be simplified for consistency with shell best practices.

♻️ Proposed simplification
     for i in $(seq $DB_COUNT); do
-        row_count_new[${i}]=$(run_sql "SELECT COUNT(*) FROM $DB${i}.$TABLE;" | awk '/COUNT/{print $2}')
+        row_count_new[$i]=$(run_sql "SELECT COUNT(*) FROM $DB${i}.$TABLE;" | awk '/COUNT/{print $2}')
     done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/tests/br_gcs/run.sh` at line 142, The array index and arithmetic-style
variable reference use unnecessary ${} expansion; change the assignment to use
the simpler form for the index and any pure arithmetic contexts (e.g. replace
row_count_new[${i}] with row_count_new[i]) while keeping ${DB}${i}.$TABLE or
other expansions that are needed for concatenation intact; update the assignment
that calls run_sql accordingly (referencing row_count_new, run_sql, DB, TABLE,
and i).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/tests/br_gcs/run.sh`:
- Line 92: The run_br invocation uses unquoted shell variables which can cause
word splitting/globbing (e.g. --pd $PD_ADDR); update the calls to quote all
variable expansions used in run_br and similar lines so each value is treated as
a single argument: wrap $PD_ADDR, $BUCKET, $DB, $GCS_HOST and $GCS_PORT in
double quotes wherever they are expanded (references: the run_br command and its
--pd and -s arguments shown in the diff; also apply the same quoting to the
other occurrences flagged at lines 101, 111, 139).

In `@br/tests/docker_compatible_gcs/_run.sh`:
- Line 45: The restore command invocation using "bin/br restore db" is passing
the PD address unquoted (PD_ADDR) which can cause word-splitting/globbing;
update the command to quote the variable (use "$PD_ADDR") in the --pd argument
so the entire address is passed as a single token. Locate the line with the
bin/br restore db command and replace --pd $PD_ADDR with --pd "$PD_ADDR".

---

Nitpick comments:
In `@br/tests/br_gcs/run.sh`:
- Line 142: The array index and arithmetic-style variable reference use
unnecessary ${} expansion; change the assignment to use the simpler form for the
index and any pure arithmetic contexts (e.g. replace row_count_new[${i}] with
row_count_new[i]) while keeping ${DB}${i}.$TABLE or other expansions that are
needed for concatenation intact; update the assignment that calls run_sql
accordingly (referencing row_count_new, run_sql, DB, TABLE, and i).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e87ecd62-248d-4df4-a6ff-b9eb65437f65

📥 Commits

Reviewing files that changed from the base of the PR and between 26a1046 and f0302db.

📒 Files selected for processing (3)
  • br/tests/br_gcs/run.sh
  • br/tests/docker_compatible_gcs/_run.sh
  • br/tests/docker_compatible_gcs/prepare.sh

Comment thread br/tests/br_gcs/run.sh
--cert "$TEST_DIR/certs/br.pem" \
--key "$TEST_DIR/certs/br.key" \
--pd $PD_ADDR -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
run_br --pd $PD_ADDR backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote variables to prevent word splitting.

Multiple instances of unquoted $PD_ADDR can lead to word splitting or globbing if the variable contains spaces or special characters.

🛡️ Proposed fix
 # Line 92
-run_br --pd $PD_ADDR backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT"
+run_br --pd "$PD_ADDR" backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT"

 # Line 101
         --key "$TEST_DIR/certs/br.key" \
-        --pd $PD_ADDR -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
+        --pd "$PD_ADDR" -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false

 # Line 111
-run_br restore full -s "gcs://$BUCKET/$DB?" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
+run_br restore full -s "gcs://$BUCKET/$DB?" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

 # Line 139
-    run_br restore full -s "gcs://$BUCKET/${DB}_old" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
+    run_br restore full -s "gcs://$BUCKET/${DB}_old" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false

Also applies to: 101-101, 111-111, 139-139

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 92-92: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/tests/br_gcs/run.sh` at line 92, The run_br invocation uses unquoted shell
variables which can cause word splitting/globbing (e.g. --pd $PD_ADDR); update
the calls to quote all variable expansions used in run_br and similar lines so
each value is treated as a single argument: wrap $PD_ADDR, $BUCKET, $DB,
$GCS_HOST and $GCS_PORT in double quotes wherever they are expanded (references:
the run_br command and its --pd and -s arguments shown in the diff; also apply
the same quoting to the other occurrences flagged at lines 101, 111, 139).

echo "restore ${TAG} data starts..."
# after BR merged into TiDB we need skip version check because the build from tidb is not a release version.
bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote $PD_ADDR to prevent word splitting.

While unlikely to cause issues in practice, unquoted variables can lead to word splitting or globbing if they contain spaces or special characters.

🛡️ Proposed fix
-    bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
+    bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 45-45: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/tests/docker_compatible_gcs/_run.sh` at line 45, The restore command
invocation using "bin/br restore db" is passing the PD address unquoted
(PD_ADDR) which can cause word-splitting/globbing; update the command to quote
the variable (use "$PD_ADDR") in the --pd argument so the entire address is
passed as a single token. Locate the line with the bin/br restore db command and
replace --pd $PD_ADDR with --pd "$PD_ADDR".

@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 11, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 11, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-08 02:50:03.824581408 +0000 UTC m=+408876.697931380: ☑️ agreed by D3Hunter.
  • 2026-05-11 04:08:23.939353388 +0000 UTC m=+65872.472132697: ☑️ agreed by Leavrth.

@jebter
Copy link
Copy Markdown

jebter commented May 11, 2026

/test pull-br-integration-test

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 11, 2026

@jebter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 11, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label May 11, 2026
@ti-chi-bot ti-chi-bot Bot merged commit 812fde7 into pingcap:release-8.5-20260116-v8.5.5 May 11, 2026
28 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

object storage: upgrade AWS go sdk from v1 to v2

6 participants