br: upgrade AWS SDK from v1 to v2 #66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5#68216
Conversation
|
@YuJuncen I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
📝 WalkthroughWalkthroughMigrates the codebase from AWS SDK Go v1 to v2: updates dependency pins and Bazel/go.mod, rewrites S3/KMS/EBS/storage logic to v2 APIs, adds S3API abstraction and v2 mocks, updates tests and scripts, and introduces a storage test harness CLI subcommand. ChangesAWS SDK v1 to v2 Migration AWS SDK v1 to v2 Migration
Sequence Diagram(s)sequenceDiagram
participant CLI as operator/test-storage
participant Harness as RunTestStorage
participant Storage as ExternalStorage (S3/other)
CLI->>Harness: parse flags, RunTestStorage(ctx,cfg)
Harness->>Storage: create client (config.LoadDefaultConfig / credentials / endpoint)
Harness->>Storage: write/read/list/delete operations
Storage-->>Harness: responses/errors
Harness-->>CLI: TestReport (results)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @YuJuncen. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-8.5-20260116-v8.5.5 #68216 +/- ##
================================================================
Coverage ? 55.4695%
================================================================
Files ? 1822
Lines ? 653906
Branches ? 0
================================================================
Hits ? 362719
Misses ? 264227
Partials ? 26960
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
|
@YuJuncen: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@YuJuncen: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@wuhuizuo: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override pull-build-next-gen pull-error-log-review pull-integration-realcluster-test-next-gen pull-mysql-client-test-next-gen pull-unit-test-next-gen |
|
@wuhuizuo: Overrode contexts on behalf of wuhuizuo: pull-build-next-gen, pull-error-log-review, pull-integration-realcluster-test-next-gen, pull-mysql-client-test-next-gen, pull-unit-test-next-gen DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
#66634#66634
|
@jebter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-br-integration-test |
|
@wuhuizuo: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
br/pkg/aws/ebs.go (2)
530-533:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSame nil-deref risk as in
EnableDataFSR.Apply the same
aws.ToStringfix to*resp.Unsuccessful[0].SnapshotIdin the disable path so a nil pointer doesn't turn an FSR error into a panic.🛡️ Proposed fix
- return errors.Errorf("Some snapshot fails to disable FSR for available zone %s, such as %s, error code is %v", targetAZ, *resp.Unsuccessful[0].SnapshotId, resp.Unsuccessful[0].FastSnapshotRestoreStateErrors) + return errors.Errorf("Some snapshot fails to disable FSR for available zone %s, such as %s, error code is %v", targetAZ, aws.ToString(resp.Unsuccessful[0].SnapshotId), resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/aws/ebs.go` around lines 530 - 533, The disable-FSR error return may nil-deref *resp.Unsuccessful[0].SnapshotId like the EnableDataFSR path; change the interpolation to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (same pattern used in EnableDataFSR) so nil snapshot IDs become empty strings instead of panics, keeping the descriptive message and FastSnapshotRestoreStateErrors as-is.
359-362:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAvoid raw pointer deref on
Unsuccessful[0].SnapshotId.
resp.Unsuccessful[0].SnapshotIdis*stringand dereferencing it directly will panic if the field is nil — and this is the failure path where the original FSR error gets reported, so a panic here would mask the real underlying error. The rest of the file consistently usesaws.ToStringfor v2 pointer fields; please do the same here (and on line 532).🛡️ Proposed nil-safe formatting
- return errors.Errorf("Some snapshot fails to enable FSR for available zone %s, such as %s, error code is %v", targetAZ, *resp.Unsuccessful[0].SnapshotId, resp.Unsuccessful[0].FastSnapshotRestoreStateErrors) + return errors.Errorf("Some snapshot fails to enable FSR for available zone %s, such as %s, error code is %v", targetAZ, aws.ToString(resp.Unsuccessful[0].SnapshotId), resp.Unsuccessful[0].FastSnapshotRestoreStateErrors)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/aws/ebs.go` around lines 359 - 362, The error path dereferences resp.Unsuccessful[0].SnapshotId directly which can be nil and cause a panic; change the formatting to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (consistent with other uses in this file) when building the error message and similarly replace any other raw *string derefs (e.g., the one flagged around line 532) so the error log and the returned errors reference nil-safe string conversions for SnapshotId while keeping FastSnapshotRestoreStateErrors unchanged.br/pkg/storage/s3.go (1)
631-642:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winReturn non-
NoSuchKeyfailures fromgetObjectCheck.If
GetObjectfails with a transport/context error that doesn't implementsmithy.APIError, this falls through toreturn nil, so permission validation can pass even though the backend is unreachable. OnlyNoSuchKeyshould count as success here.Suggested fix
_, err := svc.GetObject(ctx, input) + if err == nil { + return nil + } var aerr smithy.APIError if goerrors.As(err, &aerr) { if aerr.ErrorCode() == noSuchKey { // if key not exists and we reach this error, that // means we have the correct permission to GetObject // other we will get another error return nil } - return errors.Trace(err) } - return nil + return errors.Trace(err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/storage/s3.go` around lines 631 - 642, The getObjectCheck currently swallows non-API errors by returning nil when err != nil but not an APIError; change the logic to treat any non-nil err as a failure unless it's an APIError with ErrorCode() == noSuchKey: call svc.GetObject(ctx, input), if err != nil then if goerrors.As(err, &aerr) and aerr.ErrorCode() == noSuchKey return nil; otherwise return errors.Trace(err); only return nil when svc.GetObject succeeded or when the error is specifically noSuchKey. Ensure you reference svc.GetObject, smithy.APIError (aerr) and the noSuchKey constant in your change.
🧹 Nitpick comments (5)
DEPS.bzl (1)
414-422: 💤 Low valueOut-of-scope dependency bump for
makezero(v1.1.1 → v1.2.0).This update appears unrelated to the AWS SDK v1 → v2 migration that is the stated objective of this backport. It likely came along via the merge but expands the change surface of a release-branch PR. Confirm it is intentional (e.g., transitively required) or revert to keep the backport scoped to AWS SDK changes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@DEPS.bzl` around lines 414 - 422, The DEPS.bzl change upgrades the makezero dependency from v1.1.1 to v1.2.0 (see the makezero urls, strip_prefix and sha256 entries), which is out-of-scope for the AWS SDK v1→v2 backport; either confirm it is intentionally required (e.g., transitively needed) or revert this block to the prior v1.1.1 values — restore the original sha256, strip_prefix and urls for makezero if reverting — and add a commit message explaining the decision so the backport only includes AWS SDK-related changes unless the bump is explicitly required.br/pkg/aws/ebs.go (1)
372-396: 💤 Low valueOptional: simplify
getFSRCreditBalancesignature now thatsnapShotIDsis[]string.
getFSRCreditBalancestill takessnapshotID *string, forcing the caller at line 387 to wrap withaws.String(...)only to have it dereferenced/used as a v2 pointer field internally. Since the migration changedsnapShotIDsto[]string, takingstringand constructingaws.Stringonce at the v2 API boundary insidegetFSRCreditBalancereads more cleanly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/aws/ebs.go` around lines 372 - 396, The getFSRCreditBalance signature should accept a string (snapshotID string) instead of *string so callers like waitDataFSREnabled can pass snapShotIDs[startIdx] directly; change getFSRCreditBalance(snapshotID *string, targetAZ string) to getFSRCreditBalance(snapshotID string, targetAZ string), update all call sites (e.g., waitDataFSREnabled) to pass the plain string, and move any aws.String(...) conversion into getFSRCreditBalance at the AWS SDK boundary (where DescribeMetrics/CloudWatch calls need *string) so pointer creation happens once inside the helper.pkg/lightning/log/log.go (1)
192-214: 💤 Low valueLGTM with a small note.
The smithy-based cancellation detection is correct. Note that
goerrors.Is(err, context.Canceled)on line 192 already traverses through*smithy.OperationError.Unwrap(), so the explicitopErr.Unwrap()checks at lines 206 and 209 are largely defensive duplicates. The only one with marginal value is thestatus.Code(opErr.Unwrap())check (sincestatus.Codedoesn't unwrap on its own). Not blocking — feel free to leave the extra checks as belt-and-suspenders.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/lightning/log/log.go` around lines 192 - 214, The code contains redundant cancellation checks because goerrors.Is(err, context.Canceled) already unwraps through *smithy.OperationError, so in the block that handles var opErr *smithy.OperationError you should remove the duplicate goerrors.Is(opErr.Unwrap(), context.Canceled) and goerrors.As(opErr.Unwrap(), &canceledError) checks and only keep the status.Code(opErr.Unwrap()) == codes.Canceled check (or alternatively drop the entire opErr branch if you prefer the single goerrors.Is + smithy.CanceledError checks to suffice); update the block that references opErr, canceledError, goerrors.Is, goerrors.As, and status.Code accordingly to avoid duplicate logic while preserving the status.Code check which does not unwrap.br/pkg/kms/aws.go (1)
86-117: ⚡ Quick winConsolidate duplicate error classification.
The second
switch(lines 107–116) over the v2 typed exceptions is unreachable:*types.NotFoundException,*types.InvalidKeyUsageException,*types.DependencyTimeoutException, and*types.KMSInternalExceptionall implementsmithy.APIError(via embeddedsmithy.GenericAPIError), so the firsterrors.As(err, &apiErr)+apiErr.ErrorCode()switch already matches them. Consolidate to a single switch with adefault:arm returning "KMS error", and remove thetypesimport (line 13), which will become unused.♻️ Proposed simplification
-// classifyDecryptError uses v2 SDK error types func classifyDecryptError(err error) error { var apiErr smithy.APIError if errors.As(err, &apiErr) { switch apiErr.ErrorCode() { - case "NotFoundException": - return pErrors.Annotate(err, "wrong master key") - case "InvalidKeyUsageException": + case "NotFoundException", "InvalidKeyUsageException": return pErrors.Annotate(err, "wrong master key") case "DependencyTimeoutException": return pErrors.Annotate(err, "API timeout") case "KMSInternalException": return pErrors.Annotate(err, "API internal error") + default: + return pErrors.Annotate(err, "KMS error") } } - - // also check for specific v2 error types - var notFoundErr *types.NotFoundException - var invalidKeyErr *types.InvalidKeyUsageException - var timeoutErr *types.DependencyTimeoutException - var internalErr *types.KMSInternalException - - switch { - case errors.As(err, ¬FoundErr), errors.As(err, &invalidKeyErr): - return pErrors.Annotate(err, "wrong master key") - case errors.As(err, &timeoutErr): - return pErrors.Annotate(err, "API timeout") - case errors.As(err, &internalErr): - return pErrors.Annotate(err, "API internal error") - default: - return pErrors.Annotate(err, "KMS error") - } + return pErrors.Annotate(err, "KMS error") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/kms/aws.go` around lines 86 - 117, The classifyDecryptError function duplicates error handling: remove the second switch that checks v2 typed exceptions and consolidate all classification into the initial smithy.APIError path (the errors.As(err, &apiErr) block) by adding a default: case that returns pErrors.Annotate(err, "KMS error") when the ErrorCode() doesn't match known codes; drop the unused types import and any now-unreferenced local variables (notFoundErr, invalidKeyErr, timeoutErr, internalErr) so classification relies solely on apiErr.ErrorCode().br/pkg/aws/ebs_test.go (1)
85-127: 💤 Low valueConsider reducing test logic duplication.
The
testWaitSnapshotsCreatedtest helper duplicates snapshot-processing logic from the production code. While this approach works for testing, it creates a maintenance burden—changes to the production logic must be manually mirrored here.Consider whether the production
WaitSnapshotsCreatedcould accept an injectable EC2 API interface (or use the existingS3API-style abstraction pattern) to allow direct testing with mockedDescribeSnapshotsresponses, eliminating the need for this parallel test-only implementation.Note: Line 123's indefinite block (
<-make(chan struct{})) is intentional to simulate timeout behavior when uncompleted snapshots remain.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/aws/ebs_test.go` around lines 85 - 127, testWaitSnapshotsCreated duplicates snapshot-processing logic from production; refactor so production WaitSnapshotsCreated accepts an injectable EC2 DescribeSnapshots interface (or follow the S3API-style abstraction) and move shared processing into the real WaitSnapshotsCreated (or a helper used by both) so tests call WaitSnapshotsCreated with a mocked DescribeSnapshots response instead of a test-only implementation; update EC2Session to depend on the new interface, reuse extractSnapProgress, and remove the duplicate testWaitSnapshotsCreated logic (keep the blocking <-make(chan struct{}) behavior in tests by returning an uncompleted-snapshots condition from the mocked DescribeSnapshots).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@br/pkg/storage/s3_test.go`:
- Around line 552-557: The mock callback for
s.s3.EXPECT().DeleteObject(...).DoAndReturn currently declares and returns a
*s3.DeleteObjectInput but the real DeleteObject signature returns
(*s3.DeleteObjectOutput, error); update the DoAndReturn callback signature and
return value to return (*s3.DeleteObjectOutput, error) and return an instance
like &s3.DeleteObjectOutput{} (keeping the same assertions against input.Bucket
and input.Key) so the mock types match the DeleteObject method.
In `@br/pkg/storage/s3.go`:
- Around line 659-665: In PutAndDeleteObjectCheck avoid treating a nil err as an
error: only inspect err2 with goerrors.As when err2 != nil. After calling
svc.DeleteObject(ctx, input) check if err2 is non-nil, then use
goerrors.As(err2, &noSuchKey) to suppress warnings for NoSuchKey but call
log.Warn(...) with zap.Error(err2) for other errors; reference svc.DeleteObject,
err2, types.NoSuchKey, goerrors.As and log.Warn to locate where to add the nil
check.
- Around line 462-490: The S3 client setup currently applies credentials twice:
via config (config.LoadDefaultConfig / cfg.Credentials) and again by appending
an s3 option that sets o.Credentials from autoNewCred; when qs.RoleArn is set
this later overwrite breaks role assumption. Fix by preventing the redundant
s3Opts append that sets o.Credentials when qs.RoleArn is non-empty (or simply
remove the s3Opts credential-setting block that uses autoNewCred), so that when
RoleArn is provided cfg.Credentials (the stscreds.NewAssumeRoleProvider) remains
the effective credentials provider.
In `@br/pkg/task/operator/test_storage.go`:
- Around line 476-530: The test hardcodes startOffset/endOffset (used in the
storage.ReaderOption passed to tc.Store.Open) to [100,200) which fails when
expectedData is smaller (e.g. --test-data-size=128); modify the test to compute
a valid range from len(expectedData) (or validate a minimum size up front):
compute endOffset := int64(len(expectedData)) and if endOffset <= 100 set
startOffset = 0 else set startOffset = endOffset-100 (or clamp endOffset to
startOffset + desiredRange), then use these startOffset/endOffset values in the
ReaderOption so the range is always within expectedData; alternatively add an
explicit minimum-size check in ParseFromFlags and fail fast if test-data-size <
200.
- Around line 226-227: The current logger.Info call is logging
cfg.BackendOptions verbatim (via zap.Any) which can leak credentials; instead
build a redacted representation (e.g., pick only non-sensitive keys or mask
values for keys like access_key, secret_key, session_token, password) and pass
that to logger.Info; update the call that uses logger.Info("Starting external
storage test", zap.String("storage", cfg.StorageURI), zap.Any("options",
cfg.BackendOptions)) to use the redacted map/struct so only safe fields from
cfg.BackendOptions are logged and any sensitive fields are removed or masked.
- Around line 268-279: The current deferred call defer cleanup() runs even on
failure; change it so cleanup respects cfg.CleanupOnSuccess and only runs on
successful tests by adding a boolean flag (e.g., succeeded) captured by the
deferred cleanup or by removing the unconditional defer and invoking cleanup
conditionally at the end of the test; ensure the cleanup function still checks
cfg.CleanupOnSuccess and only performs deletions when both cfg.CleanupOnSuccess
is true and succeeded is true (update references to cleanup,
cfg.CleanupOnSuccess and the defer cleanup() placement accordingly).
---
Outside diff comments:
In `@br/pkg/aws/ebs.go`:
- Around line 530-533: The disable-FSR error return may nil-deref
*resp.Unsuccessful[0].SnapshotId like the EnableDataFSR path; change the
interpolation to use aws.ToString(resp.Unsuccessful[0].SnapshotId) (same pattern
used in EnableDataFSR) so nil snapshot IDs become empty strings instead of
panics, keeping the descriptive message and FastSnapshotRestoreStateErrors
as-is.
- Around line 359-362: The error path dereferences
resp.Unsuccessful[0].SnapshotId directly which can be nil and cause a panic;
change the formatting to use aws.ToString(resp.Unsuccessful[0].SnapshotId)
(consistent with other uses in this file) when building the error message and
similarly replace any other raw *string derefs (e.g., the one flagged around
line 532) so the error log and the returned errors reference nil-safe string
conversions for SnapshotId while keeping FastSnapshotRestoreStateErrors
unchanged.
In `@br/pkg/storage/s3.go`:
- Around line 631-642: The getObjectCheck currently swallows non-API errors by
returning nil when err != nil but not an APIError; change the logic to treat any
non-nil err as a failure unless it's an APIError with ErrorCode() == noSuchKey:
call svc.GetObject(ctx, input), if err != nil then if goerrors.As(err, &aerr)
and aerr.ErrorCode() == noSuchKey return nil; otherwise return
errors.Trace(err); only return nil when svc.GetObject succeeded or when the
error is specifically noSuchKey. Ensure you reference svc.GetObject,
smithy.APIError (aerr) and the noSuchKey constant in your change.
---
Nitpick comments:
In `@br/pkg/aws/ebs_test.go`:
- Around line 85-127: testWaitSnapshotsCreated duplicates snapshot-processing
logic from production; refactor so production WaitSnapshotsCreated accepts an
injectable EC2 DescribeSnapshots interface (or follow the S3API-style
abstraction) and move shared processing into the real WaitSnapshotsCreated (or a
helper used by both) so tests call WaitSnapshotsCreated with a mocked
DescribeSnapshots response instead of a test-only implementation; update
EC2Session to depend on the new interface, reuse extractSnapProgress, and remove
the duplicate testWaitSnapshotsCreated logic (keep the blocking <-make(chan
struct{}) behavior in tests by returning an uncompleted-snapshots condition from
the mocked DescribeSnapshots).
In `@br/pkg/aws/ebs.go`:
- Around line 372-396: The getFSRCreditBalance signature should accept a string
(snapshotID string) instead of *string so callers like waitDataFSREnabled can
pass snapShotIDs[startIdx] directly; change getFSRCreditBalance(snapshotID
*string, targetAZ string) to getFSRCreditBalance(snapshotID string, targetAZ
string), update all call sites (e.g., waitDataFSREnabled) to pass the plain
string, and move any aws.String(...) conversion into getFSRCreditBalance at the
AWS SDK boundary (where DescribeMetrics/CloudWatch calls need *string) so
pointer creation happens once inside the helper.
In `@br/pkg/kms/aws.go`:
- Around line 86-117: The classifyDecryptError function duplicates error
handling: remove the second switch that checks v2 typed exceptions and
consolidate all classification into the initial smithy.APIError path (the
errors.As(err, &apiErr) block) by adding a default: case that returns
pErrors.Annotate(err, "KMS error") when the ErrorCode() doesn't match known
codes; drop the unused types import and any now-unreferenced local variables
(notFoundErr, invalidKeyErr, timeoutErr, internalErr) so classification relies
solely on apiErr.ErrorCode().
In `@DEPS.bzl`:
- Around line 414-422: The DEPS.bzl change upgrades the makezero dependency from
v1.1.1 to v1.2.0 (see the makezero urls, strip_prefix and sha256 entries), which
is out-of-scope for the AWS SDK v1→v2 backport; either confirm it is
intentionally required (e.g., transitively needed) or revert this block to the
prior v1.1.1 values — restore the original sha256, strip_prefix and urls for
makezero if reverting — and add a commit message explaining the decision so the
backport only includes AWS SDK-related changes unless the bump is explicitly
required.
In `@pkg/lightning/log/log.go`:
- Around line 192-214: The code contains redundant cancellation checks because
goerrors.Is(err, context.Canceled) already unwraps through
*smithy.OperationError, so in the block that handles var opErr
*smithy.OperationError you should remove the duplicate
goerrors.Is(opErr.Unwrap(), context.Canceled) and goerrors.As(opErr.Unwrap(),
&canceledError) checks and only keep the status.Code(opErr.Unwrap()) ==
codes.Canceled check (or alternatively drop the entire opErr branch if you
prefer the single goerrors.Is + smithy.CanceledError checks to suffice); update
the block that references opErr, canceledError, goerrors.Is, goerrors.As, and
status.Code accordingly to avoid duplicate logic while preserving the
status.Code check which does not unwrap.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c1e8e88e-9b9f-46f2-bfa3-c389b546fd9e
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (32)
DEPS.bzlMakefilebr/cmd/br/operator.gobr/pkg/aws/BUILD.bazelbr/pkg/aws/ebs.gobr/pkg/aws/ebs_test.gobr/pkg/kms/BUILD.bazelbr/pkg/kms/aws.gobr/pkg/mock/BUILD.bazelbr/pkg/mock/s3_v2.gobr/pkg/mock/s3iface.gobr/pkg/storage/BUILD.bazelbr/pkg/storage/azblob.gobr/pkg/storage/compatibility_for_test.gobr/pkg/storage/helper.gobr/pkg/storage/ks3.gobr/pkg/storage/s3.gobr/pkg/storage/s3_interface.gobr/pkg/storage/s3_test.gobr/pkg/storage/storage.gobr/pkg/task/operator/BUILD.bazelbr/pkg/task/operator/test_storage.gobr/pkg/task/restore_ebs_meta.gobuild/nogo_config.jsongo.modpkg/lightning/backend/external/BUILD.bazelpkg/lightning/backend/external/byte_reader.gopkg/lightning/backend/external/byte_reader_test.gopkg/lightning/log/BUILD.bazelpkg/lightning/log/log.gopkg/lightning/log/log_test.gopkg/sessionctx/variable/sysvar_test.go
💤 Files with no reviewable changes (1)
- Makefile
| s.s3.EXPECT(). | ||
| DeleteObjectWithContext(ctx, gomock.Any()). | ||
| DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, opt ...request.Option) (*s3.DeleteObjectInput, error) { | ||
| require.Equal(t, "bucket", aws.StringValue(input.Bucket)) | ||
| require.Equal(t, "prefix/file", aws.StringValue(input.Key)) | ||
| DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()). | ||
| DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) { | ||
| require.Equal(t, "bucket", aws.ToString(input.Bucket)) | ||
| require.Equal(t, "prefix/file", aws.ToString(input.Key)) | ||
| return &s3.DeleteObjectInput{}, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Locate the files mentioned in the review
find . -name "s3_test.go" -path "*/br/pkg/storage/*" 2>/dev/null
find . -name "s3_v2.go" -path "*/br/pkg/mock/*" 2>/dev/nullRepository: pingcap/tidb
Length of output: 107
🏁 Script executed:
# Read the specific lines from s3_test.go
if [ -f "br/pkg/storage/s3_test.go" ]; then
sed -n '550,560p' "br/pkg/storage/s3_test.go" | cat -n
fiRepository: pingcap/tidb
Length of output: 566
🏁 Script executed:
# Find and examine the DeleteObject method signature in the mock
if [ -f "br/pkg/mock/s3_v2.go" ]; then
grep -A 5 "DeleteObject" "br/pkg/mock/s3_v2.go" | head -20
fiRepository: pingcap/tidb
Length of output: 856
Return *s3.DeleteObjectOutput from this mock callback.
The DoAndReturn callback is typed to return (*s3.DeleteObjectInput, error) and returns &s3.DeleteObjectInput{}, but DeleteObject is mocked to return (*s3.DeleteObjectOutput, error). This type mismatch causes GoMock to reject the expectation, breaking the test.
Suggested fix
s.s3.EXPECT().
DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()).
- DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) {
+ DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectOutput, error) {
require.Equal(t, "bucket", aws.ToString(input.Bucket))
require.Equal(t, "prefix/file", aws.ToString(input.Key))
- return &s3.DeleteObjectInput{}, nil
+ return &s3.DeleteObjectOutput{}, nil
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| s.s3.EXPECT(). | |
| DeleteObjectWithContext(ctx, gomock.Any()). | |
| DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, opt ...request.Option) (*s3.DeleteObjectInput, error) { | |
| require.Equal(t, "bucket", aws.StringValue(input.Bucket)) | |
| require.Equal(t, "prefix/file", aws.StringValue(input.Key)) | |
| DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()). | |
| DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectInput, error) { | |
| require.Equal(t, "bucket", aws.ToString(input.Bucket)) | |
| require.Equal(t, "prefix/file", aws.ToString(input.Key)) | |
| return &s3.DeleteObjectInput{}, nil | |
| s.s3.EXPECT(). | |
| DeleteObject(gomock.Any(), gomock.Any(), gomock.Any()). | |
| DoAndReturn(func(_ context.Context, input *s3.DeleteObjectInput, _ ...func(*s3.Options)) (*s3.DeleteObjectOutput, error) { | |
| require.Equal(t, "bucket", aws.ToString(input.Bucket)) | |
| require.Equal(t, "prefix/file", aws.ToString(input.Key)) | |
| return &s3.DeleteObjectOutput{}, nil | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/storage/s3_test.go` around lines 552 - 557, The mock callback for
s.s3.EXPECT().DeleteObject(...).DoAndReturn currently declares and returns a
*s3.DeleteObjectInput but the real DeleteObject signature returns
(*s3.DeleteObjectOutput, error); update the DoAndReturn callback signature and
return value to return (*s3.DeleteObjectOutput, error) and return an instance
like &s3.DeleteObjectOutput{} (keeping the same assertions against input.Bucket
and input.Key) so the mock types match the DeleteObject method.
| if qs.Profile == "" { | ||
| cred, err := autoNewCred(&qs) | ||
| if err != nil { | ||
| return nil, errors.Trace(err) | ||
| } | ||
| if cred != nil { | ||
| s3Opts = append(s3Opts, func(o *s3.Options) { | ||
| o.Credentials = cred | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| // Handle role assumption if specified | ||
| if len(qs.RoleArn) > 0 { | ||
| // Create STS client for role assumption | ||
| stsClient := sts.NewFromConfig(cfg) | ||
|
|
||
| // Configure assume role provider with external ID if provided | ||
| var assumeRoleProvider aws.CredentialsProvider | ||
| if len(qs.ExternalId) > 0 { | ||
| assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) { | ||
| o.ExternalID = &qs.ExternalId | ||
| }) | ||
| } else { | ||
| assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn) | ||
| } | ||
|
|
||
| // Update config with assume role credentials | ||
| cfg.Credentials = assumeRoleProvider |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd br/pkg/storage && head -c 100 s3.go && wc -l s3.goRepository: pingcap/tidb
Length of output: 167
🏁 Script executed:
# Get the specific lines mentioned in the review
sed -n '462,490p' br/pkg/storage/s3.goRepository: pingcap/tidb
Length of output: 902
🏁 Script executed:
# Get more context around the credential setup to understand the full flow
sed -n '440,520p' br/pkg/storage/s3.goRepository: pingcap/tidb
Length of output: 2911
🏁 Script executed:
# Search for NewS3Storage function definition
rg -n "func.*NewS3Storage" br/pkg/storage/s3.goRepository: pingcap/tidb
Length of output: 259
🏁 Script executed:
# Get the complete function to see the full flow
sed -n '372,530p' br/pkg/storage/s3.goRepository: pingcap/tidb
Length of output: 5337
🌐 Web query:
AWS SDK Go v2 s3.NewFromConfig options credential override behavior
💡 Result:
In AWS SDK for Go v2, s3.NewFromConfig copies credentials from the provided aws.Config, and the constructor/service Options you pass can override those credentials (including disabling signing by forcing anonymous/nil credentials). 1) Default behavior (no override): uses cfg.Credentials (which comes from config.LoadDefaultConfig) - aws.Config has a Credentials field (“The credentials object to use when signing requests”). [1] - When you create the aws.Config via config.LoadDefaultConfig, the SDK resolves credentials via its default credential chain; and if you explicitly provide credentials, “the SDK uses only those credentials.” [2] - s3.NewFromConfig builds the S3 service Options using cfg.Credentials (among other fields), then calls New(opts, optFns...). [3] Practical implication: - If you call cfg, _:= config.LoadDefaultConfig(...), then client:= s3.NewFromConfig(cfg), S3 will sign requests using whatever credentials were resolved into cfg.Credentials. [2][3] 2) Overriding credentials at the S3 client level (optFn / Options): the optFn mutates s3.Options, including Credentials - s3.NewFromConfig takes functional options that mutate the service’s Options struct. [4] - The S3 service Options struct contains Credentials aws.CredentialsProvider (“The credentials object to use when signing requests.”). [5] - s3.NewFromConfig initially sets Options.Credentials = cfg.Credentials before applying optFns, so an optFn can replace o.Credentials and thereby override the credentials used for signing. [3][5] 3) Disabling signing / forcing “anonymous” behavior - The SDK documents that using the AnonymousCredentials provider instructs signing middleware to not sign requests. [6] - It also states: “Using nil credentials when configuring an API client will achieve the same result.” and shows that when using external config loading, you can use config.WithCredentialsProvider(aws.AnonymousCredentials{}) to prevent credential resolution attempts. [6] - Additionally, the same source notes that for NewFromConfig, if the external config resolved a credential provider, you need to explicitly set the API client Options Credentials member to nil to avoid signing; otherwise it may keep the resolved provider. [6] - In the generated S3 service code, there’s special handling to “ignoreAnonymousAuth” by converting AnonymousCredentials to nil. [5] 4) Important distinction: you must use config.LoadDefaultConfig to get the default credential-chain behavior - The credentials “defaults to a chain…” behavior occurs when you use config.LoadDefaultConfig, not automatically just by having an empty aws.Config. This is clarified in an SDK issue discussion. [7] Example patterns (behaviorally accurate) A. Use the default credential chain from env/profile/roles - cfg:= config.LoadDefaultConfig(ctx,...) - client:= s3.NewFromConfig(cfg) (Uses cfg.Credentials resolved by LoadDefaultConfig.) [2][3] B. Override credentials explicitly for the S3 client - client:= s3.NewFromConfig(cfg, func(o *s3.Options) { o.Credentials = myCredProvider }) (Option mutates o.Credentials; Options.Credentials is what signing middleware uses.) [4][5][3] C. Disable signing (public S3 objects / no signature) - client:= s3.NewFromConfig(cfg, func(o *s3.Options) { o.Credentials = nil }) (or o.Credentials = aws.AnonymousCredentials{}) (This is documented as equivalent to using nil for signing middleware behavior.) [6][5] Net answer to “credential override behavior” - s3.NewFromConfig starts with cfg.Credentials. [3] - Any s3.NewFromConfig option function that sets *s3.Options.Credentials overrides cfg.Credentials for request signing. [3][5] - Setting Credentials to aws.AnonymousCredentials{} or nil disables request signing (and may also be converted to nil internally for S3). [6][5]
Citations:
- 1: https://github.com/aws/aws-sdk-go-v2/blob/main/aws/config.go
- 2: https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/configure-gosdk.html
- 3: https://github.com/aws/aws-sdk-go-v2/blob/e18a4ad0322327639a12fc21a0bb57924685aa47/service/s3/api_client.go
- 4: https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/using.html
- 5: https://github.com/aws/aws-sdk-go-v2/blob/service/s3/v1.99.0/service/s3/options.go
- 6: https://github.com/aws/aws-sdk-go-v2/blob/main/aws/credentials.go
- 7: aws.Config with nil Credentials does not pick up AWS_ACCESS_KEY_ID aws/aws-sdk-go-v2#1840
🏁 Script executed:
# Also check if there are any tests that exercise both RoleArn and credential scenarios
find br -name "*test*.go" -type f | xargs rg -l "RoleArn\|AssumeRole" | head -5Repository: pingcap/tidb
Length of output: 38
🏁 Script executed:
# Look for any comments or documentation in the code about credential precedence
rg -n "credential|Credential" br/pkg/storage/s3.go | head -20Repository: pingcap/tidb
Length of output: 1401
Remove duplicative credential setup at S3 options level when role assumption is used.
The code sets credentials twice: first at the config level (lines 406–416 via config.WithCredentialsProvider), then again at the S3 options level (lines 461–470 via o.Credentials = cred). When RoleArn is specified, cfg.Credentials is updated to the STS assume-role provider, but the S3 options function still applies afterward, overriding it with the static credentials. This prevents role assumption from working.
The simplest fix is to skip the redundant S3 options credential setup when RoleArn is present:
Suggested fix
- if qs.Profile == "" {
+ if qs.Profile == "" && len(qs.RoleArn) == 0 {
cred, err := autoNewCred(&qs)
if err != nil {
return nil, errors.Trace(err)
}
if cred != nil {
s3Opts = append(s3Opts, func(o *s3.Options) {
o.Credentials = cred
})
}
}Alternatively, remove the entire lines 461–470 block since credentials are already set during config.LoadDefaultConfig (lines 406–416), making the S3 options setup unnecessary.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if qs.Profile == "" { | |
| cred, err := autoNewCred(&qs) | |
| if err != nil { | |
| return nil, errors.Trace(err) | |
| } | |
| if cred != nil { | |
| s3Opts = append(s3Opts, func(o *s3.Options) { | |
| o.Credentials = cred | |
| }) | |
| } | |
| } | |
| // Handle role assumption if specified | |
| if len(qs.RoleArn) > 0 { | |
| // Create STS client for role assumption | |
| stsClient := sts.NewFromConfig(cfg) | |
| // Configure assume role provider with external ID if provided | |
| var assumeRoleProvider aws.CredentialsProvider | |
| if len(qs.ExternalId) > 0 { | |
| assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) { | |
| o.ExternalID = &qs.ExternalId | |
| }) | |
| } else { | |
| assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn) | |
| } | |
| // Update config with assume role credentials | |
| cfg.Credentials = assumeRoleProvider | |
| if qs.Profile == "" && len(qs.RoleArn) == 0 { | |
| cred, err := autoNewCred(&qs) | |
| if err != nil { | |
| return nil, errors.Trace(err) | |
| } | |
| if cred != nil { | |
| s3Opts = append(s3Opts, func(o *s3.Options) { | |
| o.Credentials = cred | |
| }) | |
| } | |
| } | |
| // Handle role assumption if specified | |
| if len(qs.RoleArn) > 0 { | |
| // Create STS client for role assumption | |
| stsClient := sts.NewFromConfig(cfg) | |
| // Configure assume role provider with external ID if provided | |
| var assumeRoleProvider aws.CredentialsProvider | |
| if len(qs.ExternalId) > 0 { | |
| assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn, func(o *stscreds.AssumeRoleOptions) { | |
| o.ExternalID = &qs.ExternalId | |
| }) | |
| } else { | |
| assumeRoleProvider = stscreds.NewAssumeRoleProvider(stsClient, qs.RoleArn) | |
| } | |
| // Update config with assume role credentials | |
| cfg.Credentials = assumeRoleProvider |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/storage/s3.go` around lines 462 - 490, The S3 client setup currently
applies credentials twice: via config (config.LoadDefaultConfig /
cfg.Credentials) and again by appending an s3 option that sets o.Credentials
from autoNewCred; when qs.RoleArn is set this later overwrite breaks role
assumption. Fix by preventing the redundant s3Opts append that sets
o.Credentials when qs.RoleArn is non-empty (or simply remove the s3Opts
credential-setting block that uses autoNewCred), so that when RoleArn is
provided cfg.Credentials (the stscreds.NewAssumeRoleProvider) remains the
effective credentials provider.
| _, err2 := svc.DeleteObject(ctx, input) | ||
| var noSuchKey *types.NoSuchKey | ||
| if !goerrors.As(err2, &noSuchKey) { | ||
| log.Warn("failed to delete object used for permission check", | ||
| zap.String("bucket", options.Bucket), | ||
| zap.String("key", *input.Key), zap.Error(err2)) | ||
| } |
There was a problem hiding this comment.
Avoid warning on successful cleanup in PutAndDeleteObjectCheck.
goerrors.As(nil, &noSuchKey) is false, so Line 662 logs "failed to delete..." even when DeleteObject succeeds. That will emit a misleading warning on every successful permission check.
Suggested fix
_, err2 := svc.DeleteObject(ctx, input)
var noSuchKey *types.NoSuchKey
- if !goerrors.As(err2, &noSuchKey) {
+ if err2 != nil && !goerrors.As(err2, &noSuchKey) {
log.Warn("failed to delete object used for permission check",
zap.String("bucket", options.Bucket),
zap.String("key", *input.Key), zap.Error(err2))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, err2 := svc.DeleteObject(ctx, input) | |
| var noSuchKey *types.NoSuchKey | |
| if !goerrors.As(err2, &noSuchKey) { | |
| log.Warn("failed to delete object used for permission check", | |
| zap.String("bucket", options.Bucket), | |
| zap.String("key", *input.Key), zap.Error(err2)) | |
| } | |
| _, err2 := svc.DeleteObject(ctx, input) | |
| var noSuchKey *types.NoSuchKey | |
| if err2 != nil && !goerrors.As(err2, &noSuchKey) { | |
| log.Warn("failed to delete object used for permission check", | |
| zap.String("bucket", options.Bucket), | |
| zap.String("key", *input.Key), zap.Error(err2)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/storage/s3.go` around lines 659 - 665, In PutAndDeleteObjectCheck
avoid treating a nil err as an error: only inspect err2 with goerrors.As when
err2 != nil. After calling svc.DeleteObject(ctx, input) check if err2 is
non-nil, then use goerrors.As(err2, &noSuchKey) to suppress warnings for
NoSuchKey but call log.Warn(...) with zap.Error(err2) for other errors;
reference svc.DeleteObject, err2, types.NoSuchKey, goerrors.As and log.Warn to
locate where to add the nil check.
| logger := logutil.Logger(ctx) | ||
| logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions)) |
There was a problem hiding this comment.
Don't log BackendOptions verbatim.
zap.Any("options", cfg.BackendOptions) will serialize storage credentials when the backend is configured via flags, including access keys and session tokens. Please log a redacted subset instead.
Suggested fix
- logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions))
+ logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger := logutil.Logger(ctx) | |
| logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI), zap.Any("options", cfg.BackendOptions)) | |
| logger := logutil.Logger(ctx) | |
| logger.Info("Starting external storage test", zap.String("storage", cfg.StorageURI)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/task/operator/test_storage.go` around lines 226 - 227, The current
logger.Info call is logging cfg.BackendOptions verbatim (via zap.Any) which can
leak credentials; instead build a redacted representation (e.g., pick only
non-sensitive keys or mask values for keys like access_key, secret_key,
session_token, password) and pass that to logger.Info; update the call that uses
logger.Info("Starting external storage test", zap.String("storage",
cfg.StorageURI), zap.Any("options", cfg.BackendOptions)) to use the redacted
map/struct so only safe fields from cfg.BackendOptions are logged and any
sensitive fields are removed or masked.
| cleanup := func() { | ||
| if !cfg.CleanupOnSuccess { | ||
| return | ||
| } | ||
| printStep("Cleaning up test files...") | ||
| _ = store.DeleteFile(ctx, testFileName1) | ||
| _ = store.DeleteFile(ctx, testFileName2) | ||
| _ = store.DeleteFile(ctx, testFileNameRenamed) | ||
| _ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1)) | ||
| _ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2)) | ||
| } | ||
| defer cleanup() |
There was a problem hiding this comment.
--cleanup currently runs on failure too.
The deferred cleanup is unconditional, so test artifacts are removed even when cases fail. That contradicts the CleanupOnSuccess name/help text and makes --pause-when-fail much less useful.
Suggested fix
- defer cleanup()
+ defer func() {
+ if cfg.CleanupOnSuccess && report.FailedTests == 0 {
+ cleanup()
+ }
+ }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cleanup := func() { | |
| if !cfg.CleanupOnSuccess { | |
| return | |
| } | |
| printStep("Cleaning up test files...") | |
| _ = store.DeleteFile(ctx, testFileName1) | |
| _ = store.DeleteFile(ctx, testFileName2) | |
| _ = store.DeleteFile(ctx, testFileNameRenamed) | |
| _ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1)) | |
| _ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2)) | |
| } | |
| defer cleanup() | |
| cleanup := func() { | |
| if !cfg.CleanupOnSuccess { | |
| return | |
| } | |
| printStep("Cleaning up test files...") | |
| _ = store.DeleteFile(ctx, testFileName1) | |
| _ = store.DeleteFile(ctx, testFileName2) | |
| _ = store.DeleteFile(ctx, testFileNameRenamed) | |
| _ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName1)) | |
| _ = store.DeleteFile(ctx, filepath.Join(testDirName, testFileName2)) | |
| } | |
| defer func() { | |
| if cfg.CleanupOnSuccess && report.FailedTests == 0 { | |
| cleanup() | |
| } | |
| }() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/task/operator/test_storage.go` around lines 268 - 279, The current
deferred call defer cleanup() runs even on failure; change it so cleanup
respects cfg.CleanupOnSuccess and only runs on successful tests by adding a
boolean flag (e.g., succeeded) captured by the deferred cleanup or by removing
the unconditional defer and invoking cleanup conditionally at the end of the
test; ensure the cleanup function still checks cfg.CleanupOnSuccess and only
performs deletions when both cfg.CleanupOnSuccess is true and succeeded is true
(update references to cleanup, cfg.CleanupOnSuccess and the defer cleanup()
placement accordingly).
| startOffset := int64(100) | ||
| endOffset := int64(200) | ||
| reader, err := tc.Store.Open(tc.Ctx, name, &storage.ReaderOption{ | ||
| StartOffset: &startOffset, | ||
| EndOffset: &endOffset, | ||
| }) | ||
|
|
||
| result := TestResult{ | ||
| Name: testName, | ||
| } | ||
|
|
||
| if err != nil { | ||
| printError(" ❌ Failed: %v", err) | ||
| result.Duration = time.Since(start) | ||
| result.Passed = false | ||
| result.Error = errors.Annotate(err, "Open with range failed") | ||
| tc.AddResult(result) | ||
| return | ||
| } | ||
| defer reader.Close() | ||
|
|
||
| data, err := io.ReadAll(reader) | ||
| duration := time.Since(start) | ||
| result.Duration = duration | ||
|
|
||
| if err != nil { | ||
| printError(" ❌ Failed to read: %v", err) | ||
| result.Passed = false | ||
| result.Error = errors.Annotate(err, "ReadAll failed") | ||
| tc.AddResult(result) | ||
| return | ||
| } | ||
|
|
||
| expectedSize := endOffset - startOffset | ||
| if int64(len(data)) != expectedSize { | ||
| printError(" ❌ Failed: expected size %d, got %d", expectedSize, len(data)) | ||
| result.Passed = false | ||
| result.Error = errors.Errorf("Range read returned wrong size: expected %d, got %d", expectedSize, len(data)) | ||
| } else { | ||
| // Verify data matches the expected range | ||
| dataValid := true | ||
| for i := range data { | ||
| if data[i] != expectedData[startOffset+int64(i)] { | ||
| printError(" ❌ Failed: data mismatch at offset %d", i) | ||
| result.Passed = false | ||
| result.Error = errors.Errorf("Range read returned corrupted data at offset %d", i) | ||
| dataValid = false | ||
| break | ||
| } | ||
| } | ||
| if dataValid { | ||
| printSuccess(" ✓ Passed (read %d bytes)", len(data)) | ||
| result.Passed = true | ||
| result.Details = fmt.Sprintf("Read %d bytes from range [%d, %d)", len(data), startOffset, endOffset) | ||
| } |
There was a problem hiding this comment.
Small --test-data-size values make this range test fail spuriously.
This always requests bytes [100, 200), but ParseFromFlags accepts any positive size. A user can set --test-data-size=128 and get a harness failure even when the storage backend is correct. Either validate a minimum size up front or derive the range from len(expectedData).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/task/operator/test_storage.go` around lines 476 - 530, The test
hardcodes startOffset/endOffset (used in the storage.ReaderOption passed to
tc.Store.Open) to [100,200) which fails when expectedData is smaller (e.g.
--test-data-size=128); modify the test to compute a valid range from
len(expectedData) (or validate a minimum size up front): compute endOffset :=
int64(len(expectedData)) and if endOffset <= 100 set startOffset = 0 else set
startOffset = endOffset-100 (or clamp endOffset to startOffset + desiredRange),
then use these startOffset/endOffset values in the ReaderOption so the range is
always within expectedData; alternatively add an explicit minimum-size check in
ParseFromFlags and fail fast if test-data-size < 200.
|
/retest |
|
@jebter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@YuJuncen: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
#66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5#66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=v8.5.5 pd=v8.5.5 tiflash=v8.5.5
|
/test pull-br-integration-test |
|
@YuJuncen: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
#66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=v8.5.5 pd=v8.5.5 tiflash=v8.5.5#66634 | tidb-test=release-8.5-20260116-v8.5.5 tikv=release-8.5-20260423-v8.5.5 pd=v8.5.5 tiflash=v8.5.5
|
/test pull-br-integration-test |
|
@jebter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
br/tests/br_gcs/run.sh (1)
142-142: 💤 Low valueOptional: Simplify arithmetic variable reference.
In arithmetic contexts, the
${}syntax is unnecessary and can be simplified for consistency with shell best practices.♻️ Proposed simplification
for i in $(seq $DB_COUNT); do - row_count_new[${i}]=$(run_sql "SELECT COUNT(*) FROM $DB${i}.$TABLE;" | awk '/COUNT/{print $2}') + row_count_new[$i]=$(run_sql "SELECT COUNT(*) FROM $DB${i}.$TABLE;" | awk '/COUNT/{print $2}') done🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/tests/br_gcs/run.sh` at line 142, The array index and arithmetic-style variable reference use unnecessary ${} expansion; change the assignment to use the simpler form for the index and any pure arithmetic contexts (e.g. replace row_count_new[${i}] with row_count_new[i]) while keeping ${DB}${i}.$TABLE or other expansions that are needed for concatenation intact; update the assignment that calls run_sql accordingly (referencing row_count_new, run_sql, DB, TABLE, and i).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@br/tests/br_gcs/run.sh`:
- Line 92: The run_br invocation uses unquoted shell variables which can cause
word splitting/globbing (e.g. --pd $PD_ADDR); update the calls to quote all
variable expansions used in run_br and similar lines so each value is treated as
a single argument: wrap $PD_ADDR, $BUCKET, $DB, $GCS_HOST and $GCS_PORT in
double quotes wherever they are expanded (references: the run_br command and its
--pd and -s arguments shown in the diff; also apply the same quoting to the
other occurrences flagged at lines 101, 111, 139).
In `@br/tests/docker_compatible_gcs/_run.sh`:
- Line 45: The restore command invocation using "bin/br restore db" is passing
the PD address unquoted (PD_ADDR) which can cause word-splitting/globbing;
update the command to quote the variable (use "$PD_ADDR") in the --pd argument
so the entire address is passed as a single token. Locate the line with the
bin/br restore db command and replace --pd $PD_ADDR with --pd "$PD_ADDR".
---
Nitpick comments:
In `@br/tests/br_gcs/run.sh`:
- Line 142: The array index and arithmetic-style variable reference use
unnecessary ${} expansion; change the assignment to use the simpler form for the
index and any pure arithmetic contexts (e.g. replace row_count_new[${i}] with
row_count_new[i]) while keeping ${DB}${i}.$TABLE or other expansions that are
needed for concatenation intact; update the assignment that calls run_sql
accordingly (referencing row_count_new, run_sql, DB, TABLE, and i).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e87ecd62-248d-4df4-a6ff-b9eb65437f65
📒 Files selected for processing (3)
br/tests/br_gcs/run.shbr/tests/docker_compatible_gcs/_run.shbr/tests/docker_compatible_gcs/prepare.sh
| --cert "$TEST_DIR/certs/br.pem" \ | ||
| --key "$TEST_DIR/certs/br.key" \ | ||
| --pd $PD_ADDR -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false | ||
| run_br --pd $PD_ADDR backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT" |
There was a problem hiding this comment.
Quote variables to prevent word splitting.
Multiple instances of unquoted $PD_ADDR can lead to word splitting or globbing if the variable contains spaces or special characters.
🛡️ Proposed fix
# Line 92
-run_br --pd $PD_ADDR backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT"
+run_br --pd "$PD_ADDR" backup full -s "gcs://$BUCKET/$DB?endpoint=http://$GCS_HOST:$GCS_PORT"
# Line 101
--key "$TEST_DIR/certs/br.key" \
- --pd $PD_ADDR -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
+ --pd "$PD_ADDR" -s "gcs://$BUCKET/${DB}_old?endpoint=http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false
# Line 111
-run_br restore full -s "gcs://$BUCKET/$DB?" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
+run_br restore full -s "gcs://$BUCKET/$DB?" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
# Line 139
- run_br restore full -s "gcs://$BUCKET/${DB}_old" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
+ run_br restore full -s "gcs://$BUCKET/${DB}_old" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=falseAlso applies to: 101-101, 111-111, 139-139
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 92-92: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/tests/br_gcs/run.sh` at line 92, The run_br invocation uses unquoted shell
variables which can cause word splitting/globbing (e.g. --pd $PD_ADDR); update
the calls to quote all variable expansions used in run_br and similar lines so
each value is treated as a single argument: wrap $PD_ADDR, $BUCKET, $DB,
$GCS_HOST and $GCS_PORT in double quotes wherever they are expanded (references:
the run_br command and its --pd and -s arguments shown in the diff; also apply
the same quoting to the other occurrences flagged at lines 101, 111, 139).
| echo "restore ${TAG} data starts..." | ||
| # after BR merged into TiDB we need skip version check because the build from tidb is not a release version. | ||
| bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT/storage/v1/" --check-requirements=false | ||
| bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false |
There was a problem hiding this comment.
Quote $PD_ADDR to prevent word splitting.
While unlikely to cause issues in practice, unquoted variables can lead to word splitting or globbing if they contain spaces or special characters.
🛡️ Proposed fix
- bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false
+ bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd $PD_ADDR --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false | |
| bin/br restore db --db test -s "gcs://$BUCKET/bk${TAG}" --pd "$PD_ADDR" --gcs.endpoint="http://$GCS_HOST:$GCS_PORT" --check-requirements=false |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 45-45: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/tests/docker_compatible_gcs/_run.sh` at line 45, The restore command
invocation using "bin/br restore db" is passing the PD address unquoted
(PD_ADDR) which can cause word-splitting/globbing; update the command to quote
the variable (use "$PD_ADDR") in the --pd argument so the entire address is
passed as a single token. Locate the line with the bin/br restore db command and
replace --pd $PD_ADDR with --pd "$PD_ADDR".
|
/test pull-br-integration-test |
|
@jebter: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, Leavrth The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
812fde7
into
pingcap:release-8.5-20260116-v8.5.5
successor of #62488
Issue Number: close #62541, ref #61547
Problem Summary:
Migration guide
https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/migrate-gosdk.html
What changed and how does it work?
This also picks:
Confliction Resolving(By GPT):
DEPS.bzl: kept the target commit side for AWS SDK v2 Bazel deps.br/pkg/storage/BUILD.bazel: kept target commit side for v2 deps.go.mod/go.sum: kept target commit side in conflict hunks.pkg/lightning/log/log.go: kept target commit side in conflict hunk.br/cmd/br/operator.go: keptnewTestStorageCommand()from target commit, droppednewPitrChecksumCommand()andnewUpstreamChecksumCommand().br/pkg/storage/s3.go: mostly adopted target commit side; removedrecording/AccessRecordingintegration points.br/pkg/storage/recordingprerequisite commit; preserving those references would not compile.br/pkg/storage/s3_test.go: changed conflict-hunk tests back tocreateS3Suite(...)form (without recording collector).Check List
Tests
A new subcommand
br operator test-storagewas added.Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests