disagg: Fix unexpected object storage usage caused by pre-lock residue (#10760)#10767
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@yinshuangfei: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[LGTM Timeline notifier]Timeline:
|
|
@kolafish: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CalvinNeo, JaySon-Huang, JinheLin, kolafish, yinshuangfei The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f050924
into
pingcap:release-nextgen-20251011
This is an automated cherry-pick of #10760
What problem does this PR solve?
Issue Number: close #10763
Problem Summary:
PageDirectorywrite-group semantics could cause follower writers to miss their own applied lock-id cleanup signals.S3LockLocalManager.pre_lock_keyscould remain resident and be repeatedly written into manifest locks.What is changed and how it works?
End-to-end correctness fixes for lock lifecycle
PageDirectory::applynow returns writer-scopedapplied_data_filesfor both write-group owner and followers, so each writer gets its own cleanup signal.UniversalPageStorage::writeuses those per-writer ids to clean pre-locks reliably after apply.cleanPreLockKeysOnWriteFailure(...)is invoked when remote write/apply fails.createS3LockForWriteBatchwas adjusted to avoid partial pre-lock residue on partial lock-creation failures (append topre_lock_keysafter lock-creation pass), and its return value is now aligned with "newly appended keys" semantics.Test coverage and regression guards
PageDirectoryandUniversalPageStoragepaths.S3LockLocalManagertests for partial cleanup, failure cleanup, lock-return semantics, and partial-failure atomicity.std::launch::asyncto avoid deferred scheduling risk.Observability and operations improvements
S3GCManagerService.tiflash_storage_s3_store_summary_bytes{store_id, type=data_file_bytes|dt_file_bytes}remote_summary_interval_secondsand wired it throughTMTContext;<= 0disables periodic summary task registration.Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Bug Fixes
New Features
Improvements