pkg/dxf/importinto: limit conflicted row recording files max 1G#67119
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a global atomic counter and enforcement for a per-subtask total size cap on conflicted-row files; collectors atomically account written bytes, stop recording when the cap is exceeded, and the subtask metadata records whether recording was truncated. Changes
Sequence Diagram(s)sequenceDiagram
participant Executor
participant Collector
participant ObjStore
participant SharedCounter as SharedTotalSize
participant Meta as SubtaskMeta
Executor->>Collector: NewCollector(..., &SharedTotalSize, ...)
loop for each encoded row
Collector->>Collector: recordRowToFile(encodedRow)
Collector->>Collector: compute rowSize
Collector->>SharedCounter: atomic.Add(rowSize) / load total
alt total <= maxTotalConflictRowFileSize
Collector->>ObjStore: write row to current file
Collector->>Collector: update currFileSize, TotalFileSize, RowCount
else
Collector->>Collector: onTotalSizeLimitExceeded()
Collector->>ObjStore: close/clear writer
Collector->>Meta: set ConflictedRowRecordingCapped = true
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @D3Hunter. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67119 +/- ##
================================================
- Coverage 77.7074% 77.5012% -0.2063%
================================================
Files 2016 1936 -80
Lines 552163 543625 -8538
================================================
- Hits 429072 421316 -7756
- Misses 121349 122298 +949
+ Partials 1742 11 -1731
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Review Complete Findings: 1 issues ℹ️ Learn more details on Pantheon AI. |
|
🔍 New commits detected — starting re-review... |
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 4
- Inline comments: 3
- Summary-only findings (no inline anchor): 1
Findings (highest risk first)
🟡 [Minor] (3)
- TestCollectResultMerge does not verify FilesTruncated OR-propagation (pkg/dxf/importinto/conflictedkv/collector_test.go:43-86 (TestCollectResultMerge / checkResultFn))
handleEncodedRowInneruses a mechanical "Inner" suffix instead of describing behavior (pkg/dxf/importinto/conflictedkv/collector.go:172)- Writer not nil'd on close failure in
onTotalSizeLimitExceededcauses double-close (pkg/dxf/importinto/conflictedkv/collector.go:243-252 (onTotalSizeLimitExceededwriter-close block))
🧹 [Nit] (1)
FilesTruncatedcould be read as "file contents were truncated" rather than "file recording was capped" (pkg/dxf/importinto/conflictedkv/collector.go:58, pkg/dxf/importinto/proto.go:163)
Unanchored findings
🧹 [Nit] (1)
FilesTruncatedcould be read as "file contents were truncated" rather than "file recording was capped"- Scope: pkg/dxf/importinto/conflictedkv/collector.go:58, pkg/dxf/importinto/proto.go:163
- Request: Consider renaming to
RecordingTruncatedorRowRecordingCappedto match the cause-oriented naming ofTooManyConflictsFromIndex. If the current name is intentional for JSON-schema stability, add a brief inline note at the declaration explaining the naming choice.
|
Addressed the unanchored naming nit.
For compatibility with persisted subtask metadata, the struct field keeps the historical JSON key ( |
|
/retest |
| if sharedTotalFileSize == nil { | ||
| sharedTotalFileSize = &atomic.Int64{} | ||
| } |
There was a problem hiding this comment.
Is this some defensive checking? Seems no caller pass nil pointer.
There was a problem hiding this comment.
yes, i let AI add a comment
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: GMHDBJD, joechenrh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
5 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
What problem does this PR solve?
Issue Number: ref #66019
Problem Summary:
During
IMPORT INTOcollect-conflicts, conflicted rows can be continuously recorded to object storage. This may produce very large conflicted-row files and high storage usage. This PR introduces a hardcoded total-size cap for conflicted-row recording so we can collect feedback first before deciding whether/how to expose user configuration. as 1GB is already quite large for human to process manually, and storing those files unbounded will add costWhat changed and how does it work?
1GiBlimit for total conflicted-row file size in one collect-conflicts subtask.Addthen check threshold intentionally; a small overflow (typically one row) above the threshold is acceptable to keep concurrent accounting simple.HandleEncodedRowto centralizeRowCount/Checksumupdates and move write/limit details to an inner helper.onTotalSizeLimitExceededpath is exercised (viastopRecording/writer state assertions).BUILD.bazel) viamake bazel_prepare.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Improvements
Testing