eventstore: refine event store key range format and cache handling#4953
eventstore: refine event store key range format and cache handling#4953ti-chi-bot[bot] merged 11 commits intomasterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR makes three coordinated changes: refactors event-store key encoding to fixed-offset segments and replaces ChangesKey Encoding, Pebble Cache, and GC commit-ts refactor
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a custom Pebble table property collector and filter to optimize read performance by skipping SSTables that fall outside the requested timestamp range. It also refactors the event store to properly manage and cleanup shared Pebble block and table caches. The review feedback highlights an opportunity to simplify the timestamp tracking logic in the property collector and recommends handling the error from db.NewIter to improve robustness.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@logservice/eventstore/format.go`:
- Around line 132-134: DecodeKeyMetas slices
key[encodedKeyMetasOffset:encodedKeyMetasEnd] without checking length which can
panic; add a guard at the top of DecodeKeyMetas to ensure len(key) >=
encodedKeyMetasEnd (and implicitly >= encodedKeyMetasEnd -
encodedKeyMetasOffset) and if not, return safe zero/default values (e.g.,
DMLOrder(0), CompressionType(0)) so callers won’t panic; keep the remainder of
the function (binary.BigEndian.Uint16 and bit masks) unchanged and reference
encodedKeyMetasOffset/encodedKeyMetasEnd when adding the check.
In `@logservice/eventstore/table_properties.go`:
- Around line 74-85: The overlap filter newEventStoreTableFilter currently
returns false when parsed properties are internally inconsistent (minTs >
maxTs); change the logic so after calling parseEventStoreCRTsTableProperty for
eventStoreMinCRTsTableProperty and eventStoreMaxCRTsTableProperty, if both parse
OK and minTs > maxTs then return true (fail open) instead of performing the
overlap check; keep existing behavior for parse failures (still return true) and
otherwise return maxTs >= lowerTs && minTs <= upperTs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e65a4fb0-6f2f-435e-b2f9-1d8a567b57ad
📒 Files selected for processing (4)
logservice/eventstore/event_store.gologservice/eventstore/format.gologservice/eventstore/pebble_test.gologservice/eventstore/table_properties.go
🚧 Files skipped from review as they are similar to previous changes (2)
- logservice/eventstore/pebble_test.go
- logservice/eventstore/event_store.go
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
logservice/eventstore/event_store.go (1)
428-441:⚠️ Potential issue | 🟠 MajorOnly release the shared Pebble caches after all DBs close cleanly.
The shared
pebbleCacheandtableCache(created inlogservice/eventstore/pebble.go:81-122) are unreferenced unconditionally even whendb.Close()fails on lines 433-441. If a DB fails to close due to active iterators, it remains open and still holds references to these shared caches. CallingUnref()on them violates Pebble's reference-counting contract.Additionally,
Close()does not wait for background workers (tracked bye.wg) before releasing shared resources. Workers spawned bywriteTaskPool.run()remain active and can continue writing to DBs or creating iterators during cache teardown.🛠️ Minimal mitigation
func (e *eventStore) Close(_ context.Context) error { log.Info("event store start to close") defer log.Info("event store closed") e.closed.Store(true) + allDBsClosed := true for _, db := range e.dbs { if err := db.Close(); err != nil { + allDBsClosed = false log.Error("failed to close pebble db", zap.Error(err)) } } + if !allDBsClosed { + return nil + } if e.tableCache != nil { if err := e.tableCache.Unref(); err != nil { log.Error("failed to unref pebble table cache", zap.Error(err)) } e.tableCache = nilLonger term: propagate a cancellation context to background workers, wait for
e.wg, then close DBs and only unref caches if all close cleanly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@logservice/eventstore/event_store.go` around lines 428 - 441, The current shutdown unrefs shared pebble caches (e.tableCache, e.pebbleCache) even if some db.Close() calls on e.dbs fail and without waiting for background workers (e.wg), which can violate Pebble's refcounting; modify teardown so you first signal/ cancel background workers (propagate a cancellation via existing context/cancel or add one), wait for e.wg to finish, then attempt to close all dbs (call db.Close() for each and collect any errors), and only call e.tableCache.Unref() and e.pebbleCache.Unref() (and nil them) if all db.Close() calls succeeded; if any Close failed, log the errors and skip Unref to avoid breaking Pebble refcounting (also ensure proper logging where db.Close() fails).
♻️ Duplicate comments (1)
logservice/eventstore/format.go (1)
116-118:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard
DecodeKeyMetasagainst short keys.Line 117 still slices
key[encodedKeyMetasOffset:encodedKeyMetasEnd]unconditionally. Any truncated or malformed key will panic the iterator path instead of failing closed.🛠️ Proposed fix
func DecodeKeyMetas(key []byte) (DMLOrder, CompressionType) { + if len(key) < encodedKeyMetasEnd { + return 0, CompressionNone + } combinedOrder := binary.BigEndian.Uint16(key[encodedKeyMetasOffset:encodedKeyMetasEnd]) return DMLOrder((combinedOrder & dmlOrderMask) >> dmlOrderShift), CompressionType(combinedOrder & compressionMask) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@logservice/eventstore/format.go` around lines 116 - 118, Guard DecodeKeyMetas against keys shorter than encodedKeyMetasEnd by checking len(key) >= encodedKeyMetasEnd before slicing; if the check fails, return safe zero/unknown values (e.g., DMLOrder(0), CompressionType(0)) or an explicit error sentinel so the iterator fails closed instead of panicking. Update the function DecodeKeyMetas to perform this length check using the constants encodedKeyMetasOffset and encodedKeyMetasEnd and only call binary.BigEndian.Uint16 on the slice when it is long enough.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@logservice/eventstore/event_store.go`:
- Around line 428-441: The current shutdown unrefs shared pebble caches
(e.tableCache, e.pebbleCache) even if some db.Close() calls on e.dbs fail and
without waiting for background workers (e.wg), which can violate Pebble's
refcounting; modify teardown so you first signal/ cancel background workers
(propagate a cancellation via existing context/cancel or add one), wait for e.wg
to finish, then attempt to close all dbs (call db.Close() for each and collect
any errors), and only call e.tableCache.Unref() and e.pebbleCache.Unref() (and
nil them) if all db.Close() calls succeeded; if any Close failed, log the errors
and skip Unref to avoid breaking Pebble refcounting (also ensure proper logging
where db.Close() fails).
---
Duplicate comments:
In `@logservice/eventstore/format.go`:
- Around line 116-118: Guard DecodeKeyMetas against keys shorter than
encodedKeyMetasEnd by checking len(key) >= encodedKeyMetasEnd before slicing; if
the check fails, return safe zero/unknown values (e.g., DMLOrder(0),
CompressionType(0)) or an explicit error sentinel so the iterator fails closed
instead of panicking. Update the function DecodeKeyMetas to perform this length
check using the constants encodedKeyMetasOffset and encodedKeyMetasEnd and only
call binary.BigEndian.Uint16 on the slice when it is long enough.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d675e132-7a7d-4f3b-a338-eba4896bfc2a
📒 Files selected for processing (5)
logservice/eventstore/event_store.gologservice/eventstore/format.gologservice/eventstore/pebble_test.gologservice/eventstore/table_properties.gopkg/metrics/event_store.go
✅ Files skipped from review due to trivial changes (1)
- logservice/eventstore/pebble_test.go
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the event store key encoding logic by introducing specific boundary and scan lower bound encoding functions, along with renaming fields for clarity (e.g., CRTs to txnCommitTs). It also improves Pebble cache management by explicitly tracking and releasing pebble.Cache and pebble.TableCache. Feedback was provided regarding a potential integer overflow when calculating scan boundaries using CommitTsStart+1 and CommitTsEnd+1.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
logservice/eventstore/event_store_test.go (1)
1260-1261:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
zstd.Decodergoroutine leak —decoder.Close()is never called.
zstd.NewReadermust haveClose()called on it to release resources and stop all running background goroutines. Thedecodercreated at line 1260 is passed toiterbut the sub-test only callsiter.innerIter.Close()at line 1282, skipping the decoder entirely. With 6 test cases, 6 decoders' goroutines are leaked on every test run. This will be flagged by goroutine-leak detectors (e.g.,goleak) and-raceruns.🐛 Proposed fix
decoder, err := zstd.NewReader(nil) require.NoError(t, err) +defer decoder.Close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@logservice/eventstore/event_store_test.go` around lines 1260 - 1261, The zstd decoder created with zstd.NewReader (the variable decoder) is never closed, leaking goroutines; ensure decoder.Close() is always called (e.g., immediately defer decoder.Close() after successful creation) so the decoder is cleaned up even if the test fails or when iter is closed; tie the lifecycle to the existing iter/innerIter by deferring decoder.Close() right after decoder, err := zstd.NewReader(nil) and/or ensuring any teardown path that closes iter.innerIter also closes decoder if iter doesn't already own it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@logservice/eventstore/format.go`:
- Around line 140-147: deleteDataRange and compactDataRange return raw pebble
errors; wrap the returned errors with errors.Trace() from
github.com/pingcap/ticdc/pkg/errors so stack traces are preserved. Update both
functions to import "github.com/pingcap/ticdc/pkg/errors" and wrap the calls to
db.DeleteRange(...) and db.Compact(...) with errors.Trace(...), returning that
wrapped error instead of the raw pebble error.
---
Outside diff comments:
In `@logservice/eventstore/event_store_test.go`:
- Around line 1260-1261: The zstd decoder created with zstd.NewReader (the
variable decoder) is never closed, leaking goroutines; ensure decoder.Close() is
always called (e.g., immediately defer decoder.Close() after successful
creation) so the decoder is cleaned up even if the test fails or when iter is
closed; tie the lifecycle to the existing iter/innerIter by deferring
decoder.Close() right after decoder, err := zstd.NewReader(nil) and/or ensuring
any teardown path that closes iter.innerIter also closes decoder if iter doesn't
already own it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f1da2afe-0766-4a1f-9937-eef66c9329d2
📒 Files selected for processing (9)
logservice/eventstore/event_store.gologservice/eventstore/event_store_bench_test.gologservice/eventstore/event_store_test.gologservice/eventstore/format.gologservice/eventstore/format_test.gologservice/eventstore/gc.gologservice/eventstore/gc_test.gologservice/eventstore/pebble.gologservice/eventstore/pebble_test.go
|
/gemini summary |
Summary of ChangesThis pull request refines the event store's internal key encoding and resource management. By moving to a fixed-offset encoding scheme and introducing explicit boundary key encoders, the system achieves more consistent data handling. Additionally, the PR optimizes database performance by sharing Pebble cache resources across database instances and improves the garbage collection logic by aligning range tracking with transaction commit timestamps. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
|
|
/test all |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (2)
logservice/eventstore/format.go (2)
140-156:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWrap Pebble errors to preserve stack traces.
Per coding guidelines, third-party library errors should be wrapped with
errors.Trace(). BothdeleteDataRangeandcompactDataRangereturn raw Pebble errors.🛠️ Proposed fix
Add import:
"github.com/pingcap/ticdc/pkg/errors"Then update the functions:
func deleteDataRange( db *pebble.DB, uniqueKeyID uint64, tableID int64, startTxnCommitTs uint64, endTxnCommitTs uint64, ) error { start := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, startTxnCommitTs) end := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, endTxnCommitTs) - - return db.DeleteRange(start, end, pebble.NoSync) + if err := db.DeleteRange(start, end, pebble.NoSync); err != nil { + return errors.Trace(err) + } + return nil } func compactDataRange( db *pebble.DB, uniqueKeyID uint64, tableID int64, startTxnCommitTs uint64, endTxnCommitTs uint64, ) error { start := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, startTxnCommitTs) end := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, endTxnCommitTs) - - return db.Compact(start, end, false) + if err := db.Compact(start, end, false); err != nil { + return errors.Trace(err) + } + return nil }As per coding guidelines: "When an error comes from a third party/library call, wrap it immediately with
errors.Trace(err)orerrors.WrapError(...)to attach a stack trace".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@logservice/eventstore/format.go` around lines 140 - 156, deleteDataRange and compactDataRange currently return raw Pebble errors; wrap the Pebble calls' returned errors with errors.Trace(...) to preserve stack traces. Add the import "github.com/pingcap/ticdc/pkg/errors" and change the return paths in deleteDataRange (which calls db.DeleteRange(start, end, pebble.NoSync)) and compactDataRange (which calls db.Compact(start, end, false)) to capture the error and return errors.Trace(err). Keep the existing start/end keys produced by encodeTxnCommitTsBoundaryKey and do not alter other logic.
114-118:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard against short keys before slicing in
DecodeKeyAttributes.Line 116 slices
key[encodedKeyAttributesOffset:encodedKeyAttributesEnd]without a length check, which can panic on malformed keys. This mirrors the pattern already applied indecodeTxnCommitTsFromEncodedKey.🛠️ Proposed fix
func DecodeKeyAttributes(key []byte) (DMLOrder, CompressionType) { + if len(key) < encodedKeyAttributesEnd { + return 0, CompressionNone + } combinedOrder := binary.BigEndian.Uint16(key[encodedKeyAttributesOffset:encodedKeyAttributesEnd]) return DMLOrder((combinedOrder & dmlOrderMask) >> dmlOrderShift), CompressionType(combinedOrder & compressionMask) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@logservice/eventstore/format.go` around lines 114 - 118, DecodeKeyAttributes currently slices key[encodedKeyAttributesOffset:encodedKeyAttributesEnd] without checking length which can panic; add a guard at the start of DecodeKeyAttributes that checks if len(key) < encodedKeyAttributesEnd and, if so, return safe zero/default values (e.g., DMLOrder(0), CompressionType(0)) instead of slicing. Keep the rest of the function logic (using binary.BigEndian.Uint16, dmlOrderMask, dmlOrderShift, compressionMask) unchanged so valid keys are decoded as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@logservice/eventstore/format.go`:
- Around line 140-156: deleteDataRange and compactDataRange currently return raw
Pebble errors; wrap the Pebble calls' returned errors with errors.Trace(...) to
preserve stack traces. Add the import "github.com/pingcap/ticdc/pkg/errors" and
change the return paths in deleteDataRange (which calls db.DeleteRange(start,
end, pebble.NoSync)) and compactDataRange (which calls db.Compact(start, end,
false)) to capture the error and return errors.Trace(err). Keep the existing
start/end keys produced by encodeTxnCommitTsBoundaryKey and do not alter other
logic.
- Around line 114-118: DecodeKeyAttributes currently slices
key[encodedKeyAttributesOffset:encodedKeyAttributesEnd] without checking length
which can panic; add a guard at the start of DecodeKeyAttributes that checks if
len(key) < encodedKeyAttributesEnd and, if so, return safe zero/default values
(e.g., DMLOrder(0), CompressionType(0)) instead of slicing. Keep the rest of the
function logic (using binary.BigEndian.Uint16, dmlOrderMask, dmlOrderShift,
compressionMask) unchanged so valid keys are decoded as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: afda66e2-e0cb-464f-a22b-17031a73a7fd
📒 Files selected for processing (9)
logservice/eventstore/event_store.gologservice/eventstore/event_store_bench_test.gologservice/eventstore/event_store_test.gologservice/eventstore/format.gologservice/eventstore/format_test.gologservice/eventstore/gc.gologservice/eventstore/gc_test.gologservice/eventstore/pebble.gologservice/eventstore/pebble_test.go
✅ Files skipped from review due to trivial changes (3)
- logservice/eventstore/event_store_test.go
- logservice/eventstore/pebble_test.go
- logservice/eventstore/gc_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- logservice/eventstore/event_store_bench_test.go
- logservice/eventstore/event_store.go
- logservice/eventstore/pebble.go
|
/test all |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 3AceShowHand, wk989898 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
/retest |
7 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
What problem does this PR solve?
Issue Number: close #4981
What is changed and how it works?
This pull request refines the event store's internal key encoding and resource management. By moving to a fixed-offset encoding scheme and introducing explicit boundary key encoders, the system achieves more consistent data handling.
Highlights
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit
Bug Fixes
Tests
Chores