Skip to content

eventstore: refine event store key range format and cache handling#4953

Merged
ti-chi-bot[bot] merged 11 commits intomasterfrom
ldz/optimize-event-store0429
May 7, 2026
Merged

eventstore: refine event store key range format and cache handling#4953
ti-chi-bot[bot] merged 11 commits intomasterfrom
ldz/optimize-event-store0429

Conversation

@lidezhu
Copy link
Copy Markdown
Collaborator

@lidezhu lidezhu commented Apr 29, 2026

What problem does this PR solve?

Issue Number: close #4981

What is changed and how it works?

This pull request refines the event store's internal key encoding and resource management. By moving to a fixed-offset encoding scheme and introducing explicit boundary key encoders, the system achieves more consistent data handling.

Highlights

  • Key Encoding Refactor: Refactored event-store key encoding to use fixed-offset segments and introduced dedicated boundary key encoders for transaction commit timestamps.
  • Cache Lifecycle Management: Implemented shared Pebble cache and table cache lifecycle management, ensuring proper initialization and cleanup within the event store.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Bug Fixes

    • More accurate transaction-commit-timestamp handling for GC and iterator bounds, fixing scans that spanned unfinished transactions.
    • Correctly detects stored compression type during iteration.
  • Tests

    • Added and extended tests to validate key encoding, boundary keys, and iterator behavior.
  • Chores

    • Improved shared DB cache/table-cache initialization, sizing and cleanup.
    • Refactored key encoding/decoding layout for more consistent offsets and bounds.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR makes three coordinated changes: refactors event-store key encoding to fixed-offset segments and replaces DecodeKeyMetas with DecodeKeyAttributes; makes Pebble cache/table-cache creation and lifecycle explicit (shared caches returned and unref'd on Close); and migrates GC range tracking and APIs to use transaction-commit-timestamp semantics.

Changes

Key Encoding, Pebble Cache, and GC commit-ts refactor

Layer / File(s) Summary
Encoded Key Layout / Data Shape
logservice/eventstore/format.go
Introduce fixed-offset encoded-key constants (encodedKeyTxnCommitTsStart/End, encodedKeyAttributesOffset/End), compute encodedKeyLen from offsets, add decodeTxnCommitTsFromEncodedKey, replace EncodeKeyPrefix usage with precise boundary/scan-key encoders (encodeTxnCommitTsBoundaryKey, encodeScanLowerBound).
Attribute Decoding API
logservice/eventstore/format.go
Remove DecodeKeyMetas; add DecodeKeyAttributes(key []byte) (DMLOrder, CompressionType) that decodes DML order + compression from the new attribute slice.
Pebble Options & Shared Cache Creation
logservice/eventstore/pebble.go
Add maxOpenFilesPerDB constant, wire it into newPebbleOptions, build a single shared *pebble.Cache and *pebble.TableCache sized via dbNum * pebble.TableCacheSize(maxOpenFilesPerDB), and change createPebbleDBs to return ( []*pebble.DB, *pebble.Cache, *pebble.TableCache ).
EventStore Wiring & Lifecycle
logservice/eventstore/event_store.go
Store shared pebbleCache and tableCache on eventStore; New receives and stores them; Close unrefs tableCache (warning on unref error) and pebbleCache, and closes DBs.
Iterator Bounds & Decompression
logservice/eventstore/event_store.go
GetIterator now computes lower bound using encodeScanLowerBound when LastScannedTxnStartTs is set, else uses encodeTxnCommitTsBoundaryKey (comments document bound semantics). Iterator decompression path uses DecodeKeyAttributes to obtain compression type.
GC APIs & Internal State
logservice/eventstore/gc.go
Change range function types and internal structs to startTxnCommitTs/endTxnCommitTs (replace generic startTs/endTs), update addGCItem, shouldFlushDeleteRange, doGCJob, updateCompactRanges, and doCompaction to operate on commit-ts semantics.
Benchmarks / Tests / Golden Validation
logservice/eventstore/*_test.go, logservice/eventstore/format_test.go
Update tests and benchmarks to use new encoders/decoders: replace DecodeKeyMetas with DecodeKeyAttributes; change iterator bounds to use encodeTxnCommitTsBoundaryKey; add TestEventStoreKeyBounds and golden TestEventStoreKeyFormatGolden verifying encoded layout, offsets, boundary/scanning keys, decodeTxnCommitTsFromEncodedKey, and DecodeKeyAttributes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • wk989898
  • hongyunyan
  • 3AceShowHand

Poem

🐰 I hopped through bytes and offsets fine,
I packed commit-ts where fields align,
Cached Pebble shared, now cleanly freed,
GC follows commit-ts where it must lead.
A little rabbit cheers this tidy design. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The PR description includes the required issue reference (#4981), explains the changes (key encoding refactor and cache lifecycle management), and addresses the template sections with highlights and checks.
Linked Issues check ✅ Passed The PR comprehensively addresses all five objectives from issue #4981: separates EncodeKeyPrefix semantics with dedicated boundary key encoders, renames fields to clarify commit-ts/start-ts meanings, adds focused golden tests for key format protection, updates GC code with clear commit-ts naming, and implements explicit Pebble cache lifecycle management.
Out of Scope Changes check ✅ Passed All changes align with the five objectives from issue #4981; no unrelated modifications or scope creep detected across the eight modified files.
Title check ✅ Passed The title 'eventstore: refine event store key format and cache handling' accurately summarizes the main changes, which involve refactoring key encoding with fixed-size segment offsets and implementing shared Pebble cache lifecycle management.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ldz/optimize-event-store0429

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a custom Pebble table property collector and filter to optimize read performance by skipping SSTables that fall outside the requested timestamp range. It also refactors the event store to properly manage and cleanup shared Pebble block and table caches. The review feedback highlights an opportunity to simplify the timestamp tracking logic in the property collector and recommends handling the error from db.NewIter to improve robustness.

Comment thread logservice/eventstore/table_properties.go Outdated
Comment thread logservice/eventstore/event_store.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logservice/eventstore/format.go`:
- Around line 132-134: DecodeKeyMetas slices
key[encodedKeyMetasOffset:encodedKeyMetasEnd] without checking length which can
panic; add a guard at the top of DecodeKeyMetas to ensure len(key) >=
encodedKeyMetasEnd (and implicitly >= encodedKeyMetasEnd -
encodedKeyMetasOffset) and if not, return safe zero/default values (e.g.,
DMLOrder(0), CompressionType(0)) so callers won’t panic; keep the remainder of
the function (binary.BigEndian.Uint16 and bit masks) unchanged and reference
encodedKeyMetasOffset/encodedKeyMetasEnd when adding the check.

In `@logservice/eventstore/table_properties.go`:
- Around line 74-85: The overlap filter newEventStoreTableFilter currently
returns false when parsed properties are internally inconsistent (minTs >
maxTs); change the logic so after calling parseEventStoreCRTsTableProperty for
eventStoreMinCRTsTableProperty and eventStoreMaxCRTsTableProperty, if both parse
OK and minTs > maxTs then return true (fail open) instead of performing the
overlap check; keep existing behavior for parse failures (still return true) and
otherwise return maxTs >= lowerTs && minTs <= upperTs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e65a4fb0-6f2f-435e-b2f9-1d8a567b57ad

📥 Commits

Reviewing files that changed from the base of the PR and between af9b69b and 8668f0d.

📒 Files selected for processing (4)
  • logservice/eventstore/event_store.go
  • logservice/eventstore/format.go
  • logservice/eventstore/pebble_test.go
  • logservice/eventstore/table_properties.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • logservice/eventstore/pebble_test.go
  • logservice/eventstore/event_store.go

Comment thread logservice/eventstore/format.go Outdated
Comment thread logservice/eventstore/table_properties.go Outdated
@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
logservice/eventstore/event_store.go (1)

428-441: ⚠️ Potential issue | 🟠 Major

Only release the shared Pebble caches after all DBs close cleanly.

The shared pebbleCache and tableCache (created in logservice/eventstore/pebble.go:81-122) are unreferenced unconditionally even when db.Close() fails on lines 433-441. If a DB fails to close due to active iterators, it remains open and still holds references to these shared caches. Calling Unref() on them violates Pebble's reference-counting contract.

Additionally, Close() does not wait for background workers (tracked by e.wg) before releasing shared resources. Workers spawned by writeTaskPool.run() remain active and can continue writing to DBs or creating iterators during cache teardown.

🛠️ Minimal mitigation
 func (e *eventStore) Close(_ context.Context) error {
 	log.Info("event store start to close")
 	defer log.Info("event store closed")

 	e.closed.Store(true)

+	allDBsClosed := true
 	for _, db := range e.dbs {
 		if err := db.Close(); err != nil {
+			allDBsClosed = false
 			log.Error("failed to close pebble db", zap.Error(err))
 		}
 	}
+	if !allDBsClosed {
+		return nil
+	}
 	if e.tableCache != nil {
 		if err := e.tableCache.Unref(); err != nil {
 			log.Error("failed to unref pebble table cache", zap.Error(err))
 		}
 		e.tableCache = nil

Longer term: propagate a cancellation context to background workers, wait for e.wg, then close DBs and only unref caches if all close cleanly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logservice/eventstore/event_store.go` around lines 428 - 441, The current
shutdown unrefs shared pebble caches (e.tableCache, e.pebbleCache) even if some
db.Close() calls on e.dbs fail and without waiting for background workers
(e.wg), which can violate Pebble's refcounting; modify teardown so you first
signal/ cancel background workers (propagate a cancellation via existing
context/cancel or add one), wait for e.wg to finish, then attempt to close all
dbs (call db.Close() for each and collect any errors), and only call
e.tableCache.Unref() and e.pebbleCache.Unref() (and nil them) if all db.Close()
calls succeeded; if any Close failed, log the errors and skip Unref to avoid
breaking Pebble refcounting (also ensure proper logging where db.Close() fails).
♻️ Duplicate comments (1)
logservice/eventstore/format.go (1)

116-118: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard DecodeKeyMetas against short keys.

Line 117 still slices key[encodedKeyMetasOffset:encodedKeyMetasEnd] unconditionally. Any truncated or malformed key will panic the iterator path instead of failing closed.

🛠️ Proposed fix
 func DecodeKeyMetas(key []byte) (DMLOrder, CompressionType) {
+	if len(key) < encodedKeyMetasEnd {
+		return 0, CompressionNone
+	}
 	combinedOrder := binary.BigEndian.Uint16(key[encodedKeyMetasOffset:encodedKeyMetasEnd])
 	return DMLOrder((combinedOrder & dmlOrderMask) >> dmlOrderShift), CompressionType(combinedOrder & compressionMask)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logservice/eventstore/format.go` around lines 116 - 118, Guard DecodeKeyMetas
against keys shorter than encodedKeyMetasEnd by checking len(key) >=
encodedKeyMetasEnd before slicing; if the check fails, return safe zero/unknown
values (e.g., DMLOrder(0), CompressionType(0)) or an explicit error sentinel so
the iterator fails closed instead of panicking. Update the function
DecodeKeyMetas to perform this length check using the constants
encodedKeyMetasOffset and encodedKeyMetasEnd and only call
binary.BigEndian.Uint16 on the slice when it is long enough.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@logservice/eventstore/event_store.go`:
- Around line 428-441: The current shutdown unrefs shared pebble caches
(e.tableCache, e.pebbleCache) even if some db.Close() calls on e.dbs fail and
without waiting for background workers (e.wg), which can violate Pebble's
refcounting; modify teardown so you first signal/ cancel background workers
(propagate a cancellation via existing context/cancel or add one), wait for e.wg
to finish, then attempt to close all dbs (call db.Close() for each and collect
any errors), and only call e.tableCache.Unref() and e.pebbleCache.Unref() (and
nil them) if all db.Close() calls succeeded; if any Close failed, log the errors
and skip Unref to avoid breaking Pebble refcounting (also ensure proper logging
where db.Close() fails).

---

Duplicate comments:
In `@logservice/eventstore/format.go`:
- Around line 116-118: Guard DecodeKeyMetas against keys shorter than
encodedKeyMetasEnd by checking len(key) >= encodedKeyMetasEnd before slicing; if
the check fails, return safe zero/unknown values (e.g., DMLOrder(0),
CompressionType(0)) or an explicit error sentinel so the iterator fails closed
instead of panicking. Update the function DecodeKeyMetas to perform this length
check using the constants encodedKeyMetasOffset and encodedKeyMetasEnd and only
call binary.BigEndian.Uint16 on the slice when it is long enough.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d675e132-7a7d-4f3b-a338-eba4896bfc2a

📥 Commits

Reviewing files that changed from the base of the PR and between 81526df and a384cb1.

📒 Files selected for processing (5)
  • logservice/eventstore/event_store.go
  • logservice/eventstore/format.go
  • logservice/eventstore/pebble_test.go
  • logservice/eventstore/table_properties.go
  • pkg/metrics/event_store.go
✅ Files skipped from review due to trivial changes (1)
  • logservice/eventstore/pebble_test.go

@ti-chi-bot ti-chi-bot Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 2, 2026
@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 2, 2026
@lidezhu lidezhu changed the title logservice: improve event store Pebble table filtering eventstore: refine event store keys and Pebble cache lifecycle May 2, 2026
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 2, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the event store key encoding logic by introducing specific boundary and scan lower bound encoding functions, along with renaming fields for clarity (e.g., CRTs to txnCommitTs). It also improves Pebble cache management by explicitly tracking and releasing pebble.Cache and pebble.TableCache. Feedback was provided regarding a potential integer overflow when calculating scan boundaries using CommitTsStart+1 and CommitTsEnd+1.

Comment thread logservice/eventstore/event_store.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
logservice/eventstore/event_store_test.go (1)

1260-1261: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

zstd.Decoder goroutine leak — decoder.Close() is never called.

zstd.NewReader must have Close() called on it to release resources and stop all running background goroutines. The decoder created at line 1260 is passed to iter but the sub-test only calls iter.innerIter.Close() at line 1282, skipping the decoder entirely. With 6 test cases, 6 decoders' goroutines are leaked on every test run. This will be flagged by goroutine-leak detectors (e.g., goleak) and -race runs.

🐛 Proposed fix
 decoder, err := zstd.NewReader(nil)
 require.NoError(t, err)
+defer decoder.Close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logservice/eventstore/event_store_test.go` around lines 1260 - 1261, The zstd
decoder created with zstd.NewReader (the variable decoder) is never closed,
leaking goroutines; ensure decoder.Close() is always called (e.g., immediately
defer decoder.Close() after successful creation) so the decoder is cleaned up
even if the test fails or when iter is closed; tie the lifecycle to the existing
iter/innerIter by deferring decoder.Close() right after decoder, err :=
zstd.NewReader(nil) and/or ensuring any teardown path that closes iter.innerIter
also closes decoder if iter doesn't already own it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@logservice/eventstore/format.go`:
- Around line 140-147: deleteDataRange and compactDataRange return raw pebble
errors; wrap the returned errors with errors.Trace() from
github.com/pingcap/ticdc/pkg/errors so stack traces are preserved. Update both
functions to import "github.com/pingcap/ticdc/pkg/errors" and wrap the calls to
db.DeleteRange(...) and db.Compact(...) with errors.Trace(...), returning that
wrapped error instead of the raw pebble error.

---

Outside diff comments:
In `@logservice/eventstore/event_store_test.go`:
- Around line 1260-1261: The zstd decoder created with zstd.NewReader (the
variable decoder) is never closed, leaking goroutines; ensure decoder.Close() is
always called (e.g., immediately defer decoder.Close() after successful
creation) so the decoder is cleaned up even if the test fails or when iter is
closed; tie the lifecycle to the existing iter/innerIter by deferring
decoder.Close() right after decoder, err := zstd.NewReader(nil) and/or ensuring
any teardown path that closes iter.innerIter also closes decoder if iter doesn't
already own it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f1da2afe-0766-4a1f-9937-eef66c9329d2

📥 Commits

Reviewing files that changed from the base of the PR and between a384cb1 and d0123a1.

📒 Files selected for processing (9)
  • logservice/eventstore/event_store.go
  • logservice/eventstore/event_store_bench_test.go
  • logservice/eventstore/event_store_test.go
  • logservice/eventstore/format.go
  • logservice/eventstore/format_test.go
  • logservice/eventstore/gc.go
  • logservice/eventstore/gc_test.go
  • logservice/eventstore/pebble.go
  • logservice/eventstore/pebble_test.go

Comment thread logservice/eventstore/format.go
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 2, 2026

/gemini summary

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

This pull request refines the event store's internal key encoding and resource management. By moving to a fixed-offset encoding scheme and introducing explicit boundary key encoders, the system achieves more consistent data handling. Additionally, the PR optimizes database performance by sharing Pebble cache resources across database instances and improves the garbage collection logic by aligning range tracking with transaction commit timestamps.

Highlights

  • Key Encoding Refactor: Refactored event-store key encoding to use fixed-offset segments and introduced dedicated boundary key encoders for transaction commit timestamps.
  • Cache Lifecycle Management: Implemented shared Pebble cache and table cache lifecycle management, ensuring proper initialization and cleanup within the event store.
  • GC Operation Updates: Updated garbage collection operations to track ranges using transaction commit timestamps instead of generic timestamps for improved accuracy.
  • Test Coverage: Added comprehensive tests for key encoding, boundary verification, and golden file comparisons to ensure consistency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity
  • Pull request created by lidezhu.
  • Automated review paused due to active development.
  • CodeRabbit provided a detailed walkthrough and identified potential issues regarding error handling and key slicing.
  • Addressed review comments, including adding safety guards for key decoding and updating error handling.
  • Added new unit tests for key formatting and boundary verification.
  • User requested a summary of the pull request.

@lidezhu lidezhu changed the title eventstore: refine event store keys and Pebble cache lifecycle eventstore: refine key format naming and Pebble cache lifecycle May 2, 2026
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 2, 2026

/test all

@lidezhu lidezhu changed the title eventstore: refine key format naming and Pebble cache lifecycle eventstore: refine event store key range and cache handling May 2, 2026
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
logservice/eventstore/format.go (2)

140-156: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap Pebble errors to preserve stack traces.

Per coding guidelines, third-party library errors should be wrapped with errors.Trace(). Both deleteDataRange and compactDataRange return raw Pebble errors.

🛠️ Proposed fix

Add import:

"github.com/pingcap/ticdc/pkg/errors"

Then update the functions:

 func deleteDataRange(
 	db *pebble.DB, uniqueKeyID uint64, tableID int64, startTxnCommitTs uint64, endTxnCommitTs uint64,
 ) error {
 	start := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, startTxnCommitTs)
 	end := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, endTxnCommitTs)
-
-	return db.DeleteRange(start, end, pebble.NoSync)
+	if err := db.DeleteRange(start, end, pebble.NoSync); err != nil {
+		return errors.Trace(err)
+	}
+	return nil
 }
 
 func compactDataRange(
 	db *pebble.DB, uniqueKeyID uint64, tableID int64, startTxnCommitTs uint64, endTxnCommitTs uint64,
 ) error {
 	start := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, startTxnCommitTs)
 	end := encodeTxnCommitTsBoundaryKey(uniqueKeyID, tableID, endTxnCommitTs)
-
-	return db.Compact(start, end, false)
+	if err := db.Compact(start, end, false); err != nil {
+		return errors.Trace(err)
+	}
+	return nil
 }

As per coding guidelines: "When an error comes from a third party/library call, wrap it immediately with errors.Trace(err) or errors.WrapError(...) to attach a stack trace".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logservice/eventstore/format.go` around lines 140 - 156, deleteDataRange and
compactDataRange currently return raw Pebble errors; wrap the Pebble calls'
returned errors with errors.Trace(...) to preserve stack traces. Add the import
"github.com/pingcap/ticdc/pkg/errors" and change the return paths in
deleteDataRange (which calls db.DeleteRange(start, end, pebble.NoSync)) and
compactDataRange (which calls db.Compact(start, end, false)) to capture the
error and return errors.Trace(err). Keep the existing start/end keys produced by
encodeTxnCommitTsBoundaryKey and do not alter other logic.

114-118: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard against short keys before slicing in DecodeKeyAttributes.

Line 116 slices key[encodedKeyAttributesOffset:encodedKeyAttributesEnd] without a length check, which can panic on malformed keys. This mirrors the pattern already applied in decodeTxnCommitTsFromEncodedKey.

🛠️ Proposed fix
 func DecodeKeyAttributes(key []byte) (DMLOrder, CompressionType) {
+	if len(key) < encodedKeyAttributesEnd {
+		return 0, CompressionNone
+	}
 	combinedOrder := binary.BigEndian.Uint16(key[encodedKeyAttributesOffset:encodedKeyAttributesEnd])
 	return DMLOrder((combinedOrder & dmlOrderMask) >> dmlOrderShift), CompressionType(combinedOrder & compressionMask)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@logservice/eventstore/format.go` around lines 114 - 118, DecodeKeyAttributes
currently slices key[encodedKeyAttributesOffset:encodedKeyAttributesEnd] without
checking length which can panic; add a guard at the start of DecodeKeyAttributes
that checks if len(key) < encodedKeyAttributesEnd and, if so, return safe
zero/default values (e.g., DMLOrder(0), CompressionType(0)) instead of slicing.
Keep the rest of the function logic (using binary.BigEndian.Uint16,
dmlOrderMask, dmlOrderShift, compressionMask) unchanged so valid keys are
decoded as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@logservice/eventstore/format.go`:
- Around line 140-156: deleteDataRange and compactDataRange currently return raw
Pebble errors; wrap the Pebble calls' returned errors with errors.Trace(...) to
preserve stack traces. Add the import "github.com/pingcap/ticdc/pkg/errors" and
change the return paths in deleteDataRange (which calls db.DeleteRange(start,
end, pebble.NoSync)) and compactDataRange (which calls db.Compact(start, end,
false)) to capture the error and return errors.Trace(err). Keep the existing
start/end keys produced by encodeTxnCommitTsBoundaryKey and do not alter other
logic.
- Around line 114-118: DecodeKeyAttributes currently slices
key[encodedKeyAttributesOffset:encodedKeyAttributesEnd] without checking length
which can panic; add a guard at the start of DecodeKeyAttributes that checks if
len(key) < encodedKeyAttributesEnd and, if so, return safe zero/default values
(e.g., DMLOrder(0), CompressionType(0)) instead of slicing. Keep the rest of the
function logic (using binary.BigEndian.Uint16, dmlOrderMask, dmlOrderShift,
compressionMask) unchanged so valid keys are decoded as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: afda66e2-e0cb-464f-a22b-17031a73a7fd

📥 Commits

Reviewing files that changed from the base of the PR and between a384cb1 and d0123a1.

📒 Files selected for processing (9)
  • logservice/eventstore/event_store.go
  • logservice/eventstore/event_store_bench_test.go
  • logservice/eventstore/event_store_test.go
  • logservice/eventstore/format.go
  • logservice/eventstore/format_test.go
  • logservice/eventstore/gc.go
  • logservice/eventstore/gc_test.go
  • logservice/eventstore/pebble.go
  • logservice/eventstore/pebble_test.go
✅ Files skipped from review due to trivial changes (3)
  • logservice/eventstore/event_store_test.go
  • logservice/eventstore/pebble_test.go
  • logservice/eventstore/gc_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • logservice/eventstore/event_store_bench_test.go
  • logservice/eventstore/event_store.go
  • logservice/eventstore/pebble.go

@ti-chi-bot ti-chi-bot Bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels May 6, 2026
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/test all

@ti-chi-bot ti-chi-bot Bot added the lgtm label May 6, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [3AceShowHand,wk989898]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 6, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 6, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-06 05:59:57.564904658 +0000 UTC m=+247470.438254640: ☑️ agreed by 3AceShowHand.
  • 2026-05-06 06:52:02.941812151 +0000 UTC m=+250595.815162113: ☑️ agreed by wk989898.

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu lidezhu changed the title eventstore: refine event store key range and cache handling eventstore: refine event store key format and cache handling May 6, 2026
@lidezhu lidezhu changed the title eventstore: refine event store key format and cache handling eventstore: refine event store key range format and cache handling May 6, 2026
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

7 similar comments
@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@lidezhu
Copy link
Copy Markdown
Collaborator Author

lidezhu commented May 6, 2026

/retest

@ti-chi-bot ti-chi-bot Bot merged commit 56c6d3b into master May 7, 2026
25 checks passed
@ti-chi-bot ti-chi-bot Bot deleted the ldz/optimize-event-store0429 branch May 7, 2026 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

eventstore: clarify key range encoding and Pebble cache handling

3 participants