Skip to content

add schema_version check for delta scan#219

Merged
thweetkomputer merged 2 commits intomainfrom
version_check_ckpt_zc
Nov 18, 2025
Merged

add schema_version check for delta scan#219
thweetkomputer merged 2 commits intomainfrom
version_check_ckpt_zc

Conversation

@thweetkomputer
Copy link
Collaborator

@thweetkomputer thweetkomputer commented Nov 14, 2025

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • Scan and data-sync operations now carry and expose a schema version to enable version-aware processing.
  • Bug Fixes

    • Added schema-version validation: when a requested schema version conflicts with the active schema, the system logs a warning and returns a standardized error immediately instead of retrying, preventing delayed or silent failures.

@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2025

Walkthrough

Adds a uint64_t schema_version_ field and SchemaVersion() accessors to several CC request structs, propagates schema_version into call sites, and changes TemplateCcMap to detect requested/in-memory schema mismatches, set REQUESTED_TABLE_SCHEMA_MISMATCH, and return without re-enqueuing.

Changes

Cohort / File(s) Summary
CC request structs (schema plumbing)
include/cc/cc_request.h
Added uint64_t schema_version_ member and uint64_t SchemaVersion() const override to HashPartitionDataSyncScanCc, RangePartitionDataSyncScanCc, ScanSliceDeltaSizeCcForRangePartition, and ScanDeltaSizeCcForHashPartition. Constructors updated to accept/initialize schema_version. Added TemplateCcMap friend declarations.
TemplateCcMap runtime validation
include/cc/template_cc_map.h
When a request carries a non-zero schema_version_ that differs from the in-memory table schema version, code now asserts expected ordering, logs a warning, sets CcErrorCode::REQUESTED_TABLE_SCHEMA_MISMATCH on the request, and returns false (no re-enqueue). Applied across data-sync and scan paths handling hash/range partitions.
Call-site propagation of schema_version
src/cc/local_cc_shards.cpp
Updated calls to ScanSliceDeltaSizeCcForRangePartition and ScanDeltaSizeCcForHashPartition to pass an additional schema_version parameter (sourced from catalog/table schema versions).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Local as LocalCcShards
  participant Req as CC Request (has schema_version_)
  participant Map as TemplateCcMap
  Note over Local,Req: Request construction now includes schema_version
  Local->>Req: construct(..., schema_version)
  Local->>Map: submit(req)
  Map->>Map: if req.schema_version_ == 0 or equals table_schema_.Version()
  alt match or zero
    Map->>Map: proceed with scan/data-sync
    Map-->>Local: enqueue/continue
  else mismatch
    Map->>Map: assert req.schema_version_ > table_schema_.Version()
    Map-->>Req: SetError(REQUESTED_TABLE_SCHEMA_MISMATCH)
    Map-->>Local: return false (no retry)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review focus:
    • include/cc/template_cc_map.h — ensure all mismatch branches correctly set error and avoid re-enqueue.
    • include/cc/cc_request.h — constructors, SchemaVersion() signatures, and friend template declarations.
    • src/cc/local_cc_shards.cpp — correct schema_version values passed at call sites.

Possibly related PRs

Suggested reviewers

  • liunyl
  • lzxddz

Poem

"I hop through fields of schema light, 🐇
I tuck a version in each request tonight,
If versions clash I ring the bell,
A mismatch flagged — no retry spell,
Delta scans march tidy until first light."

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template checklist with all items unchecked and no actual description of the changes, objectives, rationale, or which checklist items have been addressed. Replace the template with a meaningful description explaining what was changed, why it was changed, which tests were added, and check off completed items from the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add schema_version check for delta scan' accurately describes the main changes, which add schema_version handling to multiple scan/sync classes in the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch version_check_ckpt_zc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thweetkomputer thweetkomputer marked this pull request as ready for review November 14, 2025 09:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
include/cc/cc_request.h (3)

8492-8516: Schema version plumbed into range-partition delta-scan ctor — add accessor for consistency.

Ctor/initializers look good. To align with other requests and avoid relying solely on friend access, expose SchemaVersion() here as well.

Apply inside struct ScanSliceDeltaSizeCcForRangePartition:

@@
 struct ScanSliceDeltaSizeCcForRangePartition : public CcRequestBase
 {
@@
     void SetHasDmlSinceDdl()
     {
         bool expect = false;
         has_dml_since_ddl_.compare_exchange_strong(
             expect, true, std::memory_order_acq_rel);
     }
@@
+    // Provide uniform access like other requests.
+    uint64_t SchemaVersion() const override { return schema_version_; }

Also consider adding a small unit/integration test that verifies a mismatched schema_version triggers the intended guard in TemplateCcMap for this request.

Please confirm all call sites were updated to pass schema_version and that TemplateCcMap checks it for this request.


8723-8732: Minimize friend exposure; prefer accessor if possible.

You added a private member schema_version_ and friended TemplateCcMap. If you add SchemaVersion() as suggested, you likely can drop or narrow the friend dependency unless TemplateCcMap needs other internals. If friend is still required, keep it; otherwise, remove to reduce coupling.


8899-8915: Private schema_version_ + friend added — OK; consider const and/or accessor.

schema_version_ addition is fine. If it never changes post-ctor, making it const can communicate immutability. With an accessor in place, you may also be able to trim friend scope unless TemplateCcMap needs more than the version.

include/cc/template_cc_map.h (1)

8149-8160: Hash delta-size scan version check OK; align log/comment and table name usage

  • Consistent with range variant—good.
  • Same nits: message should say "delta-size scan"; fix "util"; and log via class table_name_ for consistency.

Suggested diff:

-        LOG(WARNING) << "Table schema version mismatched for data sync "
-                        "scan, table name: "
-                     << req.table_name_.String()
+        LOG(WARNING) << "Table schema version mismatched for delta-size scan, table name: "
+                     << table_name_.String()
                      << " ,scan carried version: " << req.schema_version_
                      << " ,ccmap version: " << table_schema_->Version();
-        // yield util version matched
+        // yield until version matches
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7e04da and 9213ad2.

📒 Files selected for processing (3)
  • include/cc/cc_request.h (6 hunks)
  • include/cc/template_cc_map.h (2 hunks)
  • src/cc/local_cc_shards.cpp (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • include/cc/template_cc_map.h
  • include/cc/cc_request.h
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • include/cc/template_cc_map.h
  • include/cc/cc_request.h
  • src/cc/local_cc_shards.cpp
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • include/cc/cc_request.h
  • src/cc/local_cc_shards.cpp
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • src/cc/local_cc_shards.cpp
🧬 Code graph analysis (1)
include/cc/cc_request.h (3)
include/cc/template_cc_map.h (3)
  • bool (11198-11205)
  • RangePartitioned (11585-11588)
  • TemplateCcMap (164-168)
include/tx_request.h (1)
  • schema_version_ (583-583)
include/remote/remote_cc_request.h (1)
  • schema_version_ (450-450)
🔇 Additional comments (2)
src/cc/local_cc_shards.cpp (2)

3505-3506: Parameter order is correct; verify caller properly handles async re-enqueue semantics during version mismatch.

The constructor signature at line 8492 confirms is_dirty precedes schema_version ✓, and version-checking logic exists in template_cc_map.h. However, when a schema-version mismatch is detected during the data-sync scan, the request is re-enqueued (shard_->Enqueue(&req)) rather than setting an error. This means slices_delta_size remains empty while awaiting schema alignment—not a silent success but an intentional wait-and-retry.

Verify that the caller (lines 3543–3572) correctly handles the async retry semantics: specifically, confirm that on empty results during a version-mismatch scenario, the code does not prematurely commit or advance last_data_sync_ts before the re-enqueued scan eventually completes.


4353-4355: Arg order confirmed correct; schema version mismatch handling requires clarification.

The argument order in the hash path is correct: TxNumber() and catalog_rec.Schema()->Version() map to the txn and schema_version constructor parameters.

However, the schema version mismatch behavior is as flagged: in template_cc_map.h (line 8149–8156), when the schema version mismatches, the code re-queues the request via shard_->Enqueue(&req) without calling SetError(). The caller in src/cc/local_cc_shards.cpp (line 4359) checks if (scan_delta_size_cc.IsError()), which will be false on re-queue, allowing subsequent logic to proceed with an uninitialized UpdatedMemory() value. This conflates "scan complete with zero updates" and "schema mismatch occurred", potentially advancing state incorrectly on version conflicts.

Verify that this re-queue-without-error pattern is intentional or needs explicit error signaling to prevent premature state advancement.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
include/cc/template_cc_map.h (1)

7797-7808: Delta-size scan schema_version guard is good; tweak log wording and use class table_name_ for consistency

The version gate and error handling for ScanSliceDeltaSizeCcForRangePartition and ScanDeltaSizeCcForHashPartition are sound and parallel the data-sync paths. Two small polish points:

  • The log message says "Table schema version mismatched for data sync scan" on these delta-size paths, which can be confusing when debugging. Consider changing to "delta-size scan" to distinguish from the main data-sync scan.
  • You’re using req.table_name_.String() / req.table_name_->String(). Since TemplateCcMap already has table_name_, using table_name_.String() here avoids differences in how individual request types store the name.

For example:

-    LOG(WARNING) << "Table schema version mismatched for data sync "
-                    "scan, table name: "
-                 << req.table_name_.String()
+    LOG(WARNING) << "Table schema version mismatched for delta-size scan, table name: "
+                 << table_name_.String()
                  << " ,scan carried version: " << req.schema_version_
                  << " ,ccmap version: " << table_schema_->Version();

(Apply the same pattern to the hash-partition variant.)

Also applies to: 8155-8165

🧹 Nitpick comments (1)
include/cc/template_cc_map.h (1)

5110-5121: Schema-version mismatch gate for data-sync scans looks correct; consider strengthening the assertion message

The new mismatch handling for RangePartitionDataSyncScanCc and HashPartitionDataSyncScanCc—asserting the requested schema is ahead of the in-memory version, logging, setting REQUESTED_TABLE_SCHEMA_MISMATCH, and returning without continuing the scan—matches the intended behavior and aligns with the other SchemaVersion() checks in this class.

If you want easier debugging when the invariant is violated, you could embed a brief message in the assert, e.g.:

assert(req.schema_version_ > table_schema_->Version() &&
       "data-sync scan: requested schema_version must be > in-memory version");

This is optional; behavior-wise the change looks good.

Also applies to: 5667-5678

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee44cd and 81f0b90.

📒 Files selected for processing (3)
  • include/cc/cc_request.h (10 hunks)
  • include/cc/template_cc_map.h (4 hunks)
  • src/cc/local_cc_shards.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cc/local_cc_shards.cpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • include/cc/template_cc_map.h
  • include/cc/cc_request.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • include/cc/template_cc_map.h
  • include/cc/cc_request.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • include/cc/cc_request.h
🧬 Code graph analysis (2)
include/cc/template_cc_map.h (2)
include/error_messages.h (1)
  • CcErrorCode (237-436)
include/remote/remote_type.h (1)
  • CcErrorCode (474-479)
include/cc/cc_request.h (3)
include/tx_request.h (1)
  • schema_version_ (583-583)
include/remote/remote_cc_request.h (1)
  • schema_version_ (450-450)
include/cc/template_cc_map.h (1)
  • TemplateCcMap (164-168)
🔇 Additional comments (4)
include/cc/cc_request.h (4)

3936-3940: HashPartitionDataSyncScanCc: SchemaVersion() override added — LGTM

Accessor is correct and matches intended bypass semantics (0 means skip).


4318-4322: RangePartitionDataSyncScanCc: SchemaVersion() override added — LGTM

Consistent with other request types and exposes stored version.


8827-8835: Verify instantiation sites for new constructor parameters

The constructor signature at lines 8827-8835 has been updated to include txn and schema_version parameters. The corresponding accessor (SchemaVersion() at 8970-8974) and friend declaration (9000-9004) are in place. However, instantiation sites for ScanDeltaSizeCcForHashPartition cannot be located in the codebase to verify that all call sites supply these new parameters. Please manually confirm that every instantiation of this class provides both txn and schema_version arguments.


8581-8595: All constructor parameters verified—instantiation is correct

At src/cc/local_cc_shards.cpp:3494, the instantiation passes all 12 constructor parameters in the correct order, including the new is_dirty and schema_version parameters. The call site is properly aligned with the extended constructor signature.

The suggestion to make schema_version_ const (line 8807) is valid: it is initialized once in the constructor initializer list and accessed only via the read-only SchemaVersion() accessor. This is a safe and appropriate refactoring with no correctness implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants