Refactor standby for eloqstore cloud by lzxddz · Pull Request #405 · eloqdata/tx_service

lzxddz · 2026-02-10T07:58:31Z

Here are some reminders before you submit the pull request

Add tests for the change
Document changes
Reference the link of issue using fixes eloqdb/tx_service#issue_id
Reference the link of RFC if exists
Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

New Features
- Improved cloud-mode detection and support for cloud-based data reloads
- Enhanced snapshot synchronization that triggers immediate standby reloads
- Single-node cluster optimization for faster startup paths
Bug Fixes & Improvements
- Standby checkpoint updates now include datastore-write awareness for accurate syncs
- Better tracking of datastore writes during flushes and propagated to checkpoint broadcasts

coderabbitai · 2026-02-10T07:58:40Z

Walkthrough

The PR removes prewarm lifecycle and related APIs, adds snapshot and standby-checkpoint event handlers (OnSnapshotReceived, OnUpdateStandbyCkptTs, OnShutdown), refactors cloud-mode detection and data-store interfaces (IsCloudMode, ReloadDataFromCloud), adds single-node detection, and propagates a data-store-write flag through checkpoint broadcasts.

Changes

Cohort / File(s)	Summary
DataStoreServiceClient updates `store_handler/data_store_service_client.cpp`, `store_handler/data_store_service_client.h`	Removed prewarm-guarded logic from leader/start/follow flows; added `OnSnapshotReceived`, `OnUpdateStandbyCkptTs`, and `OnShutdown`. `IsSharedStorage()` now queries `data_store_service_->IsCloudMode()` when available.
DataStoreService refactor (prewarm removal + events) `store_handler/eloq_data_store_service/data_store_service.cpp`, `store_handler/eloq_data_store_service/data_store_service.h`	Removed prewarm lifecycle, timer, and helper methods. Added `IsCloudMode()`, `OnSnapshotReceived(...)`, and `OnUpdateStandbyCkptTs(...)`. Added single-node check in OpenDataStore.
DataStore interface & factories (cloud-mode contract) `store_handler/eloq_data_store_service/data_store.h`, `store_handler/eloq_data_store_service/data_store_factory.h`, `store_handler/eloq_data_store_service/eloq_store_data_store_factory.h`, `store_handler/eloq_data_store_service/rocksdb_data_store_factory.h`, `store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h`	Replaced standby-sync virtuals with `ReloadDataFromCloud(int64_t)` and changed `IsCloudStorageMode()` → pure-virtual `IsCloudMode()`. Implemented/updated `IsCloudMode()` in concrete factories.
EloqStore reload implementation `store_handler/eloq_data_store_service/eloq_store_data_store.cpp`, `store_handler/eloq_data_store_service/eloq_store_data_store.h`	Replaced prewarm standby sync with async `ReloadDataFromCloud(term)` using GlobalReopenRequest pool. Added `OnReLoaded(...)` callback and related op-pool.
RocksDB placeholder for reload `store_handler/eloq_data_store_service/rocksdb_data_store_common.h`	Added `ReloadDataFromCloud(int64_t)` override as a not-implemented/error-log placeholder for RocksDB common base.
Cluster topology single-node detection `store_handler/eloq_data_store_service/data_store_service_config.cpp`, `store_handler/eloq_data_store_service/data_store_service_config.h`	Added `IsSingleNode()` on Topology and DataStoreServiceClusterManager; minor deprecation note in GetShardCount().
Data-store write tracking (data-sync/flush) `tx_service/include/data_sync_task.h`, `tx_service/src/cc/local_cc_shards.cpp`	Added `has_data_store_write_` atomic, `MarkDataStoreWrite()` and updated `HasDataStoreWrite()` on DataSyncStatus. Flush path marks tasks when KV persisted.
Checkpoint broadcast & standby propagation `tx_service/include/standby.h`, `tx_service/src/standby.cpp`, `tx_service/src/checkpointer.cpp`, `tx_service/src/data_sync_task.cpp`	Extended `BrocastPrimaryCkptTs(...)` with `has_data_store_write` boolean and updated call sites to pass `status->HasDataStoreWrite()`. Standby RPC now includes this flag.
Snapshot / standby request changes & handler hook `tx_service/include/proto/cc_request.proto`, `tx_service/include/store/data_store_handler.h`, `tx_service/src/remote/cc_node_service.cpp`, `tx_service/src/store/snapshot_manager.cpp`	Added `ng_id` to `OnSnapshotSyncedRequest` and `has_data_store_write` to `UpdateStandbyCkptTsRequest`. Added `OnUpdateStandbyCkptTs(uint32_t,int64_t)` to `DataStoreHandler`. `UpdateStandbyCkptTs` notifies store handler when write flag is set.
Object timestamp semantics tweak `tx_service/include/cc/object_cc_map.h`	Certain branches now set `commit_ts_` to constant `1` and use `lock_ts_` = `shard_->Now()` instead of previously setting commit to Now().
Minor logging cleanup `tx_service/src/fault/cc_node.cpp`	Removed some debug logs in snapshot receive paths and added a DLOG_IF(FATAL) check for ng_id mismatch.

Sequence Diagram(s)

sequenceDiagram
    participant Primary as Primary Node
    participant Checkpointer
    participant DataSyncStatus
    participant StandbyService as Standby Service
    participant StandbyNode as Standby Node
    participant SnapshotMgr as Snapshot Manager
    participant DataStoreHandler
    participant DataStore

    Primary->>Checkpointer: flush completes (with persisted KV)
    Checkpointer->>DataSyncStatus: MarkDataStoreWrite()
    Checkpointer->>StandbyService: BrocastPrimaryCkptTs(..., has_data_store_write=true)

    StandbyService->>StandbyNode: send UpdateStandbyCkptTs(has_data_store_write)
    StandbyNode->>DataStoreHandler: OnUpdateStandbyCkptTs(ng_id, term)
    DataStoreHandler->>DataStore: ReloadDataFromCloud(term)
    DataStore-->>DataStoreHandler: reload result

    SnapshotMgr->>StandbyService: snapshot ready
    StandbyService->>Primary: OnSnapshotSyncedRequest(ng_id, snapshot_path)
    Primary->>DataStoreHandler: OnSnapshotReceived(ng_id, bucket_ids, snapshot_path)
    DataStoreHandler->>DataStore: open/reload snapshot

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: update submodule eloqstore, including fix for shutdown and partial download #421 — touches eloqstore prewarm/prewarmer shutdown and lifecycle; strong overlap with removed prewarm flows.
Fix for backup operation #394 — modifies DataStoreServiceClient and snapshot/backup handling; related to the added handlers and API surface.
Merge unfull flush tasks #241 — modifies DataSyncStatus / data_sync_task tracking; related to addition of has_data_store_write tracking.

Suggested reviewers

thweetkomputer

Poem

🐰 I hopped through code at break of day,
Closed the prewarm, cleared the way,
Snapshots call and cloud reloads sing,
Checkpoints carry the write-flag wing,
Happy bytes now hop and play! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description contains only an unchecked template with no actual details, implementation notes, or issue/RFC references filled in. All required sections are incomplete.	Provide a detailed description of changes, rationale, and any linked issues. At minimum, fill in which items from the checklist are applicable and reference relevant issues/RFCs.
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Refactor standby for eloqstore cloud' directly describes the main changes: removing prewarm logic and introducing cloud snapshot/checkpoint handlers for standby replica support.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch redesign-eloqstore-standby

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tx_service/include/store/data_store_handler.h

Implement DataStoreService::OnSnapshotReceived and add func DataStoreHandler::StandbyReloadData Implement api to call datastore reload data from cloud update DataStoreServiceClient::IsSharedStorage() Complete the EloqStoreDataStore::ReloadDataFromCloud For single node, always start eloqstore with term 0 Add arg in DataSyncStatus to indentify whether has data flushed to datastore in this period of checkpoint Refactor OnSnapshotSyncedRequest: add ng_id.

thweetkomputer · 2026-03-03T09:51:29Z

store_handler/eloq_data_store_service/build_eloq_store.cmake

    ${ELOQ_STORE_SOURCE_DIR}/src/tasks/task.cpp
    ${ELOQ_STORE_SOURCE_DIR}/src/tasks/task_manager.cpp
    ${ELOQ_STORE_SOURCE_DIR}/src/tasks/write_task.cpp
+    ${ELOQ_STORE_SOURCE_DIR}/src/tasks/reopen_task.cpp


already have, remove

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

store_handler/data_store_service_client.cpp (1)

3835-3859: ⚠️ Potential issue | 🔴 Critical

Guard snapshot apply with standby-term validation before reload.

OnSnapshotReceived applies/reloads snapshot data without checking whether req->standby_node_term() is still current. A delayed/stale callback can reload old data into the shard. The RocksDB path already guards this (store_handler/rocksdb_handler.cpp, Line 3528 onward).

🔧 Suggested fix

 bool DataStoreServiceClient::OnSnapshotReceived(
     const txservice::remote::OnSnapshotSyncedRequest *req)
 {
-    // TODO(lzx): implement this for eloqstore data store.
-    if (data_store_service_ != nullptr)
-    {
-        uint32_t ng_id = req->ng_id();
+    if (req == nullptr)
+    {
+        LOG(ERROR) << "OnSnapshotReceived got null request";
+        return false;
+    }
+
+    if (txservice::Sharder::Instance().CandidateStandbyNodeTerm() !=
+        req->standby_node_term())
+    {
+        LOG(WARNING) << "Ignore stale snapshot callback. standby_term="
+                     << req->standby_node_term();
+        return false;
+    }
+
+    if (data_store_service_ != nullptr)
+    {
+        uint32_t ng_id = req->ng_id();
         std::unordered_set<uint16_t> bucket_ids;
         for (auto &[bucket_id, bucket_info] : bucket_infos_)
         {
             if (bucket_info->BucketOwner() == ng_id)
             {
                 bucket_ids.insert(bucket_id);
             }
         }
         int64_t term =
             txservice::PrimaryTermFromStandbyTerm(req->standby_node_term());
         data_store_service_->OnSnapshotReceived(
             ng_id, term, std::move(bucket_ids), req->snapshot_path());
 
         return true;
     }
-
-    return true;
+    return false;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/data_store_service_client.cpp` around lines 3835 - 3859, In
DataStoreServiceClient::OnSnapshotReceived, guard the snapshot apply by
validating req->standby_node_term() is still current before calling
data_store_service_->OnSnapshotReceived: compute term using
txservice::PrimaryTermFromStandbyTerm(req->standby_node_term()) (as already
done), obtain the current standby/primary term from the local state (the same
source used by RocksDB handler), compare them and only proceed to build
bucket_ids and call data_store_service_->OnSnapshotReceived(ng_id, term,
std::move(bucket_ids), req->snapshot_path()) if the terms match; otherwise skip
applying the snapshot to avoid reloading stale data.

🧹 Nitpick comments (2)

store_handler/eloq_data_store_service/eloq_store_data_store.h (1)

99-105: Consider adding null-check to ScanDeleteOperationData::Clear() for consistency.

EloqStoreOperationData::Clear() now guards against null data_store_request_ptr_, but ScanDeleteOperationData::Clear() (lines 99-105) does not. If ScanDeleteOperationData::Clear() can be called when data_store_request_ptr_ is null, this could cause a null dereference.

♻️ Suggested fix

     void Clear() override
     {
+        if (data_store_request_ptr_)
+        {
-        data_store_request_ptr_->Clear();
-        data_store_request_ptr_->Free();
-        data_store_request_ptr_ = nullptr;
+            data_store_request_ptr_->Clear();
+            data_store_request_ptr_->Free();
+            data_store_request_ptr_ = nullptr;
+        }
         entries_.clear();
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.h` around lines
99 - 105, ScanDeleteOperationData::Clear currently calls
data_store_request_ptr_->Clear() and ->Free() without checking for null; add the
same null-guard used in EloqStoreOperationData::Clear by testing if
(data_store_request_ptr_) before invoking Clear() and Free(), then set
data_store_request_ptr_ = nullptr and call entries_.clear(); ensure you only
call Clear/Free when data_store_request_ptr_ is non-null to avoid a potential
null dereference.

store_handler/eloq_data_store_service/eloq_store_data_store.cpp (1)

882-883: Remove stale TODO in an already-implemented path.

The TODO now conflicts with the actual implementation and can mislead future maintenance.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp` around lines
882 - 883, Remove the stale TODO comment in the
EloqStoreDataStore::ReloadDataFromCloud implementation: delete the line "//
TODO(lzx): implement this for eloqstore data store." that sits immediately
before the LOG(INFO) << "EloqStoreDataStore::ReloadDataFromCloud, term: " <<
term; so the code no longer claims the path is unimplemented while it logs the
reload action.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@store_handler/data_store_service_client.cpp`:
- Around line 3861-3869: The method
DataStoreServiceClient::OnUpdateStandbyCkptTs currently always returns true even
when data_store_service_ is null, hiding failures; change it to return false
when data_store_service_ == nullptr and only return success when the update was
actually invoked: check data_store_service_ and if null return false, otherwise
call data_store_service_->OnUpdateStandbyCkptTs(ng_id, ng_term) and return its
boolean result if that method returns a bool, or return true only after
successfully invoking it; update callers/tests if they rely on the previous
always-true behavior.

In `@store_handler/eloq_data_store_service/build_eloq_store.cmake`:
- Line 127: ELOQ_STORE_SOURCES contains a duplicated entry for
src/tasks/reopen_task.cpp; open build_eloq_store.cmake, find the
ELOQ_STORE_SOURCES list and remove the redundant occurrence of reopen_task.cpp
(the duplicate at the later line), leaving only the single original entry so the
list is deduplicated and the build generator/toolchain won’t hit edge-case
issues.

In `@store_handler/eloq_data_store_service/data_store_service_config.cpp`:
- Around line 210-215: The public Topology::GetShardCount() (and its caller
DataStoreServiceClusterManager::GetShardCount()) should not assert(false);
instead mark the methods deprecated and leave them safe to call: remove the
assert(false) from Topology::GetShardCount(), add a [[deprecated("...")]]
attribute to the declarations/definitions of Topology::GetShardCount and
DataStoreServiceClusterManager::GetShardCount, and have Topology::GetShardCount
return a valid value (e.g., return GetAllShards().size() or shard_count_) so
future callers won’t crash while the API is being phased out.

In `@store_handler/eloq_data_store_service/data_store_service.cpp`:
- Around line 2259-2269: The OnSnapshotReceived function currently aborts the
process for local data store mode via LOG(FATAL); instead change this branch to
fail gracefully by returning an error status or logging an error without
crashing: in the OnSnapshotReceived implementation (referencing
OnSnapshotReceived, data_store_factory_, data_shards_) replace the LOG(FATAL)
path with a non-fatal handling that logs the missing implementation (use
LOG(ERROR) or VLOG) and returns a controlled failure (e.g., status code, bool,
or exception consistent with surrounding API) so callers can handle the error
until local snapshot loading is implemented; ensure any resources are left
consistent and add a TODO comment pointing to the future local-mode snapshot
replacement logic.
- Around line 2280-2286: The call sites invoking
ds_ref.data_store_->ReloadDataFromCloud(term) (e.g., in OnSnapshotReceived) must
be guarded: first null-check ds_ref.data_store_ before dereferencing, and only
call ReloadDataFromCloud when the shard’s state is stable (check ds_ref.state
equals ReadOnly or ReadWrite) to avoid triggering reloads while the shard is
transitioning (e.g., Starting); apply the same null- and state-guard to the
other reload site noted (around the other ReloadDataFromCloud call).

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 924-926: In OnReLoaded replace the incorrect address-of logging
(&global_reopen_req) with the actual request pointer or its contents: use
DLOG(INFO) << "... reopen request: " << global_reopen_req (or, if you want the
object's fields, check for null and log *global_reopen_req or a suitable
toString()/DebugString() method). Update the DLOG call in
EloqStoreDataStore::OnReLoaded to avoid taking the address of the local pointer
variable and instead emit the pointer value or serialized request for correct
diagnostics.

In `@store_handler/eloq_data_store_service/rocksdb_data_store_common.h`:
- Around line 286-290: Change the default log-only implementation of
RocksDBDataStoreCommon::ReloadDataFromCloud(int64_t term) to a pure virtual
declaration so derived classes must implement it; replace the existing body that
logs the not-implemented message with a pure-virtual signature (e.g., virtual
void ReloadDataFromCloud(int64_t term) = 0;) and then update any derived classes
missing an override to implement the method (or explicitly provide a no-op) to
satisfy the new abstract requirement.

---

Outside diff comments:
In `@store_handler/data_store_service_client.cpp`:
- Around line 3835-3859: In DataStoreServiceClient::OnSnapshotReceived, guard
the snapshot apply by validating req->standby_node_term() is still current
before calling data_store_service_->OnSnapshotReceived: compute term using
txservice::PrimaryTermFromStandbyTerm(req->standby_node_term()) (as already
done), obtain the current standby/primary term from the local state (the same
source used by RocksDB handler), compare them and only proceed to build
bucket_ids and call data_store_service_->OnSnapshotReceived(ng_id, term,
std::move(bucket_ids), req->snapshot_path()) if the terms match; otherwise skip
applying the snapshot to avoid reloading stale data.

---

Nitpick comments:
In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 882-883: Remove the stale TODO comment in the
EloqStoreDataStore::ReloadDataFromCloud implementation: delete the line "//
TODO(lzx): implement this for eloqstore data store." that sits immediately
before the LOG(INFO) << "EloqStoreDataStore::ReloadDataFromCloud, term: " <<
term; so the code no longer claims the path is unimplemented while it logs the
reload action.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.h`:
- Around line 99-105: ScanDeleteOperationData::Clear currently calls
data_store_request_ptr_->Clear() and ->Free() without checking for null; add the
same null-guard used in EloqStoreOperationData::Clear by testing if
(data_store_request_ptr_) before invoking Clear() and Free(), then set
data_store_request_ptr_ = nullptr and call entries_.clear(); ensure you only
call Clear/Free when data_store_request_ptr_ is non-null to avoid a potential
null dereference.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 225ad67 and c955b90.

📒 Files selected for processing (27)

store_handler/data_store_service_client.cpp
store_handler/data_store_service_client.h
store_handler/eloq_data_store_service/build_eloq_store.cmake
store_handler/eloq_data_store_service/data_store.h
store_handler/eloq_data_store_service/data_store_factory.h
store_handler/eloq_data_store_service/data_store_service.cpp
store_handler/eloq_data_store_service/data_store_service.h
store_handler/eloq_data_store_service/data_store_service_config.cpp
store_handler/eloq_data_store_service/data_store_service_config.h
store_handler/eloq_data_store_service/eloq_store_data_store.cpp
store_handler/eloq_data_store_service/eloq_store_data_store.h
store_handler/eloq_data_store_service/eloq_store_data_store_factory.h
store_handler/eloq_data_store_service/rocksdb_cloud_data_store_factory.h
store_handler/eloq_data_store_service/rocksdb_data_store_common.h
store_handler/eloq_data_store_service/rocksdb_data_store_factory.h
tx_service/include/cc/object_cc_map.h
tx_service/include/data_sync_task.h
tx_service/include/proto/cc_request.proto
tx_service/include/standby.h
tx_service/include/store/data_store_handler.h
tx_service/src/cc/local_cc_shards.cpp
tx_service/src/checkpointer.cpp
tx_service/src/data_sync_task.cpp
tx_service/src/fault/cc_node.cpp
tx_service/src/remote/cc_node_service.cpp
tx_service/src/standby.cpp
tx_service/src/store/snapshot_manager.cpp

coderabbitai · 2026-03-03T09:55:55Z

store_handler/data_store_service_client.cpp

+bool DataStoreServiceClient::OnUpdateStandbyCkptTs(uint32_t ng_id,
+                                                   int64_t ng_term)
 {
+    if (data_store_service_ != nullptr)
+    {
+        data_store_service_->OnUpdateStandbyCkptTs(ng_id, ng_term);
+    }
+    return true;
 }


⚠️ Potential issue | 🟠 Major

Don’t report success when standby checkpoint update is skipped.

OnUpdateStandbyCkptTs always returns true, even when data_store_service_ == nullptr and no update is sent. This hides failures from callers and weakens retry/error handling.

🔧 Suggested fix

bool DataStoreServiceClient::OnUpdateStandbyCkptTs(uint32_t ng_id, int64_t ng_term) { - if (data_store_service_ != nullptr) - { - data_store_service_->OnUpdateStandbyCkptTs(ng_id, ng_term); - } - return true; + if (data_store_service_ == nullptr) + { + LOG(WARNING) << "OnUpdateStandbyCkptTs skipped: data_store_service_ is null"; + return false; + } + data_store_service_->OnUpdateStandbyCkptTs(ng_id, ng_term); + return true; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@store_handler/data_store_service_client.cpp` around lines 3861 - 3869, The method DataStoreServiceClient::OnUpdateStandbyCkptTs currently always returns true even when data_store_service_ is null, hiding failures; change it to return false when data_store_service_ == nullptr and only return success when the update was actually invoked: check data_store_service_ and if null return false, otherwise call data_store_service_->OnUpdateStandbyCkptTs(ng_id, ng_term) and return its boolean result if that method returns a bool, or return true only after successfully invoking it; update callers/tests if they rely on the previous always-true behavior.

store_handler/eloq_data_store_service/build_eloq_store.cmake

coderabbitai · 2026-03-03T09:55:56Z

store_handler/eloq_data_store_service/data_store_service_config.cpp

 int Topology::GetShardCount() const
 {
+    // deprecated, use GetAllShards().size() instead
+    assert(false);
    return shard_count_;
 }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd store_handler/eloq_data_store_service && wc -l data_store_service_config.cpp

Repository: eloqdata/tx_service

Length of output: 97

🏁 Script executed:

cd store_handler/eloq_data_store_service && sed -n '210,230p' data_store_service_config.cpp

Repository: eloqdata/tx_service

Length of output: 563

🏁 Script executed:

cd store_handler/eloq_data_store_service && sed -n '520,530p' data_store_service_config.cpp

Repository: eloqdata/tx_service

Length of output: 333

🏁 Script executed:

rg -n "GetShardCount\s*\(" --type cpp store_handler/eloq_data_store_service/

Repository: eloqdata/tx_service

Length of output: 612

🏁 Script executed:

rg -rn "GetShardCount\s*\(" --type cpp

Repository: eloqdata/tx_service

Length of output: 746

🏁 Script executed:

rg -n "GetShardCount" --type cpp --type h -B 2 -A 2 | head -100

Repository: eloqdata/tx_service

Length of output: 3403

🏁 Script executed:

rg -rn "\.GetShardCount\(\)|->GetShardCount\(\)" --type cpp --type h

Repository: eloqdata/tx_service

Length of output: 237

🏁 Script executed:

cd tx_service/src/remote && sed -n '1375,1385p' remote_cc_request.cpp

Repository: eloqdata/tx_service

Length of output: 511

🏁 Script executed:

rg -n "cluster_manager.*GetShardCount\(\)|manager.*GetShardCount\(\)|dss_manager.*GetShardCount\(\)" --type cpp --type h

Repository: eloqdata/tx_service

Length of output: 45

🏁 Script executed:

grep -r "GetShardCount" --include="*.cpp" --include="*.h" tx_service/ | grep -v "test" | grep -v "^Binary"

Repository: eloqdata/tx_service

Length of output: 320

🏁 Script executed:

cd tx_service/src/remote && sed -n '1370,1390p' remote_cc_request.cpp | head -30

Repository: eloqdata/tx_service

Length of output: 897

🏁 Script executed:

cd tx_service/include/cc && grep -n "class.*CCRequest" cc_request.h | head -5

Repository: eloqdata/tx_service

Length of output: 45

🏁 Script executed:

cd tx_service/include/cc && sed -n '2740,2755p' cc_request.h

Repository: eloqdata/tx_service

Length of output: 401

🏁 Script executed:

rg -rn "DataStoreServiceClusterManager" --type cpp --type h | grep -i "getshardcount"

Repository: eloqdata/tx_service

Length of output: 159

🏁 Script executed:

rg -rn "manager\s*->\s*GetShardCount\(\)|manager\s*\.\s*GetShardCount\(\)" --type cpp --type h

Repository: eloqdata/tx_service

Length of output: 45

🏁 Script executed:

cd store_handler && grep -rn "GetShardCount" --include="*.cpp" --include="*.h"

Repository: eloqdata/tx_service

Length of output: 542

🏁 Script executed:

rg -rn "DataStoreServiceClusterManager" --type cpp --type h | head -20

Repository: eloqdata/tx_service

Length of output: 1821

🏁 Script executed:

rg -n "DataStoreServiceClusterManager" store_handler/eloq_data_store_service/data_store_service_config.h | head -10

Repository: eloqdata/tx_service

Length of output: 349

🏁 Script executed:

cd store_handler && rg -n "\.GetShardCount\(\)" --type cpp --type h -B 3

Repository: eloqdata/tx_service

Length of output: 444

🏁 Script executed:

cd store_handler && rg -n "GetShardCount" --type h data_store_service_client.h

Repository: eloqdata/tx_service

Length of output: 45

🏁 Script executed:

cd store_handler && grep -rn "\.GetShardCount()" --include="*.cpp" --include="*.h" .

Repository: eloqdata/tx_service

Length of output: 160

Deprecation via assert(false) is problematic even though currently unused.

The assert(false) will crash any future callers in debug builds. While no external callers of DataStoreServiceClusterManager::GetShardCount() exist in the codebase today, the method is public (declared in the header at line 399) and the deprecation approach invites runtime crashes rather than compile-time warnings.

The pattern of using assert(false) for a deprecated public method is poor: it creates a time bomb for any future code that calls it. Consider either:

Removing both methods (DataStoreServiceClusterManager::GetShardCount() and Topology::GetShardCount()) if truly deprecated

Using compiler deprecation attributes (e.g., [[deprecated]]) instead of assert

Keeping only the deprecation comment without the assert if removal is premature

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@store_handler/eloq_data_store_service/data_store_service_config.cpp` around lines 210 - 215, The public Topology::GetShardCount() (and its caller DataStoreServiceClusterManager::GetShardCount()) should not assert(false); instead mark the methods deprecated and leave them safe to call: remove the assert(false) from Topology::GetShardCount(), add a [[deprecated("...")]] attribute to the declarations/definitions of Topology::GetShardCount and DataStoreServiceClusterManager::GetShardCount, and have Topology::GetShardCount return a valid value (e.g., return GetAllShards().size() or shard_count_) so future callers won’t crash while the API is being phased out.

coderabbitai · 2026-03-03T09:55:56Z

store_handler/eloq_data_store_service/data_store_service.cpp

+    const std::string &snapshot_path)
+{
+#ifdef DATA_STORE_TYPE_ELOQDSS_ELOQSTORE
+    auto &ds_ref = data_shards_.at(shard_id);
+
+    if (!data_store_factory_->IsCloudMode())
+    {
+        // TODO(lzx): Handle the case of local data store mode. (replace the
+        // data with the snapshot)
+        LOG(FATAL)
+            << "OnSnapshotReceived not implemented for local data store mode";


⚠️ Potential issue | 🟠 Major

Avoid aborting the process for local snapshot handling.

LOG(FATAL) here turns an unimplemented branch into a service crash path. This should fail gracefully and preserve process availability.

💡 Safer fallback until local-mode snapshot load is implemented

- LOG(FATAL) - << "OnSnapshotReceived not implemented for local data store mode"; + LOG(ERROR) + << "OnSnapshotReceived local mode is not implemented yet, shard " + << shard_id << ", snapshot_path: " << snapshot_path; + return;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@store_handler/eloq_data_store_service/data_store_service.cpp` around lines 2259 - 2269, The OnSnapshotReceived function currently aborts the process for local data store mode via LOG(FATAL); instead change this branch to fail gracefully by returning an error status or logging an error without crashing: in the OnSnapshotReceived implementation (referencing OnSnapshotReceived, data_store_factory_, data_shards_) replace the LOG(FATAL) path with a non-fatal handling that logs the missing implementation (use LOG(ERROR) or VLOG) and returns a controlled failure (e.g., status code, bool, or exception consistent with surrounding API) so callers can handle the error until local snapshot loading is implemented; ensure any resources are left consistent and add a TODO comment pointing to the future local-mode snapshot replacement logic.

coderabbitai · 2026-03-03T09:55:56Z

store_handler/eloq_data_store_service/data_store_service.cpp

+        else
+        {
+            LOG(INFO)
+                << "OnSnapshotReceived, reload data from cloud for DSS shard "
+                << shard_id << " and term " << term;
+            ds_ref.data_store_->ReloadDataFromCloud(term);
+            return;


⚠️ Potential issue | 🟠 Major

Guard cloud reload calls to stable shard states.

Both call sites can invoke ReloadDataFromCloud() while the shard is transitioning (e.g., Starting). Gate reloads to ReadOnly/ReadWrite and null-check data_store_ before dereference.

🔒 Suggested state/pointer guard

- else + else if (ds_ref.shard_status_.load(std::memory_order_acquire) == + DSShardStatus::ReadOnly || + ds_ref.shard_status_.load(std::memory_order_acquire) == + DSShardStatus::ReadWrite) { LOG(INFO) << "OnSnapshotReceived, reload data from cloud for DSS shard " << shard_id << " and term " << term; + if (ds_ref.data_store_ == nullptr) + { + LOG(ERROR) << "OnSnapshotReceived skipped: data_store_ is null for shard " + << shard_id; + return; + } ds_ref.data_store_->ReloadDataFromCloud(term); return; } + else + { + LOG(INFO) << "OnSnapshotReceived skipped while shard is transitioning, shard " + << shard_id; + }

- if (ds_ref.data_store_ != nullptr && - ds_ref.shard_status_.load() != DSShardStatus::Closed) + auto shard_status = ds_ref.shard_status_.load(std::memory_order_acquire); + if ((shard_status == DSShardStatus::ReadOnly || + shard_status == DSShardStatus::ReadWrite) && + ds_ref.data_store_ != nullptr)

Also applies to: 2298-2305

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@store_handler/eloq_data_store_service/data_store_service.cpp` around lines 2280 - 2286, The call sites invoking ds_ref.data_store_->ReloadDataFromCloud(term) (e.g., in OnSnapshotReceived) must be guarded: first null-check ds_ref.data_store_ before dereferencing, and only call ReloadDataFromCloud when the shard’s state is stable (check ds_ref.state equals ReadOnly or ReadWrite) to avoid triggering reloads while the shard is transitioning (e.g., Starting); apply the same null- and state-guard to the other reload site noted (around the other ReloadDataFromCloud call).

store_handler/eloq_data_store_service/eloq_store_data_store.cpp

store_handler/eloq_data_store_service/rocksdb_data_store_common.h

thweetkomputer · 2026-03-03T10:01:20Z

store_handler/eloq_data_store_service/data_store_service_config.cpp

 int Topology::GetShardCount() const
 {
+    // deprecated, use GetAllShards().size() instead
+    assert(false);


if deprecated, why not just delete

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

store_handler/data_store_service_client.cpp (1)

3835-3859: ⚠️ Potential issue | 🟠 Major

Don’t return success when snapshot sync is skipped (Lines 3838-3859).

OnSnapshotReceived reports true even when data_store_service_ is null and no snapshot is applied, which masks failure and weakens retry/error handling.

🔧 Suggested fix

 bool DataStoreServiceClient::OnSnapshotReceived(
     const txservice::remote::OnSnapshotSyncedRequest *req)
 {
 `#ifdef` DATA_STORE_TYPE_ELOQDSS_ELOQSTORE
-    if (data_store_service_ != nullptr)
+    if (req == nullptr || data_store_service_ == nullptr)
+    {
+        LOG(WARNING) << "OnSnapshotReceived skipped: "
+                     << (req == nullptr ? "request is null"
+                                        : "data_store_service_ is null");
+        return false;
+    }
+    if (data_store_service_ != nullptr)
     {
         uint32_t ng_id = req->ng_id();
         std::unordered_set<uint16_t> bucket_ids;
@@
         data_store_service_->OnSnapshotReceived(
             ng_id, term, std::move(bucket_ids), req->snapshot_path());

         return true;
     }
 `#endif`
-    return true;
+    return false;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/data_store_service_client.cpp` around lines 3835 - 3859, The
method DataStoreServiceClient::OnSnapshotReceived currently always returns true
even when data_store_service_ is null and no snapshot is applied; change the
control flow so that if data_store_service_ is nullptr the function returns
false (indicating the snapshot was not handled) instead of true, and only return
true after successfully calling data_store_service_->OnSnapshotReceived with the
computed ng_id, term (from
txservice::PrimaryTermFromStandbyTerm(req->standby_node_term())), and the moved
bucket_ids set built from bucket_infos_; ensure you still return true on success
and false when skipping/applying fails.

🧹 Nitpick comments (1)

store_handler/data_store_service_client.cpp (1)
3831-3833: Define explicit shutdown behavior for OnShutdown() (Lines 3831-3833).

A no-op shutdown hook is easy to forget and can hide resource/lifecycle gaps. Either perform minimal cleanup here (for example, close bound data stores) or add a clear invariant comment explaining why no cleanup is required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@store_handler/data_store_service_client.cpp` around lines 3831 - 3833, The
OnShutdown() method in class DataStoreServiceClient is currently a no-op; either
implement explicit minimal cleanup (e.g., close or flush bound data stores,
release network/IPC handles, cancel timers, and set a shutdown flag) inside
DataStoreServiceClient::OnShutdown(), or add a clear invariant comment inside
the method explaining why no cleanup is required (e.g., lifetime managed
elsewhere, all resources are RAII and already closed, and this method is never
called after resource release). Reference the DataStoreServiceClient class and
OnShutdown() to locate where to add the logic or the explanatory comment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@store_handler/data_store_service_client.cpp`:
- Around line 3822-3823: The call to Connect() during the follower transition is
unchecked, so failures leave follower initialization partially applied; update
the follower transition code that calls Connect() to capture its return/result,
handle errors by logging via the existing logger (include the error) and
propagate failure (return an error status or throw) instead of continuing, and
ensure any partial state changes are rolled back or marked as not-initialized
when Connect() fails; reference the Connect() call and the follower
initialization/transition path to locate and fix the flow.

---

Outside diff comments:
In `@store_handler/data_store_service_client.cpp`:
- Around line 3835-3859: The method DataStoreServiceClient::OnSnapshotReceived
currently always returns true even when data_store_service_ is null and no
snapshot is applied; change the control flow so that if data_store_service_ is
nullptr the function returns false (indicating the snapshot was not handled)
instead of true, and only return true after successfully calling
data_store_service_->OnSnapshotReceived with the computed ng_id, term (from
txservice::PrimaryTermFromStandbyTerm(req->standby_node_term())), and the moved
bucket_ids set built from bucket_infos_; ensure you still return true on success
and false when skipping/applying fails.

---

Nitpick comments:
In `@store_handler/data_store_service_client.cpp`:
- Around line 3831-3833: The OnShutdown() method in class DataStoreServiceClient
is currently a no-op; either implement explicit minimal cleanup (e.g., close or
flush bound data stores, release network/IPC handles, cancel timers, and set a
shutdown flag) inside DataStoreServiceClient::OnShutdown(), or add a clear
invariant comment inside the method explaining why no cleanup is required (e.g.,
lifetime managed elsewhere, all resources are RAII and already closed, and this
method is never called after resource release). Reference the
DataStoreServiceClient class and OnShutdown() to locate where to add the logic
or the explanatory comment.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c955b90 and ff8278a.

📒 Files selected for processing (2)

store_handler/data_store_service_client.cpp
tx_service/src/fault/cc_node.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

tx_service/src/fault/cc_node.cpp

coderabbitai · 2026-03-03T10:29:56Z

store_handler/data_store_service_client.cpp

+    Connect();
+}


⚠️ Potential issue | 🟠 Major

Handle Connect() failure in follower transition (Line 3822).

Connect() can fail, but this path ignores the result, leaving follower initialization partially applied without a failure signal/log.

🔧 Suggested fix

- Connect(); + if (!Connect()) + { + LOG(ERROR) << "OnStartFollowing failed to connect data store, ng_id=" + << ng_id << ", term=" << term; + return; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Connect();

}

if (!Connect())

{

LOG(ERROR) << "OnStartFollowing failed to connect data store, ng_id="

<< ng_id << ", term=" << term;

return;

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@store_handler/data_store_service_client.cpp` around lines 3822 - 3823, The call to Connect() during the follower transition is unchecked, so failures leave follower initialization partially applied; update the follower transition code that calls Connect() to capture its return/result, handle errors by logging via the existing logger (include the error) and propagate failure (return an error status or throw) instead of continuing, and ensure any partial state changes are rolled back or marked as not-initialized when Connect() fails; reference the Connect() call and the follower initialization/transition path to locate and fix the flow.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

store_handler/eloq_data_store_service/eloq_store_data_store.cpp (1)

924-926: ⚠️ Potential issue | 🟡 Minor

Fix pointer logging in OnReLoaded.

Line [926] logs &global_reopen_req, which is the address of the local pointer variable, not the request pointer value.

Suggested fix

     DLOG(INFO)
         << "===============EloqStoreDataStore::OnReLoaded, reopen request: "
-        << &global_reopen_req << " from EloqStore";
+        << global_reopen_req << " from EloqStore";

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp` around lines
924 - 926, The log in EloqStoreDataStore::OnReLoaded is printing the address of
the local pointer variable (&global_reopen_req) instead of the actual request
pointer; change the DLOG call to log global_reopen_req (or its contents) rather
than &global_reopen_req so the logged value is the request pointer (or a
meaningful representation), e.g., use global_reopen_req or format its fields
when calling DLOG in OnReLoaded.

store_handler/eloq_data_store_service/data_store_service.cpp (2)

2264-2269: ⚠️ Potential issue | 🟠 Major

Do not abort the process for unimplemented local snapshot handling.

Line [2268] uses LOG(FATAL), which crashes the service for a currently unimplemented branch. Return gracefully instead.

Suggested fix

-        LOG(FATAL)
-            << "OnSnapshotReceived not implemented for local data store mode";
+        LOG(ERROR)
+            << "OnSnapshotReceived local mode is not implemented yet, shard "
+            << shard_id << ", snapshot_path: " << snapshot_path;
+        return;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/eloq_data_store_service/data_store_service.cpp` around lines
2264 - 2269, The OnSnapshotReceived path currently aborts the process when
data_store_factory_->IsCloudMode() is false; replace the LOG(FATAL) crash with
graceful handling: remove the LOG(FATAL) call in the non-cloud branch, log a
warning or info (e.g., via LOG(WARNING)) that local snapshot handling is
unimplemented, preserve the TODO comment, and return early from
OnSnapshotReceived so the service does not terminate; ensure the change is made
in the branch guarded by data_store_factory_->IsCloudMode().

2283-2294: ⚠️ Potential issue | 🟠 Major

Guard cloud reload calls to stable shard states and bound the wait loop.

ReloadDataFromCloud() is invoked without a strict stable-state guard, and the Starting wait loop is unbounded. This can block indefinitely and trigger reload during transitions.

Suggested fix

-            while (ds_ref.shard_status_.load(std::memory_order_acquire) ==
-                   DSShardStatus::Starting)
+            constexpr int kMaxWaitMs = 5000;
+            int waited_ms = 0;
+            while (ds_ref.shard_status_.load(std::memory_order_acquire) ==
+                       DSShardStatus::Starting &&
+                   waited_ms < kMaxWaitMs)
             {
                 bthread_usleep(1000);
-                LOG(INFO) << "OnSnapshotReceived, data store is starting, "
-                             "waiting for data store to be ready";
+                waited_ms += 1;
             }
 
-            LOG(INFO)
-                << "OnSnapshotReceived, reload data from cloud for DSS shard "
-                << shard_id << " and term " << term;
-            ds_ref.data_store_->ReloadDataFromCloud(term);
+            auto status = ds_ref.shard_status_.load(std::memory_order_acquire);
+            if ((status == DSShardStatus::ReadOnly ||
+                 status == DSShardStatus::ReadWrite) &&
+                ds_ref.data_store_ != nullptr)
+            {
+                LOG(INFO) << "OnSnapshotReceived, reload data from cloud for DSS shard "
+                          << shard_id << " and term " << term;
+                ds_ref.data_store_->ReloadDataFromCloud(term);
+            }
+            else
+            {
+                LOG(WARNING) << "OnSnapshotReceived skipped reload for shard "
+                             << shard_id << ", status: "
+                             << static_cast<int>(status);
+            }
             return;

-    if (ds_ref.data_store_ != nullptr &&
-        ds_ref.shard_status_.load() != DSShardStatus::Closed)
+    auto status = ds_ref.shard_status_.load(std::memory_order_acquire);
+    if ((status == DSShardStatus::ReadOnly ||
+         status == DSShardStatus::ReadWrite) &&
+        ds_ref.data_store_ != nullptr)
     {
         LOG(INFO)
             << "StandbySyncAndReloadData reload data from cloud for DSS shard "
             << shard_id;
         ds_ref.data_store_->ReloadDataFromCloud(term);

Also applies to: 2307-2314

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@store_handler/eloq_data_store_service/data_store_service.cpp` around lines
2283 - 2294, The current OnSnapshotReceived path calls
ds_ref.data_store_->ReloadDataFromCloud(term) without ensuring the shard is in a
stable running/ready state and uses an unbounded wait loop on
ds_ref.shard_status_ == DSShardStatus::Starting; change this by replacing the
infinite wait with a bounded wait (e.g., max iterations or timeout using
bthread_usleep) and only call ReloadDataFromCloud when the shard_status_ is a
stable state such as DSShardStatus::Running or DSShardStatus::Ready (check your
enum for the exact stable value), and if the timeout expires log and abort the
reload; apply the same guarded, timed-wait pattern to the other occurrence
around lines 2307-2314 that also uses ds_ref.shard_status_ and
ReloadDataFromCloud.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 886-889: EloqStoreDataStore::ReloadDataFromCloud currently uses
LOG(FATAL) on a term mismatch which aborts the process; change this to a
non-fatal failure path: replace the LOG(FATAL) usage with a safer log level
(e.g., LOG(ERROR) or LOG(WARNING)) that includes the same context (expected term
and eloq_store_service_->Term()) and then return from ReloadDataFromCloud so the
reload fails gracefully without crashing the process.

---

Duplicate comments:
In `@store_handler/eloq_data_store_service/data_store_service.cpp`:
- Around line 2264-2269: The OnSnapshotReceived path currently aborts the
process when data_store_factory_->IsCloudMode() is false; replace the LOG(FATAL)
crash with graceful handling: remove the LOG(FATAL) call in the non-cloud
branch, log a warning or info (e.g., via LOG(WARNING)) that local snapshot
handling is unimplemented, preserve the TODO comment, and return early from
OnSnapshotReceived so the service does not terminate; ensure the change is made
in the branch guarded by data_store_factory_->IsCloudMode().
- Around line 2283-2294: The current OnSnapshotReceived path calls
ds_ref.data_store_->ReloadDataFromCloud(term) without ensuring the shard is in a
stable running/ready state and uses an unbounded wait loop on
ds_ref.shard_status_ == DSShardStatus::Starting; change this by replacing the
infinite wait with a bounded wait (e.g., max iterations or timeout using
bthread_usleep) and only call ReloadDataFromCloud when the shard_status_ is a
stable state such as DSShardStatus::Running or DSShardStatus::Ready (check your
enum for the exact stable value), and if the timeout expires log and abort the
reload; apply the same guarded, timed-wait pattern to the other occurrence
around lines 2307-2314 that also uses ds_ref.shard_status_ and
ReloadDataFromCloud.

In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp`:
- Around line 924-926: The log in EloqStoreDataStore::OnReLoaded is printing the
address of the local pointer variable (&global_reopen_req) instead of the actual
request pointer; change the DLOG call to log global_reopen_req (or its contents)
rather than &global_reopen_req so the logged value is the request pointer (or a
meaningful representation), e.g., use global_reopen_req or format its fields
when calling DLOG in OnReLoaded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f9a1fb7a-feb8-4c11-a076-ade3a0bf63f2

📥 Commits

Reviewing files that changed from the base of the PR and between ff8278a and cb46e7a.

📒 Files selected for processing (2)

store_handler/eloq_data_store_service/data_store_service.cpp
store_handler/eloq_data_store_service/eloq_store_data_store.cpp

coderabbitai · 2026-03-04T10:14:37Z

store_handler/eloq_data_store_service/eloq_store_data_store.cpp

+        LOG(FATAL) << "EloqStoreDataStore::ReloadDataFromCloud, term mismatch, "
+                      "expected: "
+                   << term << ", actual: " << eloq_store_service_->Term();
        return;


⚠️ Potential issue | 🟠 Major

Avoid crashing the process on term mismatch in reload path.

Line [886] uses LOG(FATAL), which turns a stale/late reload event into a full process abort. This should fail gracefully and return.

Suggested fix

- LOG(FATAL) << "EloqStoreDataStore::ReloadDataFromCloud, term mismatch, " - "expected: " - << term << ", actual: " << eloq_store_service_->Term(); + LOG(ERROR) << "EloqStoreDataStore::ReloadDataFromCloud, term mismatch, " + "expected: " + << term << ", actual: " << eloq_store_service_->Term(); return;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

LOG(FATAL) << "EloqStoreDataStore::ReloadDataFromCloud, term mismatch, "

"expected: "

<< term << ", actual: " << eloq_store_service_->Term();

return;

LOG(ERROR) << "EloqStoreDataStore::ReloadDataFromCloud, term mismatch, "

"expected: "

<< term << ", actual: " << eloq_store_service_->Term();

return;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@store_handler/eloq_data_store_service/eloq_store_data_store.cpp` around lines 886 - 889, EloqStoreDataStore::ReloadDataFromCloud currently uses LOG(FATAL) on a term mismatch which aborts the process; change this to a non-fatal failure path: replace the LOG(FATAL) usage with a safer log level (e.g., LOG(ERROR) or LOG(WARNING)) that includes the same context (expected term and eloq_store_service_->Term()) and then return from ReloadDataFromCloud so the reload fails gracefully without crashing the process.

This reverts commit 3f2dcd5.

liunyl reviewed Feb 10, 2026

View reviewed changes

tx_service/include/store/data_store_handler.h Outdated Show resolved Hide resolved

lzxddz force-pushed the redesign-eloqstore-standby branch 11 times, most recently from fae35f1 to 6d82100 Compare March 2, 2026 11:04

lzxddz and others added 2 commits March 3, 2026 17:45

fix obj_result.commit_ts_

c955b90

lzxddz force-pushed the redesign-eloqstore-standby branch from 9add740 to c955b90 Compare March 3, 2026 09:48

lzxddz requested a review from thweetkomputer March 3, 2026 09:50

lzxddz self-assigned this Mar 3, 2026

lzxddz marked this pull request as ready for review March 3, 2026 09:50

thweetkomputer reviewed Mar 3, 2026

View reviewed changes

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

thweetkomputer reviewed Mar 3, 2026

View reviewed changes

thweetkomputer approved these changes Mar 3, 2026

View reviewed changes

update

ff8278a

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

update

cb46e7a

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

lzxddz merged commit 3f2dcd5 into main Mar 4, 2026
4 checks passed

lzxddz linked an issue Mar 4, 2026 that may be closed by this pull request

[Feature]: Redesign the standby node sync data store files with eloqstore cloud. #395

Closed

coderabbitai bot mentioned this pull request Mar 6, 2026

feat: adapt to eloqstore local standby feature #424

Merged

5 tasks

lokax added a commit that referenced this pull request Mar 9, 2026

Revert "Refactor standby for eloqstore cloud (#405)"

e55b285

This reverts commit 3f2dcd5.

coderabbitai bot mentioned this pull request Mar 9, 2026

Enforce subscription barrier for standby snapshot sync #436

Merged

lokax added a commit that referenced this pull request Mar 10, 2026

Revert "Refactor standby for eloqstore cloud (#405)"

27a005d

This reverts commit 3f2dcd5.

-    Connect();
-}
+    if (!Connect())
+    {
+        LOG(ERROR) << "OnStartFollowing failed to connect data store, ng_id="
+                   << ng_id << ", term=" << term;
+        return;
+    }
+}

Conversation

lzxddz commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Here are some reminders before you submit the pull request

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

Uh oh!

thweetkomputer Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thweetkomputer Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lzxddz commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading