Restructure for eloqconverge by liunyl · Pull Request #168 · eloqdata/tx_service

liunyl · 2025-10-23T06:35:32Z

Here are some reminders before you submit the pull request

Add tests for the change
Document changes
Reference the link of issue using fixes eloqdb/tx_service#issue_id
Reference the link of RFC if exists
Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

New Features
- Modular build system with configurable storage backends (RocksDB, DynamoDB, BigTable, Cloud variants)
- Comprehensive metrics collection and monitoring framework
- Enhanced configuration management via INI-based settings
- Multi-cloud storage support (AWS S3, Google Cloud Storage)
- Improved logging infrastructure with customizable prefixes
Refactor
- Restructured project architecture for better modularity
- Updated CMake build configuration with improved organization
- Consolidated data store handler implementations

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-10-23T06:35:56Z

Walkthrough

This PR comprehensively restructures the project from a monolithic eloq build into a modular data_substrate system with pluggable storage backends, logging, metrics, and transaction services. It introduces centralized lifecycle management, multi-phase initialization with INI-based configuration, and integrates BigTable, DynamoDB, and EloqDS backends alongside new metrics and logging frameworks.

Changes

Cohort / File(s)	Summary
Build System Infrastructure `.gitignore`, `.gitmodules`, `.pre-commit-config.yaml`	Updated protobuf ignore patterns, relocated submodules (abseil-cpp, tx-log-protos to tx_service/), added log_service submodule, introduced pre-commit hooks for clang-format and code cleanup.
Top-level CMake Configuration `CMakeLists.txt`	Complete rewrite replacing narrow CMake paradigm with modular, option-driven system; renamed project to data_substrate; introduced cache variables for storage backend selection (DynamoDB, BigTable, RocksDB variants, EloqDS), feature flags (WITH_LOG_SERVICE, STATISTICS, etc.), and conditional compilation definitions; set up public library targets, installation rules, and included modular build scripts.
Modular Build Scripts `build_tx_service.cmake`, `build_eloq_metrics.cmake`, `build_log_service.cmake`, `build_eloq_log_service.cmake`	New CMake configuration files establishing standalone build pipelines for TxService, metrics library, and log service with dependency resolution, protobuf compilation, platform-specific flags, and public target exposure.
Core Data Substrate Framework `core/include/data_substrate.h`, `core/src/data_substrate.cpp`, `core/src/log_init.cpp`, `core/src/metrics_init.cpp`, `core/src/storage_init.cpp`, `core/src/tx_service_init.cpp`	Introduced DataSubstrate singleton managing global lifecycle with config structs (EngineConfig, CoreConfig, NetworkConfig, LogServiceConfig); multi-phase initialization (core/network config, engine registration, log service, storage handler, metrics, TxService); integrated INI-based configuration loading and error propagation across subsystems.
Logging Infrastructure `core/include/glog_error_logging.h`	New glog integration header with custom log prefix formatting (timestamp, severity, thread ID, optional source location), environment-aware initialization, and directory management.
Metrics Headers `core/include/mysql_metrics.h`, `core/include/redis_metrics.h`	New metric collection interfaces for MySQL and Redis operations; defines metric names, tracking macros (TRACK_TX_METRICS, COLLECT_TX_METRICS, etc.), and registration functions.
Metrics Framework (eloq_metrics/) `eloq_metrics/\*`	Complete metrics library with Prometheus backend integration; includes abstract interfaces (MetricsRegistry, MetricsCollector, Meter), concrete implementations (MetricsRegistryImpl, PrometheusCollector), manager/singleton patterns, benchmarking infrastructure (eloq-metrics-bench, tx-service-metrics-bench, redis-metrics-bench), test suite, and CMake packaging/installation.
Store Handler — BigTable `store_handler/bigtable_handler.h`, `store_handler/bigtable_handler.cpp`, `store_handler/bigtable_scanner.h`, `store_handler/bigtable_scanner.cpp`	BigTable-backed DataStoreHandler with catalog management, range partitioning, async bulk operations, MVCC support, archive handling, and per-partition scanner implementations.
Store Handler — DynamoDB `store_handler/dynamo_handler.h`, `store_handler/dynamo_handler_typed.h`, `store_handler/dynamo_handler_typed.cpp`, `store_handler/dynamo_scanner.h`, `store_handler/dynamo_scanner.cpp`	DynamoDB-backed store handler with typed record/key binding, payload serialization, pushdown predicate filtering, multi-partition scanning with heap-based result merging, and per-partition query orchestration.
Store Handler — Data Store Service Client `store_handler/data_store_service_client.h`, `store_handler/data_store_service_client_closure.cpp`, `store_handler/data_store_service_scanner.h`, `store_handler/data_store_service_scanner.cpp`	Distributed data store client with multi-shard routing, batch planning, async callback infrastructure (ReadClosure, BatchWriteRecordsClosure, FlushDataClosure, etc.), range metadata management, archive/MVCC operations, and single/hash-partition scanner implementations.
Data Store Service (eloq_data_store_service/) `store_handler/eloq_data_store_service/\*`	New service module with CMake build (multi-backend support), INI configuration reader, abstract interfaces (DataStore, DataStoreFactory), fault injection framework with action triggering (PANIC, SLEEP, etc.), README and RocksDB configuration documentation.
Formatting & Documentation `eloq_metrics/.clang-format`, `eloq_metrics/README.md`, `store_handler/.clang-format`, `store_handler/README.md`, `store_handler/eloq_data_store_service/README.md`, `store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md`	Added project-specific clang-format configs (C++ and Proto), README files documenting build/usage, and RocksDB configuration flag reference.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

This PR introduces substantial architectural changes across multiple subsystems with high interconnection complexity:

High-density logic: Multi-phase initialization sequences in DataSubstrate with interdependent error handling and state transitions across storage, logging, metrics, and TxService layers.
Heterogeneous changes: Diverse implementations spanning build configuration, abstract interfaces, concrete backends (BigTable, DynamoDB, distributed service client), metrics framework, and lifecycle management—each requiring separate reasoning.
New public APIs: Significant new surface area (DataSubstrate, multiple store handlers, metrics registry/collector/meter, INI reader, fault injection) with complex initialization semantics.
Async callback infrastructure: Data store client closures and scanner implementations involve intricate concurrency patterns, state management, and error propagation across multiple layers.
Cross-module integration: Configuration loading, backend selection logic, and module interdependencies demand careful verification of initialization order and error recovery.

Areas requiring extra attention during review:

DataSubstrate initialization sequencing (core/src/data_substrate.cpp, log_init.cpp, storage_init.cpp, tx_service_init.cpp, metrics_init.cpp): Verify all phase dependencies, early-exit conditions on failures, and resource cleanup paths.
CMake configuration logic (CMakeLists.txt, build_*.cmake): Validate WITH_DATA_STORE/WITH_LOG_STATE synchronization, backend-specific compile definitions, and cross-module dependency resolution.
Store handler async callbacks (data_store_service_client_closure.cpp, bigtable_handler.cpp, dynamo_handler implementations): Confirm closure state management, result serialization/deserialization, thread-safety, and heap-switching correctness.
Metrics framework public API (meter.h, metrics.h, metrics_registry_impl.h, prometheus_collector.cc): Validate label generation, Cartesian product logic, metric key mapping, and singleton initialization.
Multi-backend scanner implementations (bigtable_scanner, dynamo_scanner, data_store_service_scanner): Verify partition iteration, result caching/merging, and per-partition state transitions.
INI configuration integration: Ensure configuration precedence rules (CLI flags, INI, defaults) are consistently applied across all modules.
Error propagation and early exits: Trace failure paths through multi-phase initialization to confirm no partial state leaks or orphaned resources.

Possibly related PRs

Move the CreateTxKey and CreateTxRecord function into CatalogFactory and remove TableEngine::Sequence #137: Refactors key/record creation into catalog-level APIs and removes the Sequence engine, directly overlapping with changes adding CreateTxKey/CreateTxRecord to CatalogFactory and removing TableEngine::Sequence throughout this PR.

Suggested reviewers

MrGuin
thweetkomputer

Poem

🐰 From monolith to modular spire,
Data substrates climb ever higher,
Metrics, logs, and stores aligned,
Backends plugged with careful design,
Init phases dance in grand sequence—
A rabbit's refactored renaissance! 🎭

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is empty except for the template checklist reminders. All items remain unchecked and no actual description of changes, objectives, or context is provided.	Provide a detailed description of the restructuring changes, explain the new modular architecture, document affected subsystems, and complete the required checklist items with issue/RFC links.
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.06% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Restructure for eloqconverge' is vague and generic, using non-descriptive terms that don't clearly convey the scope or purpose of the changeset.	Replace with a more specific title that describes the main architectural changes, such as 'Refactor build system and add data substrate abstraction layer' or 'Restructure modular build with multiple storage backends support'.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch converge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

* Enable data_substrate share libs build * Initialized g_data_substrate at the beginning of DataSubstrate::Initialize

* Install absl and yaml-cpp libs * Rename txlog_rocksdb_cloud_endpoint_url

githubzilla

+1

coderabbitai

Actionable comments posted: 21

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (36)

store_handler/eloq_data_store_service/data_store_fault_inject.h-57-73 (1)
57-73: Static map in header causes ODR violation and memory waste.

Defining a static variable in a header file creates a separate copy in each translation unit that includes this header. This violates the One Definition Rule (ODR) and wastes memory.

Option 1 (C++17): Use inline:
-static std::unordered_map<std::string, FaultAction> action_name_to_enum_map{
+inline std::unordered_map<std::string, FaultAction> action_name_to_enum_map{
Option 2: Move definition to .cpp file and declare extern in header:
// In header:
extern std::unordered_map<std::string, FaultAction> action_name_to_enum_map;

// In .cpp:
std::unordered_map<std::string, FaultAction> action_name_to_enum_map{...};
store_handler/eloq_data_store_service/data_store_fault_inject.h-163-176 (1)

163-176: Race condition: returned pointer can be invalidated by concurrent operations.

Entry() returns a raw pointer to a map element after releasing the lock. If another thread calls InjectFault(..., "remove") or modifies the map, the pointer may become dangling, leading to use-after-free.

The same issue exists in TriggerAction(std::string fault_name) at lines 178-195, where the pointer is used after the lock is released.

Consider:

Returning a copy of FaultEntry instead of a pointer

Using std::shared_ptr<FaultEntry> for entries

Holding the lock for the entire duration of action triggering (may impact concurrency)

store_handler/README.md-1-2 (1)

1-2: Expand documentation for the store_handler module.

This README is minimal (just a title and one-line description) for what appears to be a core module based on the PR context. Consider adding:

Architecture overview

Supported storage backends

Configuration examples

API usage examples

Integration guide
eloq_metrics/src/metrics_app.cc-40-51 (1)
40-51: Memory leak: dynamically allocated Executor objects are never deleted.

Raw pointers are stored in exec_vec but never freed. After Join(), the objects remain allocated.

Apply this diff to fix the memory leak:
-    std::vector<eloq_metrics_app::Executor *> exec_vec;
+    std::vector<std::unique_ptr<eloq_metrics_app::Executor>> exec_vec;
     for (int i = 0; i < 3; ++i)
     {
         std::string task_name = "Task_" + std::to_string(i);
-        exec_vec.push_back(new Executor(task_name,
+        exec_vec.push_back(std::make_unique<Executor>(task_name,
                                         i,
                                         cpu_set,
                                         metrics_registry.get(),
-                                        {{"task", task_name}}));
+                                        {{"task", task_name}}));
     }
Also add #include <memory> if not already transitively included.

Committable suggestion skipped: line range outside the PR's diff.
eloq_metrics/src/metrics_registry_impl.cpp-71-77 (1)
71-77: Use find() instead of operator[] for lookup.

collectors_[key] inserts a default-constructed (null) entry if the key doesn't exist, silently modifying the map. Using find() is safer and more explicit:
 void MetricsRegistryImpl::Collect(metrics::MetricKey key,
                                   const metrics::Value &val)
 {
-    auto collector = collectors_[key].get();
-    assert(collector);
+    auto it = collectors_.find(key);
+    assert(it != collectors_.end() && it->second);
+    auto collector = it->second.get();
     collector->Collect(val);
 }
eloq_metrics/CMakeLists.txt-11-14 (1)
11-14: Undefined variable ELOQ_METRICS_ROOT_DIR will cause empty paths.

Line 11 defines METRICS_ROOT_DIR, but lines 12-13 reference the undefined ELOQ_METRICS_ROOT_DIR. This will result in empty METRICS_SRC_DIR and METRICS_INCLUDE_DIR variables.
 set(METRICS_ROOT_DIR ${CMAKE_CURRENT_SOURCE_DIR})
-set(METRICS_SRC_DIR "${ELOQ_METRICS_ROOT_DIR}/src")
-set(METRICS_INCLUDE_DIR "${ELOQ_METRICS_ROOT_DIR}/include")
+set(METRICS_SRC_DIR "${METRICS_ROOT_DIR}/src")
+set(METRICS_INCLUDE_DIR "${METRICS_ROOT_DIR}/include")
eloq_metrics/src/dummy_executor.cc-43-54 (1)

43-54: CPU affinity is set after thread has already started - race condition.

The constructor delegates to Executor(task_name, metrics_registry, common_labels) which starts the thread at line 63-64. After delegation completes, lines 50-53 attempt to set CPU affinity, but the thread is already running. This is a race condition and the affinity may not take effect as intended.

Consider restructuring to set affinity before starting the thread, or use a two-phase initialization pattern.
eloq_metrics/src/dummy_executor.cc-77-99 (1)
77-99: Busy loop without sleep will consume 100% CPU.

The while (true) loop has no sleep or yield, causing the thread to spin at full CPU utilization. This is problematic for a benchmark/demo executor. Consider adding a small sleep interval.
     while (true)
     {
         metrics::Value val = metrics::Value(max_range(rng));
         meter_->Collect(dummy_executor_hist_metric, val);
         auto tick = boost::posix_time::second_clock::local_time();
         boost::posix_time::time_duration diff = tick - start;
+        boost::this_thread::sleep_for(boost::chrono::milliseconds(1));
         if (diff.total_seconds() > 0 && diff.total_seconds() % 5 == 0)
eloq_metrics/src/dummy_executor.cc-106-108 (1)
106-108: Destructor doesn't join the thread, risking undefined behavior.

If the Executor is destroyed without calling Join(), the thread may still be running and accessing member variables that have been destroyed.
 Executor::~Executor()
 {
+    if (inner_thread_ && inner_thread_->joinable())
+    {
+        inner_thread_->join();
+    }
 }
eloq_metrics/src/dummy_executor.cc-31-41 (1)
31-41: Move constructor and assignment operator are incomplete.

The move constructor only moves inner_thread_ but not task_name_, meter_, or running_state_. The move assignment copies task_name_ but doesn't move meter_ or running_state_.
 Executor::Executor(eloq_metrics_app::Executor &&executor) noexcept
-    : inner_thread_(std::move(executor.inner_thread_))
+    : inner_thread_(std::move(executor.inner_thread_)),
+      task_name_(std::move(executor.task_name_)),
+      meter_(std::move(executor.meter_)),
+      running_state_(executor.running_state_)
 {
 }

 Executor &Executor::operator=(Executor &&executor_) noexcept
 {
     this->inner_thread_ = std::move(executor_.inner_thread_);
-    this->task_name_ = executor_.task_name_;
+    this->task_name_ = std::move(executor_.task_name_);
+    this->meter_ = std::move(executor_.meter_);
+    this->running_state_ = executor_.running_state_;
     return *this;
 }
core/src/log_init.cpp-141-148 (1)
141-148: std::stoi can throw on malformed port values.

If the config contains a non-numeric value after the colon, std::stoi will throw std::invalid_argument or std::out_of_range. Consider adding exception handling or validation.
             size_t c_idx = token.find_first_of(':');
             if (c_idx != std::string::npos)
             {
                 txlog_ips.emplace_back(token.substr(0, c_idx));
-                uint16_t pt = std::stoi(token.substr(c_idx + 1));
-                txlog_ports.emplace_back(pt);
+                try
+                {
+                    uint16_t pt = std::stoi(token.substr(c_idx + 1));
+                    txlog_ports.emplace_back(pt);
+                }
+                catch (const std::exception &e)
+                {
+                    LOG(ERROR) << "Invalid port in txlog_service_list: " << token;
+                    return false;
+                }
             }
core/src/log_init.cpp-387-399 (1)
387-399: Invalid time format logs error but doesn't fail initialization.

When std::get_time fails to parse the log purger schedule, the code logs an error but continues with uninitialized log_purger_tm values. This could lead to unexpected purger scheduling behavior.
             if (iss.fail())
             {
-                LOG(ERROR) << "Invalid time format." << std::endl;
+                LOG(ERROR) << "Invalid time format for txlog_rocksdb_cloud_log_purger_schedule: "
+                           << log_purger_schedule;
+                return false;
             }
             else
             {
eloq_metrics/benchmark/CMakeLists.txt-29-32 (1)
29-32: Test command references wrong executable.

The test is named eloq-metrics-bench but the COMMAND specifies mono-bench, which doesn't match any executable defined in this file. This will cause test execution to fail.
 add_test(
         NAME eloq-metrics-bench
-        COMMAND mono-bench
+        COMMAND eloq-metrics-bench
 )
core/include/glog_error_logging.h-80-87 (1)
80-87: Add error handling for readlink failure.

readlink returns -1 on failure, which would result in undefined behavior when constructing the string with a negative length. Additionally, this approach is Linux-specific and won't work on macOS or other platforms.
         // Get the absolute path of the bin directory
         char bin_path[PATH_MAX];
         ssize_t len = readlink("/proc/self/exe", bin_path, PATH_MAX);
+        if (len == -1 || len >= PATH_MAX)
+        {
+            // Fallback to current directory if readlink fails
+            FLAGS_log_dir = "./logs";
+        }
+        else
+        {
-        std::filesystem::path fullPath(std::string(bin_path, len));
-        std::filesystem::path dir_path =
-            fullPath.parent_path().parent_path();
-        FLAGS_log_dir = dir_path.string() + "/logs";
+            std::filesystem::path fullPath(std::string(bin_path, len));
+            std::filesystem::path dir_path =
+                fullPath.parent_path().parent_path();
+            FLAGS_log_dir = dir_path.string() + "/logs";
+        }
core/src/data_substrate.cpp-443-448 (1)
443-448: Logic appears inverted for enable_wal configuration.

The ternary condition checks if the flag is default, but then uses FLAGS_enable_wal in both branches instead of reading from config_reader when the flag is default. This means the INI config value is never used.
     core_config_.enable_wal =
         !core_config_.bootstrap &&
-        (CheckCommandLineFlagIsDefault("enable_wal")
-             ? FLAGS_enable_wal
-             : config_reader.GetBoolean(
-                   "local", "enable_wal", FLAGS_enable_wal));
+        (!CheckCommandLineFlagIsDefault("enable_wal")
+             ? FLAGS_enable_wal
+             : config_reader.GetBoolean(
+                   "local", "enable_wal", FLAGS_enable_wal));
core/src/data_substrate.cpp-216-216 (1)
216-216: Method name "Destory" is misspelled in the Sequences class definition—it should be "Destroy".

The typo exists not only in the call at line 216 but also in the method definition itself (tx_service/include/sequences/sequences.h:92). Both the definition and all call sites must be updated consistently:
-    txservice::Sequences::Destory();
+    txservice::Sequences::Destroy();
Additionally, update the method definition in tx_service/include/sequences/sequences.h:92 from Destory() to Destroy().
build_tx_service.cmake-212-280 (1)

212-280: Remove large commented-out code block.

This 68-line commented block duplicates the active FORK_HM_PROCESS logic below it. Keeping dead code increases maintenance burden and causes confusion.

Delete lines 212-280 entirely, as the active implementation exists in lines 282-354.
eloq_metrics/include/prometheus_collector.h-42-50 (1)
42-50: static const vector in header causes ODR violation and code bloat.

Each translation unit that includes this header will get its own copy of PROMETHEUS_HISTOGRAM_DEF_BUCKETS. This violates ODR and wastes memory.

Move to an inline variable (C++17) or define in the .cc file:
-static const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS = {
+inline const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS = {
     1e+1, 2e+1, 4e+1, 6e+1, 8e+1,  // <  100 us
     ...
 };
Or declare extern in header and define in .cc:
// In header:
extern const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS;
// In .cc:
const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS = {...};
store_handler/data_store_service_scanner.cpp-102-118 (1)
102-118: ScanClose not called when partition has no more data.

When last_batch_size_ < batch_size, the partition is exhausted but ScanClose is never sent to the server. However, IncrementInFlightFetchCount() is called unconditionally. This causes asymmetric increment/decrement when the condition on line 107 is false.

Consider restructuring to either:

Only increment the counter when actually sending a close request

Or always send close and let the server handle the no-op case
 bool SinglePartitionScanner::ScanClose()
 {
     ...
-    scanner_->IncrementInFlightFetchCount();
-
     assert(last_batch_size_ <= scanner_->GetBatchSize());
     // If parition scanner is not run out of data,
     // we need to close the scanner
     if (last_batch_size_ == scanner_->GetBatchSize())
     {
+        scanner_->IncrementInFlightFetchCount();
         client->ScanClose(scanner_->GetTableName(),
                           partition_id_,
                           data_shard_id_,
                           session_id_,
                           this,
                           ProcessScanCloseResult);
     }

     return true;
 }
build_tx_service.cmake-291-303 (1)
291-303: BRAFT_INCLUDE_PATH used before it's defined.

HOST_MANAGER_INCLUDE_DIR references BRAFT_INCLUDE_PATH on line 293, but find_path(BRAFT_INCLUDE_PATH ...) is called on line 299. This will result in an empty include path.

Move the find_path and find_library calls before the HOST_MANAGER_INCLUDE_DIR assignment:
 if (FORK_HM_PROCESS)
+    find_path(BRAFT_INCLUDE_PATH NAMES braft/raft.h)
+    find_library(BRAFT_LIB NAMES braft)
+    if ((NOT BRAFT_INCLUDE_PATH) OR (NOT BRAFT_LIB))
+        message (FATAL_ERROR "Fail to find braft")
+    endif()
+
     SET (HOST_MANAGER_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/tx_service/raft_host_manager)
     set(HOST_MANAGER_INCLUDE_DIR
         ${HOST_MANAGER_SOURCE_DIR}/include
         ...
         ${BRAFT_INCLUDE_PATH}
         ...)
-
-    find_path(BRAFT_INCLUDE_PATH NAMES braft/raft.h)
-    find_library(BRAFT_LIB NAMES braft)
-    if ((NOT BRAFT_INCLUDE_PATH) OR (NOT BRAFT_LIB))
-        message (FATAL_ERROR "Fail to find braft")
-    endif()
Committable suggestion skipped: line range outside the PR's diff.
store_handler/data_store_service_scanner.cpp-76-80 (1)
76-80: Search conditions are built but not used.

The constructor (lines 250-258) builds search_conditions_ from pushdown_cond, but ScanNext() is called with nullptr instead of the search conditions. This means pushdown predicates are never applied.
     client->ScanNext(
         scanner_->GetTableName(),
         partition_id_,
         data_shard_id_,
         last_key_,
         scanner_->GetEndKey(),
         session_id_,
         true,
         // only set inclusive_start for the first batch
         first_batch_fetched_ ? false : scanner_->IsInclusiveStart(),
         scanner_->IsInclusiveEnd(),
         scanner_->IsScanForward(),
         scanner_->GetBatchSize(),
-        // scanner_->SearchConditions(),
-        nullptr,
+        scanner_->SearchConditions(),
         this,
         ProcessScanNextResult);
Ensure SearchConditions() accessor exists in DataStoreServiceScanner base class.

Committable suggestion skipped: line range outside the PR's diff.
store_handler/data_store_service_client_closure.cpp-372-379 (1)

372-379: Same issue: std::abort() on deserialization failure.

This is the same pattern as in FetchRecordCallback. Consider consistent error handling that returns an error code rather than crashing.

store_handler/data_store_service_client_closure.cpp-1231-1239 (1)

1231-1239: Another std::abort() on deserialization failure.

Consistent with earlier findings, this should return an error rather than terminating the process.
eloq_metrics/include/metrics_manager.h-121-121 (1)
121-121: std::stoi may throw on invalid input.

If ELOQ_METRICS_PORT contains a non-numeric value, std::stoi will throw std::invalid_argument or std::out_of_range, crashing initialization.
-        auto bind_port = std::stoi(bind_port_str);
+        int bind_port;
+        try
+        {
+            bind_port = std::stoi(bind_port_str);
+        }
+        catch (const std::exception &e)
+        {
+#ifdef WITH_GLOG
+            LOG(ERROR) << "Invalid port value: " << bind_port_str;
+#endif
+            mgr_init_err_ = MetricsMgrErrors::CollectorOpenErr;
+            return;
+        }
store_handler/data_store_service_client_closure.cpp-196-203 (1)
196-203: Calling std::abort() on deserialization failure may be too aggressive.

If DeserializeTxRecordStr fails, the process terminates immediately. Consider returning an error code to allow graceful error handling and recovery.
             if (!DataStoreServiceClient::DeserializeTxRecordStr(
                     val, is_deleted, offset))
             {
                 LOG(ERROR) << "====fetch record===decode error==" << " key: "
                            << read_closure->Key()
                            << " status: " << (int) fetch_cc->rec_status_;
-                std::abort();
+                fetch_cc->SetFinish(
+                    static_cast<int>(txservice::CcErrorCode::DATA_STORE_ERR));
+                return;
             }
store_handler/dynamo_scanner.h-132-144 (1)
132-144: Potential dangling pointer: start_key passed by reference but stored as pointer.

HashPartitionDynamoScanner takes start_key by const reference but the base class stores &start_key as a pointer. If the caller's start_key object goes out of scope before the scanner, this becomes a dangling pointer. Consider storing a copy or documenting the lifetime requirement clearly.

Either store a copy of the key:
-        const EloqKV::EloqKey *start_key_;
+        std::unique_ptr<EloqKV::EloqKey> start_key_;
Or document the lifetime requirement if the caller guarantees validity:
    // NOTE: start_key must remain valid for the lifetime of this scanner
    HashPartitionDynamoScanner(
        ...
        const EloqKV::EloqKey &start_key,  // must outlive this scanner
        ...
store_handler/dynamo_handler_typed.cpp-159-193 (1)

159-193: Multiple unimplemented methods with assert(false) stubs.

NewTxRecord, NegInfKey, PosInfKey, and ColumnsToScanWhenBuildSk are all stubs. These are likely called at runtime based on the scanner implementations. Either implement them or document that this handler doesn't support these operations by throwing descriptive exceptions.

Would you like me to help identify where these methods are called from to understand the expected behavior?
store_handler/dynamo_handler_typed.cpp-36-53 (1)
36-53: Stub implementations with assert(false) will cause runtime failures.

These methods (SetSequenceTableSchema, GetSequenceTableSchema, GetIndexes) unconditionally trigger assert(false). In debug builds, this crashes; in release builds, behavior is undefined. If these are intended to be overridden by derived classes, consider making them pure virtual in the base class. Otherwise, throw a proper exception with a meaningful message.
 void DynamoHandlerTyped::SetSequenceTableSchema(
     const txservice::TableSchema *tbl_schema)
 {
-    assert(false);
+    throw std::logic_error("SetSequenceTableSchema not implemented for DynamoHandlerTyped");
 }
 const txservice::TableSchema *DynamoHandlerTyped::GetSequenceTableSchema()
 {
-    assert(false);
+    throw std::logic_error("GetSequenceTableSchema not implemented for DynamoHandlerTyped");
+    return nullptr;
 }
CMakeLists.txt-215-219 (1)
215-219: Proto compilation at configure time may cause stale artifacts.

Using execute_process() runs protoc during CMake configure, not during build. If .proto files change, the generated files won't update until CMake is re-run.

Consider using add_custom_command() for build-time generation:
add_custom_command(
    OUTPUT ${DS_PROTO_DIR}/ds_request.pb.cc ${DS_PROTO_DIR}/ds_request.pb.h
    COMMAND protoc --cpp_out=${DS_PROTO_DIR} -I${DS_PROTO_DIR} ${DS_PROTO_DIR}/ds_request.proto
    DEPENDS ${DS_PROTO_DIR}/ds_request.proto
    COMMENT "Generating protobuf files for ds_request"
)
store_handler/dynamo_handler.h-584-588 (1)
584-588: Copy constructor doesn't copy ttl_set member.

The copy constructor copies kv_table_name_ and kv_index_names_ but omits ttl_set, which will default to false in copied objects regardless of the source value.
     DynamoCatalogInfo(const DynamoCatalogInfo &kv_info)
     {
         kv_table_name_ = kv_info.kv_table_name_;
         kv_index_names_ = kv_info.kv_index_names_;
+        ttl_set = kv_info.ttl_set;
     }
store_handler/dynamo_handler.h-388-406 (1)
388-406: Member old_table_schema_ is declared but never initialized in constructors.

UpsertTableData::old_table_schema_ (line 391) is declared but none of the three constructors initialize it, leaving it with indeterminate value.

Initialize the member in all constructors:
         UpsertTableData(
             DynamoHandler *dynamo_hd,
             const txservice::TableName *table_name,
             const txservice::TableSchema *old_schema,
             const txservice::TableSchema *schema,
             txservice::OperationType op_type,
             uint64_t write_time,
             std::shared_ptr<void> defer_unpin,
             txservice::CcHandlerResult<txservice::Void> *hd_res,
             const txservice::AlterTableInfo *alter_table_info = nullptr)
             : hd_res_(hd_res),
               dynamo_hd_(dynamo_hd),
               table_name_(table_name),
+              old_table_schema_(old_schema),
               table_schema_(schema),
And similarly for the copy constructors, copying from other.old_table_schema_.

Committable suggestion skipped: line range outside the PR's diff.
store_handler/data_store_service_scanner.h-173-177 (1)
173-177: Dangling reference risk: storing reference to external string.

table_name_ is stored as const std::string &, creating a lifetime dependency on the caller's string. If the source string is destroyed while the scanner is active, this becomes a dangling reference.

Consider storing by value to avoid lifetime issues:
 protected:
     // initialized parameters
     DataStoreServiceClient *client_;
-    const std::string &table_name_;
+    std::string table_name_;
     std::string start_key_;
And update the constructor initialization:
-          table_name_(table_name),
+          table_name_(table_name),  // Now copies the string
store_handler/bigtable_handler.cpp-1926-1927 (1)

1926-1927: Potential data race in async callback modifying futures vector.

The callback at line 1926-1927 modifies the futures vector (swap and pop_back) while holding flight_bulks_mux_. However, the futures vector is passed by reference from the caller (PutAll, PutArchivesAll) and new elements are added at line 1932 after releasing the lock. This creates a race between the callback modifying the vector and the main thread adding to it.

The idx = futures.size() capture at line 1892 becomes invalid after the swap/pop_back operations from other callbacks.

Consider using an index-stable container or tracking completed futures separately rather than modifying the vector in-place.
store_handler/bigtable_handler.cpp-1863-1867 (1)
1863-1867: Return value of PutArchivesAll is ignored, potentially masking failures.

The variable ret is assigned the result of PutArchivesAll but is never checked or used. If archiving fails, the function still returns true, which could lead to data inconsistency.

Apply this diff to fix the issue:
-        bool ret = PutArchivesAll(node_group,
+        if (!PutArchivesAll(node_group,
                                   table_name,
                                   table_schema->GetKVCatalogInfo(),
-                                  archive_vec);
+                                  archive_vec))
+        {
+            return false;
+        }
     }
store_handler/bigtable_handler.h-35-35 (1)
35-35: Avoid using namespace in header files.

using namespace MyEloq; in a header file pollutes the global namespace for all files that include this header, potentially causing name collisions and making code harder to reason about.
-using namespace MyEloq;
 namespace EloqDS
 {
+using MyEloq::EloqKey;
+using MyEloq::EloqRecord;
+// Add other specific using declarations as needed
store_handler/bigtable_handler.h-350-371 (1)
350-371: strtok is not thread-safe and const_cast violates const correctness.

The Deserialize methods use const_cast to remove const from payload_ and then use strtok, which modifies the string in place and uses static internal state, making it not thread-safe.

Consider using a safer alternative:
         void Deserialize(std::vector<long> &nums) const
         {
-            char *payload = const_cast<char *>(payload_.c_str());
-            char *token = strtok(payload, ",");
-            while (token)
+            std::string_view sv(payload_);
+            size_t pos = 0;
+            while (pos < sv.size())
             {
-                long num = strtol(token, nullptr, 10);
-                assert(errno != ERANGE);
-                nums.push_back(num);
-                token = strtok(nullptr, ",");
+                size_t next = sv.find(',', pos);
+                if (next == std::string_view::npos) next = sv.size();
+                long num = std::strtol(sv.data() + pos, nullptr, 10);
+                nums.push_back(num);
+                pos = next + 1;
             }
         }

🟡 Minor comments (35)

store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md-82-82 (1)
82-82: Fix typo: "dialy" should be "daily". Line 82 contains a spelling error in the flag name rocksdb_dialy_offpeak_time_utc.
-| `rocksdb_dialy_offpeak_time_utc` | No | Local time zone's 00:00-05:00 converted to UTC | Time range in format "HH:MM-HH:MM" | Daily off-peak time in UTC for scheduling compactions |
+| `rocksdb_daily_offpeak_time_utc` | No | Local time zone's 00:00-05:00 converted to UTC | Time range in format "HH:MM-HH:MM" | Daily off-peak time in UTC for scheduling compactions |
store_handler/eloq_data_store_service/data_store_fault_inject.h-106-106 (1)
106-106: assert silently passes in release builds if parameter format is invalid.

If paras contains a malformed key-value pair without =, assert(pos3 != std::string::npos) will crash in debug builds but continue with undefined behavior (using npos as index) in release builds.

Consider using explicit error handling consistent with line 94-96:
             std::string sbs = paras.substr(pos1, pos2 - pos1);
             size_t pos3 = sbs.find('=');
-            assert(pos3 != std::string::npos);
+            if (pos3 == std::string::npos)
+            {
+                LOG(ERROR) << "Error parameters for fault inject: missing '=' in name="
+                           << fault_name << ", segment=" << sbs;
+                abort();
+            }
store_handler/eloq_data_store_service/data_store_fault_inject.cpp-100-109 (1)
100-109: std::stoi can throw if para contains invalid input.

If the parameter string is not a valid integer (e.g., "abc" or overflow), std::stoi throws std::invalid_argument or std::out_of_range, causing an unhandled exception and process termination.

Consider wrapping with try-catch or validating input:
         case FaultAction::SLEEP:
         {
             int secs = 1;
             if (!para.empty())
             {
-                secs = std::stoi(para);
+                try
+                {
+                    secs = std::stoi(para);
+                }
+                catch (const std::exception &e)
+                {
+                    LOG(ERROR) << "Invalid sleep parameter: " << para
+                               << ", using default 1s";
+                }
             }

             sleep(secs);
             break;
         }
store_handler/.clang-format-9-9 (1)
9-9: Fix typo in comment.

"sepecified" should be "specified".
-# When an option is not sepecified, it defaults to Google's style (this can be
+# When an option is not specified, it defaults to Google's style (this can be
store_handler/.clang-format-6-6 (1)
6-6: Fix typo in comment.

"Advacned" should be "Advanced".
-# compiled by the Advacned Technology department.
+# compiled by the Advanced Technology department.
eloq_metrics/.clang-format-6-6 (1)
6-6: Fix typo in comment.

"Advacned" should be "Advanced".
-# compiled by the Advacned Technology department.
+# compiled by the Advanced Technology department.
eloq_metrics/.clang-format-9-9 (1)
9-9: Fix typo in comment.

"sepecified" should be "specified".
-# When an option is not sepecified, it defaults to Google's style (this can be
+# When an option is not specified, it defaults to Google's style (this can be
store_handler/eloq_data_store_service/.gitignore-58-60 (1)
58-60: Remove duplicate "bin" entry.

The pattern bin appears on both Line 58 and Line 60.
 bin
 bld
-bin
 cmake/
eloq_metrics/README.md-94-102 (1)
94-102: Complete the cmake command.

Lines 94-102 show an incomplete cmake command - the initial cmake -E make_directory build && cmake -S . -B build \ is missing.
 cd eloq-metrics
+cmake -E make_directory build && \
+cmake -S . -B build \
   -DCMAKE_BUILD_TYPE=Release \
   -DCMAKE_CXX_FLAGS="${CXXFLAGS} -fPIC" \
   -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake \
   -DENABLE_BENCHMARK=ON \
   -DWITH_ABSEIL=ON
 
 cd build && cmake --build .  -j4
store_handler/eloq_data_store_service/build_eloq_store.cmake-28-28 (1)
28-28: Optional zstd is linked unconditionally.

find_library(ZSTD_LIBRARY zstd) doesn't use REQUIRED, making zstd optional. However, line 110 links against ${ZSTD_LIBRARY} unconditionally. If zstd isn't found, this will be empty or NOTFOUND.

Either make it required or link conditionally:
 find_library(ZSTD_LIBRARY zstd)
+if(ZSTD_LIBRARY)
+    set(ZSTD_LINK_LIB ${ZSTD_LIBRARY})
+else()
+    set(ZSTD_LINK_LIB "")
+    message(STATUS "zstd not found, compression support disabled")
+endif()
Then on line 110, use ${ZSTD_LINK_LIB} instead of ${ZSTD_LIBRARY}.

Committable suggestion skipped: line range outside the PR's diff.
log_service-1-1 (1)

1-1: Improve submodule initialization documentation; correct verification command syntax.

The .gitmodules configuration for log_service is correctly set up. However:

The verification command in your comment is syntactically incorrect. git ls-remote does not accept commit hashes as arguments—it only works with refs (branches/tags). To verify a commit exists, use git ls-remote --heads --tags git@github.com:eloqdata/log_service.git and search the output, or clone the repository and check with git cat-file -t <hash>.

The SSH URL (git@github.com:...) may create friction for developers who prefer HTTPS. Consider documenting both protocols or using HTTPS by default.

Add explicit submodule initialization instructions to README.md or a setup guide (e.g., git clone --recursive or git submodule update --init --recursive). Currently, the README only mentions that transaction logs are stored in the Log Service but lacks setup steps.
eloq_metrics/tests/metrics_collector_test.cc-111-137 (1)
111-137: Test logic issue: shared collector state across WHEN blocks.

The two WHEN blocks at lines 118 and 127 share the same collector instance. When tests run, the first WHEN opens the collector, and the second WHEN's behavior depends on whether both blocks execute independently (fresh collector) or sequentially (already opened). Catch2's BDD macros create separate execution paths, but this structure is confusing and may not test the intended scenario.

Consider restructuring to make each scenario independent:
-SCENARIO("Metrics collector call Open several times",
-         "[MCCallOpenSeveralTimes]")
+SCENARIO("Metrics collector Open returns true on first call",
+         "[MCOpenOnce]")
 {
-    INFO("Run unit test MCCallOpenSeveralTimes");
-    WHEN("init prometheus collector")
+    GIVEN("A fresh prometheus collector")
     {
         metrics::PrometheusCollector collector{"0.0.0.0", 18082};
         WHEN("The open method is called")
         {
             auto open_one = collector.Open();
             THEN("open one will return success")
             {
                 REQUIRE(open_one == true);
             }
         }
-
-        WHEN("The open method is call twice")
-        {
-            collector.Open();
-            auto open_two = collector.Open();
-            THEN("open two will return false")
-            {
-                REQUIRE(open_two == false);
-            }
-        }
     }
 }
+
+SCENARIO("Metrics collector Open returns false on second call",
+         "[MCOpenTwice]")
+{
+    GIVEN("A prometheus collector that has been opened")
+    {
+        metrics::PrometheusCollector collector{"0.0.0.0", 18083};
+        collector.Open();
+        WHEN("Open is called again")
+        {
+            auto open_two = collector.Open();
+            THEN("it returns false")
+            {
+                REQUIRE(open_two == false);
+            }
+        }
+    }
+}
eloq_metrics/tests/metrics_http_service_test.cc-83-90 (1)
83-90: Reorder assertions to avoid potential UB.

Line 86 constructs a std::string from mgr_result.not_ok_ before asserting it's non-null on line 88. If the test assumption is violated and not_ok_ is null, this causes undefined behavior.
         THEN("port conflict MetricsMgr init failure")
         {
             auto mgr_result = metrics::MetricsMgr::GetMetricMgrInstance();
-            std::string err_msg(mgr_result.not_ok_);
-            INFO("MetricsMgrResult ErrMsg: " << err_msg);
             REQUIRE(mgr_result.not_ok_ != nullptr);
+            std::string err_msg(mgr_result.not_ok_);
+            INFO("MetricsMgrResult ErrMsg: " << err_msg);
             REQUIRE(mgr_result.mgr_ == nullptr);
         }
eloq_metrics/tests/metrics_manager_test.cc-41-52 (1)
41-52: Add null-checks before dereferencing mgr_.

The test asserts mgr_res_1.not_ok_ == mgr_res_2.not_ok_ but does not verify that mgr_ pointers are non-null before dereferencing them on lines 49-50. If both results have not_ok_ != nullptr, dereferencing mgr_ would cause undefined behavior.
         THEN("Two MetricsMgr instance equal")
         {
             INFO("check MetricsMgrResult");
             REQUIRE(mgr_res_1.not_ok_ == mgr_res_2.not_ok_);
+            REQUIRE(mgr_res_1.mgr_ != nullptr);
+            REQUIRE(mgr_res_2.mgr_ != nullptr);
             metrics::MetricsMgr &mgr_1 = *mgr_res_1.mgr_;
             metrics::MetricsMgr &mgr_2 = *mgr_res_2.mgr_;
             auto mgr_instance_equals = mgr_1 == mgr_2;
             REQUIRE(mgr_instance_equals);
         }
eloq_metrics/tests/metrics_http_service_test.cc-46-64 (1)
46-64: Ensure curl cleanup occurs unconditionally.

curl_easy_cleanup(curl) is only called inside the THEN block. If the REQUIRE on line 57 or 61 fails, the curl handle leaks. Also, curl_global_cleanup() is never called.

Consider restructuring to ensure cleanup:
         curl_easy_setopt(curl, CURLOPT_WRITEDATA, &rsp_buffer);
         CURLcode curl_response = curl_easy_perform(curl);
+        long http_status = 0;
+        curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_status);
+        curl_easy_cleanup(curl);
+        curl_global_cleanup();
         THEN("access to eloq_metrics http service return 200.")
         {
             REQUIRE(curl_response == CURLE_OK);
-            long http_status;
-            curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_status);
             INFO("/eloq_metrics response \n" << rsp_buffer);
             REQUIRE(http_status == 200);
-            curl_easy_cleanup(curl);
         }
eloq_metrics/benchmark/tx_service_metrics_bench.cpp-31-31 (1)
31-31: Typo: "guage" should be "gauge".
-    metrics::Name name{"guage"};
+    metrics::Name name{"gauge"};
eloq_metrics/include/metrics_collector.h-24-26 (1)
24-26: Use angle brackets for standard library includes.

Line 24 uses "memory" which searches the local directory first. Standard library headers should use angle brackets.
-#include "memory"
+#include <memory>
 #include "metrics.h"
 #include "prometheus/client_metric.h"
eloq_metrics/benchmark/tx_service_metrics_bench.cpp-19-25 (1)
19-25: Unused variable time_start will cause compiler warning.

The variable time_start is assigned but never used. Either use it (e.g., with benchmark::DoNotOptimize) or remove the assignment.
 static void BM_get_time_point(benchmark::State &state)
 {
     while (state.KeepRunning())
     {
-        auto time_start = metrics::Clock::now();
+        benchmark::DoNotOptimize(metrics::Clock::now());
     }
 }
eloq_metrics/CMakeLists.txt-43-49 (1)
43-49: Undefined DEF_INSTALL_CMAKEDIR and loop variable mismatch.

Line 43: DEF_INSTALL_CMAKEDIR is not defined, so INSTALL_CMAKEDIR will be empty.

Line 45: The loop uses "include" but the variable is named INSTALL_INCLUDEDIR (with DIR suffix), causing a mismatch in the INSTALL_${p}DIR pattern.
-set(INSTALL_CMAKEDIR ${DEF_INSTALL_CMAKEDIR} CACHE PATH "Installation directory for CMake files")
+set(DEF_INSTALL_CMAKEDIR "${CMAKE_INSTALL_LIBDIR}/cmake/${PROJECT_NAME}" CACHE PATH "Default CMake install directory")
+set(INSTALL_CMAKEDIR ${DEF_INSTALL_CMAKEDIR} CACHE PATH "Installation directory for CMake files")

-foreach(p LIB BIN include CMAKE)
+foreach(p LIB BIN INCLUDE CMAKE)
core/src/metrics_init.cpp-14-14 (1)
14-14: Typo in comment: "TDOO" should be "TODO".
-#include "mysql_metrics.h"  //TDOO(liunyl): move this file to correct location
+#include "mysql_metrics.h"  //TODO(liunyl): move this file to correct location
eloq_metrics/benchmark/redis_metrics_bench.cpp-325-331 (1)
325-331: Unused variable may be optimized away.

The benchmark measures time to obtain a time point, but time_start is unused. The compiler may optimize away the call entirely, making the benchmark meaningless.

Apply this diff to prevent optimization:
 static void BM_get_time_point(benchmark::State &state)
 {
     while (state.KeepRunning())
     {
         auto time_start = metrics::Clock::now();
+        benchmark::DoNotOptimize(time_start);
     }
 }
eloq_metrics/benchmark/redis_metrics_bench.cpp-19-123 (1)
19-123: Duplicate entries in command_types vector.

"dump" and "restore" appear twice in the list (lines 28-29 and 122-123). This will cause duplicate metric registrations in the benchmark.

Remove the duplicate entries at lines 122-123:
                                                     "sunionstore",
                                                     "sort",
-                                                    "sort_ro",
-                                                    "dump",
-                                                    "restore"};
+                                                    "sort_ro"};
store_handler/eloq_data_store_service/CMakeLists.txt-201-223 (1)
201-223: Duplicate AWS Kinesis discovery blocks.

The find_path and find_library for AWS Kinesis are duplicated (lines 201-211 and 213-223). Remove the duplicate block.
         find_library(AWS_KINESIS_LIB aws-cpp-sdk-kinesis)
         if((NOT AWS_KINESIS_LIB))
           message(FATAL_ERROR "Fail to find aws-cpp-sdk-kinesis lib")
         endif()
         message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")

-        find_path(AWS_KINESIS_INCLUDE_PATH aws/kinesis/KinesisClient.h)
-        if((NOT AWS_KINESIS_INCLUDE_PATH))
-          message(FATAL_ERROR "Fail to find aws/kinesis include path")
-        endif()
-        message(STATUS "aws/kinesis include path: ${AWS_KINESIS_INCLUDE_PATH}")
-
-        find_library(AWS_KINESIS_LIB aws-cpp-sdk-kinesis)
-        if((NOT AWS_KINESIS_LIB))
-          message(FATAL_ERROR "Fail to find aws-cpp-sdk-kinesis lib")
-        endif()
-        message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")
-
         find_path(AWS_S3_INCLUDE_PATH aws/s3/S3Client.h)
store_handler/eloq_data_store_service/data_store.h-106-110 (1)
106-110: Typo in documentation comment.

The parameter name in the comment says "drop_talbe_req" but the actual parameter is drop_table_req.
     /**
      * @brief Drop kv table.
-     * @param drop_talbe_req The pointer of the request.
+     * @param drop_table_req The pointer of the request.
      */
     virtual void DropTable(DropTableRequest *drop_table_req) = 0;
core/src/data_substrate.cpp-579-584 (1)
579-584: Incorrect error message text.

The error message mentions enable_io_uring twice, but should reference raft_log_async_fsync as the condition being validated.
     if (raft_log_async_fsync && !enable_io_uring)
     {
-        LOG(ERROR) << "Invalid config: when set `enable_io_uring`, "
-                      "should also set `enable_io_uring`.";
+        LOG(ERROR) << "Invalid config: when set `raft_log_async_fsync`, "
+                      "should also set `enable_io_uring`.";
         return false;
     }
eloq_metrics/src/prometheus_collector.cc-241-245 (1)
241-245: Logging prints pointer address instead of enum value.

&metric_type logs the address of the enum variable, not its value.
 #ifdef WITH_GLOG
-        LOG(ERROR) << "ERR: For now not support metrics_type " << &metric_type;
+        LOG(ERROR) << "ERR: For now not support metrics_type " << static_cast<int>(metric_type);
 #endif
-        ;
+        break;
Also, the trailing ; after #endif is unnecessary—replace with break; for consistency.
store_handler/data_store_service_scanner.cpp-181-189 (1)
181-189: Incorrect error message in ProcessScanCloseResult.

The error message says "Failed to fetch next batch" but this is the close result callback.
     if (result.error_code() != remote::DataStoreError::NO_ERROR)
     {
-        LOG(ERROR) << "Failed to fetch next batch: " << result.error_msg();
+        LOG(ERROR) << "Failed to close scan: " << result.error_msg();
         sp_scanner->scanner_->SetError(
build_eloq_log_service.cmake-51-51 (1)
51-51: find_library does not support VERSION argument.

The VERSION parameter is not a valid argument for find_library(). This line may silently ignore the version constraint.
-    find_library(GLOG_LIB NAMES glog VERSION ">=0.6.0" REQUIRED)
+    find_library(GLOG_LIB NAMES glog REQUIRED)
Consider using find_package(glog 0.6.0 REQUIRED) if version checking is needed.
eloq_metrics/include/metrics_manager.h-128-128 (1)
128-128: Unused variable registry.

The prometheus::Registry is created but never passed to PrometheusCollector or used elsewhere.

Either remove the unused variable or pass it to the collector if intended:
-        auto registry = std::make_shared<prometheus::Registry>();
-
         metrics_collector_ =
-            std::make_shared<PrometheusCollector>(bind_addr, bind_port);
+            std::make_shared<PrometheusCollector>(bind_addr, bind_port, registry);
Committable suggestion skipped: line range outside the PR's diff.
build_eloq_log_service.cmake-134-157 (1)
134-157: Duplicate AWS Kinesis detection code.

The AWS Kinesis include path and library detection (lines 134-144) is duplicated exactly at lines 147-157. This appears to be a copy-paste error.

Remove the duplicate block:
       message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")
 
-
-      find_path(AWS_KINESIS_INCLUDE_PATH aws/kinesis/KinesisClient.h)
-      if((NOT AWS_KINESIS_INCLUDE_PATH))
-        message(FATAL_ERROR "Fail to find aws/kinesis include path")
-      endif()
-      message(STATUS "aws/kinesis include path: ${AWS_KINESIS_INCLUDE_PATH}")
-
-      find_library(AWS_KINESIS_LIB aws-cpp-sdk-kinesis)
-      if((NOT AWS_KINESIS_LIB))
-        message(FATAL_ERROR "Fail to find aws-cpp-sdk-kinesis lib")
-      endif()
-      message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")
-
       find_path(AWS_S3_INCLUDE_PATH aws/s3/S3Client.h)
eloq_metrics/include/metrics_manager.h-32-34 (1)
32-34: Inconsistent environment variable naming.

MOMO_METRICS_PUSH_ADDR uses a different prefix than the other ELOQ_* variables. This appears to be a typo or legacy naming.
-const std::string BIND_ADDR_ENV = "MOMO_METRICS_PUSH_ADDR";
+const std::string BIND_ADDR_ENV = "ELOQ_METRICS_BIND_ADDR";
store_handler/dynamo_handler_typed.cpp-113-128 (1)
113-128: Type mismatch assertion could fail at runtime.

Line 124 asserts that typed_rec->ObjectType() == obj_type, but if the stored object type doesn't match the provided record's type, this will crash in debug builds. Consider handling this mismatch gracefully with an error return or exception.
-        assert(typed_rec->ObjectType() == obj_type);
+        if (typed_rec->ObjectType() != obj_type)
+        {
+            LOG(ERROR) << "Object type mismatch: expected "
+                       << static_cast<int>(typed_rec->ObjectType())
+                       << " but found " << static_cast<int>(obj_type);
+            return;  // or throw
+        }
Committable suggestion skipped: line range outside the PR's diff.
store_handler/data_store_service_scanner.h-207-221 (1)
207-221: Reset() should clear scan_cache_ before reserving.

When reusing a pooled scanner, Reset() only reserves capacity but doesn't clear existing cache entries. If Clear() wasn't called before Reset(), stale data could remain.
     void Reset(DataStoreServiceScanner *scanner,
                int32_t partition_id,
                uint32_t data_shard_id,
                const std::string_view start_key)
     {
         scanner_ = scanner;
         partition_id_ = partition_id;
         data_shard_id_ = data_shard_id;
         last_key_ = start_key;
         session_id_ = "";
         last_batch_size_ = scanner_->GetBatchSize();
         first_batch_fetched_ = false;
         head_ = 0;
+        scan_cache_.clear();
         scan_cache_.reserve(scanner_->GetBatchSize());
     }
store_handler/data_store_service_scanner.h-117-136 (1)
117-136: Potential data race when reading error message.

While SetError correctly uses CAS to ensure only one thread writes error_msg_, GetErrorMsg() reads error_msg_ without memory ordering guarantees relative to error_code_. A caller might observe the new error code but stale/partial error message.

Consider adding a memory fence or using error_code_.load(std::memory_order_acquire) before reading error_msg_:
     const std::string &GetErrorMsg()
     {
+        // Ensure we see error_msg_ written by SetError
+        (void)error_code_.load(std::memory_order_acquire);
         return error_msg_;
     }
Or document that callers must check GetErrorCode() first (which already uses acquire semantics implicitly via the atomic load).
store_handler/bigtable_handler.cpp-2718-2718 (1)
2718-2718: substr result is discarded; trailing space is not removed.

The call index_names.substr(0, index_names.size() - 1) returns a new string but the result is not assigned back to index_names. The trailing space remains in the serialized output.

Apply this diff to fix the issue:
-        index_names.substr(0, index_names.size() - 1);
+        index_names.pop_back();  // Remove trailing space

coderabbitai · 2025-11-28T09:15:29Z

build_tx_service.cmake

+FOREACH(PROTO_FILE ${PROTO_FILES})
+    STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE})
+    LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)
+    ADD_CUSTOM_COMMAND(
+        OUTPUT "${PROTO_SRC}/${PROTO_NAME}.pb.cc" "${PROTO_SRC}/${PROTO_NAME}.pb.h"
+        DEPENDS ${PROTO_SRC}/${PROTO_FILE}
+        COMMAND protoc ${PROTO_FILE} --proto_path=./ --cpp_out=./
+        WORKING_DIRECTORY ${PROTO_SRC}
+    )
+ENDFOREACH(PROTO_FILE)


⚠️ Potential issue | 🔴 Critical

Incorrect regex pattern for extracting proto name.

The regex [^/]proto matches any character (except /) followed by proto, which is not the intended behavior. This will likely produce malformed filenames.

To strip the .proto extension correctly:

FOREACH(PROTO_FILE ${PROTO_FILES}) - STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE}) + STRING(REGEX REPLACE "\\.proto$" "" PROTO_NAME ${PROTO_FILE}) LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

FOREACH(PROTO_FILE ${PROTO_FILES})

STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE})

LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)

ADD_CUSTOM_COMMAND(

OUTPUT "${PROTO_SRC}/${PROTO_NAME}.pb.cc" "${PROTO_SRC}/${PROTO_NAME}.pb.h"

DEPENDS ${PROTO_SRC}/${PROTO_FILE}

COMMAND protoc ${PROTO_FILE} --proto_path=./ --cpp_out=./

WORKING_DIRECTORY ${PROTO_SRC}

)

ENDFOREACH(PROTO_FILE)

FOREACH(PROTO_FILE ${PROTO_FILES})

STRING(REGEX REPLACE "\\.proto$" "" PROTO_NAME ${PROTO_FILE})

LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)

ADD_CUSTOM_COMMAND(

OUTPUT "${PROTO_SRC}/${PROTO_NAME}.pb.cc" "${PROTO_SRC}/${PROTO_NAME}.pb.h"

DEPENDS ${PROTO_SRC}/${PROTO_FILE}

COMMAND protoc ${PROTO_FILE} --proto_path=./ --cpp_out=./

WORKING_DIRECTORY ${PROTO_SRC}

)

ENDFOREACH(PROTO_FILE)

🤖 Prompt for AI Agents

In build_tx_service.cmake around lines 98-107, the current STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE}) uses an incorrect regex and produces malformed names; replace it with a reliable extraction of the proto basename without extension — for example use CMake's GET_FILENAME_COMPONENT(PROTO_NAME ${PROTO_FILE} NAME_WE) (or, if you prefer regex, use STRING(REGEX REPLACE "\\.proto$" "" PROTO_NAME ${PROTO_FILE})), keeping the rest of the loop (APPEND, ADD_CUSTOM_COMMAND outputs/depends) unchanged so the generated .pb.cc/.pb.h filenames are correct.

coderabbitai · 2025-11-28T09:15:29Z

build_tx_service.cmake

+        ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp
+        ${HOST_MANAGER_SOURCE_DIR}/src/ini.c
+        ${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp
+        ${PROTO_SRC}/${PROTO_NAME}.pb.cc


⚠️ Potential issue | 🔴 Critical

Undefined variable ${PROTO_NAME} in host_manager sources.

PROTO_NAME is defined inside the FOREACH loop (lines 98-107) and is not accessible in this scope. This will result in an empty or undefined path.

Either use ${PROTO_CC_FILES} which contains all compiled proto sources, or explicitly list the required proto source files:

SET(RaftHM_SOURCES ${HOST_MANAGER_SOURCE_DIR}/src/main.cpp ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager_service.cpp ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp ${HOST_MANAGER_SOURCE_DIR}/src/ini.c ${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp - ${PROTO_SRC}/${PROTO_NAME}.pb.cc + ${PROTO_CC_FILES} ${LOG_PROTO_SRC}/log_agent.cpp ${LOG_PROTO_SRC}/${LOG_PROTO_NAME}.pb.cc )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

${PROTO_SRC}/${PROTO_NAME}.pb.cc

SET(RaftHM_SOURCES

${HOST_MANAGER_SOURCE_DIR}/src/main.cpp

${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager_service.cpp

${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp

${HOST_MANAGER_SOURCE_DIR}/src/ini.c

${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp

${PROTO_CC_FILES}

${LOG_PROTO_SRC}/log_agent.cpp

${LOG_PROTO_SRC}/${LOG_PROTO_NAME}.pb.cc

)

🤖 Prompt for AI Agents

In build_tx_service.cmake around line 325, the line referencing ${PROTO_NAME}.pb.cc uses PROTO_NAME which was only defined inside the FOREACH at lines 98-107 and is out of scope here, producing an empty/invalid path; replace that reference by using the aggregate variable ${PROTO_CC_FILES} (which contains all generated proto cc files) or explicitly add the specific proto source(s) required here, or move the ${PROTO_NAME}-dependent logic inside the FOREACH so the variable is in scope.

coderabbitai · 2025-11-28T09:15:29Z

CMakeLists.txt

+    if(NOT ROCKSDB_CLOUD_LIB)
+        message(FATAL_ERROR "Fail to find RocksDB Cloud lib path")
+    endif()
+    message(STATUS "ROCKSDB_CLOUD_LIB: ${ROCKSDB_CLOUD_LIB}")
+    set(ROCKSDB_LIBRARIES ${ROCKSDB_LIBRARIES} ${ROCKSDB_CLOUD_LIB})


⚠️ Potential issue | 🔴 Critical

ROCKSDB_CLOUD_LIB check may fail for GCS configuration.

ROCKSDB_CLOUD_LIB is only set for S3 configurations (line 190), but the check at line 202 is common to both S3 and GCS. GCS configurations will hit this FATAL_ERROR.

Add find_library for GCS:

find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-aws) endif() + + if((WITH_DATA_STORE STREQUAL "ROCKSDB_CLOUD_GCS") OR (WITH_DATA_STORE STREQUAL "ELOQDSS_ROCKSDB_CLOUD_GCS")) + find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-gcs) + endif() # Common RocksDB Cloud setup

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if(NOT ROCKSDB_CLOUD_LIB)

message(FATAL_ERROR "Fail to find RocksDB Cloud lib path")

endif()

message(STATUS "ROCKSDB_CLOUD_LIB: ${ROCKSDB_CLOUD_LIB}")

set(ROCKSDB_LIBRARIES ${ROCKSDB_LIBRARIES} ${ROCKSDB_CLOUD_LIB})

find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-aws)

endif()

if((WITH_DATA_STORE STREQUAL "ROCKSDB_CLOUD_GCS") OR (WITH_DATA_STORE STREQUAL "ELOQDSS_ROCKSDB_CLOUD_GCS"))

find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-gcs)

endif()

# Common RocksDB Cloud setup

if(NOT ROCKSDB_CLOUD_LIB)

message(FATAL_ERROR "Fail to find RocksDB Cloud lib path")

endif()

message(STATUS "ROCKSDB_CLOUD_LIB: ${ROCKSDB_CLOUD_LIB}")

set(ROCKSDB_LIBRARIES ${ROCKSDB_LIBRARIES} ${ROCKSDB_CLOUD_LIB})

🤖 Prompt for AI Agents

In CMakeLists.txt around lines 202-206, the current unconditional check for ROCKSDB_CLOUD_LIB will FATAL_ERROR for GCS builds because ROCKSDB_CLOUD_LIB is only set for S3; update the logic to also attempt to locate the GCS cloud lib (use find_library to set ROCKSDB_CLOUD_LIB_GCS or similar) and only require ROCKSDB_CLOUD_LIB when S3 is enabled, or merge the found GCS/S3 library into ROCKSDB_CLOUD_LIB before the FATAL_ERROR check so that the fatal branch is only triggered when neither cloud provider library could be found.

coderabbitai · 2025-11-28T09:15:29Z

CMakeLists.txt

+install(TARGETS logservice
+        ARCHIVE DESTINATION lib
+        LIBRARY DESTINATION lib)


⚠️ Potential issue | 🔴 Critical

Unconditional install of logservice will fail when WITH_LOG_SERVICE is OFF.

The logservice target is only created when WITH_LOG_SERVICE is ON (lines 251-257), but the install rule is unconditional.

-install(TARGETS logservice - ARCHIVE DESTINATION lib - LIBRARY DESTINATION lib) +if(WITH_LOG_SERVICE) + install(TARGETS logservice + ARCHIVE DESTINATION lib + LIBRARY DESTINATION lib) +endif()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

install(TARGETS logservice

ARCHIVE DESTINATION lib

LIBRARY DESTINATION lib)

if(WITH_LOG_SERVICE)

install(TARGETS logservice

ARCHIVE DESTINATION lib

LIBRARY DESTINATION lib)

endif()

🤖 Prompt for AI Agents

In CMakeLists.txt around lines 387-389, the install(TARGETS logservice ...) stanza is unconditional but the logservice target is only created when WITH_LOG_SERVICE is ON; guard the install so it only runs when the target exists by wrapping the install() block in the same condition (if(WITH_LOG_SERVICE) ... endif()) or by checking target existence (if(TARGET logservice) ... endif()) so installation does not fail when WITH_LOG_SERVICE is OFF.

coderabbitai · 2025-11-28T09:15:29Z

core/src/metrics_init.cpp

+        // remote request metrics
+        metrics::enable_remote_request_metrics = config_reader.GetBoolean(
+            "metrics", "enable_busy_round_metrics", metrics::enable_tx_metrics);
+        LOG(INFO) << "enable_remote_request_metrics: "
+                  << (metrics::enable_remote_request_metrics ? "ON" : "OFF");


⚠️ Potential issue | 🔴 Critical

Wrong config key used for enable_remote_request_metrics.

The code reads "enable_busy_round_metrics" but assigns to enable_remote_request_metrics. This appears to be a copy-paste error that will cause the remote request metrics to be incorrectly configured based on busy round settings.

// remote request metrics metrics::enable_remote_request_metrics = config_reader.GetBoolean( - "metrics", "enable_busy_round_metrics", metrics::enable_tx_metrics); + "metrics", "enable_remote_request_metrics", metrics::enable_tx_metrics);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// remote request metrics

metrics::enable_remote_request_metrics = config_reader.GetBoolean(

"metrics", "enable_busy_round_metrics", metrics::enable_tx_metrics);

LOG(INFO) << "enable_remote_request_metrics: "

<< (metrics::enable_remote_request_metrics ? "ON" : "OFF");

// remote request metrics

metrics::enable_remote_request_metrics = config_reader.GetBoolean(

"metrics", "enable_remote_request_metrics", metrics::enable_tx_metrics);

LOG(INFO) << "enable_remote_request_metrics: "

<< (metrics::enable_remote_request_metrics ? "ON" : "OFF");

🤖 Prompt for AI Agents

In core/src/metrics_init.cpp around lines 91 to 95, the code mistakenly reads the "enable_busy_round_metrics" key when setting metrics::enable_remote_request_metrics; change the config key to the correct "enable_remote_request_metrics" when calling config_reader.GetBoolean and keep the same default (metrics::enable_remote_request_metrics) so the remote request metric flag is read from the correct config entry and logged accurately.

coderabbitai · 2025-11-28T09:15:31Z

store_handler/data_store_service_client.h

+                   uint16_t port,
+                   uint64_t shard_version)
+        {
+            assert(expired_ts_.load(std::memory_order_acquire) == 0);


⚠️ Potential issue | 🔴 Critical

Assertion in DssNode::Reset will always fail for newly constructed nodes.

expired_ts_ is initialized to 1U (line 811), but Reset() asserts expired_ts_.load() == 0 (line 782). This assertion will always fail for nodes that haven't been explicitly marked as expired.

- std::atomic<uint64_t> expired_ts_{1U}; + std::atomic<uint64_t> expired_ts_{0};

Or remove the assertion if nodes can be reused:

void Reset(...) { - assert(expired_ts_.load(std::memory_order_acquire) == 0); + expired_ts_.store(0, std::memory_order_release); host_name_ = hostname;

Also applies to: 811-811

🤖 Prompt for AI Agents

In store_handler/data_store_service_client.h around lines 782 and 811, the Reset() assertion expects expired_ts_ == 0 but expired_ts_ is initialized to 1U, so the assert always fails for newly constructed nodes; fix by making the initial value and the Reset() invariant consistent: either initialize expired_ts_ to 0 (change the initializer at line ~811 to 0U) or adjust/remove the assertion in Reset() to match the intended initial state (e.g., assert(expired_ts_.load(...) == 1U) or drop the assert if nodes may be reused).

coderabbitai · 2025-11-28T09:15:32Z

store_handler/dynamo_handler_typed.h

+template <typename KeyT>
+void DynamoHandlerTyped::BindDynamoReqForKeyPredecessor(
+    Aws::DynamoDB::Model::AttributeValue &sk, const KeyT &key)
+{
+    assert(false);
+
+    std::string_view predecessor_key_str = key.KVSerialize();
+    auto data_ptr =
+        reinterpret_cast<const unsigned char *>(predecessor_key_str.data());
+    predecessor(const_cast<unsigned char *>(data_ptr),
+                predecessor_key_str.size());
+
+    sk.SetB(Aws::Utils::ByteBuffer(
+        reinterpret_cast<const unsigned char *>(predecessor_key_str.data()),
+        predecessor_key_str.size()));
+}


⚠️ Potential issue | 🔴 Critical

Critical: Undefined behavior via const_cast on read-only data and dead code after assert(false).

This function has two serious issues:

Dead code: Lines 166-174 execute after assert(false) which will always abort in debug builds. In release builds where asserts are disabled, the function proceeds to execute potentially dangerous code.

Undefined behavior: key.KVSerialize() returns a std::string_view to potentially read-only memory. Using const_cast to modify this data (line 169) is undefined behavior.

If this function is intentionally unimplemented, consider:

template <typename KeyT> void DynamoHandlerTyped::BindDynamoReqForKeyPredecessor( Aws::DynamoDB::Model::AttributeValue &sk, const KeyT &key) { - assert(false); - - std::string_view predecessor_key_str = key.KVSerialize(); - auto data_ptr = - reinterpret_cast<const unsigned char *>(predecessor_key_str.data()); - predecessor(const_cast<unsigned char *>(data_ptr), - predecessor_key_str.size()); - - sk.SetB(Aws::Utils::ByteBuffer( - reinterpret_cast<const unsigned char *>(predecessor_key_str.data()), - predecessor_key_str.size())); + // TODO: Implement predecessor key binding + (void)sk; + (void)key; + throw std::logic_error("BindDynamoReqForKeyPredecessor not implemented"); }

If implementation is needed, copy to a mutable buffer first:

std::string key_copy(key.KVSerialize()); predecessor(reinterpret_cast<unsigned char*>(key_copy.data()), key_copy.size()); sk.SetB(Aws::Utils::ByteBuffer( reinterpret_cast<const unsigned char*>(key_copy.data()), key_copy.size()));

🤖 Prompt for AI Agents

In store_handler/dynamo_handler_typed.h around lines 160-175, the function contains an assert(false) followed by code that uses const_cast on data from a std::string_view (undefined behavior) and therefore is both dead/unsafe; either mark the function as unimplemented (e.g., remove the body and throw or return) or properly implement it by making a mutable copy of the serialized key (std::string key_copy(key.KVSerialize())), call predecessor using key_copy.data() and key_copy.size(), and pass a const pointer to Aws::Utils::ByteBuffer created from key_copy.data() and key_copy.size(), avoiding any const_cast on read-only memory.

coderabbitai · 2025-11-28T09:15:32Z

store_handler/eloq_data_store_service/CMakeLists.txt

+include_directories(
+        ${PROJECT_SOURCE_DIR}/
+if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")
+        ${PROJECT_SOURCE_DIR}/eloqstore/
+endif ()
+)


⚠️ Potential issue | 🔴 Critical

Invalid CMake syntax: if() inside include_directories().

CMake does not support conditional statements inside include_directories(). This will cause a parse error.

Apply this diff to fix the syntax:

-include_directories( - ${PROJECT_SOURCE_DIR}/ -if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE") - ${PROJECT_SOURCE_DIR}/eloqstore/ -endif () -) +include_directories(${PROJECT_SOURCE_DIR}/) +if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE") + include_directories(${PROJECT_SOURCE_DIR}/eloqstore/) +endif()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

include_directories(

${PROJECT_SOURCE_DIR}/

if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")

${PROJECT_SOURCE_DIR}/eloqstore/

endif ()

)

include_directories(${PROJECT_SOURCE_DIR}/)

if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")

include_directories(${PROJECT_SOURCE_DIR}/eloqstore/)

endif()

🤖 Prompt for AI Agents

In store_handler/eloq_data_store_service/CMakeLists.txt around lines 404-409, the include_directories() call contains an if() block which is invalid CMake syntax; to fix it, close include_directories() after the unconditional path(s) and move the conditional outside so you either call include_directories(${PROJECT_SOURCE_DIR}/eloqstore/) only when WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE" or build a list variable (e.g., set(INCLUDE_DIRS ${PROJECT_SOURCE_DIR}/) and if(CONDITION) list(APPEND INCLUDE_DIRS ${PROJECT_SOURCE_DIR}/eloqstore/) then call include_directories(${INCLUDE_DIRS})), ensuring no if() appears inside include_directories().

coderabbitai · 2025-11-28T09:15:32Z

store_handler/eloq_data_store_service/data_store_factory.h

+#include "data_store.h"
+
+namespace EloqDS
+{
+
+class DataStoreFactory
+{
+public:
+    virtual ~DataStoreFactory() = default;
+
+    virtual std::unique_ptr<DataStore> CreateDataStore(
+        bool create_if_missing,
+        uint32_t shard_id,
+        DataStoreService *data_store_service,
+        bool start_db = true) = 0;


⚠️ Potential issue | 🔴 Critical

Missing forward declaration for DataStoreService.

DataStoreService* is used as a parameter in CreateDataStore() but is not forward-declared or included. This will cause a compilation error.

#include "data_store.h" namespace EloqDS { +class DataStoreService; + class DataStoreFactory {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#include "data_store.h"

namespace EloqDS

{

class DataStoreFactory

{

public:

virtual ~DataStoreFactory() = default;

virtual std::unique_ptr<DataStore> CreateDataStore(

bool create_if_missing,

uint32_t shard_id,

DataStoreService *data_store_service,

bool start_db = true) = 0;

#include "data_store.h"

namespace EloqDS

{

class DataStoreService;

class DataStoreFactory

{

public:

virtual ~DataStoreFactory() = default;

virtual std::unique_ptr<DataStore> CreateDataStore(

bool create_if_missing,

uint32_t shard_id,

DataStoreService *data_store_service,

bool start_db = true) = 0;

🤖 Prompt for AI Agents

In store_handler/eloq_data_store_service/data_store_factory.h around lines 27 to 41, the header uses DataStoreService* in the CreateDataStore signature but does not forward-declare or include DataStoreService, which will cause a compilation error; fix by adding a forward declaration “class DataStoreService;” above the DataStoreFactory declaration (or include the DataStoreService header if full type is required), and ensure no circular include is introduced.

coderabbitai · 2025-11-28T09:15:32Z

store_handler/eloq_data_store_service/data_store_fault_inject.cpp

+    if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)
+    {
+        entry->count_strike_++;
+        if (entry->start_strike_ < entry->count_strike_ ||
+            entry->end_strike_ > entry->count_strike_)
+        {
+            return;
+        }
+    }


⚠️ Potential issue | 🔴 Critical

Strike-based gating condition is inverted — actions will never trigger.

The condition on lines 39-40 returns early (skips execution) when it should proceed. With the current logic:

If start_strike_=2, end_strike_=5, count_strike_=3: 2 < 3 is true → returns, action skipped

This means actions are skipped when they should be triggered

The intended behavior is to skip when count_strike_ is outside the [start_strike_, end_strike_] range.

Apply this diff to fix the condition:

if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0) { entry->count_strike_++; - if (entry->start_strike_ < entry->count_strike_ || - entry->end_strike_ > entry->count_strike_) + if (entry->count_strike_ < entry->start_strike_ || + entry->count_strike_ > entry->end_strike_) { return; } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)

{

entry->count_strike_++;

if (entry->start_strike_ < entry->count_strike_ ||

entry->end_strike_ > entry->count_strike_)

{

return;

}

}

if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)

{

entry->count_strike_++;

if (entry->count_strike_ < entry->start_strike_ ||

entry->count_strike_ > entry->end_strike_)

{

return;

}

}

🤖 Prompt for AI Agents

In store_handler/eloq_data_store_service/data_store_fault_inject.cpp around lines 36 to 44, the strike-range gate is inverted: after incrementing count_strike_ the code currently returns when count is inside the [start_strike_, end_strike_] range, preventing intended actions; change the condition to return only when count_strike_ is outside the range (i.e., if count_strike_ < start_strike_ || count_strike_ > end_strike_) so actions proceed when count_strike_ falls within the inclusive bounds.

liunyl force-pushed the converge branch 2 times, most recently from 821dd5f to 85235fc Compare October 24, 2025 06:58

liunyl force-pushed the converge branch 4 times, most recently from bfe5559 to afbec58 Compare November 13, 2025 07:56

liunyl force-pushed the converge branch 2 times, most recently from 4f36bee to 8caa739 Compare November 24, 2025 09:23

liunyl force-pushed the converge branch from 02d3307 to 4df2f15 Compare November 27, 2025 10:37

liunyl and others added 20 commits November 28, 2025 16:00

converge 1

5669db5

remove debug log

a37f0ae

fix stats crash on eloqkv table

3be0ee3

remove debug log and avoid redefinition

ac75330

system handler shutdown api

a39399e

rename tx service to data substrate

b81f20d

fix format

c24a56c

Converge support eloqdoc (#179)

b44c924

* Enable data_substrate share libs build * Initialized g_data_substrate at the beginning of DataSubstrate::Initialize

Fix clang-format

a693912

mysql fail to reset max fds

cf0e247

merge recent store handler change

8990b18

rebase with recent store handler init change

63e5e8d

fix rocksdbcloud compile

a15830a

use standlone mode in bootstrap

043e159

keep eloqkv flags backward compatibility

e116222

check compatibility between log state and data store

044f51c

rebase recent store handler

e06f4e0

make sync data store call yield on block

71d03c7

Converge support for eloqdoc (#212)

9c95d62

* Install absl and yaml-cpp libs * Rename txlog_rocksdb_cloud_endpoint_url

fix assert

46b261a

githubzilla and others added 16 commits November 28, 2025 16:00

Add tx log data_substrate install targets

cd764a9

Fix statistics error when data store is disabled

9846b57

Fix build error

d77f3e7

Fix build error of rocksdb_handler

950c743

Remove assert for __sequences table

8b89d80

Fix __sequence table not found in pre_built_tables

f0d166d

Fix table not found

cbedd21

rebase recent store handler change

7b09bfc

Update ds_config for multi shards

2270321

Add s3_file_downloader to cmake

d296a6f

Add shard_id to S3Downloader s3 url

4b39334

Skip ng_leaders if not bootstrap or single node

442f204

Skip wait node become native group leader

0462bca

fix eloqdss_rocksdb build

c4216da

fix load range bug

830c10a

Fix single node bootstrap

d0f0e12

githubzilla force-pushed the converge branch from f3ddf3c to d0f0e12 Compare November 28, 2025 08:00

githubzilla added 2 commits November 28, 2025 17:02

Add pre-commit-hooks

82f9589

Fix clang-format

4628ea9

githubzilla approved these changes Nov 28, 2025

View reviewed changes

githubzilla marked this pull request as ready for review November 28, 2025 09:08

coderabbitai bot reviewed Nov 28, 2025

View reviewed changes

githubzilla merged commit ea5b4ad into main Nov 28, 2025
5 checks passed

coderabbitai bot mentioned this pull request Dec 3, 2025

fix bug and refactor init process #250

Merged

5 tasks

This was referenced Dec 12, 2025

add jemalloc support #287

Merged

Add s3 url config #288

Merged

This was referenced Jan 6, 2026

chore: auto budget node memory across tx service and EloqStore components #339

Merged

refactor eloq metric repo and register eloqstore metrics #345

Merged

Add engine prefix to kv table name #353

Merged

coderabbitai bot mentioned this pull request Jan 28, 2026

Sync file cache from cloud for eloqstore on standby node #380

Merged

5 tasks

-        ${PROTO_SRC}/${PROTO_NAME}.pb.cc
+        SET(RaftHM_SOURCES
+            ${HOST_MANAGER_SOURCE_DIR}/src/main.cpp
+            ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager_service.cpp
+            ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp
+            ${HOST_MANAGER_SOURCE_DIR}/src/ini.c
+            ${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp
+            ${PROTO_CC_FILES}
+            ${LOG_PROTO_SRC}/log_agent.cpp
+            ${LOG_PROTO_SRC}/${LOG_PROTO_NAME}.pb.cc
+            )

Conversation

liunyl commented Oct 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Here are some reminders before you submit the pull request

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

githubzilla left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liunyl commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2025 •

edited

Loading