Skip to content

Restructure for eloqconverge#168

Merged
githubzilla merged 40 commits intomainfrom
converge
Nov 28, 2025
Merged

Restructure for eloqconverge#168
githubzilla merged 40 commits intomainfrom
converge

Conversation

@liunyl
Copy link
Contributor

@liunyl liunyl commented Oct 23, 2025

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

Summary by CodeRabbit

  • New Features

    • Modular build system with configurable storage backends (RocksDB, DynamoDB, BigTable, Cloud variants)
    • Comprehensive metrics collection and monitoring framework
    • Enhanced configuration management via INI-based settings
    • Multi-cloud storage support (AWS S3, Google Cloud Storage)
    • Improved logging infrastructure with customizable prefixes
  • Refactor

    • Restructured project architecture for better modularity
    • Updated CMake build configuration with improved organization
    • Consolidated data store handler implementations

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Oct 23, 2025

Walkthrough

This PR comprehensively restructures the project from a monolithic eloq build into a modular data_substrate system with pluggable storage backends, logging, metrics, and transaction services. It introduces centralized lifecycle management, multi-phase initialization with INI-based configuration, and integrates BigTable, DynamoDB, and EloqDS backends alongside new metrics and logging frameworks.

Changes

Cohort / File(s) Summary
Build System Infrastructure
.gitignore, .gitmodules, .pre-commit-config.yaml
Updated protobuf ignore patterns, relocated submodules (abseil-cpp, tx-log-protos to tx_service/), added log_service submodule, introduced pre-commit hooks for clang-format and code cleanup.
Top-level CMake Configuration
CMakeLists.txt
Complete rewrite replacing narrow CMake paradigm with modular, option-driven system; renamed project to data_substrate; introduced cache variables for storage backend selection (DynamoDB, BigTable, RocksDB variants, EloqDS), feature flags (WITH_LOG_SERVICE, STATISTICS, etc.), and conditional compilation definitions; set up public library targets, installation rules, and included modular build scripts.
Modular Build Scripts
build_tx_service.cmake, build_eloq_metrics.cmake, build_log_service.cmake, build_eloq_log_service.cmake
New CMake configuration files establishing standalone build pipelines for TxService, metrics library, and log service with dependency resolution, protobuf compilation, platform-specific flags, and public target exposure.
Core Data Substrate Framework
core/include/data_substrate.h, core/src/data_substrate.cpp, core/src/log_init.cpp, core/src/metrics_init.cpp, core/src/storage_init.cpp, core/src/tx_service_init.cpp
Introduced DataSubstrate singleton managing global lifecycle with config structs (EngineConfig, CoreConfig, NetworkConfig, LogServiceConfig); multi-phase initialization (core/network config, engine registration, log service, storage handler, metrics, TxService); integrated INI-based configuration loading and error propagation across subsystems.
Logging Infrastructure
core/include/glog_error_logging.h
New glog integration header with custom log prefix formatting (timestamp, severity, thread ID, optional source location), environment-aware initialization, and directory management.
Metrics Headers
core/include/mysql_metrics.h, core/include/redis_metrics.h
New metric collection interfaces for MySQL and Redis operations; defines metric names, tracking macros (TRACK_TX_METRICS, COLLECT_TX_METRICS, etc.), and registration functions.
Metrics Framework (eloq_metrics/)
eloq_metrics/\*
Complete metrics library with Prometheus backend integration; includes abstract interfaces (MetricsRegistry, MetricsCollector, Meter), concrete implementations (MetricsRegistryImpl, PrometheusCollector), manager/singleton patterns, benchmarking infrastructure (eloq-metrics-bench, tx-service-metrics-bench, redis-metrics-bench), test suite, and CMake packaging/installation.
Store Handler — BigTable
store_handler/bigtable_handler.h, store_handler/bigtable_handler.cpp, store_handler/bigtable_scanner.h, store_handler/bigtable_scanner.cpp
BigTable-backed DataStoreHandler with catalog management, range partitioning, async bulk operations, MVCC support, archive handling, and per-partition scanner implementations.
Store Handler — DynamoDB
store_handler/dynamo_handler.h, store_handler/dynamo_handler_typed.h, store_handler/dynamo_handler_typed.cpp, store_handler/dynamo_scanner.h, store_handler/dynamo_scanner.cpp
DynamoDB-backed store handler with typed record/key binding, payload serialization, pushdown predicate filtering, multi-partition scanning with heap-based result merging, and per-partition query orchestration.
Store Handler — Data Store Service Client
store_handler/data_store_service_client.h, store_handler/data_store_service_client_closure.cpp, store_handler/data_store_service_scanner.h, store_handler/data_store_service_scanner.cpp
Distributed data store client with multi-shard routing, batch planning, async callback infrastructure (ReadClosure, BatchWriteRecordsClosure, FlushDataClosure, etc.), range metadata management, archive/MVCC operations, and single/hash-partition scanner implementations.
Data Store Service (eloq_data_store_service/)
store_handler/eloq_data_store_service/\*
New service module with CMake build (multi-backend support), INI configuration reader, abstract interfaces (DataStore, DataStoreFactory), fault injection framework with action triggering (PANIC, SLEEP, etc.), README and RocksDB configuration documentation.
Formatting & Documentation
eloq_metrics/.clang-format, eloq_metrics/README.md, store_handler/.clang-format, store_handler/README.md, store_handler/eloq_data_store_service/README.md, store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md
Added project-specific clang-format configs (C++ and Proto), README files documenting build/usage, and RocksDB configuration flag reference.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

This PR introduces substantial architectural changes across multiple subsystems with high interconnection complexity:

  • High-density logic: Multi-phase initialization sequences in DataSubstrate with interdependent error handling and state transitions across storage, logging, metrics, and TxService layers.
  • Heterogeneous changes: Diverse implementations spanning build configuration, abstract interfaces, concrete backends (BigTable, DynamoDB, distributed service client), metrics framework, and lifecycle management—each requiring separate reasoning.
  • New public APIs: Significant new surface area (DataSubstrate, multiple store handlers, metrics registry/collector/meter, INI reader, fault injection) with complex initialization semantics.
  • Async callback infrastructure: Data store client closures and scanner implementations involve intricate concurrency patterns, state management, and error propagation across multiple layers.
  • Cross-module integration: Configuration loading, backend selection logic, and module interdependencies demand careful verification of initialization order and error recovery.

Areas requiring extra attention during review:

  • DataSubstrate initialization sequencing (core/src/data_substrate.cpp, log_init.cpp, storage_init.cpp, tx_service_init.cpp, metrics_init.cpp): Verify all phase dependencies, early-exit conditions on failures, and resource cleanup paths.
  • CMake configuration logic (CMakeLists.txt, build_*.cmake): Validate WITH_DATA_STORE/WITH_LOG_STATE synchronization, backend-specific compile definitions, and cross-module dependency resolution.
  • Store handler async callbacks (data_store_service_client_closure.cpp, bigtable_handler.cpp, dynamo_handler implementations): Confirm closure state management, result serialization/deserialization, thread-safety, and heap-switching correctness.
  • Metrics framework public API (meter.h, metrics.h, metrics_registry_impl.h, prometheus_collector.cc): Validate label generation, Cartesian product logic, metric key mapping, and singleton initialization.
  • Multi-backend scanner implementations (bigtable_scanner, dynamo_scanner, data_store_service_scanner): Verify partition iteration, result caching/merging, and per-partition state transitions.
  • INI configuration integration: Ensure configuration precedence rules (CLI flags, INI, defaults) are consistently applied across all modules.
  • Error propagation and early exits: Trace failure paths through multi-phase initialization to confirm no partial state leaks or orphaned resources.

Possibly related PRs

Suggested reviewers

  • MrGuin
  • thweetkomputer

Poem

🐰 From monolith to modular spire,
Data substrates climb ever higher,
Metrics, logs, and stores aligned,
Backends plugged with careful design,
Init phases dance in grand sequence—
A rabbit's refactored renaissance! 🎭

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is empty except for the template checklist reminders. All items remain unchecked and no actual description of changes, objectives, or context is provided. Provide a detailed description of the restructuring changes, explain the new modular architecture, document affected subsystems, and complete the required checklist items with issue/RFC links.
Docstring Coverage ⚠️ Warning Docstring coverage is 6.06% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Restructure for eloqconverge' is vague and generic, using non-descriptive terms that don't clearly convey the scope or purpose of the changeset. Replace with a more specific title that describes the main architectural changes, such as 'Refactor build system and add data substrate abstraction layer' or 'Restructure modular build with multiple storage backends support'.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch converge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@liunyl liunyl force-pushed the converge branch 2 times, most recently from 821dd5f to 85235fc Compare October 24, 2025 06:58
@liunyl liunyl force-pushed the converge branch 4 times, most recently from bfe5559 to afbec58 Compare November 13, 2025 07:56
@liunyl liunyl force-pushed the converge branch 2 times, most recently from 4f36bee to 8caa739 Compare November 24, 2025 09:23
Copy link
Collaborator

@githubzilla githubzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@githubzilla githubzilla marked this pull request as ready for review November 28, 2025 09:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (36)
store_handler/eloq_data_store_service/data_store_fault_inject.h-57-73 (1)

57-73: Static map in header causes ODR violation and memory waste.

Defining a static variable in a header file creates a separate copy in each translation unit that includes this header. This violates the One Definition Rule (ODR) and wastes memory.

Option 1 (C++17): Use inline:

-static std::unordered_map<std::string, FaultAction> action_name_to_enum_map{
+inline std::unordered_map<std::string, FaultAction> action_name_to_enum_map{

Option 2: Move definition to .cpp file and declare extern in header:

// In header:
extern std::unordered_map<std::string, FaultAction> action_name_to_enum_map;

// In .cpp:
std::unordered_map<std::string, FaultAction> action_name_to_enum_map{...};
store_handler/eloq_data_store_service/data_store_fault_inject.h-163-176 (1)

163-176: Race condition: returned pointer can be invalidated by concurrent operations.

Entry() returns a raw pointer to a map element after releasing the lock. If another thread calls InjectFault(..., "remove") or modifies the map, the pointer may become dangling, leading to use-after-free.

The same issue exists in TriggerAction(std::string fault_name) at lines 178-195, where the pointer is used after the lock is released.

Consider:

  1. Returning a copy of FaultEntry instead of a pointer
  2. Using std::shared_ptr<FaultEntry> for entries
  3. Holding the lock for the entire duration of action triggering (may impact concurrency)
store_handler/README.md-1-2 (1)

1-2: Expand documentation for the store_handler module.

This README is minimal (just a title and one-line description) for what appears to be a core module based on the PR context. Consider adding:

  • Architecture overview
  • Supported storage backends
  • Configuration examples
  • API usage examples
  • Integration guide
eloq_metrics/src/metrics_app.cc-40-51 (1)

40-51: Memory leak: dynamically allocated Executor objects are never deleted.

Raw pointers are stored in exec_vec but never freed. After Join(), the objects remain allocated.

Apply this diff to fix the memory leak:

-    std::vector<eloq_metrics_app::Executor *> exec_vec;
+    std::vector<std::unique_ptr<eloq_metrics_app::Executor>> exec_vec;
     for (int i = 0; i < 3; ++i)
     {
         std::string task_name = "Task_" + std::to_string(i);
-        exec_vec.push_back(new Executor(task_name,
+        exec_vec.push_back(std::make_unique<Executor>(task_name,
                                         i,
                                         cpu_set,
                                         metrics_registry.get(),
-                                        {{"task", task_name}}));
+                                        {{"task", task_name}}));
     }

Also add #include <memory> if not already transitively included.

Committable suggestion skipped: line range outside the PR's diff.

eloq_metrics/src/metrics_registry_impl.cpp-71-77 (1)

71-77: Use find() instead of operator[] for lookup.

collectors_[key] inserts a default-constructed (null) entry if the key doesn't exist, silently modifying the map. Using find() is safer and more explicit:

 void MetricsRegistryImpl::Collect(metrics::MetricKey key,
                                   const metrics::Value &val)
 {
-    auto collector = collectors_[key].get();
-    assert(collector);
+    auto it = collectors_.find(key);
+    assert(it != collectors_.end() && it->second);
+    auto collector = it->second.get();
     collector->Collect(val);
 }
eloq_metrics/CMakeLists.txt-11-14 (1)

11-14: Undefined variable ELOQ_METRICS_ROOT_DIR will cause empty paths.

Line 11 defines METRICS_ROOT_DIR, but lines 12-13 reference the undefined ELOQ_METRICS_ROOT_DIR. This will result in empty METRICS_SRC_DIR and METRICS_INCLUDE_DIR variables.

 set(METRICS_ROOT_DIR ${CMAKE_CURRENT_SOURCE_DIR})
-set(METRICS_SRC_DIR "${ELOQ_METRICS_ROOT_DIR}/src")
-set(METRICS_INCLUDE_DIR "${ELOQ_METRICS_ROOT_DIR}/include")
+set(METRICS_SRC_DIR "${METRICS_ROOT_DIR}/src")
+set(METRICS_INCLUDE_DIR "${METRICS_ROOT_DIR}/include")
eloq_metrics/src/dummy_executor.cc-43-54 (1)

43-54: CPU affinity is set after thread has already started - race condition.

The constructor delegates to Executor(task_name, metrics_registry, common_labels) which starts the thread at line 63-64. After delegation completes, lines 50-53 attempt to set CPU affinity, but the thread is already running. This is a race condition and the affinity may not take effect as intended.

Consider restructuring to set affinity before starting the thread, or use a two-phase initialization pattern.

eloq_metrics/src/dummy_executor.cc-77-99 (1)

77-99: Busy loop without sleep will consume 100% CPU.

The while (true) loop has no sleep or yield, causing the thread to spin at full CPU utilization. This is problematic for a benchmark/demo executor. Consider adding a small sleep interval.

     while (true)
     {
         metrics::Value val = metrics::Value(max_range(rng));
         meter_->Collect(dummy_executor_hist_metric, val);
         auto tick = boost::posix_time::second_clock::local_time();
         boost::posix_time::time_duration diff = tick - start;
+        boost::this_thread::sleep_for(boost::chrono::milliseconds(1));
         if (diff.total_seconds() > 0 && diff.total_seconds() % 5 == 0)
eloq_metrics/src/dummy_executor.cc-106-108 (1)

106-108: Destructor doesn't join the thread, risking undefined behavior.

If the Executor is destroyed without calling Join(), the thread may still be running and accessing member variables that have been destroyed.

 Executor::~Executor()
 {
+    if (inner_thread_ && inner_thread_->joinable())
+    {
+        inner_thread_->join();
+    }
 }
eloq_metrics/src/dummy_executor.cc-31-41 (1)

31-41: Move constructor and assignment operator are incomplete.

The move constructor only moves inner_thread_ but not task_name_, meter_, or running_state_. The move assignment copies task_name_ but doesn't move meter_ or running_state_.

 Executor::Executor(eloq_metrics_app::Executor &&executor) noexcept
-    : inner_thread_(std::move(executor.inner_thread_))
+    : inner_thread_(std::move(executor.inner_thread_)),
+      task_name_(std::move(executor.task_name_)),
+      meter_(std::move(executor.meter_)),
+      running_state_(executor.running_state_)
 {
 }

 Executor &Executor::operator=(Executor &&executor_) noexcept
 {
     this->inner_thread_ = std::move(executor_.inner_thread_);
-    this->task_name_ = executor_.task_name_;
+    this->task_name_ = std::move(executor_.task_name_);
+    this->meter_ = std::move(executor_.meter_);
+    this->running_state_ = executor_.running_state_;
     return *this;
 }
core/src/log_init.cpp-141-148 (1)

141-148: std::stoi can throw on malformed port values.

If the config contains a non-numeric value after the colon, std::stoi will throw std::invalid_argument or std::out_of_range. Consider adding exception handling or validation.

             size_t c_idx = token.find_first_of(':');
             if (c_idx != std::string::npos)
             {
                 txlog_ips.emplace_back(token.substr(0, c_idx));
-                uint16_t pt = std::stoi(token.substr(c_idx + 1));
-                txlog_ports.emplace_back(pt);
+                try
+                {
+                    uint16_t pt = std::stoi(token.substr(c_idx + 1));
+                    txlog_ports.emplace_back(pt);
+                }
+                catch (const std::exception &e)
+                {
+                    LOG(ERROR) << "Invalid port in txlog_service_list: " << token;
+                    return false;
+                }
             }
core/src/log_init.cpp-387-399 (1)

387-399: Invalid time format logs error but doesn't fail initialization.

When std::get_time fails to parse the log purger schedule, the code logs an error but continues with uninitialized log_purger_tm values. This could lead to unexpected purger scheduling behavior.

             if (iss.fail())
             {
-                LOG(ERROR) << "Invalid time format." << std::endl;
+                LOG(ERROR) << "Invalid time format for txlog_rocksdb_cloud_log_purger_schedule: "
+                           << log_purger_schedule;
+                return false;
             }
             else
             {
eloq_metrics/benchmark/CMakeLists.txt-29-32 (1)

29-32: Test command references wrong executable.

The test is named eloq-metrics-bench but the COMMAND specifies mono-bench, which doesn't match any executable defined in this file. This will cause test execution to fail.

 add_test(
         NAME eloq-metrics-bench
-        COMMAND mono-bench
+        COMMAND eloq-metrics-bench
 )
core/include/glog_error_logging.h-80-87 (1)

80-87: Add error handling for readlink failure.

readlink returns -1 on failure, which would result in undefined behavior when constructing the string with a negative length. Additionally, this approach is Linux-specific and won't work on macOS or other platforms.

         // Get the absolute path of the bin directory
         char bin_path[PATH_MAX];
         ssize_t len = readlink("/proc/self/exe", bin_path, PATH_MAX);
+        if (len == -1 || len >= PATH_MAX)
+        {
+            // Fallback to current directory if readlink fails
+            FLAGS_log_dir = "./logs";
+        }
+        else
+        {
-        std::filesystem::path fullPath(std::string(bin_path, len));
-        std::filesystem::path dir_path =
-            fullPath.parent_path().parent_path();
-        FLAGS_log_dir = dir_path.string() + "/logs";
+            std::filesystem::path fullPath(std::string(bin_path, len));
+            std::filesystem::path dir_path =
+                fullPath.parent_path().parent_path();
+            FLAGS_log_dir = dir_path.string() + "/logs";
+        }
core/src/data_substrate.cpp-443-448 (1)

443-448: Logic appears inverted for enable_wal configuration.

The ternary condition checks if the flag is default, but then uses FLAGS_enable_wal in both branches instead of reading from config_reader when the flag is default. This means the INI config value is never used.

     core_config_.enable_wal =
         !core_config_.bootstrap &&
-        (CheckCommandLineFlagIsDefault("enable_wal")
-             ? FLAGS_enable_wal
-             : config_reader.GetBoolean(
-                   "local", "enable_wal", FLAGS_enable_wal));
+        (!CheckCommandLineFlagIsDefault("enable_wal")
+             ? FLAGS_enable_wal
+             : config_reader.GetBoolean(
+                   "local", "enable_wal", FLAGS_enable_wal));
core/src/data_substrate.cpp-216-216 (1)

216-216: Method name "Destory" is misspelled in the Sequences class definition—it should be "Destroy".

The typo exists not only in the call at line 216 but also in the method definition itself (tx_service/include/sequences/sequences.h:92). Both the definition and all call sites must be updated consistently:

-    txservice::Sequences::Destory();
+    txservice::Sequences::Destroy();

Additionally, update the method definition in tx_service/include/sequences/sequences.h:92 from Destory() to Destroy().

build_tx_service.cmake-212-280 (1)

212-280: Remove large commented-out code block.

This 68-line commented block duplicates the active FORK_HM_PROCESS logic below it. Keeping dead code increases maintenance burden and causes confusion.

Delete lines 212-280 entirely, as the active implementation exists in lines 282-354.

eloq_metrics/include/prometheus_collector.h-42-50 (1)

42-50: static const vector in header causes ODR violation and code bloat.

Each translation unit that includes this header will get its own copy of PROMETHEUS_HISTOGRAM_DEF_BUCKETS. This violates ODR and wastes memory.

Move to an inline variable (C++17) or define in the .cc file:

-static const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS = {
+inline const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS = {
     1e+1, 2e+1, 4e+1, 6e+1, 8e+1,  // <  100 us
     ...
 };

Or declare extern in header and define in .cc:

// In header:
extern const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS;
// In .cc:
const std::vector<double> PROMETHEUS_HISTOGRAM_DEF_BUCKETS = {...};
store_handler/data_store_service_scanner.cpp-102-118 (1)

102-118: ScanClose not called when partition has no more data.

When last_batch_size_ < batch_size, the partition is exhausted but ScanClose is never sent to the server. However, IncrementInFlightFetchCount() is called unconditionally. This causes asymmetric increment/decrement when the condition on line 107 is false.

Consider restructuring to either:

  1. Only increment the counter when actually sending a close request
  2. Or always send close and let the server handle the no-op case
 bool SinglePartitionScanner::ScanClose()
 {
     ...
-    scanner_->IncrementInFlightFetchCount();
-
     assert(last_batch_size_ <= scanner_->GetBatchSize());
     // If parition scanner is not run out of data,
     // we need to close the scanner
     if (last_batch_size_ == scanner_->GetBatchSize())
     {
+        scanner_->IncrementInFlightFetchCount();
         client->ScanClose(scanner_->GetTableName(),
                           partition_id_,
                           data_shard_id_,
                           session_id_,
                           this,
                           ProcessScanCloseResult);
     }

     return true;
 }
build_tx_service.cmake-291-303 (1)

291-303: BRAFT_INCLUDE_PATH used before it's defined.

HOST_MANAGER_INCLUDE_DIR references BRAFT_INCLUDE_PATH on line 293, but find_path(BRAFT_INCLUDE_PATH ...) is called on line 299. This will result in an empty include path.

Move the find_path and find_library calls before the HOST_MANAGER_INCLUDE_DIR assignment:

 if (FORK_HM_PROCESS)
+    find_path(BRAFT_INCLUDE_PATH NAMES braft/raft.h)
+    find_library(BRAFT_LIB NAMES braft)
+    if ((NOT BRAFT_INCLUDE_PATH) OR (NOT BRAFT_LIB))
+        message (FATAL_ERROR "Fail to find braft")
+    endif()
+
     SET (HOST_MANAGER_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/tx_service/raft_host_manager)
     set(HOST_MANAGER_INCLUDE_DIR
         ${HOST_MANAGER_SOURCE_DIR}/include
         ...
         ${BRAFT_INCLUDE_PATH}
         ...)
-
-    find_path(BRAFT_INCLUDE_PATH NAMES braft/raft.h)
-    find_library(BRAFT_LIB NAMES braft)
-    if ((NOT BRAFT_INCLUDE_PATH) OR (NOT BRAFT_LIB))
-        message (FATAL_ERROR "Fail to find braft")
-    endif()

Committable suggestion skipped: line range outside the PR's diff.

store_handler/data_store_service_scanner.cpp-76-80 (1)

76-80: Search conditions are built but not used.

The constructor (lines 250-258) builds search_conditions_ from pushdown_cond, but ScanNext() is called with nullptr instead of the search conditions. This means pushdown predicates are never applied.

     client->ScanNext(
         scanner_->GetTableName(),
         partition_id_,
         data_shard_id_,
         last_key_,
         scanner_->GetEndKey(),
         session_id_,
         true,
         // only set inclusive_start for the first batch
         first_batch_fetched_ ? false : scanner_->IsInclusiveStart(),
         scanner_->IsInclusiveEnd(),
         scanner_->IsScanForward(),
         scanner_->GetBatchSize(),
-        // scanner_->SearchConditions(),
-        nullptr,
+        scanner_->SearchConditions(),
         this,
         ProcessScanNextResult);

Ensure SearchConditions() accessor exists in DataStoreServiceScanner base class.

Committable suggestion skipped: line range outside the PR's diff.

store_handler/data_store_service_client_closure.cpp-372-379 (1)

372-379: Same issue: std::abort() on deserialization failure.

This is the same pattern as in FetchRecordCallback. Consider consistent error handling that returns an error code rather than crashing.

store_handler/data_store_service_client_closure.cpp-1231-1239 (1)

1231-1239: Another std::abort() on deserialization failure.

Consistent with earlier findings, this should return an error rather than terminating the process.

eloq_metrics/include/metrics_manager.h-121-121 (1)

121-121: std::stoi may throw on invalid input.

If ELOQ_METRICS_PORT contains a non-numeric value, std::stoi will throw std::invalid_argument or std::out_of_range, crashing initialization.

-        auto bind_port = std::stoi(bind_port_str);
+        int bind_port;
+        try
+        {
+            bind_port = std::stoi(bind_port_str);
+        }
+        catch (const std::exception &e)
+        {
+#ifdef WITH_GLOG
+            LOG(ERROR) << "Invalid port value: " << bind_port_str;
+#endif
+            mgr_init_err_ = MetricsMgrErrors::CollectorOpenErr;
+            return;
+        }
store_handler/data_store_service_client_closure.cpp-196-203 (1)

196-203: Calling std::abort() on deserialization failure may be too aggressive.

If DeserializeTxRecordStr fails, the process terminates immediately. Consider returning an error code to allow graceful error handling and recovery.

             if (!DataStoreServiceClient::DeserializeTxRecordStr(
                     val, is_deleted, offset))
             {
                 LOG(ERROR) << "====fetch record===decode error==" << " key: "
                            << read_closure->Key()
                            << " status: " << (int) fetch_cc->rec_status_;
-                std::abort();
+                fetch_cc->SetFinish(
+                    static_cast<int>(txservice::CcErrorCode::DATA_STORE_ERR));
+                return;
             }
store_handler/dynamo_scanner.h-132-144 (1)

132-144: Potential dangling pointer: start_key passed by reference but stored as pointer.

HashPartitionDynamoScanner takes start_key by const reference but the base class stores &start_key as a pointer. If the caller's start_key object goes out of scope before the scanner, this becomes a dangling pointer. Consider storing a copy or documenting the lifetime requirement clearly.

Either store a copy of the key:

-        const EloqKV::EloqKey *start_key_;
+        std::unique_ptr<EloqKV::EloqKey> start_key_;

Or document the lifetime requirement if the caller guarantees validity:

    // NOTE: start_key must remain valid for the lifetime of this scanner
    HashPartitionDynamoScanner(
        ...
        const EloqKV::EloqKey &start_key,  // must outlive this scanner
        ...
store_handler/dynamo_handler_typed.cpp-159-193 (1)

159-193: Multiple unimplemented methods with assert(false) stubs.

NewTxRecord, NegInfKey, PosInfKey, and ColumnsToScanWhenBuildSk are all stubs. These are likely called at runtime based on the scanner implementations. Either implement them or document that this handler doesn't support these operations by throwing descriptive exceptions.

Would you like me to help identify where these methods are called from to understand the expected behavior?

store_handler/dynamo_handler_typed.cpp-36-53 (1)

36-53: Stub implementations with assert(false) will cause runtime failures.

These methods (SetSequenceTableSchema, GetSequenceTableSchema, GetIndexes) unconditionally trigger assert(false). In debug builds, this crashes; in release builds, behavior is undefined. If these are intended to be overridden by derived classes, consider making them pure virtual in the base class. Otherwise, throw a proper exception with a meaningful message.

 void DynamoHandlerTyped::SetSequenceTableSchema(
     const txservice::TableSchema *tbl_schema)
 {
-    assert(false);
+    throw std::logic_error("SetSequenceTableSchema not implemented for DynamoHandlerTyped");
 }
 const txservice::TableSchema *DynamoHandlerTyped::GetSequenceTableSchema()
 {
-    assert(false);
+    throw std::logic_error("GetSequenceTableSchema not implemented for DynamoHandlerTyped");
+    return nullptr;
 }
CMakeLists.txt-215-219 (1)

215-219: Proto compilation at configure time may cause stale artifacts.

Using execute_process() runs protoc during CMake configure, not during build. If .proto files change, the generated files won't update until CMake is re-run.

Consider using add_custom_command() for build-time generation:

add_custom_command(
    OUTPUT ${DS_PROTO_DIR}/ds_request.pb.cc ${DS_PROTO_DIR}/ds_request.pb.h
    COMMAND protoc --cpp_out=${DS_PROTO_DIR} -I${DS_PROTO_DIR} ${DS_PROTO_DIR}/ds_request.proto
    DEPENDS ${DS_PROTO_DIR}/ds_request.proto
    COMMENT "Generating protobuf files for ds_request"
)
store_handler/dynamo_handler.h-584-588 (1)

584-588: Copy constructor doesn't copy ttl_set member.

The copy constructor copies kv_table_name_ and kv_index_names_ but omits ttl_set, which will default to false in copied objects regardless of the source value.

     DynamoCatalogInfo(const DynamoCatalogInfo &kv_info)
     {
         kv_table_name_ = kv_info.kv_table_name_;
         kv_index_names_ = kv_info.kv_index_names_;
+        ttl_set = kv_info.ttl_set;
     }
store_handler/dynamo_handler.h-388-406 (1)

388-406: Member old_table_schema_ is declared but never initialized in constructors.

UpsertTableData::old_table_schema_ (line 391) is declared but none of the three constructors initialize it, leaving it with indeterminate value.

Initialize the member in all constructors:

         UpsertTableData(
             DynamoHandler *dynamo_hd,
             const txservice::TableName *table_name,
             const txservice::TableSchema *old_schema,
             const txservice::TableSchema *schema,
             txservice::OperationType op_type,
             uint64_t write_time,
             std::shared_ptr<void> defer_unpin,
             txservice::CcHandlerResult<txservice::Void> *hd_res,
             const txservice::AlterTableInfo *alter_table_info = nullptr)
             : hd_res_(hd_res),
               dynamo_hd_(dynamo_hd),
               table_name_(table_name),
+              old_table_schema_(old_schema),
               table_schema_(schema),

And similarly for the copy constructors, copying from other.old_table_schema_.

Committable suggestion skipped: line range outside the PR's diff.

store_handler/data_store_service_scanner.h-173-177 (1)

173-177: Dangling reference risk: storing reference to external string.

table_name_ is stored as const std::string &, creating a lifetime dependency on the caller's string. If the source string is destroyed while the scanner is active, this becomes a dangling reference.

Consider storing by value to avoid lifetime issues:

 protected:
     // initialized parameters
     DataStoreServiceClient *client_;
-    const std::string &table_name_;
+    std::string table_name_;
     std::string start_key_;

And update the constructor initialization:

-          table_name_(table_name),
+          table_name_(table_name),  // Now copies the string
store_handler/bigtable_handler.cpp-1926-1927 (1)

1926-1927: Potential data race in async callback modifying futures vector.

The callback at line 1926-1927 modifies the futures vector (swap and pop_back) while holding flight_bulks_mux_. However, the futures vector is passed by reference from the caller (PutAll, PutArchivesAll) and new elements are added at line 1932 after releasing the lock. This creates a race between the callback modifying the vector and the main thread adding to it.

The idx = futures.size() capture at line 1892 becomes invalid after the swap/pop_back operations from other callbacks.

Consider using an index-stable container or tracking completed futures separately rather than modifying the vector in-place.

store_handler/bigtable_handler.cpp-1863-1867 (1)

1863-1867: Return value of PutArchivesAll is ignored, potentially masking failures.

The variable ret is assigned the result of PutArchivesAll but is never checked or used. If archiving fails, the function still returns true, which could lead to data inconsistency.

Apply this diff to fix the issue:

-        bool ret = PutArchivesAll(node_group,
+        if (!PutArchivesAll(node_group,
                                   table_name,
                                   table_schema->GetKVCatalogInfo(),
-                                  archive_vec);
+                                  archive_vec))
+        {
+            return false;
+        }
     }
store_handler/bigtable_handler.h-35-35 (1)

35-35: Avoid using namespace in header files.

using namespace MyEloq; in a header file pollutes the global namespace for all files that include this header, potentially causing name collisions and making code harder to reason about.

-using namespace MyEloq;
 namespace EloqDS
 {
+using MyEloq::EloqKey;
+using MyEloq::EloqRecord;
+// Add other specific using declarations as needed
store_handler/bigtable_handler.h-350-371 (1)

350-371: strtok is not thread-safe and const_cast violates const correctness.

The Deserialize methods use const_cast to remove const from payload_ and then use strtok, which modifies the string in place and uses static internal state, making it not thread-safe.

Consider using a safer alternative:

         void Deserialize(std::vector<long> &nums) const
         {
-            char *payload = const_cast<char *>(payload_.c_str());
-            char *token = strtok(payload, ",");
-            while (token)
+            std::string_view sv(payload_);
+            size_t pos = 0;
+            while (pos < sv.size())
             {
-                long num = strtol(token, nullptr, 10);
-                assert(errno != ERANGE);
-                nums.push_back(num);
-                token = strtok(nullptr, ",");
+                size_t next = sv.find(',', pos);
+                if (next == std::string_view::npos) next = sv.size();
+                long num = std::strtol(sv.data() + pos, nullptr, 10);
+                nums.push_back(num);
+                pos = next + 1;
             }
         }
🟡 Minor comments (35)
store_handler/eloq_data_store_service/RocksDB_Configuration_Flags.md-82-82 (1)

82-82: Fix typo: "dialy" should be "daily". Line 82 contains a spelling error in the flag name rocksdb_dialy_offpeak_time_utc.

-| `rocksdb_dialy_offpeak_time_utc` | No | Local time zone's 00:00-05:00 converted to UTC | Time range in format "HH:MM-HH:MM" | Daily off-peak time in UTC for scheduling compactions |
+| `rocksdb_daily_offpeak_time_utc` | No | Local time zone's 00:00-05:00 converted to UTC | Time range in format "HH:MM-HH:MM" | Daily off-peak time in UTC for scheduling compactions |
store_handler/eloq_data_store_service/data_store_fault_inject.h-106-106 (1)

106-106: assert silently passes in release builds if parameter format is invalid.

If paras contains a malformed key-value pair without =, assert(pos3 != std::string::npos) will crash in debug builds but continue with undefined behavior (using npos as index) in release builds.

Consider using explicit error handling consistent with line 94-96:

             std::string sbs = paras.substr(pos1, pos2 - pos1);
             size_t pos3 = sbs.find('=');
-            assert(pos3 != std::string::npos);
+            if (pos3 == std::string::npos)
+            {
+                LOG(ERROR) << "Error parameters for fault inject: missing '=' in name="
+                           << fault_name << ", segment=" << sbs;
+                abort();
+            }
store_handler/eloq_data_store_service/data_store_fault_inject.cpp-100-109 (1)

100-109: std::stoi can throw if para contains invalid input.

If the parameter string is not a valid integer (e.g., "abc" or overflow), std::stoi throws std::invalid_argument or std::out_of_range, causing an unhandled exception and process termination.

Consider wrapping with try-catch or validating input:

         case FaultAction::SLEEP:
         {
             int secs = 1;
             if (!para.empty())
             {
-                secs = std::stoi(para);
+                try
+                {
+                    secs = std::stoi(para);
+                }
+                catch (const std::exception &e)
+                {
+                    LOG(ERROR) << "Invalid sleep parameter: " << para
+                               << ", using default 1s";
+                }
             }

             sleep(secs);
             break;
         }
store_handler/.clang-format-9-9 (1)

9-9: Fix typo in comment.

"sepecified" should be "specified".

-# When an option is not sepecified, it defaults to Google's style (this can be
+# When an option is not specified, it defaults to Google's style (this can be
store_handler/.clang-format-6-6 (1)

6-6: Fix typo in comment.

"Advacned" should be "Advanced".

-# compiled by the Advacned Technology department.
+# compiled by the Advanced Technology department.
eloq_metrics/.clang-format-6-6 (1)

6-6: Fix typo in comment.

"Advacned" should be "Advanced".

-# compiled by the Advacned Technology department.
+# compiled by the Advanced Technology department.
eloq_metrics/.clang-format-9-9 (1)

9-9: Fix typo in comment.

"sepecified" should be "specified".

-# When an option is not sepecified, it defaults to Google's style (this can be
+# When an option is not specified, it defaults to Google's style (this can be
store_handler/eloq_data_store_service/.gitignore-58-60 (1)

58-60: Remove duplicate "bin" entry.

The pattern bin appears on both Line 58 and Line 60.

 bin
 bld
-bin
 cmake/
eloq_metrics/README.md-94-102 (1)

94-102: Complete the cmake command.

Lines 94-102 show an incomplete cmake command - the initial cmake -E make_directory build && cmake -S . -B build \ is missing.

 cd eloq-metrics
+cmake -E make_directory build && \
+cmake -S . -B build \
   -DCMAKE_BUILD_TYPE=Release \
   -DCMAKE_CXX_FLAGS="${CXXFLAGS} -fPIC" \
   -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake \
   -DENABLE_BENCHMARK=ON \
   -DWITH_ABSEIL=ON
 
 cd build && cmake --build .  -j4
store_handler/eloq_data_store_service/build_eloq_store.cmake-28-28 (1)

28-28: Optional zstd is linked unconditionally.

find_library(ZSTD_LIBRARY zstd) doesn't use REQUIRED, making zstd optional. However, line 110 links against ${ZSTD_LIBRARY} unconditionally. If zstd isn't found, this will be empty or NOTFOUND.

Either make it required or link conditionally:

 find_library(ZSTD_LIBRARY zstd)
+if(ZSTD_LIBRARY)
+    set(ZSTD_LINK_LIB ${ZSTD_LIBRARY})
+else()
+    set(ZSTD_LINK_LIB "")
+    message(STATUS "zstd not found, compression support disabled")
+endif()

Then on line 110, use ${ZSTD_LINK_LIB} instead of ${ZSTD_LIBRARY}.

Committable suggestion skipped: line range outside the PR's diff.

log_service-1-1 (1)

1-1: Improve submodule initialization documentation; correct verification command syntax.

The .gitmodules configuration for log_service is correctly set up. However:

  1. The verification command in your comment is syntactically incorrect. git ls-remote does not accept commit hashes as arguments—it only works with refs (branches/tags). To verify a commit exists, use git ls-remote --heads --tags git@github.com:eloqdata/log_service.git and search the output, or clone the repository and check with git cat-file -t <hash>.

  2. The SSH URL (git@github.com:...) may create friction for developers who prefer HTTPS. Consider documenting both protocols or using HTTPS by default.

  3. Add explicit submodule initialization instructions to README.md or a setup guide (e.g., git clone --recursive or git submodule update --init --recursive). Currently, the README only mentions that transaction logs are stored in the Log Service but lacks setup steps.

eloq_metrics/tests/metrics_collector_test.cc-111-137 (1)

111-137: Test logic issue: shared collector state across WHEN blocks.

The two WHEN blocks at lines 118 and 127 share the same collector instance. When tests run, the first WHEN opens the collector, and the second WHEN's behavior depends on whether both blocks execute independently (fresh collector) or sequentially (already opened). Catch2's BDD macros create separate execution paths, but this structure is confusing and may not test the intended scenario.

Consider restructuring to make each scenario independent:

-SCENARIO("Metrics collector call Open several times",
-         "[MCCallOpenSeveralTimes]")
+SCENARIO("Metrics collector Open returns true on first call",
+         "[MCOpenOnce]")
 {
-    INFO("Run unit test MCCallOpenSeveralTimes");
-    WHEN("init prometheus collector")
+    GIVEN("A fresh prometheus collector")
     {
         metrics::PrometheusCollector collector{"0.0.0.0", 18082};
         WHEN("The open method is called")
         {
             auto open_one = collector.Open();
             THEN("open one will return success")
             {
                 REQUIRE(open_one == true);
             }
         }
-
-        WHEN("The open method is call twice")
-        {
-            collector.Open();
-            auto open_two = collector.Open();
-            THEN("open two will return false")
-            {
-                REQUIRE(open_two == false);
-            }
-        }
     }
 }
+
+SCENARIO("Metrics collector Open returns false on second call",
+         "[MCOpenTwice]")
+{
+    GIVEN("A prometheus collector that has been opened")
+    {
+        metrics::PrometheusCollector collector{"0.0.0.0", 18083};
+        collector.Open();
+        WHEN("Open is called again")
+        {
+            auto open_two = collector.Open();
+            THEN("it returns false")
+            {
+                REQUIRE(open_two == false);
+            }
+        }
+    }
+}
eloq_metrics/tests/metrics_http_service_test.cc-83-90 (1)

83-90: Reorder assertions to avoid potential UB.

Line 86 constructs a std::string from mgr_result.not_ok_ before asserting it's non-null on line 88. If the test assumption is violated and not_ok_ is null, this causes undefined behavior.

         THEN("port conflict MetricsMgr init failure")
         {
             auto mgr_result = metrics::MetricsMgr::GetMetricMgrInstance();
-            std::string err_msg(mgr_result.not_ok_);
-            INFO("MetricsMgrResult ErrMsg: " << err_msg);
             REQUIRE(mgr_result.not_ok_ != nullptr);
+            std::string err_msg(mgr_result.not_ok_);
+            INFO("MetricsMgrResult ErrMsg: " << err_msg);
             REQUIRE(mgr_result.mgr_ == nullptr);
         }
eloq_metrics/tests/metrics_manager_test.cc-41-52 (1)

41-52: Add null-checks before dereferencing mgr_.

The test asserts mgr_res_1.not_ok_ == mgr_res_2.not_ok_ but does not verify that mgr_ pointers are non-null before dereferencing them on lines 49-50. If both results have not_ok_ != nullptr, dereferencing mgr_ would cause undefined behavior.

         THEN("Two MetricsMgr instance equal")
         {
             INFO("check MetricsMgrResult");
             REQUIRE(mgr_res_1.not_ok_ == mgr_res_2.not_ok_);
+            REQUIRE(mgr_res_1.mgr_ != nullptr);
+            REQUIRE(mgr_res_2.mgr_ != nullptr);
             metrics::MetricsMgr &mgr_1 = *mgr_res_1.mgr_;
             metrics::MetricsMgr &mgr_2 = *mgr_res_2.mgr_;
             auto mgr_instance_equals = mgr_1 == mgr_2;
             REQUIRE(mgr_instance_equals);
         }
eloq_metrics/tests/metrics_http_service_test.cc-46-64 (1)

46-64: Ensure curl cleanup occurs unconditionally.

curl_easy_cleanup(curl) is only called inside the THEN block. If the REQUIRE on line 57 or 61 fails, the curl handle leaks. Also, curl_global_cleanup() is never called.

Consider restructuring to ensure cleanup:

         curl_easy_setopt(curl, CURLOPT_WRITEDATA, &rsp_buffer);
         CURLcode curl_response = curl_easy_perform(curl);
+        long http_status = 0;
+        curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_status);
+        curl_easy_cleanup(curl);
+        curl_global_cleanup();
         THEN("access to eloq_metrics http service return 200.")
         {
             REQUIRE(curl_response == CURLE_OK);
-            long http_status;
-            curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_status);
             INFO("/eloq_metrics response \n" << rsp_buffer);
             REQUIRE(http_status == 200);
-            curl_easy_cleanup(curl);
         }
eloq_metrics/benchmark/tx_service_metrics_bench.cpp-31-31 (1)

31-31: Typo: "guage" should be "gauge".

-    metrics::Name name{"guage"};
+    metrics::Name name{"gauge"};
eloq_metrics/include/metrics_collector.h-24-26 (1)

24-26: Use angle brackets for standard library includes.

Line 24 uses "memory" which searches the local directory first. Standard library headers should use angle brackets.

-#include "memory"
+#include <memory>
 #include "metrics.h"
 #include "prometheus/client_metric.h"
eloq_metrics/benchmark/tx_service_metrics_bench.cpp-19-25 (1)

19-25: Unused variable time_start will cause compiler warning.

The variable time_start is assigned but never used. Either use it (e.g., with benchmark::DoNotOptimize) or remove the assignment.

 static void BM_get_time_point(benchmark::State &state)
 {
     while (state.KeepRunning())
     {
-        auto time_start = metrics::Clock::now();
+        benchmark::DoNotOptimize(metrics::Clock::now());
     }
 }
eloq_metrics/CMakeLists.txt-43-49 (1)

43-49: Undefined DEF_INSTALL_CMAKEDIR and loop variable mismatch.

  1. Line 43: DEF_INSTALL_CMAKEDIR is not defined, so INSTALL_CMAKEDIR will be empty.
  2. Line 45: The loop uses "include" but the variable is named INSTALL_INCLUDEDIR (with DIR suffix), causing a mismatch in the INSTALL_${p}DIR pattern.
-set(INSTALL_CMAKEDIR ${DEF_INSTALL_CMAKEDIR} CACHE PATH "Installation directory for CMake files")
+set(DEF_INSTALL_CMAKEDIR "${CMAKE_INSTALL_LIBDIR}/cmake/${PROJECT_NAME}" CACHE PATH "Default CMake install directory")
+set(INSTALL_CMAKEDIR ${DEF_INSTALL_CMAKEDIR} CACHE PATH "Installation directory for CMake files")

-foreach(p LIB BIN include CMAKE)
+foreach(p LIB BIN INCLUDE CMAKE)
core/src/metrics_init.cpp-14-14 (1)

14-14: Typo in comment: "TDOO" should be "TODO".

-#include "mysql_metrics.h"  //TDOO(liunyl): move this file to correct location
+#include "mysql_metrics.h"  //TODO(liunyl): move this file to correct location
eloq_metrics/benchmark/redis_metrics_bench.cpp-325-331 (1)

325-331: Unused variable may be optimized away.

The benchmark measures time to obtain a time point, but time_start is unused. The compiler may optimize away the call entirely, making the benchmark meaningless.

Apply this diff to prevent optimization:

 static void BM_get_time_point(benchmark::State &state)
 {
     while (state.KeepRunning())
     {
         auto time_start = metrics::Clock::now();
+        benchmark::DoNotOptimize(time_start);
     }
 }
eloq_metrics/benchmark/redis_metrics_bench.cpp-19-123 (1)

19-123: Duplicate entries in command_types vector.

"dump" and "restore" appear twice in the list (lines 28-29 and 122-123). This will cause duplicate metric registrations in the benchmark.

Remove the duplicate entries at lines 122-123:

                                                     "sunionstore",
                                                     "sort",
-                                                    "sort_ro",
-                                                    "dump",
-                                                    "restore"};
+                                                    "sort_ro"};
store_handler/eloq_data_store_service/CMakeLists.txt-201-223 (1)

201-223: Duplicate AWS Kinesis discovery blocks.

The find_path and find_library for AWS Kinesis are duplicated (lines 201-211 and 213-223). Remove the duplicate block.

         find_library(AWS_KINESIS_LIB aws-cpp-sdk-kinesis)
         if((NOT AWS_KINESIS_LIB))
           message(FATAL_ERROR "Fail to find aws-cpp-sdk-kinesis lib")
         endif()
         message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")

-        find_path(AWS_KINESIS_INCLUDE_PATH aws/kinesis/KinesisClient.h)
-        if((NOT AWS_KINESIS_INCLUDE_PATH))
-          message(FATAL_ERROR "Fail to find aws/kinesis include path")
-        endif()
-        message(STATUS "aws/kinesis include path: ${AWS_KINESIS_INCLUDE_PATH}")
-
-        find_library(AWS_KINESIS_LIB aws-cpp-sdk-kinesis)
-        if((NOT AWS_KINESIS_LIB))
-          message(FATAL_ERROR "Fail to find aws-cpp-sdk-kinesis lib")
-        endif()
-        message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")
-
         find_path(AWS_S3_INCLUDE_PATH aws/s3/S3Client.h)
store_handler/eloq_data_store_service/data_store.h-106-110 (1)

106-110: Typo in documentation comment.

The parameter name in the comment says "drop_talbe_req" but the actual parameter is drop_table_req.

     /**
      * @brief Drop kv table.
-     * @param drop_talbe_req The pointer of the request.
+     * @param drop_table_req The pointer of the request.
      */
     virtual void DropTable(DropTableRequest *drop_table_req) = 0;
core/src/data_substrate.cpp-579-584 (1)

579-584: Incorrect error message text.

The error message mentions enable_io_uring twice, but should reference raft_log_async_fsync as the condition being validated.

     if (raft_log_async_fsync && !enable_io_uring)
     {
-        LOG(ERROR) << "Invalid config: when set `enable_io_uring`, "
-                      "should also set `enable_io_uring`.";
+        LOG(ERROR) << "Invalid config: when set `raft_log_async_fsync`, "
+                      "should also set `enable_io_uring`.";
         return false;
     }
eloq_metrics/src/prometheus_collector.cc-241-245 (1)

241-245: Logging prints pointer address instead of enum value.

&metric_type logs the address of the enum variable, not its value.

 #ifdef WITH_GLOG
-        LOG(ERROR) << "ERR: For now not support metrics_type " << &metric_type;
+        LOG(ERROR) << "ERR: For now not support metrics_type " << static_cast<int>(metric_type);
 #endif
-        ;
+        break;

Also, the trailing ; after #endif is unnecessary—replace with break; for consistency.

store_handler/data_store_service_scanner.cpp-181-189 (1)

181-189: Incorrect error message in ProcessScanCloseResult.

The error message says "Failed to fetch next batch" but this is the close result callback.

     if (result.error_code() != remote::DataStoreError::NO_ERROR)
     {
-        LOG(ERROR) << "Failed to fetch next batch: " << result.error_msg();
+        LOG(ERROR) << "Failed to close scan: " << result.error_msg();
         sp_scanner->scanner_->SetError(
build_eloq_log_service.cmake-51-51 (1)

51-51: find_library does not support VERSION argument.

The VERSION parameter is not a valid argument for find_library(). This line may silently ignore the version constraint.

-    find_library(GLOG_LIB NAMES glog VERSION ">=0.6.0" REQUIRED)
+    find_library(GLOG_LIB NAMES glog REQUIRED)

Consider using find_package(glog 0.6.0 REQUIRED) if version checking is needed.

eloq_metrics/include/metrics_manager.h-128-128 (1)

128-128: Unused variable registry.

The prometheus::Registry is created but never passed to PrometheusCollector or used elsewhere.

Either remove the unused variable or pass it to the collector if intended:

-        auto registry = std::make_shared<prometheus::Registry>();
-
         metrics_collector_ =
-            std::make_shared<PrometheusCollector>(bind_addr, bind_port);
+            std::make_shared<PrometheusCollector>(bind_addr, bind_port, registry);

Committable suggestion skipped: line range outside the PR's diff.

build_eloq_log_service.cmake-134-157 (1)

134-157: Duplicate AWS Kinesis detection code.

The AWS Kinesis include path and library detection (lines 134-144) is duplicated exactly at lines 147-157. This appears to be a copy-paste error.

Remove the duplicate block:

       message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")
 
-
-      find_path(AWS_KINESIS_INCLUDE_PATH aws/kinesis/KinesisClient.h)
-      if((NOT AWS_KINESIS_INCLUDE_PATH))
-        message(FATAL_ERROR "Fail to find aws/kinesis include path")
-      endif()
-      message(STATUS "aws/kinesis include path: ${AWS_KINESIS_INCLUDE_PATH}")
-
-      find_library(AWS_KINESIS_LIB aws-cpp-sdk-kinesis)
-      if((NOT AWS_KINESIS_LIB))
-        message(FATAL_ERROR "Fail to find aws-cpp-sdk-kinesis lib")
-      endif()
-      message(STATUS "aws-cpp-sdk-kinesis library: ${AWS_KINESIS_LIB}")
-
       find_path(AWS_S3_INCLUDE_PATH aws/s3/S3Client.h)
eloq_metrics/include/metrics_manager.h-32-34 (1)

32-34: Inconsistent environment variable naming.

MOMO_METRICS_PUSH_ADDR uses a different prefix than the other ELOQ_* variables. This appears to be a typo or legacy naming.

-const std::string BIND_ADDR_ENV = "MOMO_METRICS_PUSH_ADDR";
+const std::string BIND_ADDR_ENV = "ELOQ_METRICS_BIND_ADDR";
store_handler/dynamo_handler_typed.cpp-113-128 (1)

113-128: Type mismatch assertion could fail at runtime.

Line 124 asserts that typed_rec->ObjectType() == obj_type, but if the stored object type doesn't match the provided record's type, this will crash in debug builds. Consider handling this mismatch gracefully with an error return or exception.

-        assert(typed_rec->ObjectType() == obj_type);
+        if (typed_rec->ObjectType() != obj_type)
+        {
+            LOG(ERROR) << "Object type mismatch: expected "
+                       << static_cast<int>(typed_rec->ObjectType())
+                       << " but found " << static_cast<int>(obj_type);
+            return;  // or throw
+        }

Committable suggestion skipped: line range outside the PR's diff.

store_handler/data_store_service_scanner.h-207-221 (1)

207-221: Reset() should clear scan_cache_ before reserving.

When reusing a pooled scanner, Reset() only reserves capacity but doesn't clear existing cache entries. If Clear() wasn't called before Reset(), stale data could remain.

     void Reset(DataStoreServiceScanner *scanner,
                int32_t partition_id,
                uint32_t data_shard_id,
                const std::string_view start_key)
     {
         scanner_ = scanner;
         partition_id_ = partition_id;
         data_shard_id_ = data_shard_id;
         last_key_ = start_key;
         session_id_ = "";
         last_batch_size_ = scanner_->GetBatchSize();
         first_batch_fetched_ = false;
         head_ = 0;
+        scan_cache_.clear();
         scan_cache_.reserve(scanner_->GetBatchSize());
     }
store_handler/data_store_service_scanner.h-117-136 (1)

117-136: Potential data race when reading error message.

While SetError correctly uses CAS to ensure only one thread writes error_msg_, GetErrorMsg() reads error_msg_ without memory ordering guarantees relative to error_code_. A caller might observe the new error code but stale/partial error message.

Consider adding a memory fence or using error_code_.load(std::memory_order_acquire) before reading error_msg_:

     const std::string &GetErrorMsg()
     {
+        // Ensure we see error_msg_ written by SetError
+        (void)error_code_.load(std::memory_order_acquire);
         return error_msg_;
     }

Or document that callers must check GetErrorCode() first (which already uses acquire semantics implicitly via the atomic load).

store_handler/bigtable_handler.cpp-2718-2718 (1)

2718-2718: substr result is discarded; trailing space is not removed.

The call index_names.substr(0, index_names.size() - 1) returns a new string but the result is not assigned back to index_names. The trailing space remains in the serialized output.

Apply this diff to fix the issue:

-        index_names.substr(0, index_names.size() - 1);
+        index_names.pop_back();  // Remove trailing space

Comment on lines +98 to +107
FOREACH(PROTO_FILE ${PROTO_FILES})
STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE})
LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)
ADD_CUSTOM_COMMAND(
OUTPUT "${PROTO_SRC}/${PROTO_NAME}.pb.cc" "${PROTO_SRC}/${PROTO_NAME}.pb.h"
DEPENDS ${PROTO_SRC}/${PROTO_FILE}
COMMAND protoc ${PROTO_FILE} --proto_path=./ --cpp_out=./
WORKING_DIRECTORY ${PROTO_SRC}
)
ENDFOREACH(PROTO_FILE)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect regex pattern for extracting proto name.

The regex [^/]proto matches any character (except /) followed by proto, which is not the intended behavior. This will likely produce malformed filenames.

To strip the .proto extension correctly:

 FOREACH(PROTO_FILE ${PROTO_FILES})
-    STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE})
+    STRING(REGEX REPLACE "\\.proto$" "" PROTO_NAME ${PROTO_FILE})
     LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FOREACH(PROTO_FILE ${PROTO_FILES})
STRING(REGEX REPLACE "[^/]proto" "" PROTO_NAME ${PROTO_FILE})
LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)
ADD_CUSTOM_COMMAND(
OUTPUT "${PROTO_SRC}/${PROTO_NAME}.pb.cc" "${PROTO_SRC}/${PROTO_NAME}.pb.h"
DEPENDS ${PROTO_SRC}/${PROTO_FILE}
COMMAND protoc ${PROTO_FILE} --proto_path=./ --cpp_out=./
WORKING_DIRECTORY ${PROTO_SRC}
)
ENDFOREACH(PROTO_FILE)
FOREACH(PROTO_FILE ${PROTO_FILES})
STRING(REGEX REPLACE "\\.proto$" "" PROTO_NAME ${PROTO_FILE})
LIST(APPEND PROTO_CC_FILES ${PROTO_SRC}/${PROTO_NAME}.pb.cc)
ADD_CUSTOM_COMMAND(
OUTPUT "${PROTO_SRC}/${PROTO_NAME}.pb.cc" "${PROTO_SRC}/${PROTO_NAME}.pb.h"
DEPENDS ${PROTO_SRC}/${PROTO_FILE}
COMMAND protoc ${PROTO_FILE} --proto_path=./ --cpp_out=./
WORKING_DIRECTORY ${PROTO_SRC}
)
ENDFOREACH(PROTO_FILE)
🤖 Prompt for AI Agents
In build_tx_service.cmake around lines 98-107, the current STRING(REGEX REPLACE
"[^/]proto" "" PROTO_NAME ${PROTO_FILE}) uses an incorrect regex and produces
malformed names; replace it with a reliable extraction of the proto basename
without extension — for example use CMake's GET_FILENAME_COMPONENT(PROTO_NAME
${PROTO_FILE} NAME_WE) (or, if you prefer regex, use STRING(REGEX REPLACE
"\\.proto$" "" PROTO_NAME ${PROTO_FILE})), keeping the rest of the loop (APPEND,
ADD_CUSTOM_COMMAND outputs/depends) unchanged so the generated .pb.cc/.pb.h
filenames are correct.

${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp
${HOST_MANAGER_SOURCE_DIR}/src/ini.c
${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp
${PROTO_SRC}/${PROTO_NAME}.pb.cc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Undefined variable ${PROTO_NAME} in host_manager sources.

PROTO_NAME is defined inside the FOREACH loop (lines 98-107) and is not accessible in this scope. This will result in an empty or undefined path.

Either use ${PROTO_CC_FILES} which contains all compiled proto sources, or explicitly list the required proto source files:

     SET(RaftHM_SOURCES
         ${HOST_MANAGER_SOURCE_DIR}/src/main.cpp
         ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager_service.cpp
         ${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp
         ${HOST_MANAGER_SOURCE_DIR}/src/ini.c
         ${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp
-        ${PROTO_SRC}/${PROTO_NAME}.pb.cc
+        ${PROTO_CC_FILES}
         ${LOG_PROTO_SRC}/log_agent.cpp
         ${LOG_PROTO_SRC}/${LOG_PROTO_NAME}.pb.cc
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
${PROTO_SRC}/${PROTO_NAME}.pb.cc
SET(RaftHM_SOURCES
${HOST_MANAGER_SOURCE_DIR}/src/main.cpp
${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager_service.cpp
${HOST_MANAGER_SOURCE_DIR}/src/raft_host_manager.cpp
${HOST_MANAGER_SOURCE_DIR}/src/ini.c
${HOST_MANAGER_SOURCE_DIR}/src/INIReader.cpp
${PROTO_CC_FILES}
${LOG_PROTO_SRC}/log_agent.cpp
${LOG_PROTO_SRC}/${LOG_PROTO_NAME}.pb.cc
)
🤖 Prompt for AI Agents
In build_tx_service.cmake around line 325, the line referencing
${PROTO_NAME}.pb.cc uses PROTO_NAME which was only defined inside the FOREACH at
lines 98-107 and is out of scope here, producing an empty/invalid path; replace
that reference by using the aggregate variable ${PROTO_CC_FILES} (which contains
all generated proto cc files) or explicitly add the specific proto source(s)
required here, or move the ${PROTO_NAME}-dependent logic inside the FOREACH so
the variable is in scope.

Comment on lines +202 to +206
if(NOT ROCKSDB_CLOUD_LIB)
message(FATAL_ERROR "Fail to find RocksDB Cloud lib path")
endif()
message(STATUS "ROCKSDB_CLOUD_LIB: ${ROCKSDB_CLOUD_LIB}")
set(ROCKSDB_LIBRARIES ${ROCKSDB_LIBRARIES} ${ROCKSDB_CLOUD_LIB})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

ROCKSDB_CLOUD_LIB check may fail for GCS configuration.

ROCKSDB_CLOUD_LIB is only set for S3 configurations (line 190), but the check at line 202 is common to both S3 and GCS. GCS configurations will hit this FATAL_ERROR.

Add find_library for GCS:

         find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-aws)
     endif()
+    
+    if((WITH_DATA_STORE STREQUAL "ROCKSDB_CLOUD_GCS") OR (WITH_DATA_STORE STREQUAL "ELOQDSS_ROCKSDB_CLOUD_GCS"))
+        find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-gcs)
+    endif()

     # Common RocksDB Cloud setup
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if(NOT ROCKSDB_CLOUD_LIB)
message(FATAL_ERROR "Fail to find RocksDB Cloud lib path")
endif()
message(STATUS "ROCKSDB_CLOUD_LIB: ${ROCKSDB_CLOUD_LIB}")
set(ROCKSDB_LIBRARIES ${ROCKSDB_LIBRARIES} ${ROCKSDB_CLOUD_LIB})
find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-aws)
endif()
if((WITH_DATA_STORE STREQUAL "ROCKSDB_CLOUD_GCS") OR (WITH_DATA_STORE STREQUAL "ELOQDSS_ROCKSDB_CLOUD_GCS"))
find_library(ROCKSDB_CLOUD_LIB NAMES rocksdb-cloud-gcs)
endif()
# Common RocksDB Cloud setup
if(NOT ROCKSDB_CLOUD_LIB)
message(FATAL_ERROR "Fail to find RocksDB Cloud lib path")
endif()
message(STATUS "ROCKSDB_CLOUD_LIB: ${ROCKSDB_CLOUD_LIB}")
set(ROCKSDB_LIBRARIES ${ROCKSDB_LIBRARIES} ${ROCKSDB_CLOUD_LIB})
🤖 Prompt for AI Agents
In CMakeLists.txt around lines 202-206, the current unconditional check for
ROCKSDB_CLOUD_LIB will FATAL_ERROR for GCS builds because ROCKSDB_CLOUD_LIB is
only set for S3; update the logic to also attempt to locate the GCS cloud lib
(use find_library to set ROCKSDB_CLOUD_LIB_GCS or similar) and only require
ROCKSDB_CLOUD_LIB when S3 is enabled, or merge the found GCS/S3 library into
ROCKSDB_CLOUD_LIB before the FATAL_ERROR check so that the fatal branch is only
triggered when neither cloud provider library could be found.

Comment on lines +387 to +389
install(TARGETS logservice
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unconditional install of logservice will fail when WITH_LOG_SERVICE is OFF.

The logservice target is only created when WITH_LOG_SERVICE is ON (lines 251-257), but the install rule is unconditional.

-install(TARGETS logservice
-        ARCHIVE DESTINATION lib
-        LIBRARY DESTINATION lib)
+if(WITH_LOG_SERVICE)
+    install(TARGETS logservice
+            ARCHIVE DESTINATION lib
+            LIBRARY DESTINATION lib)
+endif()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
install(TARGETS logservice
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib)
if(WITH_LOG_SERVICE)
install(TARGETS logservice
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib)
endif()
🤖 Prompt for AI Agents
In CMakeLists.txt around lines 387-389, the install(TARGETS logservice ...)
stanza is unconditional but the logservice target is only created when
WITH_LOG_SERVICE is ON; guard the install so it only runs when the target exists
by wrapping the install() block in the same condition (if(WITH_LOG_SERVICE) ...
endif()) or by checking target existence (if(TARGET logservice) ... endif()) so
installation does not fail when WITH_LOG_SERVICE is OFF.

Comment on lines +91 to +95
// remote request metrics
metrics::enable_remote_request_metrics = config_reader.GetBoolean(
"metrics", "enable_busy_round_metrics", metrics::enable_tx_metrics);
LOG(INFO) << "enable_remote_request_metrics: "
<< (metrics::enable_remote_request_metrics ? "ON" : "OFF");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Wrong config key used for enable_remote_request_metrics.

The code reads "enable_busy_round_metrics" but assigns to enable_remote_request_metrics. This appears to be a copy-paste error that will cause the remote request metrics to be incorrectly configured based on busy round settings.

         // remote request metrics
         metrics::enable_remote_request_metrics = config_reader.GetBoolean(
-            "metrics", "enable_busy_round_metrics", metrics::enable_tx_metrics);
+            "metrics", "enable_remote_request_metrics", metrics::enable_tx_metrics);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// remote request metrics
metrics::enable_remote_request_metrics = config_reader.GetBoolean(
"metrics", "enable_busy_round_metrics", metrics::enable_tx_metrics);
LOG(INFO) << "enable_remote_request_metrics: "
<< (metrics::enable_remote_request_metrics ? "ON" : "OFF");
// remote request metrics
metrics::enable_remote_request_metrics = config_reader.GetBoolean(
"metrics", "enable_remote_request_metrics", metrics::enable_tx_metrics);
LOG(INFO) << "enable_remote_request_metrics: "
<< (metrics::enable_remote_request_metrics ? "ON" : "OFF");
🤖 Prompt for AI Agents
In core/src/metrics_init.cpp around lines 91 to 95, the code mistakenly reads
the "enable_busy_round_metrics" key when setting
metrics::enable_remote_request_metrics; change the config key to the correct
"enable_remote_request_metrics" when calling config_reader.GetBoolean and keep
the same default (metrics::enable_remote_request_metrics) so the remote request
metric flag is read from the correct config entry and logged accurately.

uint16_t port,
uint64_t shard_version)
{
assert(expired_ts_.load(std::memory_order_acquire) == 0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Assertion in DssNode::Reset will always fail for newly constructed nodes.

expired_ts_ is initialized to 1U (line 811), but Reset() asserts expired_ts_.load() == 0 (line 782). This assertion will always fail for nodes that haven't been explicitly marked as expired.

-        std::atomic<uint64_t> expired_ts_{1U};
+        std::atomic<uint64_t> expired_ts_{0};

Or remove the assertion if nodes can be reused:

         void Reset(...)
         {
-            assert(expired_ts_.load(std::memory_order_acquire) == 0);
+            expired_ts_.store(0, std::memory_order_release);
             host_name_ = hostname;

Also applies to: 811-811

🤖 Prompt for AI Agents
In store_handler/data_store_service_client.h around lines 782 and 811, the
Reset() assertion expects expired_ts_ == 0 but expired_ts_ is initialized to 1U,
so the assert always fails for newly constructed nodes; fix by making the
initial value and the Reset() invariant consistent: either initialize
expired_ts_ to 0 (change the initializer at line ~811 to 0U) or adjust/remove
the assertion in Reset() to match the intended initial state (e.g.,
assert(expired_ts_.load(...) == 1U) or drop the assert if nodes may be reused).

Comment on lines +160 to +175
template <typename KeyT>
void DynamoHandlerTyped::BindDynamoReqForKeyPredecessor(
Aws::DynamoDB::Model::AttributeValue &sk, const KeyT &key)
{
assert(false);

std::string_view predecessor_key_str = key.KVSerialize();
auto data_ptr =
reinterpret_cast<const unsigned char *>(predecessor_key_str.data());
predecessor(const_cast<unsigned char *>(data_ptr),
predecessor_key_str.size());

sk.SetB(Aws::Utils::ByteBuffer(
reinterpret_cast<const unsigned char *>(predecessor_key_str.data()),
predecessor_key_str.size()));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Undefined behavior via const_cast on read-only data and dead code after assert(false).

This function has two serious issues:

  1. Dead code: Lines 166-174 execute after assert(false) which will always abort in debug builds. In release builds where asserts are disabled, the function proceeds to execute potentially dangerous code.

  2. Undefined behavior: key.KVSerialize() returns a std::string_view to potentially read-only memory. Using const_cast to modify this data (line 169) is undefined behavior.

If this function is intentionally unimplemented, consider:

 template <typename KeyT>
 void DynamoHandlerTyped::BindDynamoReqForKeyPredecessor(
     Aws::DynamoDB::Model::AttributeValue &sk, const KeyT &key)
 {
-    assert(false);
-
-    std::string_view predecessor_key_str = key.KVSerialize();
-    auto data_ptr =
-        reinterpret_cast<const unsigned char *>(predecessor_key_str.data());
-    predecessor(const_cast<unsigned char *>(data_ptr),
-                predecessor_key_str.size());
-
-    sk.SetB(Aws::Utils::ByteBuffer(
-        reinterpret_cast<const unsigned char *>(predecessor_key_str.data()),
-        predecessor_key_str.size()));
+    // TODO: Implement predecessor key binding
+    (void)sk;
+    (void)key;
+    throw std::logic_error("BindDynamoReqForKeyPredecessor not implemented");
 }

If implementation is needed, copy to a mutable buffer first:

std::string key_copy(key.KVSerialize());
predecessor(reinterpret_cast<unsigned char*>(key_copy.data()), key_copy.size());
sk.SetB(Aws::Utils::ByteBuffer(
    reinterpret_cast<const unsigned char*>(key_copy.data()), key_copy.size()));
🤖 Prompt for AI Agents
In store_handler/dynamo_handler_typed.h around lines 160-175, the function
contains an assert(false) followed by code that uses const_cast on data from a
std::string_view (undefined behavior) and therefore is both dead/unsafe; either
mark the function as unimplemented (e.g., remove the body and throw or return)
or properly implement it by making a mutable copy of the serialized key
(std::string key_copy(key.KVSerialize())), call predecessor using
key_copy.data() and key_copy.size(), and pass a const pointer to
Aws::Utils::ByteBuffer created from key_copy.data() and key_copy.size(),
avoiding any const_cast on read-only memory.

Comment on lines +404 to +409
include_directories(
${PROJECT_SOURCE_DIR}/
if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")
${PROJECT_SOURCE_DIR}/eloqstore/
endif ()
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Invalid CMake syntax: if() inside include_directories().

CMake does not support conditional statements inside include_directories(). This will cause a parse error.

Apply this diff to fix the syntax:

-include_directories(
-        ${PROJECT_SOURCE_DIR}/
-if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")
-        ${PROJECT_SOURCE_DIR}/eloqstore/
-endif ()
-)
+include_directories(${PROJECT_SOURCE_DIR}/)
+if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")
+    include_directories(${PROJECT_SOURCE_DIR}/eloqstore/)
+endif()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
include_directories(
${PROJECT_SOURCE_DIR}/
if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")
${PROJECT_SOURCE_DIR}/eloqstore/
endif ()
)
include_directories(${PROJECT_SOURCE_DIR}/)
if(WITH_DATA_STORE STREQUAL "ELOQDSS_ELOQSTORE")
include_directories(${PROJECT_SOURCE_DIR}/eloqstore/)
endif()
🤖 Prompt for AI Agents
In store_handler/eloq_data_store_service/CMakeLists.txt around lines 404-409,
the include_directories() call contains an if() block which is invalid CMake
syntax; to fix it, close include_directories() after the unconditional path(s)
and move the conditional outside so you either call
include_directories(${PROJECT_SOURCE_DIR}/eloqstore/) only when WITH_DATA_STORE
STREQUAL "ELOQDSS_ELOQSTORE" or build a list variable (e.g., set(INCLUDE_DIRS
${PROJECT_SOURCE_DIR}/) and if(CONDITION) list(APPEND INCLUDE_DIRS
${PROJECT_SOURCE_DIR}/eloqstore/) then call
include_directories(${INCLUDE_DIRS})), ensuring no if() appears inside
include_directories().

Comment on lines +27 to +41
#include "data_store.h"

namespace EloqDS
{

class DataStoreFactory
{
public:
virtual ~DataStoreFactory() = default;

virtual std::unique_ptr<DataStore> CreateDataStore(
bool create_if_missing,
uint32_t shard_id,
DataStoreService *data_store_service,
bool start_db = true) = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing forward declaration for DataStoreService.

DataStoreService* is used as a parameter in CreateDataStore() but is not forward-declared or included. This will cause a compilation error.

 #include "data_store.h"

 namespace EloqDS
 {

+class DataStoreService;
+
 class DataStoreFactory
 {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include "data_store.h"
namespace EloqDS
{
class DataStoreFactory
{
public:
virtual ~DataStoreFactory() = default;
virtual std::unique_ptr<DataStore> CreateDataStore(
bool create_if_missing,
uint32_t shard_id,
DataStoreService *data_store_service,
bool start_db = true) = 0;
#include "data_store.h"
namespace EloqDS
{
class DataStoreService;
class DataStoreFactory
{
public:
virtual ~DataStoreFactory() = default;
virtual std::unique_ptr<DataStore> CreateDataStore(
bool create_if_missing,
uint32_t shard_id,
DataStoreService *data_store_service,
bool start_db = true) = 0;
🤖 Prompt for AI Agents
In store_handler/eloq_data_store_service/data_store_factory.h around lines 27 to
41, the header uses DataStoreService* in the CreateDataStore signature but does
not forward-declare or include DataStoreService, which will cause a compilation
error; fix by adding a forward declaration “class DataStoreService;” above the
DataStoreFactory declaration (or include the DataStoreService header if full
type is required), and ensure no circular include is introduced.

Comment on lines +36 to +44
if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)
{
entry->count_strike_++;
if (entry->start_strike_ < entry->count_strike_ ||
entry->end_strike_ > entry->count_strike_)
{
return;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Strike-based gating condition is inverted — actions will never trigger.

The condition on lines 39-40 returns early (skips execution) when it should proceed. With the current logic:

  • If start_strike_=2, end_strike_=5, count_strike_=3: 2 < 3 is true → returns, action skipped
  • This means actions are skipped when they should be triggered

The intended behavior is to skip when count_strike_ is outside the [start_strike_, end_strike_] range.

Apply this diff to fix the condition:

     if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)
     {
         entry->count_strike_++;
-        if (entry->start_strike_ < entry->count_strike_ ||
-            entry->end_strike_ > entry->count_strike_)
+        if (entry->count_strike_ < entry->start_strike_ ||
+            entry->count_strike_ > entry->end_strike_)
         {
             return;
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)
{
entry->count_strike_++;
if (entry->start_strike_ < entry->count_strike_ ||
entry->end_strike_ > entry->count_strike_)
{
return;
}
}
if (entry->start_strike_ >= 0 && entry->end_strike_ >= 0)
{
entry->count_strike_++;
if (entry->count_strike_ < entry->start_strike_ ||
entry->count_strike_ > entry->end_strike_)
{
return;
}
}
🤖 Prompt for AI Agents
In store_handler/eloq_data_store_service/data_store_fault_inject.cpp around
lines 36 to 44, the strike-range gate is inverted: after incrementing
count_strike_ the code currently returns when count is inside the
[start_strike_, end_strike_] range, preventing intended actions; change the
condition to return only when count_strike_ is outside the range (i.e., if
count_strike_ < start_strike_ || count_strike_ > end_strike_) so actions proceed
when count_strike_ falls within the inclusive bounds.

@githubzilla githubzilla merged commit ea5b4ad into main Nov 28, 2025
5 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 3, 2025
5 tasks
This was referenced Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants