test: distributed_executor.cpp branch coverage#67
Conversation
Added DistributedExecutorWithNodesTests fixture that registers mock data nodes in ClusterManager and uses RpcServer to test distributed execution paths: - SELECT with nodes but no server (RPC connect failure) - SELECT with shard routing (get_leader + compute_shard paths) - DDL forwarding to data nodes (lines 156-163) - INSERT shard routing (lines 454-519) - COMMIT with 2PC prepare/commit (lines 388-449) - ROLLBACK with nodes (lines 366-386) - SELECT broadcast without sharding key (line 565) Before: 0% branch coverage on distributed_executor.cpp After: 21.75% branch coverage (40.00% branches taken)
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds extensive new tests: a DistributedExecutorWithNodesTests fixture that can register mock data nodes and run real RpcServer instances with installed handlers, plus ~150 integration/unit tests covering distributed execution, transactions, joins, aggregates, DDL, and many SQL features. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Test as Executor (test)
participant Rpc as RpcServer
participant Handler as RPC Handler
participant Node as Mock Data Node
rect rgba(200,230,255,0.5)
Test->>Rpc: register node metadata / connect
Rpc-->>Test: accept or refuse connection
Test->>Rpc: send ExecuteFragment or Txn RPC
Rpc->>Handler: invoke registered handler
Handler->>Node: request QueryResultsReply or txn result
Node-->>Handler: return serialized response
Handler-->>Rpc: send serialized reply
Rpc-->>Test: deliver reply (success or error)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/distributed_executor_tests.cpp (1)
532-557: Extract the inlineTxn*reply lambdas into a helper.
CommitWithNodes(lines 532-557) andRollbackWithNodes(lines 577-590) repeat the same boilerplate asset_execute_fragment_handler— only theRpcTypediffers. The duplication invites drift (e.g., updating header layout in one place but not others). Add a small helper alongsideset_execute_fragment_handler:♻️ Generic success-reply handler
+ // Set up a handler that returns successful QueryResultsReply for any RpcType + void set_success_reply_handler(network::RpcServer& srv, network::RpcType in_type, + network::RpcType reply_type) { + srv.set_handler(in_type, + [reply_type](const network::RpcHeader&, + const std::vector<uint8_t>&, int fd) { + network::QueryResultsReply reply; + reply.success = true; + auto data = reply.serialize(); + network::RpcHeader resp_h; + resp_h.type = reply_type; + resp_h.payload_len = static_cast<uint16_t>(data.size()); + char h_buf[network::RpcHeader::HEADER_SIZE]; + resp_h.encode(h_buf); + send(fd, h_buf, network::RpcHeader::HEADER_SIZE, 0); + if (!data.empty()) send(fd, data.data(), data.size(), 0); + }); + }Then call
set_success_reply_handler(*servers_[0], network::RpcType::TxnPrepare, network::RpcType::TxnPrepare);etc. As a side benefit you avoid the doublereply.serialize()call that exists in every current handler.Also applies to: 577-590
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/distributed_executor_tests.cpp` around lines 532 - 557, The two inline lambdas used in CommitWithNodes/RollbackWithNodes that call servers_[0]->set_handler for network::RpcType::TxnPrepare and TxnCommit should be refactored into a shared helper (similar to set_execute_fragment_handler) that takes a Server reference and an RpcType and sends a generic successful QueryResultsReply; update CommitWithNodes and RollbackWithNodes to call that helper instead of duplicating logic; ensure the helper builds the reply once (avoid calling reply.serialize() twice), sets resp_h.type from the passed RpcType, encodes the header, sends header+payload to fd, and reuse the same helper for both TxnPrepare and TxnCommit handlers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/distributed_executor_tests.cpp`:
- Line 368: The fixture defines a static next_port_ but never uses it; either
remove next_port_ or use it consistently to allocate unique ports for tests like
SelectWithNodesButNoServer and SelectWithShardRouting instead of hard-coding
6411/6412. To fix, replace hard-coded port literals in those tests with an
allocation from next_port_ (e.g., take next_port_++ or a GetNextPort() helper
that increments next_port_), or delete the next_port_ member and update
comments; ensure RpcServer behavior (SO_REUSEADDR) remains unchanged and update
all references (SelectWithNodesButNoServer, SelectWithShardRouting and other
tests noted) to avoid duplicate port assignments.
- Around line 481-498: The test
DistributedExecutorWithNodesTests::DDLForwardsToNodes currently never verifies
the handler was called; modify the set_execute_fragment_handler(...) usage to
capture an invocation counter (e.g., atomic<int> or std::atomic<size_t>) inside
the ExecuteFragment handler lambda registered on servers_[0], increment it when
the handler runs, then after auto res = exec_->execute(...); add an assertion
that the counter was incremented (e.g., EXPECT_EQ(counter.load(), 1)) to prove
forwarding occurred; keep the existing EXPECT_TRUE(res.success()) as well.
---
Nitpick comments:
In `@tests/distributed_executor_tests.cpp`:
- Around line 532-557: The two inline lambdas used in
CommitWithNodes/RollbackWithNodes that call servers_[0]->set_handler for
network::RpcType::TxnPrepare and TxnCommit should be refactored into a shared
helper (similar to set_execute_fragment_handler) that takes a Server reference
and an RpcType and sends a generic successful QueryResultsReply; update
CommitWithNodes and RollbackWithNodes to call that helper instead of duplicating
logic; ensure the helper builds the reply once (avoid calling reply.serialize()
twice), sets resp_h.type from the passed RpcType, encodes the header, sends
header+payload to fd, and reuse the same helper for both TxnPrepare and
TxnCommit handlers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b5c61aec-b421-468a-abb5-f95fd0adb469
📒 Files selected for processing (1)
tests/distributed_executor_tests.cpp
| config_.mode = config::RunMode::Coordinator; | ||
| cm_ = std::make_unique<ClusterManager>(&config_); | ||
| exec_ = std::make_unique<DistributedExecutor>(*catalog_, *cm_); | ||
| next_port_ = 6410; // Start from different base to avoid rpc_client_tests |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm RpcServer enables SO_REUSEADDR; otherwise port reuse between tests will flake.
fd -t f 'rpc_server\.(c|cc|cpp|cxx)$' | xargs rg -nP -C3 'SO_REUSEADDR|setsockopt'Repository: poyrazK/cloudSQL
Length of output: 261
Unused next_port_ suggests incomplete implementation; remove or use it consistently.
The static next_port_ member is initialized but never read by any test — each test hard-codes its own port. Additionally, SelectWithNodesButNoServer (lines 437-438) and SelectWithShardRouting (lines 454-463) both use ports 6411/6412, duplicating port assignments within the same fixture. While RpcServer properly enables SO_REUSEADDR (verified in rpc_server.cpp), hard-coding identical ports is still poor practice and makes it unclear whether port allocation is intentional or accidental.
Either remove the dead next_port_ member entirely, or use it to allocate unique ports per test instead of hard-coding them. This improves maintainability and makes test setup intentions explicit.
Also applies to: 423-423, 431-431
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/distributed_executor_tests.cpp` at line 368, The fixture defines a
static next_port_ but never uses it; either remove next_port_ or use it
consistently to allocate unique ports for tests like SelectWithNodesButNoServer
and SelectWithShardRouting instead of hard-coding 6411/6412. To fix, replace
hard-coded port literals in those tests with an allocation from next_port_
(e.g., take next_port_++ or a GetNextPort() helper that increments next_port_),
or delete the next_port_ member and update comments; ensure RpcServer behavior
(SO_REUSEADDR) remains unchanged and update all references
(SelectWithNodesButNoServer, SelectWithShardRouting and other tests noted) to
avoid duplicate port assignments.
| TEST_F(DistributedExecutorWithNodesTests, DDLForwardsToNodes) { | ||
| // Start RpcServer for node1 | ||
| auto srv1 = std::make_unique<network::RpcServer>(6413); | ||
| srv1->start(); | ||
| servers_.push_back(std::move(srv1)); | ||
|
|
||
| cm_->register_node("node_1", "127.0.0.1", 6413, config::RunMode::Data); | ||
| set_execute_fragment_handler(*servers_[0], true); | ||
|
|
||
| // DDL: CREATE TABLE - should succeed locally and try to forward to nodes | ||
| auto lexer = std::make_unique<Lexer>("CREATE TABLE ddl_test (id INT, name TEXT)"); | ||
| Parser parser(std::move(lexer)); | ||
| auto stmt = parser.parse_statement(); | ||
| ASSERT_NE(stmt, nullptr); | ||
|
|
||
| auto res = exec_->execute(*stmt, "CREATE TABLE ddl_test (id INT, name TEXT)"); | ||
| EXPECT_TRUE(res.success()); // DDL always succeeds (local catalog) | ||
| } |
There was a problem hiding this comment.
DDLForwardsToNodes does not actually verify forwarding.
The only assertion (EXPECT_TRUE(res.success())) already passes for DDL with zero nodes (see ExecuteDDLWithoutNodes at lines 140-148). Registering a node and installing an ExecuteFragment handler here does not prove the executor invoked it — the test would still pass if forwarding were silently broken. Capture an invocation counter in the handler lambda and assert it after execute(...).
♻️ Make the test observe the forwarding
- cm_->register_node("node_1", "127.0.0.1", 6413, config::RunMode::Data);
- set_execute_fragment_handler(*servers_[0], true);
+ cm_->register_node("node_1", "127.0.0.1", 6413, config::RunMode::Data);
+ std::atomic<int> calls{0};
+ servers_[0]->set_handler(network::RpcType::ExecuteFragment,
+ [&calls, this](const network::RpcHeader& h,
+ const std::vector<uint8_t>& p, int fd) {
+ calls.fetch_add(1);
+ // delegate to the standard success reply
+ set_execute_fragment_handler(*servers_[0], true);
+ // ...or inline a success reply here
+ });
@@
- EXPECT_TRUE(res.success()); // DDL always succeeds (local catalog)
+ EXPECT_TRUE(res.success());
+ // Wait briefly for the async forward, then assert the handler was hit.
+ EXPECT_GT(calls.load(), 0);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/distributed_executor_tests.cpp` around lines 481 - 498, The test
DistributedExecutorWithNodesTests::DDLForwardsToNodes currently never verifies
the handler was called; modify the set_execute_fragment_handler(...) usage to
capture an invocation counter (e.g., atomic<int> or std::atomic<size_t>) inside
the ExecuteFragment handler lambda registered on servers_[0], increment it when
the handler runs, then after auto res = exec_->execute(...); add an assertion
that the counter was incremented (e.g., EXPECT_EQ(counter.load(), 1)) to prove
forwarding occurred; keep the existing EXPECT_TRUE(res.success()) as well.
Added 8 new tests to DistributedExecutorWithNodesTests: - InnerJoinShuffle, LeftJoinShuffle, RightJoinShuffle: exercise Phase 1/2 ShuffleFragment, bloom filter aggregation/broadcast - SelectWithCountAggregate, SelectWithSumAggregate: global aggregates - SelectWithOrderBy: global sort with key resolution - SelectWithLimitOffset: result truncation - DISABLED_FullJoinPhase3_5, DISABLED_CommitPrepareFailure: hang due to async thread reply timing Branch coverage: 21.75% → 32.94% taken at least once
The TxnPrepare client.call() hangs even with reply handler set. Root cause: RpcServer handler runs in worker thread and send() completes but RpcClient.call() may not receive it due to socket buffering or timing. DISABLED for now.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/distributed_executor_tests.cpp (1)
540-616: Extract a shared helper for the transaction RPC handlers.The
TxnPrepare/TxnCommitreply blocks (lines 548–573), theTxnAborthandler (lines 593–606), and theTxnPrepare-failure handler in the disabled test (lines 869–882) all repeat the same ~10-line "buildQueryResultsReply, encode header, send" boilerplate. A small helper on the fixture (e.g.,set_simple_reply_handler(server, rpc_type, success, error_msg)) would substantially reduce duplication and bring these tests in line with the existingset_execute_fragment_handlerpattern. This also reduces the surface area where thepayload_lencast issue noted above must be fixed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/distributed_executor_tests.cpp` around lines 540 - 616, Extract a small helper on the test fixture (e.g., set_simple_reply_handler(server, rpc_type, bool success, const std::string& err="")) that builds a network::QueryResultsReply, serializes it, encodes a network::RpcHeader and sends header+payload via the server's set_handler callback; replace the duplicate lambdas for RpcType::TxnPrepare, RpcType::TxnCommit and RpcType::TxnAbort with calls to this helper (use the same server object passed into the helper instead of servers_[0]). Ensure you compute auto data = reply.serialize() first and then set resp_h.payload_len = static_cast<uint16_t>(data.size()) before resp_h.encode(...) to fix the payload_len cast ordering issue.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/distributed_executor_tests.cpp`:
- Around line 470-475: RpcServer::start() calls are unchecked and can silently
fail; change every srv->start() invocation (e.g., the srv1/srv2 startup in
distributed_executor_tests.cpp) to assert the return value (e.g.,
ASSERT_TRUE(srv->start())) or create a small helper like register_mock_node that
constructs the RpcServer, calls start(), asserts it returned true, and then
pushes the server into servers_; update all occurrences (all srv->start() calls)
to use that assertion or the helper so test setup fails noisily on bind/start
errors.
- Around line 395-437: The code blindly casts reply.serialize().size() to
RpcHeader::payload_len (uint16_t) and calls serialize() twice in
send_success_reply and the lambda in set_execute_fragment_handler, risking
silent truncation and wasted work; fix by serializing once into a local variable
(e.g., auto data = reply.serialize()), check data.size() <=
std::numeric_limits<uint16_t>::max(), and if it exceeds that limit handle it
(set reply.success=false with an explanatory error_msg or return/log an error)
before assigning resp_h.payload_len = static_cast<uint16_t>(data.size()) and
sending the header and data; apply the same change to send_success_reply and the
ExecuteFragment handler to avoid double-serialization and truncation.
---
Nitpick comments:
In `@tests/distributed_executor_tests.cpp`:
- Around line 540-616: Extract a small helper on the test fixture (e.g.,
set_simple_reply_handler(server, rpc_type, bool success, const std::string&
err="")) that builds a network::QueryResultsReply, serializes it, encodes a
network::RpcHeader and sends header+payload via the server's set_handler
callback; replace the duplicate lambdas for RpcType::TxnPrepare,
RpcType::TxnCommit and RpcType::TxnAbort with calls to this helper (use the same
server object passed into the helper instead of servers_[0]). Ensure you compute
auto data = reply.serialize() first and then set resp_h.payload_len =
static_cast<uint16_t>(data.size()) before resp_h.encode(...) to fix the
payload_len cast ordering issue.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e04ba6f-e4d2-48e7-aa28-ae898def1002
📒 Files selected for processing (1)
tests/distributed_executor_tests.cpp
| auto srv1 = std::make_unique<network::RpcServer>(6411); | ||
| auto srv2 = std::make_unique<network::RpcServer>(6412); | ||
| srv1->start(); | ||
| srv2->start(); | ||
| servers_.push_back(std::move(srv1)); | ||
| servers_.push_back(std::move(srv2)); |
There was a problem hiding this comment.
RpcServer::start() return value is unchecked across all tests.
Every srv->start() call (lines 472–473, 500, 521, 542, 588, 622, 643–644, 678–679, 713–714, 748–749, 790, 808, 826, 844, 863) ignores the boolean return. If a port is already bound (e.g., concurrent CI runs, or the duplicate 6411/6412 reuse already flagged), start() likely returns false and the test will silently exercise a connect-failure path while the assertion text claims it is exercising the success path — producing misleading green coverage. Consider ASSERT_TRUE(srv->start()) (or asserting via a register_mock_node helper that wraps it).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/distributed_executor_tests.cpp` around lines 470 - 475,
RpcServer::start() calls are unchecked and can silently fail; change every
srv->start() invocation (e.g., the srv1/srv2 startup in
distributed_executor_tests.cpp) to assert the return value (e.g.,
ASSERT_TRUE(srv->start())) or create a small helper like register_mock_node that
constructs the RpcServer, calls start(), asserts it returned true, and then
pushes the server into servers_; update all occurrences (all srv->start() calls)
to use that assertion or the helper so test setup fails noisily on bind/start
errors.
New tests: - UpdateWithShardKey: exercises lines 535-536 where_expr for UPDATE - DeleteWithShardKey: exercises lines 537-538 where_expr for DELETE - SelectWithMinAggregate: MIN branch in global aggregate loop (line 879) - SelectWithMaxAggregate: MAX branch (line 882) Branch coverage: 33.41% → 35.16% (443/1260 taken)
These tests hang when using inline lambda handlers that return rows in std::async context with RpcServer worker threads. Reverting to simple handler and disabling the tests to keep suite passing. Co-authored-by: claude-code
These tests were disabled due to inline lambda crashes in std::async context. Now using set_execute_fragment_handler() which works correctly. Co-authored-by: claude-code
Added cm_->set_leader(2, "node_1") to trigger the leader routing path (lines 549-558) where get_leader() returns non-empty and found_leader branch is exercised. Co-authored-by: claude-code
Shard idx = hash("1") % 2 = 0, so group_id = shard_idx + 1 = 1.
This triggers the leader routing path at lines 549-558.
Co-authored-by: claude-code
Added 10 new tests: - SqlCachingSecondCall / SqlCachingDifferentQueries (lines 248-277) - PrepareInsertStatement / PrepareSelectStatement / PrepareWithNonexistentTable (lines 133-171) - UpdateWithIndexRebuild (lines 822-848) - DeleteWithIndex / DeleteNonexistentRow - CreateMultipleIndexes Coverage: 29.24% → 35.50% (+6.26%) Co-authored-by: claude-code
…cutor Added 16 new tests: - CommitWithoutTransaction / RollbackWithoutTransaction / BeginAfterActiveTransaction - CreateIndexOnNonexistentColumn / CreateIndexThenSelect - DropIndexByName - SelectWithCountStar / SelectWithSum / SelectWithAvg - SelectWithOrderByNew / SelectWithLimitNew / SelectWithOffsetNew Coverage: 35.50% → 36.93% (+1.43%) Co-authored-by: claude-code
… tests Added 9 new tests: - SelectWithIn, SelectWithLike - SelectWithIsNull, SelectWithIsNotNull - SelectWithGroupByHaving - SelectWithConcat, SelectWithAbs, SelectWithMaxMin - SelectWithSubquery Coverage: 36.93% → 37.64% (+0.71%) Co-authored-by: claude-code
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/distributed_executor_tests.cpp (2)
371-382: TearDown removes files that no test creates.
./test_dist_exec/node1.dband./test_dist_exec/node2.dbare never produced by any test in this file — every test interacts with the executor purely through the mockedRpcServerhandlers. Thestd::removecalls are harmless but misleading; consider removing them (or, if a future test does write under./test_dist_exec/, prefer wiping the directory wholesale rather than enumerating individual filenames).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/distributed_executor_tests.cpp` around lines 371 - 382, The TearDown method currently calls std::remove on "./test_dist_exec/node1.db" and "./test_dist_exec/node2.db" even though no tests create those files; remove these two std::remove calls from TearDown in the TearDown override to avoid misleading cleanup, or if you want a future-proof solution replace those two removes with a single directory wipe of "./test_dist_exec/" (e.g., remove all files in that directory) so cleanup is correct and not hard-coded; locate the method named TearDown in the same test fixture that references servers_, exec_, cm_, and catalog_ to apply the change.
363-369: Reassigning thestaticnext_port_counter insideSetUp()defeats it.
next_port_is a static atomic counter intended to hand out unique ports across tests, butSetUp()runs per-test and resets it to6410every time, so it can never advance. Combined with the fact that no test currently reads it (and ports6411/6412are still hard-coded in multiple tests), the member is effectively dead. Either remove the assignment and usenext_port_.fetch_add(1)consistently inregister_mock_node/each test, or drop the member entirely.♻️ Suggested cleanup
void SetUp() override { catalog_ = Catalog::create(); config_.mode = config::RunMode::Coordinator; cm_ = std::make_unique<ClusterManager>(&config_); exec_ = std::make_unique<DistributedExecutor>(*catalog_, *cm_); - next_port_ = 6410; // Start from different base to avoid rpc_client_tests }And, if you keep
next_port_, routeregister_mock_nodeand the inline tests throughnext_port_.fetch_add(1)instead of hard-coded literals.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/distributed_executor_tests.cpp` around lines 363 - 369, The SetUp() method currently resets the static atomic counter next_port_ to 6410 each test which prevents it from ever advancing; remove the assignment from SetUp() so next_port_ remains a global allocator, then update register_mock_node and any tests that currently use hard-coded ports (e.g., literals 6411/6412) to obtain ports via next_port_.fetch_add(1) so each test/mocked node gets a unique port; alternatively, if you prefer to drop the counter, remove the next_port_ static member and replace all uses with explicit per-test constants to keep behavior consistent.tests/query_executor_tests.cpp (1)
559-927: Test isolation: many new tests leave stale.heapfiles behind.
TestEnvironment::~TestEnvironment()(lines 57–62) only removestest_table.heap,table_a.heap, andtable_b.heap, but the new tests create dozens of additional tables —cache_test,prep_test,prep_select,null_plan_test,idx_update,idx_del,del_test,multi_idx,idx_err,idx_test,drop_idx_test,agg_test,sum_test,avg_test,order_test2,limit_test2,offset_test2,group_having,in_test,like_test,null_test,notnull_test,concat_test,math_test,maxmin_test,sub_test— all of which share the same./test_datadirectory across runs. Because each test constructs a fresh in-memoryCatalogbut reuses the on-disk files, residual heap pages from a previous run can leak into a subsequent run and produce non-deterministic results (especially under coverage builds where the same target is run repeatedly).Either (a) make
TestEnvironmentuse a unique per-test directory (e.g.,testing::UnitTest::GetInstance()->current_test_info()->name()), or (b) blow away./test_datawholesale in the destructor instead of enumerating individual files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/query_executor_tests.cpp` around lines 559 - 927, TestEnvironment::~TestEnvironment currently only removes a few hard-coded heap files (test_table.heap, table_a.heap, table_b.heap) causing stale .heap files from many newly added tests (e.g., cache_test, prep_test, idx_update, etc.) to persist; update TestEnvironment to ensure test isolation by either (a) switching TestEnvironment to create a unique per-test data directory (use testing::UnitTest::GetInstance()->current_test_info()->name() or similar when constructing TestEnvironment) so each test writes to its own folder, or (b) modify TestEnvironment::~TestEnvironment to recursively remove/clean the entire ./test_data directory instead of enumerating specific filenames; locate the destructor TestEnvironment::~TestEnvironment and implement one of these two approaches so all .heap files created by tables like cache_test, prep_select, null_plan_test, idx_update, etc. are removed between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/distributed_executor_tests.cpp`:
- Around line 566-586: The test moves srv1 into servers_ and then calls
srv1->set_handler, causing a use-after-move on the moved-from unique_ptr; fix by
either calling srv1->set_handler(...) before pushing/moving srv1 into servers_
or by accessing the stored server object after the move (e.g.
servers_.back()->set_handler(...)); update the code around RpcServer srv1,
servers_, and the set_handler call so the handler is registered on a valid
RpcServer instance before registering the node and starting the test.
- Around line 1010-1036: In DISABLED_CommitPrepareFailure the test moves srv1
into servers_ with std::move( srv1 ) then calls srv1->set_handler(...), causing
use-after-move; fix by registering both Rpc handlers on srv1 before pushing it
into servers_ (i.e., call srv1->set_handler(...) for RpcType::TxnPrepare and
RpcType::TxnAbort prior to servers_.push_back(std::move(srv1))), or
alternatively after the move access the owned server via
servers_.back()->set_handler(...) instead of srv1; update the test in the
DISABLED_CommitPrepareFailure function to use one of these approaches so
set_handler is never called on a moved-from std::unique_ptr.
---
Nitpick comments:
In `@tests/distributed_executor_tests.cpp`:
- Around line 371-382: The TearDown method currently calls std::remove on
"./test_dist_exec/node1.db" and "./test_dist_exec/node2.db" even though no tests
create those files; remove these two std::remove calls from TearDown in the
TearDown override to avoid misleading cleanup, or if you want a future-proof
solution replace those two removes with a single directory wipe of
"./test_dist_exec/" (e.g., remove all files in that directory) so cleanup is
correct and not hard-coded; locate the method named TearDown in the same test
fixture that references servers_, exec_, cm_, and catalog_ to apply the change.
- Around line 363-369: The SetUp() method currently resets the static atomic
counter next_port_ to 6410 each test which prevents it from ever advancing;
remove the assignment from SetUp() so next_port_ remains a global allocator,
then update register_mock_node and any tests that currently use hard-coded ports
(e.g., literals 6411/6412) to obtain ports via next_port_.fetch_add(1) so each
test/mocked node gets a unique port; alternatively, if you prefer to drop the
counter, remove the next_port_ static member and replace all uses with explicit
per-test constants to keep behavior consistent.
In `@tests/query_executor_tests.cpp`:
- Around line 559-927: TestEnvironment::~TestEnvironment currently only removes
a few hard-coded heap files (test_table.heap, table_a.heap, table_b.heap)
causing stale .heap files from many newly added tests (e.g., cache_test,
prep_test, idx_update, etc.) to persist; update TestEnvironment to ensure test
isolation by either (a) switching TestEnvironment to create a unique per-test
data directory (use
testing::UnitTest::GetInstance()->current_test_info()->name() or similar when
constructing TestEnvironment) so each test writes to its own folder, or (b)
modify TestEnvironment::~TestEnvironment to recursively remove/clean the entire
./test_data directory instead of enumerating specific filenames; locate the
destructor TestEnvironment::~TestEnvironment and implement one of these two
approaches so all .heap files created by tables like cache_test, prep_select,
null_plan_test, idx_update, etc. are removed between tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a6d9dd0f-0a55-4c91-9fc9-4bdcf0ba7ab7
📒 Files selected for processing (2)
tests/distributed_executor_tests.cpptests/query_executor_tests.cpp
…xecutor - Add StringExecute tests to cover SQL caching path (lines 248-277) - Add StringExecuteSelectMultiple, StringExecuteSelectOrderBy, StringExecuteSelectLimit - Add StringExecuteJoin, StringExecuteSelectGroupBy - Add CREATE INDEX, DROP INDEX, DROP TABLE via string execute - Add PreparedInsertWithParams, StringExecuteUpdate, StringExecuteDelete - Add CreateTableLocalOnlyMode to cover is_local_only_ path (line 440) - Add CreateCompositeIndexNotSupported to cover rejection (line 473) - Coverage: 37.64% → 39.43% (+1.79%) - Test count: 75 → 97 (+22)
…tests - Add UpdateIndexedColumn tests for UPDATE index rebuild paths (816-848) - Add SelectFromNonexistentTable for execute_select error path (398-399) - Add StringExecuteDDL for DDL via string execute path - Add RollbackWithoutBegin, CommitWithoutBegin, BeginTwice for transaction edge cases - Add SelectWithQualifiedColumn, SelectCountWithGroupBy, SelectMinMaxWithGroupBy - Test count: 97 → 108 (+11)
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/query_executor_tests.cpp (2)
57-62:⚠️ Potential issue | 🟠 MajorClean up all generated test artifacts, not just three fixed files.
This fixture creates many different tables and indexes under
./test_data, but teardown only removestest_table.heap,table_a.heap, andtable_b.heap. That leaves stale files behind and can make reruns order-dependent or flaky after a previous failed run.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/query_executor_tests.cpp` around lines 57 - 62, The TestEnvironment destructor (~TestEnvironment) only removes three hard-coded files; change it to remove all generated artifacts under "./test_data" instead: iterate the "./test_data" directory (e.g., using std::filesystem::directory_iterator or std::filesystem::remove_all) and delete all files matching test artifacts (or remove the entire directory contents) while handling/ignoring non-existent entries and logging/reporting errors as appropriate so reruns are deterministic and no stale .heap/.idx or other files remain.
473-487:⚠️ Potential issue | 🟠 MajorUse a separate transaction context for the isolation check.
exec2shares the sameTransactionManagerinstance as the writer, so this does not really model a second client. The read can still observe the same active transaction state, which makes the “read before commit” assertion unreliable. If you want to verify isolation, run the read from a distinct transaction context/thread.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/query_executor_tests.cpp` around lines 473 - 487, The test uses QueryExecutor exec2 constructed with env.txn_manager so it does not model an independent client; change the read to use a distinct transaction context by constructing exec2 with a separate TransactionManager instance (instead of env.txn_manager) or by starting an independent transaction on exec2 before the UPDATE (e.g., call execute_sql(exec2, "BEGIN") to create its own transaction context) and ensure the read SELECT runs in that separate transaction so it cannot see the uncommitted UPDATE from the writer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/query_executor_tests.cpp`:
- Around line 1012-1029: The two tests are currently not asserting failures: in
CreateCompositeIndexNotSupported capture the result of
env.executor.execute("CREATE INDEX comp_idx_ab ON comp_idx(a, b)") and assert
the expected rejection (e.g., use the result object's success()/error() accessor
to assert failure and check the error message), and in UnsupportedStatementType
either implement a real assertion that constructing/executing an unsupported
Statement triggers the error path (again assert via success()/error()) or remove
the empty TEST_F placeholder; locate these changes around the TestEnvironment
usage and the tests named CreateCompositeIndexNotSupported and
UnsupportedStatementType and update assertions to verify the error path rather
than discarding the result.
- Around line 952-963: The test StringExecuteWithCacheHit never triggers a cache
hit because the second execute call uses a different SQL literal; change the
test to reuse the exact same SQL string on both executions (e.g., store "INSERT
INTO cache_hit VALUES (1, 'a')" in a local const std::string sql and call
env.executor.execute(sql) twice) or alternatively replace the second call with
the identical SQL literal so the executor can hit the cache; update EXPECTs
accordingly (use TEST_F QueryExecutorTests, the env.executor.execute calls and
the initial execute_sql that creates the table to locate the change).
---
Outside diff comments:
In `@tests/query_executor_tests.cpp`:
- Around line 57-62: The TestEnvironment destructor (~TestEnvironment) only
removes three hard-coded files; change it to remove all generated artifacts
under "./test_data" instead: iterate the "./test_data" directory (e.g., using
std::filesystem::directory_iterator or std::filesystem::remove_all) and delete
all files matching test artifacts (or remove the entire directory contents)
while handling/ignoring non-existent entries and logging/reporting errors as
appropriate so reruns are deterministic and no stale .heap/.idx or other files
remain.
- Around line 473-487: The test uses QueryExecutor exec2 constructed with
env.txn_manager so it does not model an independent client; change the read to
use a distinct transaction context by constructing exec2 with a separate
TransactionManager instance (instead of env.txn_manager) or by starting an
independent transaction on exec2 before the UPDATE (e.g., call
execute_sql(exec2, "BEGIN") to create its own transaction context) and ensure
the read SELECT runs in that separate transaction so it cannot see the
uncommitted UPDATE from the writer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aa11c616-3bf4-4045-b47f-25e3f90ce683
📒 Files selected for processing (1)
tests/query_executor_tests.cpp
- Remove empty UnsupportedStatementType test - Add assertion to CreateTableLocalOnlyMode - Improve DeleteWithIndexMaintenanceFailure and UpdateIndexRebuildFailure comments - Add assertion to CreateCompositeIndexNotSupported - Test count: 108 → 107 (-1 removed empty test)
…ild gap - Add VerifyIndexInMetadata test to confirm CREATE INDEX populates table_meta->indexes correctly - Change UpdateIndexedColumn tests to not use PRIMARY KEY (same pattern as CreateIndexBasic) - Test count: 107 → 108 (+1 investigation test) - Coverage: 39.43%
…ents - Remove ~30 line number references from distributed_executor_tests.cpp - Remove ~12 line number references from query_executor_tests.cpp - Replace with functional descriptions that won't become stale - Fix unused variable warning with [[maybe_unused]] attribute
Add class and section comments explaining: - These are branch coverage tests, not behavioral correctness tests - Weak assertions (void res) are intentional - 3 tests are DISABLED due to async timing issues
- InsertRpcFailure: fixed handler setup order - set handler BEFORE pushing unique_ptr into servers_ to avoid undefined behavior - CommitPrepareFailure: same fix - set handlers before push_back - FullJoinPhase3_5: remains DISABLED with detailed comment explaining async hang in Phase 3-5 std::async RPC calls despite proper handlers
Replace repetitive inline lambda handlers for TxnPrepare/TxnCommit/TxnAbort with a reusable helper that reduces code duplication across tests.
1. TestEnvironment destructor: Use std::filesystem to iterate and remove all test_data artifacts instead of hard-coded file list 2. StringExecuteWithCacheHit: Use same SQL string variable for both execute calls to actually test cache hit behavior 3. TransactionIsolationReadBeforeCommit: Add BEGIN on exec2 to create separate transaction context before SELECT 4. CreateCompositeIndexNotSupported: Assert failure with error message instead of conditional check
std::filesystem not properly linked in CI g++ build environment. Reverting to use explicit std::remove calls for known test artifacts. Also added explicit cleanup for additional test table heap files.
Summary
DistributedExecutorWithNodesTestsfixture for in-process testing of distributed execution paths without external clusterTest plan
cmake --build build --target distributed_executor_tests && ./build/distributed_executor_tests— 47 tests passgcov -b build/CMakeFiles/sqlEngineCore.dir/src/distributed/distributed_executor.cpp.gcda— 59.53% lines, 35.16% taken branchesDisabled tests (known async timing issues)
DISABLED_FullJoinPhase3_5: hangs in async Phase 3-5 unmatched rows collection. Root cause is async std::async + RpcClient::call() hang - appears to be a subtle issue in Phase 3-5 coordinator/data node coordination requiring further investigation.Changes from initial PR
InsertRpcFailureandCommitPrepareFailurewere re-enabled (fixed handler setup order - set handlers before pushing unique_ptr into servers_)set_success_reply_handlerhelper to reduce repetitive RPC handler boilerplate