Skip to content

Comments

Range read block range lock if RW conflict#180

Merged
githubzilla merged 9 commits intoeloqdata:eloq-10.6.10from
githubzilla:range_read_block_on_rw_conflict
Dec 12, 2025
Merged

Range read block range lock if RW conflict#180
githubzilla merged 9 commits intoeloqdata:eloq-10.6.10from
githubzilla:range_read_block_on_rw_conflict

Conversation

@githubzilla
Copy link
Collaborator

@githubzilla githubzilla commented Dec 9, 2025

Summary by CodeRabbit

  • Chores

    • Updated a tracked submodule to a newer commit.
  • Tests

    • Added a MySQL integration test verifying reads block while a concurrent write lock and range split are in progress, then resume.
    • Added a MySQL integration test simulating a range-split deadlock, verifying retry and successful completion with no lingering blocked queries.
    • Added test configuration entries setting a long checkpointer interval for these tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Updated the data_substrate submodule pointer and added three MySQL ELoQ test artifacts exercising range-split behavior: a read-block-on-write-lock test and a range-split deadlock-abort test with options and expected results.

Changes

Cohort / File(s) Change Summary
Submodule update
data_substrate
Advanced submodule pointer from 0ee68ce2451e4a0f6a46b37f1d289038279ef4be to 7c0a42621ceaae1283ae0a8a531796b49c547a3a.
Range-read block test
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test, storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.opt
Added test that seeds data to force a range split, pauses the split while a write lock is held, opens concurrent read connections to assert they block and later complete, and drops t1; added --checkpointer_interval=86400.
Range-split deadlock abort test
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test, storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt, storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result
Added test that injects a DEAD_LOCK_ABORT at range-split phase 8, coordinates lock downgrade and retry, verifies completion and absence of lingering blocked queries, includes --checkpointer_interval=86400, and adds expected results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test — timing, processlist-based blocking checks, debug-point synchronization, and cleanup.
    • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test and .../r/...result — fault injection sequencing and expected-result assertions.
    • data_substrate submodule bump — confirm compatibility with test debug hooks or API changes.

Possibly related PRs

Suggested reviewers

  • liunyl

Poem

"🐇 I nudged a pointer, paused a split with care,
Three readers waited patient on a hare's stare.
A cheeky deadlock hopped, then bowed and withdrew,
Rows danced back in order — tests finished true.
Carrots for all — burrow, bounce, and chew!"

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Range read block range lock if RW conflict' directly describes the main functionality being tested and implemented across the PR changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cd7f5d and b502805.

📒 Files selected for processing (6)
  • data_substrate (1 hunks)
  • storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.opt (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (2)

104-106: Remove commented-out debug command.

Line 105 contains a commented-out debug command that appears to be leftover from development. Consider removing it to keep the test clean.

 SET SESSION debug_dbug="-d,eloq;term_SplitFlushOp_CommitAcquireAllWriteOp_Continue;node_id=-1";
-# SET SESSION debug_dbug="+d,eloq;at_once;node_id=-1;action=NOTIFY_CHECKPOINTER";
 --sleep 5

110-123: Redundant query re-execution after --reap.

After --reap retrieves the result from the async --send, the same queries are executed again (lines 112, 117, 122). This is redundant unless intentionally verifying queries work after lock release—if so, add a comment explaining the intent.

 --connection conn_read1
 --reap
-select count(*) from t1;
 commit;

 --connection conn_read2
 --reap
-select count(*) from t1 where c4 = 1;
 commit;

 --connection conn_read3
 --reap
-select c0, c1, c2 from t1;
 commit;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 363d781 and 6b5f697.

📒 Files selected for processing (1)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (1 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (2)

20-24: Consider verifying the fault injector actually triggers.

The test assumes the fault injector will fire and cause the downgrade/retry logic, but doesn't explicitly verify this happened. Consider adding diagnostic output or checking logs to confirm the fault injection occurred.


57-57: Redundant connection statements.

The --connection default statements appear redundant since no other connections are established in this test. Consider removing them for clarity, or add a comment explaining why they're needed.

Also applies to: 63-63

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3dea8fd and e0d4e68.

📒 Files selected for processing (4)
  • data_substrate (1 hunks)
  • storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt
🚧 Files skipped from review as they are similar to previous changes (1)
  • data_substrate
🔇 Additional comments (7)
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (6)

1-6: LGTM!

Standard MySQL test setup with proper include file, storage engine configuration, and defensive table cleanup.


8-13: LGTM!

Clear documentation explaining the test objective and the fault injection mechanism.


15-18: LGTM!

Table structure with multiple char(250) columns is appropriate for triggering range split by exceeding size thresholds.


26-40: LGTM!

The data buildup logic correctly doubles the row count 11 times (reaching 2048 rows), which with the char(250) columns should reliably exceed the range size threshold. Good practice disabling query and result logs for bulk operations.


60-70: Good defensive verification.

The count verification confirms the range split completed, and the wait_condition properly ensures no queries remain blocked, which validates the downgrade/retry logic succeeded.


73-73: LGTM!

Proper test cleanup.

storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result (1)

1-26: LGTM!

The expected result file correctly matches the test script output. The count of 2048 rows (line 23) aligns with the 11 iterations of data doubling (2^11 = 2048), confirming successful range split completion.

@githubzilla githubzilla force-pushed the range_read_block_on_rw_conflict branch from 96061d7 to 2498f86 Compare December 11, 2025 07:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (1)

125-135: Final wait_condition is still overly broad; can fail due to unrelated idle threads

The last check:

select count(*) = 0 from information_schema.processlist
    where ((state like "%lock%") or
           (info IS NULL and time >= 1))
    and (info like "select%from t1%" or info IS NULL);

will match any sleeping thread with info IS NULL and time >= 1 (because of the (info IS NULL and time >= 1) branch combined with or info IS NULL), even if it has nothing to do with this test. That can cause count(*) = 0 to never become true in environments with background/system threads or other idle sessions, making the test flaky or failing for the wrong reason. This is the same class of issue that was fixed earlier in the file for the >= 1 / >= 3 conditions.

You can tighten this to only consider the specific read queries started in the test and exclude the current connection, mirroring the earlier fixes, e.g.:

-let $wait_condition=
-    select count(*) = 0 from information_schema.processlist
-    where ((state like "%lock%") or
-           (info IS NULL and time >= 1))
-    and (info like "select%from t1%" or info IS NULL);
+let $wait_condition=
+    select count(*) = 0 from information_schema.processlist
+    where (state like "%lock%" or info like "select%from t1%")
+      and id != CONNECTION_ID();

(or even narrow info like further to the exact t1 queries if you prefer).

🧹 Nitpick comments (2)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (2)

17-21: Confirm debug_dbug SET semantics; second assignment may drop the pause hook

You first install the pause hook (term_SplitFlushOp_CommitAcquireAllWriteOp_Continue) at Line 20, then later overwrite debug_dbug with a different string at Line 40 that omits that term, and finally only remove the term at Line 104. If SET SESSION debug_dbug = ... is not cumulative, the Line 40 assignment will clear the pause directive and the test may no longer guarantee that the range split actually stalls while reads are sent.

Consider either:

  • combining both behaviors in a single value, or
  • re‑including term_SplitFlushOp_CommitAcquireAllWriteOp_Continue when you add the at_once/NOTIFY_CHECKPOINTER options, and resetting debug_dbug fully at the end.

Also applies to: 38-42, 101-106


63-65: Comments still mention NULL info/time checks that no longer exist

In Test Case 1 (Lines 63–65) and Test Case 2 (Lines 88–90), the comments list “3. NULL info with time >= 1 (waiting/blocked)” as part of the detection criteria, but the corresponding WHERE clauses now only use state like "%lock%" and specific info like "select ... from t1%" patterns.

To avoid confusion for future readers, either:

  • drop bullet 3 from the comments, or
  • reintroduce the corresponding predicate if that behavior is still desired.

Also applies to: 88-90

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0d4e68 and 96061d7.

📒 Files selected for processing (2)
  • data_substrate (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • data_substrate
🔇 Additional comments (1)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (1)

44-48: Connection usage and cleanup for concurrent reads looks solid

The pattern of opening three dedicated read connections, using --send/--reap around the blocking selects, then committing and finally disconnecting all three connections before dropping t1 is clean and avoids leaked sessions or dangling blocked queries. This addresses the earlier missing‑disconnect concern.

Also applies to: 110-123, 141-147

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (1)

50-54: Replace fixed sleeps with deterministic wait for split progress (flakiness risk).

The fixed --sleep 5 and --sleep 10 are brittle: slow environments may not have reached phase 8 yet, and fast environments will just waste time. This is the same concern already raised earlier on this file.

Prefer a deterministic wait that polls for the desired state with a timeout, e.g.:

  • Use wait_condition.inc around a query that checks that:
    • the range split on t1 has started and
    • has reached phase 8 / completed the downgrade-and-retry cycle.
  • Or, if there is no explicit range-split status table, derive a condition from cluster metadata, a diagnostic view, or a specific processlist pattern that indicates the split has passed phase 8.

Conceptual sketch (pseudo-MTR):

let $wait_condition=
  <SELECT 1 FROM some_status_view
   WHERE table='t1' AND phase >= 8>;
--source include/wait_condition.inc

This would remove the magic timings and make the test robust across machines and build configurations.

🧹 Nitpick comments (1)
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (1)

62-71: Good use of wait_condition.inc to assert no blocked queries; consider also clearing debug flags afterwards.

The post-split wait_condition on information_schema.processlist is a good way to assert the system quiesces and nothing remains blocked on t1.

As a small hygiene improvement, you might want to reset debug_dbug at the end of the test to reduce cross-test coupling, e.g.:

 # Cleanup
 drop table t1;
+SET SESSION debug_dbug="";

This keeps subsequent tests from inheriting the eLOQ / split / checkpointer debug knobs if they happen to reuse the same connection.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96061d7 and edd6cc3.

📒 Files selected for processing (6)
  • data_substrate (1 hunks)
  • storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.opt (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt (1 hunks)
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test
  • storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt
  • storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result
  • data_substrate
  • storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.opt
🔇 Additional comments (2)
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (2)

31-41: Data amplification and logging suppression look good.

The let $counter= 11 loop that doubles t1 each time and the surrounding --disable_query_log / --disable_result_log are a solid pattern for driving the table over the test threshold without polluting logs. No issues here.


1-19: Overall test structure and setup/teardown are sound.

The eLOQ check (have_eloq.inc), engine selection, DROP TABLE IF EXISTS, table definition, seed insert, and explicit cleanup are all consistent with typical MTR patterns and look correct for this scenario.

Also applies to: 72-73

@githubzilla githubzilla force-pushed the range_read_block_on_rw_conflict branch from 8cd7f5d to b502805 Compare December 12, 2025 03:13
@githubzilla githubzilla merged commit 3e820a6 into eloqdata:eloq-10.6.10 Dec 12, 2025
1 of 2 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 15, 2025
yi-xmu added a commit that referenced this pull request Dec 18, 2025
This was referenced Dec 18, 2025
yi-xmu added a commit that referenced this pull request Dec 18, 2025
* Revert "Range read block range lock if RW conflict (#180)"

This reverts commit 3e820a6.

* Update submodule
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants