Skip to content

Stabilize FlatKV EVM migration stop point#3592

Merged
masih merged 2 commits into
mainfrom
masih/more-flatkv-flaky-test-fix
Jun 15, 2026
Merged

Stabilize FlatKV EVM migration stop point#3592
masih merged 2 commits into
mainfrom
masih/more-flatkv-flaky-test-fix

Conversation

@masih

@masih masih commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

The FlatKV EVM migration test could flake while switching validators from memiavl_only to migrate_evm. The script only checked that validators had the same committed height before flipping modes, but a validator can already have signed a vote for the next height without committing it. After the mode flip, consensus WAL replay could ask that validator to sign different data for the same height, triggering Tendermint's double-sign guard with "conflicting data" and leaving the cluster stuck.

Fix the migration choreography by stopping validators, configuring a temporary future halt-height, restarting them in memiavl_only, and waiting for all nodes to halt themselves at the same durable commit boundary. The script also records priv_validator_state heights and refuses to flip if last-sign state advanced past the committed halt height. Reset halt-height when switching to migrate_evm so the migrated restart can make progress.

Flaked for unrelated changes.

The FlatKV EVM migration test could flake while switching validators from
memiavl_only to migrate_evm. The script only checked that validators had the
same committed height before flipping modes, but a validator can already have
signed a vote for the next height without committing it. After the mode flip,
consensus WAL replay could ask that validator to sign different data for the
same height, triggering Tendermint's double-sign guard with "conflicting data"
and leaving the cluster stuck.

Fix the migration choreography by stopping validators, configuring a temporary
future halt-height, restarting them in memiavl_only, and waiting for all nodes
to halt themselves at the same durable commit boundary. The script also records
priv_validator_state heights and refuses to flip if last-sign state advanced
past the committed halt height. Reset halt-height when switching to migrate_evm
so the migrated restart can make progress.
@cursor

cursor Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Changes are confined to an integration test shell script; no production node or consensus code paths are modified.

Overview
Replaces the pre-flip validator shutdown loop in verify_flatkv_evm_migrate.sh with a halt-height–driven stop so the cluster flips sc-write-mode only after every node halts on the same committed boundary, avoiding Tendermint double-sign flakes when last-sign state is ahead of committed height.

The script stops all validators, sets a temporary halt-height (default +300 blocks from the max of committed height, priv_validator_state last-sign height, and the pre-flip sync floor), restarts in memiavl_only, and waits up to HALT_TIMEOUT. It records last-sign heights and aborts the flip if committed heights diverge or last-sign advanced past the halt commit. halt-height is reset to 0 when switching to migrate_evm.

New helpers factor out coordinated barrier-synchronized starts, parallel TERM stops, and halt-height writes with verification on node 0. The old docker pause / multi-attempt freeze path and PRE_FLIP_STOP_ATTEMPTS are removed.

Reviewed by Cursor Bugbot for commit 26ddc6a. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 15, 2026, 5:02 PM

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc65617ab5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if [ -n "$PRE_FLIP_HEIGHT" ] && [ "$PRE_FLIP_HEIGHT" -gt "$halt_start_height" ]; then
halt_start_height=$PRE_FLIP_HEIGHT
fi
HALT_HEIGHT=$((halt_start_height + PRE_FLIP_HALT_BLOCKS))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for all validators before starting halt countdown

When one validator restarts slower than the other three, this fixed 10-block window can be consumed before the lagging node is up and caught up: in the local 4-validator devnet any three equal-power validators can form quorum, and the config allows very fast blocks (unsafe-commit-timeout-override = "50ms" in docker/localnode/config/config.toml:358). Those three validators can then halt at HALT_HEIGHT, leaving the fourth without live peers to commit the halt block, so wait_for_all_seid_stop times out instead of stabilizing the migration; choose the halt target only after all validators are observed running/caught up, or make the window large enough for the slowest restart path.

Useful? React with 👍 / 👎.

@masih masih requested a review from cody-littley June 15, 2026 16:45
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.36%. Comparing base (e7c0eff) to head (26ddc6a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3592      +/-   ##
==========================================
- Coverage   59.22%   58.36%   -0.87%     
==========================================
  Files        2213     2139      -74     
  Lines      183115   174568    -8547     
==========================================
- Hits       108453   101878    -6575     
+ Misses      64914    63642    -1272     
+ Partials     9748     9048     -700     
Flag Coverage Δ
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 74 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@masih masih enabled auto-merge June 15, 2026 17:01
@masih masih added this pull request to the merge queue Jun 15, 2026
Merged via the queue into main with commit 1d41e3e Jun 15, 2026
60 checks passed
@masih masih deleted the masih/more-flatkv-flaky-test-fix branch June 15, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants