Skip to content

Add two-way network split playbook#181

Merged
pk910 merged 7 commits into
masterfrom
parithosh/two-way-network-split
May 19, 2026
Merged

Add two-way network split playbook#181
pk910 merged 7 commits into
masterfrom
parithosh/two-way-network-split

Conversation

@parithosh
Copy link
Copy Markdown
Member

@parithosh parithosh commented May 19, 2026

Summary

  • Adds playbooks/dev/two-way-network-split.yaml — a Kurtosis devnet playbook that uses the disruptoor HTTP API to partition the network into two halves, verifies finality stalls for splitObservationEpochs, clears the split, then verifies finality recovers within recoveryEpochs.
  • Updates .hack/devnet/run.sh to pass --privileged to kurtosis run when disruptoor is present in the args file (disruptoor requires it).
  • Regenerates playbooks/_index.yaml to include the new playbook.

Test plan

  • Launch a 4-node ethereum-package devnet with disruptoor enabled and run the playbook via make devnet-run
  • Confirm split partitions are observed via GET /v1/state while the playbook runs
  • Confirm finality stalls during the split and recovers within recoveryEpochs after clearing

🤖 Generated with Claude Code

Adds a Kurtosis devnet playbook that uses the disruptoor HTTP API to
partition the network into two halves, verifies finality stalls during
the split, heals it, and verifies finality recovers. Also passes
--privileged to `kurtosis run` when disruptoor is in the args file,
which it requires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qu0b-reviewer
Copy link
Copy Markdown

qu0b-reviewer Bot commented May 19, 2026

🤖 qu0b-reviewer

Now I have enough context. Let me write the review.

Summary

The PR adds a new two-way-network-split playbook that uses the disruptoor HTTP API to partition a devnet into two halves and verify finality behavior, plus a small related change to the devnet launch script to pass --privileged when disruptoor is enabled.

Issues

  • 🔴 blocker playbooks/dev/two-way-network-split.yaml:21slotsPerEpoch: 32 is hardcoded. Every other playbook that needs slot/epoch math fetches it dynamically from tasks.get_specs.outputs.specs.SLOTS_PER_EPOCH. Hardcoding 32 will silently produce wrong slot targets on any network with a different slot count (e.g., fast slots, testnets with custom configs). The two minSlotNumber configVars at lines 112 and 144 both depend on this value.

  • 🟡 concern playbooks/dev/two-way-network-split.yaml:18disruptoorUrl has no http:// prefix in the default but the shell command does jq -r . on it and uses it directly in curl, so it's fine — but the config default "http://disruptoor:7700" is redundant since the shell strips the JSON string wrapper anyway. Minor, but inconsistent with other string config defaults that are plain values.

Suggestions

  • playbooks/dev/two-way-network-split.yaml — Add a get_consensus_specs task (with ID get_specs) before the slot-range tasks so the full spec is available; this is the established pattern across the repo and future-proofs the playbook if other spec-dependent logic is added.
  • playbooks/dev/two-way-network-split.yaml:107 — The verification curl ... | jq -e '.partitions | length == 1' succeeds quietly because the >/dev/null discards it; since set -euo pipefail is in effect, a non-empty jq result (0 exit) still passes. Consider adding || { echo "partition not created"; exit 1; } after the PUT curl for explicit failure signaling.
  • .hack/devnet/run.sh:25 — The grep pattern ^[[:space:]]*-[[:space:]]*disruptoor uses a YAML-unquoted literal - disruptoor; if the args file uses _disruptoor, a different key pattern, or any YAML formatting (e.g. disruptoor: with nested keys), this check silently misses the need for --privileged. A comment in this section noting this limitation would help future editors know what the pattern is and isn't matching.

Reviewed @ d4d7cdf5
"The bug is always in the last place you look — because you stop looking after you find it."

parithosh and others added 6 commits May 19, 2026 20:55
Replaces the hardcoded splitLeftParticipants/splitRightParticipants
config with jq expressions that compute the two halves from the
check_clients_are_healthy totalCount output: nodes 1..floor(N/2) on the
left, floor(N/2)+1..N on the right. For odd N the right half gets the
extra node. minClientCount (default 4) keeps either half below the 2/3
finality threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two nodes is the minimum that supports a two-way split (1 vs 1). At
N=3 the larger half retains a 2/3 finality majority; that's an operator
choice rather than something this playbook should refuse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Disruptoor can take a few seconds to bind its HTTP listener after the
enclave starts, so the immediate curl in the health-check task would
fail before the API was reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the hardcoded slotsPerEpoch=32 config with a get_consensus_specs
task that exposes the value via tasks.get_specs.outputs.specs.SLOTS_PER_EPOCH,
matching the established pattern in other playbooks. Also documents the
limited YAML shapes the run.sh disruptoor grep recognises.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: pk910 <philipp@pk910.de>
@pk910 pk910 enabled auto-merge May 19, 2026 21:31
@pk910 pk910 merged commit 0058109 into master May 19, 2026
9 checks passed
@pk910 pk910 deleted the parithosh/two-way-network-split branch May 19, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants