Skip to content

claude: add skill for running local kurtosis testnets#20876

Merged
taratorio merged 6 commits intomainfrom
kurtosis_skill
Apr 29, 2026
Merged

claude: add skill for running local kurtosis testnets#20876
taratorio merged 6 commits intomainfrom
kurtosis_skill

Conversation

@taratorio
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Claude skill document to guide running and iterating on local Kurtosis ethereum-package testnets against a locally built test/erigon:current image, mirroring the repo’s test-kurtosis-assertoor CI flow.

Changes:

  • Introduces a new kurtosis-test skill with end-to-end local workflow steps (build image → run enclave → monitor → triage → rebuild/rerun loop).
  • Documents CI-aligned ethereum-package branch pinning guidance and troubleshooting tips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
Comment thread .claude/skills/kurtosis-test/SKILL.md
Comment thread .claude/skills/kurtosis-test/SKILL.md
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
Comment thread .claude/skills/kurtosis-test/SKILL.md
Comment thread .claude/skills/kurtosis-test/SKILL.md
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issues

  1. The image-freshness check is GNU-only (won't run on macOS). SKILL.md:84
    img_ts=$(... | xargs -I{} date -d {} +%s ...)
    On BSD date (macOS), -d is "set DST flag", not "parse this string" — verified locally just now: date -d '2025-01-01' +%s → date: illegal option -- d. CLAUDE.md is explicit: "all
    scripts and shell code… must be cross platform, for macOS, Windows and Linux." Fix is one Go-template hop:
    img_ts=$(docker image inspect test/erigon:current --format '{{.Created.Unix}}' 2>/dev/null || echo 0)

  2. The freshness check ignores uncommitted changes. SKILL.md:85
    src_ts=$(git log -1 --format=%ct)
    This compares to the last commit's timestamp. The skill's whole premise is the fix → rebuild → rerun loop where the user (or the skill itself, via Edit) is making local source edits —
    those won't be in git log yet, so the check will skip the rebuild and silently run the stale image. Either always rebuild and lean on BuildKit's layer cache, or compare against find
    . -name '*.go' -newer -print -quit. Always-rebuild is simpler and matches CI.

Real bugs (medium)

  1. Block-progress monitor never honors duration. SKILL.md:120-138 — the while true loop only exits on stall (deadline exceeded). The Inputs table at SKILL.md:24 promises duration=Nm
    is "the wall-clock window the monitor watches before declaring stable if no failures trip," but no duration variable appears anywhere in the body. Add either:
    end=$(( $(date +%s) + duration_secs ))
    while [ "$(date +%s)" -lt "$end" ]; do ...
    or drop duration from the inputs.

  2. The service-name regex won't match kurtosis enclave inspect output. SKILL.md:101, 144
    grep -oE '^el-[0-9]+-erigon-[a-z]+'
    That output is columnar — UUID, then service name, indented. The ^ anchor will fail; the script will silently set EL_SERVICE="", and every subsequent kurtosis port print "$ENCLAVE" ""
    returns nothing, then EL_RPC_PORT is empty, and curl quietly hits port 0. Either drop the anchor (grep -oE 'el-[0-9]+-erigon-[a-z]+') or pipe through awk '$2 ~ /^el-/ {print $2}'.
    Same fix for the cross-client log loop at SKILL.md:144.

Worth verifying with the actual CLI

  1. kurtosis service exec -- syntax. SKILL.md:200, 202 — newer kurtosis CLI versions take the command as a single quoted string (kurtosis service exec
    "ls -la /jwt/"), not -- . Worth confirming against the version pinned for this skill.

  2. kurtosis service logs --follow=false. SKILL.md:140, 148 — kurtosis service logs is one-shot by default; --follow is what enables tailing. --follow=false may parse fine but is
    redundant. Drop the flag.

Smaller stuff

  • Inputs duration/auto/max-attempts are listed but never parsed in the bash. SKILL.md:18-30 + body. Fine if the model is the parser, but say so explicitly.
  • Triage suggestion eth_call for payload replay. SKILL.md:172 — eth_call runs one tx against committed state; for full payload replay you want debug_traceBlockByNumber /
    debug_traceBlockByHash. Trim the eth_call mention.
  • Style: the skill is noticeably wordier than hive-test/SKILL.md and launch-bal-devnet-2/SKILL.md. Lines like "The user's rule: challenge other implementations against erigon — figure
    out who is in the wrong" read like AI commentary. The same point is made by the triage table; the editorializing prose can come out without losing information.

- Inline the config -> ethereum-package branch mapping as a real `case` block
  (previously referenced an undefined `get_package_ref_from_config_path`).
- Add caplin-assertoor.io to the mapping and explicitly note caplin suites
  must use the erigontech fork.
- Reconcile GpuConfig guidance: the 1.18.1 CLI requirement only applies to
  ethereum-package branches that include the GpuConfig built-in; the pinned
  branches in the mapping (5.0.1 / 6.1.0 / erigontech fork) all predate it.
- Update troubleshooting row to point at the pinned versions instead of the
  contradictory "<= 6.0.0".
- Reorder kurtosis run args to put the package ref immediately after `run`,
  matching the convention in kurtosis docs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@taratorio
Copy link
Copy Markdown
Member Author

@yperbasis addressed


5 accepted (substantive issues fixed):

  • Inlined the case "$(basename "$CONFIG")" block — was previously calling an
    undefined get_package_ref_from_config_path.
  • Added caplin-assertoor.io to both the mapping table and the case block (was
    missing — would have hit the 5.0.1 default fallback).
  • Reconciled the GpuConfig contradiction in Prerequisites (1.18.1 only needed for
    branches that include the built-in; the pinned branches all predate it).
  • Reconciled the GpuConfig contradiction in Troubleshooting (was saying "≤ 6.0.0"
    while the rest of the doc pins to 6.1.0).
  • Reordered kurtosis run args to put the package ref immediately after run, per
    kurtosis-docs convention.

4 challenged (Copilot hallucinations):

  • Four separate comments claimed tables used ||-prefixed rows. They don't — every
    table in the file uses standard single-pipe markdown. Verified with grep -n '^||'
    SKILL.md returning nothing. Pushed back on each with the verification.

1 partially challenged:

  • The kurtosis run ordering comment claimed wrong order would cause "CLI parse
    errors". I accepted the reorder for stylistic consistency but flagged that the
    reasoning is incorrect — kurtosis is Cobra-based and accepts flags in any order.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated

```bash
img_ts=$(docker image inspect test/erigon:current --format '{{.Created}}' 2>/dev/null \
| xargs -I{} date -d {} +%s 2>/dev/null || echo 0)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “skip rebuild if image is fresh” snippet uses date -d to parse the Docker image Created timestamp. date -d is GNU-date specific and will fail on macOS/BSD environments, which makes the snippet non-runnable for many local setups. Consider either (a) documenting the GNU date/coreutils requirement for this optimization, or (b) replacing the parsing with a more portable approach (e.g., python3 -c ...), or (c) dropping the optimization and always rebuilding.

Suggested change
| xargs -I{} date -d {} +%s 2>/dev/null || echo 0)
| python3 -c 'import sys; from datetime import datetime, timezone; s = sys.stdin.read().strip(); print(int(datetime.fromisoformat(s.replace("Z", "+00:00")).astimezone(timezone.utc).timestamp()) if s else 0)' \
2>/dev/null || echo 0)

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +166
## Monitor

Three checks run on a polling loop until either (a) `duration` elapses, or (b) any
failure trips. All three are captured in the run history.

Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says the three checks run on a polling loop until either duration elapses or a failure trips, but the only provided loop is the block-height loop and it has no overall duration cutoff (it runs indefinitely unless stalled). Either update the example to include a duration end-time check, or clarify that duration handling is implemented outside these snippets so readers don’t assume the shown loop matches the described behavior.

Copilot uses AI. Check for mistakes.
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
| jq -r '.result')
height=$(printf '%d\n' "$height_hex")
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the block-height snippet, if the RPC isn’t reachable yet (or jq returns null), height_hex may be empty/null, and printf '%d' will emit an “invalid number” warning and treat it as 0, potentially misclassifying the situation as a stall. Consider explicitly validating that .result matches ^0x[0-9a-fA-F]+$ (and treating anything else as an immediate failure with a helpful message) before converting to an integer.

Suggested change
height=$(printf '%d\n' "$height_hex")
case "$height_hex" in
0x[0-9a-fA-F]*)
height=$((16#${height_hex#0x}))
;;
*)
echo "FAIL: invalid eth_blockNumber result from EL RPC on port ${EL_RPC_PORT}: ${height_hex:-<empty>}"
break
;;
esac

Copilot uses AI. Check for mistakes.
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated

| Argument | Default | Notes |
|---|---|---|
| `$1` config path | required | Path to an `ethereum-package` YAML (same schema as `.github/workflows/kurtosis/*.io`). If omitted, list those files and ask the user to pick. |
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “If $1 is omitted, list those files” guidance points at .github/workflows/kurtosis/*.io, but that directory contains both ethereum-package args files (participants/participants_matrix) and assertoor playbooks (id/tasks). Listing all *.io makes it easy to pick a non-ethereum-package file that kurtosis run --args-file can’t consume. Consider filtering the suggested list (e.g., only files that contain participants:/participants_matrix: or match *-assertoor.io plus known suites), or explicitly warning that assertoor playbooks in that folder are not valid --args-file inputs.

Suggested change
| `$1` config path | required | Path to an `ethereum-package` YAML (same schema as `.github/workflows/kurtosis/*.io`). If omitted, list those files and ask the user to pick. |
| `$1` config path | required | Path to an `ethereum-package` YAML. Valid examples live among `.github/workflows/kurtosis/*.io`, but that directory also contains assertoor playbooks that are **not** valid `kurtosis run --args-file` inputs. If omitted, list only the `.io` files that look like ethereum-package args files (for example, files containing `participants:` or `participants_matrix:`) and ask the user to pick from that filtered set. |

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +37
## Prerequisites

1. **Docker** running: `docker info >/dev/null` should succeed.
2. **Kurtosis CLI** installed: `kurtosis version`. Install from
https://docs.kurtosis.com/install if missing. CLI **≥ 1.18.1** is only needed when
the `--package@branch` you run includes the `GpuConfig` Starlark built-in (i.e.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prerequisites list doesn’t mention jq (and curl), but multiple later snippets rely on them (jq -r '.result', parsing assertoor API, etc.). Add them as explicit prerequisites (or add a short “requires curl+jq” note) so users don’t hit confusing runtime failures when copy/pasting.

Copilot uses AI. Check for mistakes.
Comment thread .claude/skills/kurtosis-test/SKILL.md Outdated
3. **Erigon source tree** at the cwd: `Makefile` exists and `go.mod` contains
`module github.com/erigontech/erigon`.
4. **Fork detection** — skim the YAML for fork-epoch keys
(`{cancun,prague,osaka,fulu,gloas,amsterdam}_fork_epoch`) to know which fork is
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fork-epoch key list omits keys that appear in the repo’s kurtosis configs (e.g. electra_fork_epoch in pectra.io, deneb_fork_epoch in caplin configs). This can cause incorrect fork identification when following these steps. Consider expanding the list to cover the fork keys used in .github/workflows/kurtosis/*.io (at least deneb/electra in addition to the newer ones).

Suggested change
(`{cancun,prague,osaka,fulu,gloas,amsterdam}_fork_epoch`) to know which fork is
(`{deneb,cancun,electra,prague,osaka,fulu,gloas,amsterdam}_fork_epoch`) to know which fork is

Copilot uses AI. Check for mistakes.
taratorio and others added 3 commits April 29, 2026 12:58
Blocking issues:
- Drop the GNU-only `date -d` freshness check; always rebuild instead.
  BuildKit's layer cache makes the no-op fast and this picks up
  uncommitted source edits which a `git log` check would miss.

Real bugs:
- Block-progress monitor now honors `duration_secs`: the loop exits on
  STABLE after the full window or STALL on no progress.
- Switch service-name extraction from `grep -oE '^...'` (which never
  matches columnar `enclave inspect` output) to awk on field 2.

CLI usage:
- `kurtosis service exec` takes the command as a single positional arg
  (multi-word quoted), not `-- <cmd>`. Verified via `--help`.
- Drop redundant `--follow=false` from `kurtosis service logs` (one-shot
  by default).

Smaller:
- Note that the model parses Inputs args into shell vars.
- Replace `eth_call` with `debug_traceBlockByNumber/Hash` for payload
  replay in the triage table (eth_call runs against committed state).
- Trim the "user's rule" editorial prose; the triage table carries the
  same information.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- RPC validation in monitor: guard `printf '%d'` against non-hex
  responses (RPC unreachable, jq null), so the loop logs "RPC
  unreachable" and stays in the stall window without spamming
  printf "invalid number" warnings or falsely incrementing height.
- Add `curl` and `jq` to prerequisites — both are used by the
  monitor and triage snippets.
- Fix fork-epoch key list: configs use CL-side names
  (`deneb`/`electra`/`fulu`/`gloas`_fork_epoch), not the EL names
  the previous list had (`cancun`/`prague`/`osaka`/`amsterdam`).
  Map back to EL names so users still recognise the fork.
- Sharpen `$1` config-path guidance: `.github/workflows/kurtosis/`
  contains both ethereum-package args files and assertoor
  playbooks. Direct users to `grep -lE '^participants(_matrix)?:'`
  to filter to valid args files only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real bugs (medium):

  • Stall detection can trip before the first block. SKILL.md monitor section. stall_deadline is initialized to start + stall_window (e.g. T+36s for a 12s slot), and is only refreshed
    once height > prev. If the genesis-to-first-block gap exceeds 3 × seconds_per_slot — likely on cold-start configs with genesis_delay, e.g. glamsterdam.io has genesis_delay: 20 on a 6s
    slot, so first block can land near T+26..32s, dangerously close to the T+18s stall window — a false STALL fires. Fix: gate the stall check on observed progress:
    if [ "$prev" -gt 0 ] && [ "$(date +%s)" -gt "$stall_deadline" ]; then
    ...
    fi
  • Both STALL and STABLE can print. Same block. After break from the STALL branch, the post-loop if [ "$(date +%s)" -ge "$end" ] can also be true if the stall happened to land in the
    last poll window. Use a flag (stable=true; … stable=false; break) so you print exactly one outcome.

Doc inconsistencies (low):

  • Cleanup section advertises trap but doesn't set one. The header reads Run on success, failure, and signal interruption (trap): but the snippet has no trap '... cleanup ...' EXIT INT
    TERM. Either drop the (trap) mention or actually show the trap installation.
  • Check A pass criterion description ("advances ≥1 within every 2 × seconds_per_slot window") doesn't quite match the code (stall_window = 3 × slot). The 2× is the poll cadence, the
    3× is the failure threshold — easy to clarify.

Nits:

  • slot_secs=$(grep -E '^\s*seconds_per_slot:' …) works (verified \s parses on macOS BSD grep against the actual YAMLs in .github/workflows/kurtosis/), so no portability fix needed
    there. But it would silently break if a YAML quotes the value (seconds_per_slot: "12"); none of the current configs do, so this is a future-proofing note, not a bug.
  • caplin-assertoor.io is in the mapping table but not in the CI matrix — the author mentioned this is intentional (skill is broader than CI). Fine to keep, just worth knowing.

Real bugs:
- Don't trip stall before first block. Configs with `genesis_delay` >
  `stall_window` (e.g. glamsterdam.io: 20s delay vs 18s window at 6s
  slots) would fire a false STALL during the pre-genesis gap. Gate the
  stall check on `prev > 0` so only real post-genesis stalls fire.
- Single outcome guarantee. Track outcome in a variable so STALL and
  STABLE can't both print on the same iteration when the stall lands
  in the last poll window. Adds NO_PROGRESS as a third explicit outcome
  for "chain never started" (validators down, etc.) which the previous
  code would have mislabelled as STABLE.

Doc:
- Drop the misleading `(trap)` mention from Cleanup — the snippet does
  not install a trap; the model executes cleanup as a final step.
- Fix Check A pass criterion: `2 ×` is poll cadence, `3 ×` is the stall
  window / failure threshold. Previous text said advance must happen
  within `2 × seconds_per_slot`, conflating the two.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@taratorio
Copy link
Copy Markdown
Member Author

Real bugs (medium):

  • Stall detection can trip before the first block. SKILL.md monitor section. stall_deadline is initialized to start + stall_window (e.g. T+36s for a 12s slot), and is only refreshed
    once height > prev. If the genesis-to-first-block gap exceeds 3 × seconds_per_slot — likely on cold-start configs with genesis_delay, e.g. glamsterdam.io has genesis_delay: 20 on a 6s
    slot, so first block can land near T+26..32s, dangerously close to the T+18s stall window — a false STALL fires. Fix: gate the stall check on observed progress:
    if [ "$prev" -gt 0 ] && [ "$(date +%s)" -gt "$stall_deadline" ]; then
    ...
    fi
  • Both STALL and STABLE can print. Same block. After break from the STALL branch, the post-loop if [ "$(date +%s)" -ge "$end" ] can also be true if the stall happened to land in the
    last poll window. Use a flag (stable=true; … stable=false; break) so you print exactly one outcome.

Doc inconsistencies (low):

  • Cleanup section advertises trap but doesn't set one. The header reads Run on success, failure, and signal interruption (trap): but the snippet has no trap '... cleanup ...' EXIT INT
    TERM. Either drop the (trap) mention or actually show the trap installation.
  • Check A pass criterion description ("advances ≥1 within every 2 × seconds_per_slot window") doesn't quite match the code (stall_window = 3 × slot). The 2× is the poll cadence, the
    3× is the failure threshold — easy to clarify.

Nits:

  • slot_secs=$(grep -E '^\s*seconds_per_slot:' …) works (verified \s parses on macOS BSD grep against the actual YAMLs in .github/workflows/kurtosis/), so no portability fix needed
    there. But it would silently break if a YAML quotes the value (seconds_per_slot: "12"); none of the current configs do, so this is a future-proofing note, not a bug.
  • caplin-assertoor.io is in the mapping table but not in the CI matrix — the author mentioned this is intentional (skill is broader than CI). Fine to keep, just worth knowing.

addressed

@taratorio taratorio enabled auto-merge April 29, 2026 14:48
@taratorio taratorio added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit cc38945 Apr 29, 2026
37 checks passed
@taratorio taratorio deleted the kurtosis_skill branch April 29, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants