Skip to content

CI validation is structural only; add semantic drift checks #55

@plx

Description

@plx

Problem Statement

plugin-validate.yml currently validates plugin structure, not plugin behavior/meaning.
The workflow confirms frontmatter, file presence, JSON shape, and executable bits, but it does not verify that:

  1. documented trop commands/flags in skills/commands still match the actual CLI contract, or
  2. hook scripts behave correctly for representative hook payloads.

This creates a semantic drift gap: docs and hooks can silently diverge from reality while CI stays green.

Severity: Medium
Impact: Incorrect guidance and hook behavior can ship undetected, causing user confusion and automation failures.

Affected Files / Lines

File Lines Why this matters
.github/workflows/plugin-validate.yml 52-107 Structural checks only (SKILL.md frontmatter, command frontmatter, hooks.json has hooks, executable bit, word counts). No semantic assertions.
plugins/trop/skills/trop-reference/SKILL.md 18-25, 81 Documents concrete commands/flags/output semantics that can drift from CLI behavior.
plugins/trop/skills/trop-reference/references/cli-commands.md 7-10, 57-62 Canonical command/flag matrix in docs; currently unvalidated against CLI source-of-truth.
plugins/trop/commands/migrate.md 37-56 Prescribes specific runtime commands/flows that should remain consistent with CLI behavior.
plugins/trop/hooks/hooks.json 4-24 Hook wiring is only shape-validated, not behavior-validated.
plugins/trop/hooks/scripts/worktree-enter.sh 16-27 Payload parsing and command invocation logic are untested in CI.
plugins/trop/hooks/scripts/worktree-exit.sh 16-23 Same as above; no fixture-based behavior checks.
trop-cli/src/cli.rs 46-114 Source of truth for available subcommands.

Recommended Fix

Add semantic validation to CI in two parts:

1. Docs-to-CLI semantic check

Parse plugin markdown for trop ... command examples and --flags, and fail if they reference unknown subcommands/flags compared to CLI help (or clap metadata).

2. Hook behavior fixture tests

Execute hook scripts with representative JSON payload fixtures and a stub trop binary to verify:

  • payload key extraction works,
  • expected command is invoked,
  • script always returns valid {"permissionDecision":"allow"} JSON.

Workflow update (example)

# .github/workflows/plugin-validate.yml
- name: Validate plugin semantics
  run: ./.github/scripts/validate-plugin-semantics.sh

- name: Validate hook behavior fixtures
  run: ./.github/scripts/test-plugin-hooks.sh

Docs-to-CLI validator sketch

#!/usr/bin/env bash
set -euo pipefail

# Build allowed command/flag set from CLI help
cargo run -q -p trop-cli -- --help > /tmp/trop-help.txt
# ...collect subcommands and per-command flags...

# Extract documented trop commands/flags from plugin markdown
rg -No '`trop [^`]+`' plugins/trop/skills plugins/trop/commands > /tmp/doc-cmds.txt
# ...fail on unknown subcommand/flag references...

Hook fixture test sketch

#!/usr/bin/env bash
set -euo pipefail

tmp="$(mktemp -d)"
mkdir -p "$tmp/bin" "$tmp/worktree"
touch "$tmp/worktree/trop.yaml"

cat > "$tmp/bin/trop" <<'STUB'
#!/usr/bin/env bash
echo "$@" >> "$TMPDIR/trop-calls.log"
STUB
chmod +x "$tmp/bin/trop"

export PATH="$tmp/bin:$PATH"
export TMPDIR="$tmp"

printf '{"tool_result":{"path":"%s"}}' "$tmp/worktree" \
  | bash plugins/trop/hooks/scripts/worktree-enter.sh \
  | jq -e '.permissionDecision=="allow"'

grep -q 'autoreserve --quiet' "$TMPDIR/trop-calls.log"

Validation Strategy

  1. Add automated CI checks above and make them required in PRs touching plugins/** or .claude-plugin/**.
  2. Add negative tests:
    • Introduce a fake flag in docs (e.g. --does-not-exist) and confirm CI fails.
    • Introduce a fake subcommand in docs and confirm CI fails.
  3. Add hook fixture cases:
    • Enter payload with .tool_result.path and .tool_result.cwd.
    • Exit payload with .tool_input.path and .tool_result.path.
    • Missing path payload must still return permissionDecision=allow without failing.
  4. Manual smoke verification:
    • Run act or local workflow equivalent and confirm semantic checks fail on intentional drift and pass after correction.

Acceptance Criteria

  1. CI fails when plugin docs reference non-existent trop subcommands/flags.
  2. CI fails when hook scripts stop handling expected payload shapes.
  3. CI remains green for valid plugin/docs/hooks changes.
  4. plugin-validate.yml includes semantic checks in addition to existing structural checks.

🤖 Discovered by Codex · expanded by Codex · filed by Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Non-blocking semantic, stability, completeness, or correctness.claude-code-pluginConcerns trop's claude code plugin.discovered-by-codexIssues discovered by codex.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions