Skip to content

[codex] Fix OpenCode integration tests#1285

Merged
arittr merged 1 commit into
devfrom
codex/pri-1369-opencode-integration-tests
Apr 27, 2026
Merged

[codex] Fix OpenCode integration tests#1285
arittr merged 1 commit into
devfrom
codex/pri-1369-opencode-integration-tests

Conversation

@arittr
Copy link
Copy Markdown
Collaborator

@arittr arittr commented Apr 27, 2026

What problem are you trying to solve?

The baseline OpenCode integration suite failed on dev while validating the #1232 bootstrap-caching PR. The failures were not caused by #1232:

  • tests/opencode/test-tools.sh still prompted for old custom OpenCode tools (find_skills / use_skill) that the current plugin no longer exposes. Current OpenCode uses the native skill tool.
  • tests/opencode/test-priority.sh hard-failed on a confirmed OpenCode duplicate-name behavior: bundled Superpowers skills can currently shadow project/personal skills with the same native skill name.

The priority behavior is real, but the first attempted plugin fix was too magical for this PR. That fix has been split out to PRI-1370. This PR keeps the integration suite aligned with current OpenCode behavior and keeps the unresolved priority issue visible without landing the odd plugin workaround.

What does this PR change?

This PR rewrites the OpenCode integration tests to exercise OpenCode's native skill tool with JSON-event assertions instead of old find_skills / use_skill prompts. It also changes the priority test into documentation mode for the current duplicate-name behavior, verifies non-colliding bundled Superpowers skills still load, and raises the real-OpenCode command timeout from 60s to 120s to avoid normal model/tool latency flakes.

Is this change appropriate for the core library?

Yes. This is test coverage for the OpenCode plugin support shipped by Superpowers core. It does not add a new third-party integration surface; it keeps the existing OpenCode harness tests accurate for the current plugin and OpenCode CLI behavior.

What alternatives did you consider?

I first tried fixing the local-skill-shadowing behavior in .opencode/plugins/superpowers.js by building a filtered symlink directory of bundled skills during plugin config registration. That passed tests, but it was too surprising and filesystem-heavy for a maintenance PR. Drew asked to split it out, so the behavior fix now lives in PRI-1370 for separate design.

I also considered restoring custom find_skills / use_skill tools, but the current plugin intentionally maps skill usage to OpenCode's native skill tool, so reintroducing custom tools would add unnecessary surface area and diverge from current OpenCode behavior.

Does this PR contain multiple unrelated changes?

No. Both test files are part of the same OpenCode integration-test maintenance pass: update stale native-tool assumptions, document the split-out priority behavior, and make the real CLI sessions less timing-sensitive.

Existing PRs

This came from the OpenCode PR triage pass. #1232 fixed bootstrap caching and is already merged into dev; #1210 and #1216 were duplicate caching fixes; #1247 is a broader OpenCode refactor that should remain separate; #981 is related to OpenCode skill naming/docs but does not fix this integration-test cleanup. The split-out behavior follow-up is PRI-1370.

Environment tested

Harness (e.g. Claude Code, Cursor) Harness version Model Model version/ID
OpenCode 1.14.28 Default configured OpenCode model Not specified by the test harness

Evaluation

  • What was the initial prompt you (or your human partner) used to start the session that led to this change?
    • Drew: "ok let's go ahead and fix those opencode integeration tests now before we fix anything else"
  • How many eval sessions did you run AFTER making the change?
    • Final validation ran the full OpenCode integration suite once after the split. That suite launches real OpenCode CLI sessions from test-tools.sh and test-priority.sh. During debugging, focused reruns were also used for the two failing tests.
  • How did outcomes change compared to before the change?
    • Before: bash tests/opencode/run-tests.sh --integration failed in test-tools.sh and test-priority.sh on plain dev.
    • After: bash tests/opencode/run-tests.sh --integration passed with 4 passed, 0 failed, 0 skipped.
    • Before: test-tools.sh asked for obsolete custom tools.
    • After: test-tools.sh verifies OpenCode's native skill tool loads personal, project, and bundled Superpowers skills.
    • Before: test-priority.sh failed on the unresolved duplicate-name shadowing behavior.
    • After: test-priority.sh documents that current behavior and points at the split-out follow-up, while still verifying non-colliding bundled skills load.

Rigor

  • Not a skills content change: superpowers:writing-skills and adversarial skill-content pressure testing are not applicable
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

Additional verification run:

node --check .opencode/plugins/superpowers.js
bash -n tests/opencode/test-tools.sh
bash -n tests/opencode/test-priority.sh
git diff --check
bash tests/opencode/run-tests.sh --integration
bash tests/opencode/run-tests.sh

All commands passed.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

This is still a draft PR and is pending Drew's review of the complete diff.

@arittr arittr marked this pull request as ready for review April 27, 2026 19:28
@arittr arittr force-pushed the codex/pri-1369-opencode-integration-tests branch from 63d4a19 to 68040f2 Compare April 27, 2026 19:34
@obra
Copy link
Copy Markdown
Owner

obra commented Apr 27, 2026

lgtm

@arittr arittr merged commit 88eb667 into dev Apr 27, 2026
@scicco scicco mentioned this pull request May 5, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants