diff --git a/.lore.md b/.lore.md new file mode 100644 index 0000000..1a074de --- /dev/null +++ b/.lore.md @@ -0,0 +1,22 @@ + + +## Long-term Knowledge + +### Architecture + + +* **opentower handler.ts: CORS scoped to /api/\* by default, not global**: Prior to the CORS fix, \`hono/cors\` was only applied to \`/api/\*\` routes, leaving \`/healthz\` inaccessible from browser origins. The correct pattern is to register \`app.use('\*', cors({...}))\` as the first middleware before \`app.onError\` and before any route. \`allowMethods\` must include \`POST\` for webhook routes if browser preflight checks are needed. + +### Gotcha + + +* **@loreai/opencode 0.13.0–0.13.3 unusable in bun workspaces due to fastembed workspace deps**: \`@loreai/opencode\` versions 0.13.0–0.13.3 and \`@loreai/core\` 0.13.3 transitively depend on \`fastembed@^2.1.0\`, which declared unresolvable \`workspace:\*\` deps (\`gearhash-jit\`, \`@huggingface/blake3-jit\`). Bun failed with "Workspace dependency not found" during install. This was resolved in 0.13.4 — \`fastembed@2.1.0\` on npm no longer carries the broken workspace deps. Note: bun's registry cache may lag npm by minutes — a version appearing in \`npm view ... versions\` may not resolve immediately in bun. + + +* **Biome expects 2-space indentation; tabs cause CI failure**: The repo's \`biome.json\` enforces \`indentStyle: 'space'\` with \`indentWidth: 2\`. Files written with tabs will fail the \`biome check\` CI step. Fix: run \`bunx biome check --write\` (or \`node\_modules/.bin/biome check --write\`) from the repo root to auto-convert all files before committing. Also note: \`useExhaustiveDependencies\` warns when stable setter functions are listed as dependencies — suppress intentional cases with \`// biome-ignore lint/correctness/useExhaustiveDependencies: intentional\`. + + +* **bun.lock goes stale when adding packages inside a workspace subdirectory**: Running \`bun add\` inside a workspace package (e.g. \`packages/dashboard\`) updates that package's local state but may not update the root \`bun.lock\`. CI uses \`bun install --frozen-lockfile\` at the repo root, so a stale root lockfile causes build failure with "lockfile had changes, but lockfile is frozen". Fix: always run \`bun install\` from the repo root after adding any dependency, then commit the updated \`bun.lock\`. + + +* **Hono middleware registration order: /healthz bypasses app.use('\*') registered after it**: In \`handler.ts\`, \`/healthz\` is registered before \`app.use('\*', ...)\` middleware. Hono evaluates routes/middleware in registration order — any middleware registered after a matched route won't run for that route. To apply CORS (or any global middleware) to ALL routes including \`/healthz\`, register it before any route definitions, as the very first call on the \`app\` instance. diff --git a/agents/github-agent.md b/agents/github-agent.md index 50583fc..0520f7c 100644 --- a/agents/github-agent.md +++ b/agents/github-agent.md @@ -122,6 +122,7 @@ load the situation skill for the task at hand. - `pr` — create a draft PR - `mark-pr-ready` — promote draft to ready-for-review - `apply-fixes` — apply review findings as code changes + - `auto-merge` — merge small, non-disruptive PRs after checks pass ### Multi-repo investigation diff --git a/skills/apply-fixes/SPEC.md b/skills/apply-fixes/SPEC.md new file mode 100644 index 0000000..e5ac97b --- /dev/null +++ b/skills/apply-fixes/SPEC.md @@ -0,0 +1,53 @@ +# apply-fixes — Specification + +## Intent + +Turn structured review findings into minimal code changes, committed +and pushed. Used on the bot's own PRs after the `review` skill +produces findings. + +## Scope + +### In scope + +- Parsing the findings JSON array +- Implementing one finding at a time as the smallest possible change +- Running tests when available +- Loading `deslop` before committing +- Reporting which findings were addressed and which were skipped + +### Out of scope + +- Opening new PRs or branches +- Force-pushing +- Fixing findings that require architectural changes + +## Invocation + +Loaded by `review-pr` when findings exist on the bot's own PR, or +by any skill that needs to apply structured fixes. + +## Runtime contract + +### Input + +- JSON array of findings: `[{ kind, file, line, summary, suggested_fix }]` + +### Output + +- Commit SHA, list of findings addressed, list of findings skipped with reasons + +### Side effects + +- Commits and pushes to the current feature branch + +## Evaluation criteria + +- Each finding is addressed with the smallest possible change +- Skipped findings have clear, valid reasons +- Only files that were edited are staged +- The `deslop` skill runs before committing + +## Maintenance + +- The findings schema must stay in sync with the `review` skill's output format diff --git a/skills/auto-merge/SPEC.md b/skills/auto-merge/SPEC.md new file mode 100644 index 0000000..3502e46 --- /dev/null +++ b/skills/auto-merge/SPEC.md @@ -0,0 +1,57 @@ +# auto-merge — Specification + +## Intent + +Automatically merge small, non-disruptive PRs after they pass all +checks and have no review objections. Acts as the final step after +`mark-pr-ready` for trivial changes that don't need human review. + +## Scope + +### In scope + +- Verifying PR state, target branch, and CI status +- Evaluating the size gate (lines, files, sensitive paths) +- Checking for review objections and unresolved threads +- Squash-merging with branch cleanup + +### Out of scope + +- PRs that were created as ready-for-review from the start +- PRs targeting non-default branches +- PRs with review objections or unresolved threads +- PRs exceeding the size gate + +## Invocation + +Loaded by `github-agent` after `mark-pr-ready` completes, or in +response to a `pull_request.ready_for_review` webhook for a PR +that transitioned from draft. + +## Runtime contract + +### Input + +- PR number and repository context + +### Output + +- Merge confirmation comment, or a comment explaining why auto-merge was skipped + +### Side effects + +- Merges the PR (squash) and deletes the feature branch +- Uses `--auto` to respect branch protection rules + +## Evaluation criteria + +- All six preconditions are verified before merging +- The size gate correctly identifies sensitive files (CI, lockfiles, migrations, auth, public API) +- PRs that need human review are never auto-merged +- The `--auto` flag is always used (never bypasses branch protection) + +## Maintenance + +- The size gate thresholds (150 lines, 5 files) may need tuning based on experience +- New sensitive file patterns may need to be added to the size gate +- The GraphQL query for unresolved threads depends on GitHub's API schema diff --git a/skills/deslop/SKILL.md b/skills/deslop/SKILL.md index 7642eb9..f82e20c 100644 --- a/skills/deslop/SKILL.md +++ b/skills/deslop/SKILL.md @@ -16,14 +16,23 @@ slop introduced in this branch. - Extra comments that a human wouldn't add or that are inconsistent with the rest of the file (no `// loop through items`, no - `# increment counter`). + `# increment counter`, no `// Handle the error case`). - Extra defensive checks or try/catch blocks that are abnormal for that area of the codebase, especially when called from trusted/validated code paths. - Casts to `any` (or `as unknown as X`) that exist purely to silence - the type checker. + the type checker. If a type assertion is needed, use the narrowest + correct type instead. - Inline imports in Python — move to the top of the file alongside the - other imports. + other imports. Group with the appropriate import section (stdlib, + third-party, local). +- Redundant type annotations where TypeScript inference handles it + (e.g., `const x: string = "hello"` → `const x = "hello"`). +- Unnecessary `else` after `return`, `throw`, `continue`, or `break`. +- Console.log / print statements left from debugging. +- Overly verbose variable names that don't match the file's naming + convention (e.g., `isCurrentlyLoadingDataFromServer` when neighbors + use `loading`). - Any other style that's inconsistent with the file (string quote choice, brace style, trailing commas, etc.). diff --git a/skills/deslop/SPEC.md b/skills/deslop/SPEC.md new file mode 100644 index 0000000..24b9616 --- /dev/null +++ b/skills/deslop/SPEC.md @@ -0,0 +1,58 @@ +# deslop — Specification + +## Intent + +Strip AI-generated noise from the diff before committing, ensuring +the final code reads as if a human wrote it. Clean diffs get merged +faster and avoid distracting reviewers. + +## Scope + +### In scope + +- Removing unnecessary comments that restate the code +- Removing defensive try/catch blocks in trusted code paths +- Removing type casts (`any`, `as unknown as X`) that silence the checker +- Moving inline imports to the top of the file (Python) +- Removing redundant type annotations where inference handles it +- Removing unnecessary `else` after early returns +- Removing leftover debug logging (console.log, print) +- Replacing overly verbose variable names that break file conventions +- Fixing style inconsistencies with the surrounding file + +### Out of scope + +- Adding new functionality +- Refactoring code beyond slop removal +- Changing the logic or behavior of the code + +## Invocation + +Loaded as a utility skill before every commit, typically by +`resolve-issue`, `fix-ci`, `respond-to-comment`, and `apply-fixes`. + +## Runtime contract + +### Input + +- A branch with uncommitted or committed changes ahead of the default branch + +### Output + +- A 1-3 sentence summary of what was cleaned up + +### Side effects + +- Modifies files in the working tree (removes slop patterns) + +## Evaluation criteria + +- Removed comments were genuinely redundant (not meaningful documentation) +- Removed try/catch blocks were genuinely unnecessary (not protecting against real failures) +- Style consistency with the surrounding file is improved, not degraded +- No behavioral changes introduced by slop removal + +## Maintenance + +- New AI slop patterns should be added as they are observed in practice +- Language-specific patterns (Python imports, TypeScript casts) may need expansion diff --git a/skills/fix-ci/SKILL.md b/skills/fix-ci/SKILL.md index 7f6187c..61105b9 100644 --- a/skills/fix-ci/SKILL.md +++ b/skills/fix-ci/SKILL.md @@ -22,8 +22,10 @@ prefix is required for counting but the rest should read naturally. 1. Find failed runs: `gh run list --branch --status failure` 2. Read logs: `gh run view --log-failed` -3. Categorize: test failure, type/lint error, build error, snapshot - diff, flaky test, or infra issue. +3. Categorize the failure — see `references/failure-taxonomy.md` for + the full taxonomy and decision tree. Categories: test failure, + type/lint error, build error, snapshot diff, flaky test, or infra + issue. 4. Flaky? Re-run once (`gh run rerun --failed`) and stop. 5. Infra/dependency issue? BLOCKED. 6. Otherwise: make the smallest fix. Reproduce locally if possible. @@ -32,5 +34,6 @@ prefix is required for counting but the rest should read naturally. how. Write it like a teammate explaining the fix, not a status report. -Don't modify CI config unless the failure is specifically in it. -Don't bump dependency versions. Don't force-push. Don't merge. +Avoid modifying CI config unless the failure is specifically in it. +Avoid bumping dependency versions — the fix should target the code, +not the toolchain. Don't force-push. Don't merge. diff --git a/skills/fix-ci/SPEC.md b/skills/fix-ci/SPEC.md new file mode 100644 index 0000000..89dff10 --- /dev/null +++ b/skills/fix-ci/SPEC.md @@ -0,0 +1,59 @@ +# fix-ci — Specification + +## Intent + +Diagnose and fix failing CI on a PR the bot authored, with a hard +budget of 3 attempts to prevent infinite fix loops. + +## Scope + +### In scope + +- Reading CI failure logs +- Categorizing failures (test, type/lint, build, snapshot, flaky, infra) +- Making targeted fixes for test, type, lint, and build failures +- Re-running flaky tests once +- Loading `deslop` and `review` before committing fixes + +### Out of scope + +- Modifying CI configuration (unless the failure is specifically in it) +- Bumping dependency versions +- Fixing infrastructure issues (BLOCKED) +- Fixing failures unrelated to the bot's changes + +## Invocation + +Loaded by `github-agent` when a `check_suite` or `workflow_run` event +arrives with `conclusion: failure` for a PR the bot authored. +Requires `repo-setup` first. + +## Runtime contract + +### Input + +- PR number and branch (from webhook payload) +- A prepared worktree on the PR's branch + +### Output + +- A fix commit pushed to the branch, or `BLOCKED: ` + +### Side effects + +- Posts `fix-ci: attempt N` comments for budget tracking +- Commits and pushes fixes to the PR branch +- May re-run failed workflows for flaky tests + +## Evaluation criteria + +- The 3-attempt budget is respected (counted via comment prefix) +- Fixes are minimal — don't change more than necessary +- Flaky tests are re-run rather than "fixed" +- Infrastructure issues are correctly identified as BLOCKED +- The fix actually resolves the CI failure + +## Maintenance + +- The `gh run view --log-failed` output format may change across `gh` versions +- New CI failure categories may emerge (e.g., security scanning failures) diff --git a/skills/fix-ci/references/failure-taxonomy.md b/skills/fix-ci/references/failure-taxonomy.md new file mode 100644 index 0000000..98ac0c0 --- /dev/null +++ b/skills/fix-ci/references/failure-taxonomy.md @@ -0,0 +1,76 @@ +# CI Failure Taxonomy + +Categorize the failure before attempting a fix. The category +determines the response strategy. + +## Categories + +### 1. Test failure + +**Signature:** Test runner output with `FAIL`, assertion errors, +expected/actual diffs. + +**Response:** Read the failing test, understand what it asserts, check +if the test expectation is wrong (your change intentionally altered +behavior) or if the code has a bug. Fix the test if the expectation +is outdated; fix the code if the behavior is wrong. + +### 2. Type / lint error + +**Signature:** `tsc` errors (TS####), ESLint/Biome errors with rule +names, type mismatch messages. + +**Response:** Fix the type or lint issue directly. These are usually +mechanical. For lint rules you disagree with, fix the code anyway — +don't modify lint config. + +### 3. Build error + +**Signature:** Bundler/compiler errors, missing modules, import +resolution failures. + +**Response:** Check if your change broke an import path, removed an +export, or changed a file name. Fix the import/export. If the build +error is in unrelated code, note it and investigate whether it's +pre-existing. + +### 4. Snapshot diff + +**Signature:** Snapshot test failures showing before/after diffs. + +**Response:** If your change intentionally altered the output, update +the snapshot (`--update-snapshots`, `-u`, etc.). If the diff is +unexpected, investigate why the output changed. + +### 5. Flaky test + +**Signature:** The test passes on retry. The failure involves timing, +network, or random ordering. The test name may appear in known-flaky +lists. + +**Response:** Re-run once: `gh run rerun --failed`. Do not +attempt to fix the test — flaky test fixes are out of scope for a +CI-fix skill. + +### 6. Infrastructure issue + +**Signature:** Network timeouts, registry errors (`npm ERR! 503`), +Docker pull failures, runner out of disk, GitHub Actions service +degradation. + +**Response:** BLOCKED. These are transient or platform-level issues. +Post a comment noting the infrastructure failure and stop. + +## Decision tree + +``` +Is it a test failure? +├── Yes → Did your change intentionally alter behavior? +│ ├── Yes → Update the test/snapshot +│ └── No → Fix the code bug +└── No → Is it a type/lint/build error? + ├── Yes → Fix the type/lint/import issue + └── No → Is it intermittent / passes on retry? + ├── Yes → Re-run once, then stop + └── No → Infrastructure issue → BLOCKED +``` diff --git a/skills/mark-pr-ready/SPEC.md b/skills/mark-pr-ready/SPEC.md new file mode 100644 index 0000000..a7fdf5f --- /dev/null +++ b/skills/mark-pr-ready/SPEC.md @@ -0,0 +1,57 @@ +# mark-pr-ready — Specification + +## Intent + +Promote a draft PR to ready-for-review after CI passes and +self-review is clean. The `auto-merge` skill may follow for small, +non-disruptive changes, but this skill itself never merges. + +## Scope + +### In scope + +- Verifying CI status via `gh pr checks` +- Marking the PR ready with `gh pr ready` +- Requesting reviewers from CODEOWNERS +- Adding labels (`bot-generated` + priority labels from the issue) +- Posting a confirmation comment + +### Out of scope + +- Merging the PR +- Creating labels that don't exist in the repo +- Running self-review (should be done before this skill is loaded) + +## Invocation + +Loaded by `github-agent` after a `check_suite` or `workflow_run` +event with `conclusion: success` on a draft PR where the bot is +the author. + +## Runtime contract + +### Input + +- PR number and repository context +- CI must be green + +### Output + +- A confirmation comment noting the PR is ready + +### Side effects + +- Marks the PR as ready-for-review on GitHub +- May request reviewers and add labels + +## Evaluation criteria + +- CI is verified green before marking ready +- Reviewer assignment failures are silently skipped +- Label creation failures are silently skipped (don't create labels) +- The confirmation comment varies its wording + +## Maintenance + +- CODEOWNERS file location may vary (`.github/CODEOWNERS`, `CODEOWNERS`, `docs/CODEOWNERS`) +- Priority label patterns may need updating diff --git a/skills/pr/SKILL.md b/skills/pr/SKILL.md index 057608d..ac44eb0 100644 --- a/skills/pr/SKILL.md +++ b/skills/pr/SKILL.md @@ -26,11 +26,16 @@ short and to the point — not overly long or detailed. (i.e. not the repo's default branch), reuse it. Don't create a new one on top. -2. **Open the PR** with `gh pr create --draft`. Title should match the - commit subject. Body should be a 1–3 sentence summary plus a - "Testing" line if relevant — followed by the full implementation - plan inside a hidden HTML comment so reviewers can read it without - leaving GitHub but it doesn't bloat the visible description: +2. **Open the PR** with `gh pr create --draft`. Title should follow + the repo's commit convention. If the repo uses conventional commits + (check recent history with `git log --oneline -10`), use the format + `(): ` where type is `fix`, `feat`, `chore`, + `refactor`, `docs`, `test`, etc. Do not include AI-attribution + labels like `[bot]`, `[claude]`, or `[ai]` in the title. Body + should be a 1–3 sentence summary plus a "Testing" line if relevant + — followed by the full implementation plan inside a hidden HTML + comment so reviewers can read it without leaving GitHub but it + doesn't bloat the visible description: ```sh gh pr create --draft \ diff --git a/skills/pr/SPEC.md b/skills/pr/SPEC.md new file mode 100644 index 0000000..7941844 --- /dev/null +++ b/skills/pr/SPEC.md @@ -0,0 +1,55 @@ +# pr — Specification + +## Intent + +Create a draft PR with a concise description and the full +implementation plan embedded as an invisible HTML comment for +reviewer context. + +## Scope + +### In scope + +- Creating draft PRs via `gh pr create --draft` +- Writing concise PR descriptions (1-3 sentences) +- Embedding the implementation plan as a hidden HTML comment +- Linking to the originating issue via `Closes #N` + +### Out of scope + +- Marking PRs ready for review (handled by `mark-pr-ready`) +- Merging PRs +- Creating new branches (the caller handles this) + +## Invocation + +Loaded as the final step by `resolve-issue` after implementation, +testing, `deslop`, and `review` are complete. + +## Runtime contract + +### Input + +- A branch with committed and pushed changes +- The implementation plan and issue number from the calling context + +### Output + +- The PR URL printed as the final line + +### Side effects + +- Creates a draft PR on GitHub + +## Evaluation criteria + +- PR is created as a draft (not ready-for-review) +- Title follows conventional commit format when the repo uses it +- Description is concise (not bloated with redundant context) +- Implementation plan is present in the HTML comment +- Issue is linked via `Closes #N` + +## Maintenance + +- The heredoc approach for PR body is important for preserving special characters +- The `git notes` alternative is documented but not the default path diff --git a/skills/repo-setup/SPEC.md b/skills/repo-setup/SPEC.md new file mode 100644 index 0000000..f766cd2 --- /dev/null +++ b/skills/repo-setup/SPEC.md @@ -0,0 +1,54 @@ +# repo-setup — Specification + +## Intent + +Provide a safe, isolated working directory for each agent session so +concurrent sessions never overwrite each other's changes. This is the +prerequisite skill for every situation skill. + +## Scope + +### In scope + +- Cloning or refreshing the target repository +- Creating git worktrees for branch isolation +- Determining the correct branch name from issue or PR context + +### Out of scope + +- Installing dependencies or running setup scripts (the situation skill handles that) +- Making any code changes +- Pushing or committing + +## Invocation + +Loaded first by `github-agent` before every situation skill. Every +code-touching workflow starts with `repo-setup`. + +## Runtime contract + +### Input + +- Repository identifier (`owner/repo`) +- Context: issue number (for new branches) or PR number (for existing branches) + +### Output + +- A worktree at `~/dev//-wt/` ready for work +- Current directory set to the worktree + +### Side effects + +- Creates or updates the shared clone at `~/dev//` +- Creates a worktree directory on disk + +## Evaluation criteria + +- Concurrent sessions for different branches on the same repo do not interfere +- Worktree is on the correct branch (new branch from default for issues, PR head branch for PRs) +- `git status` in the worktree is clean after setup + +## Maintenance + +- If git worktree behavior changes across git versions, the commands here may need updating +- The `--depth=50` clone depth is a balance between speed and having enough history for rebasing diff --git a/skills/resolve-issue/SKILL.md b/skills/resolve-issue/SKILL.md index 0bc0c7e..4c9e641 100644 --- a/skills/resolve-issue/SKILL.md +++ b/skills/resolve-issue/SKILL.md @@ -42,14 +42,21 @@ and respond to feedback. gh repo clone / ~/dev// -- --depth=50 ``` Only push changes to the repo where the fix belongs. -4. Plan — state what you'll do before doing it. -5. Implement. Keep the diff focused. -6. **Discover and run tests** — see the test discovery section below. -7. Load `deslop` skill — clean up AI noise. -8. Load `review` skill — self-review. Fix issues and re-run deslop +4. **Investigate before implementing.** Before writing any code, you + should be able to state: "This breaks because **X**, in **Y** path, + after **Z** condition." If you cannot fill in X, Y, and Z, keep + investigating. See `references/investigation-protocol.md` for the + full protocol. +5. Plan — state what you'll change and why it addresses the root cause. +6. Implement. Keep the diff focused. +7. **Verify the change.** Select the right verification method for the + shape of the change — see `references/verification-harness.md`. + Then run the project's test suite (see test discovery below). +8. Load `deslop` skill — clean up AI noise. +9. Load `review` skill — self-review. Fix issues and re-run deslop until clean. -9. Commit with `Fixes #` in the message. Push. -10. Load `pr` skill — open a draft PR. Be explicit about what's +10. Commit with `Fixes #` in the message. Push. +11. Load `pr` skill — open a draft PR. Be explicit about what's complete vs. uncertain. ## Conflict resolution @@ -115,4 +122,6 @@ note "no test suite found" in the PR description. Don't spend more than 2 minutes on a failing test suite that's unrelated to your changes — note it and move on. -Only BLOCKED for genuine impossibility (auth, repo missing, contradictory issue). +Reserve BLOCKED for genuine impossibility (missing auth, deleted repo, +contradictory requirements). A best-effort draft PR is almost always +better than blocking. diff --git a/skills/resolve-issue/SPEC.md b/skills/resolve-issue/SPEC.md new file mode 100644 index 0000000..e0b69ce --- /dev/null +++ b/skills/resolve-issue/SPEC.md @@ -0,0 +1,59 @@ +# resolve-issue — Specification + +## Intent + +Take a GitHub issue from "assigned" to "draft PR opened" with a +focused, tested implementation. Prefer shipping a best-effort draft +over blocking on ambiguity. + +## Scope + +### In scope + +- Reading and interpreting the issue +- Checking for existing PRs that already address the issue +- Codebase exploration (including read-only cross-repo investigation) +- Planning, implementing, and testing the fix +- Self-review via `deslop` and `review` skills +- Opening a draft PR via the `pr` skill + +### Out of scope + +- Marking the PR ready for review (handled by `mark-pr-ready`) +- Merging (handled by `auto-merge`) +- Responding to review feedback after PR creation (handled by `respond-to-comment`) +- Closing the issue (GitHub auto-closes on merge via `Fixes #N`) + +## Invocation + +Loaded by `github-agent` when an issue is assigned to the bot. +Requires `repo-setup` to run first. + +## Runtime contract + +### Input + +- Issue number and repository context (from webhook payload) +- A prepared worktree (from `repo-setup`) + +### Output + +- A draft PR URL, or `BLOCKED: ` for genuine impossibility + +### Side effects + +- Commits and pushes to a feature branch +- Opens a draft PR on GitHub + +## Evaluation criteria + +- The PR addresses the issue's core request (smallest interpretation) +- Tests pass (or "no test suite found" is documented) +- The diff is focused — no unrelated changes +- Self-review produces an empty findings array before PR creation +- Existing PRs for the same issue are detected and continued rather than duplicated + +## Maintenance + +- Test discovery heuristics should be updated as new ecosystems are encountered +- The conflict resolution section assumes standard git rebase workflow diff --git a/skills/resolve-issue/references/investigation-protocol.md b/skills/resolve-issue/references/investigation-protocol.md new file mode 100644 index 0000000..ac9eb32 --- /dev/null +++ b/skills/resolve-issue/references/investigation-protocol.md @@ -0,0 +1,47 @@ +# Investigation Protocol + +Before editing any code, you should be able to articulate: + +> "This breaks because **X**, in **Y** path, after **Z** condition." + +If you cannot fill in X, Y, and Z, keep investigating. + +## Steps + +1. **Reproduce the problem.** Find the code path described in the + issue. Trace it from entry point to the failure site. If the issue + includes an error message or stack trace, locate the exact line. + +2. **Understand the current behavior.** Read the code, not just the + function — read its callers and the data flowing into it. Check + tests (if any) to see what the expected behavior was. + +3. **Identify the root cause.** Distinguish between: + - The **symptom** (what the user sees) + - The **proximate cause** (what line/condition triggers it) + - The **root cause** (why that condition exists) + + Fix the root cause when possible. Fix the proximate cause only when + the root cause is out of scope. + +4. **State the fix hypothesis.** Before writing code, state in one + sentence what you plan to change and why it addresses the root + cause. This becomes part of the commit message. + +## When investigation is blocked + +- **Missing reproduction context**: if the issue lacks enough detail + to identify the code path, ask via a PR comment (not a blocking + question — ship what you can). +- **External dependencies**: if the root cause is in a dependency or + external service, note it in the PR description and fix what's + within scope. +- **Multiple possible causes**: if you find several plausible causes, + fix the most likely one and note the alternatives in the PR. + +## Anti-patterns + +- Starting to code before understanding the failure path +- Reading only the function mentioned in the issue without checking callers +- Guessing based on function names without reading the implementation +- Fixing the symptom while leaving the root cause intact diff --git a/skills/resolve-issue/references/verification-harness.md b/skills/resolve-issue/references/verification-harness.md new file mode 100644 index 0000000..4d74b4b --- /dev/null +++ b/skills/resolve-issue/references/verification-harness.md @@ -0,0 +1,35 @@ +# Verification Harness Selection + +Match the shape of the change to the right verification method. +Use the first row that fits. + +| Change shape | Verification method | Example | +|---|---|---| +| Logic bug in a pure function | Unit test targeting the fixed path | `expect(parse("")).toBe(null)` | +| State management / data flow | Integration test with real data flow | Test the full handler, not just the helper | +| CLI command behavior | Run the actual command | `bun run cli --flag` and check output | +| HTTP endpoint | `curl` or test client against dev server | `curl localhost:3000/api/health` | +| Build / compilation | Full build | `bun run build` or `npm run build` | +| Type error fix | Type checker | `tsc --noEmit` or `bunx tsc --noEmit` | +| Lint / format fix | Linter | `bunx biome check` or project lint command | +| Config change | Smoke test the affected system | Start the server, run the workflow | +| UI component | Browser test or visual check | Storybook, Playwright, or manual | +| CI workflow change | Dry-run where possible | `act` for GitHub Actions, or push and observe | + +## Principles + +- **Run the existing test suite first.** If it passes, your change + didn't break existing behavior. If it fails on unrelated code, + note it and move on. + +- **Verify the fix, not just the absence of error.** A test that + previously crashed and now doesn't crash might be passing for the + wrong reason. Assert the expected output. + +- **Prefer the project's own verification tools.** If the repo has + `make test`, use it. Don't introduce new test infrastructure for + a single fix. + +- **Time-box verification.** Spend up to 2 minutes finding and + running tests. If the project has no test suite and the change is + simple, note "no test suite found" and move on. diff --git a/skills/respond-to-comment/SPEC.md b/skills/respond-to-comment/SPEC.md new file mode 100644 index 0000000..dafd416 --- /dev/null +++ b/skills/respond-to-comment/SPEC.md @@ -0,0 +1,56 @@ +# respond-to-comment — Specification + +## Intent + +Triage and respond to comments on PRs the bot is involved in, +implementing fixes for actionable feedback and explaining decisions +for non-actionable feedback. + +## Scope + +### In scope + +- Self-loop detection (skip own comments) +- Classifying comments as actionable vs. not actionable +- Implementing fixes for actionable comments on own PRs +- Replying with suggestions for actionable comments on others' PRs +- Explaining why a comment is not actionable + +### Out of scope + +- Approving or merging PRs +- Pushing to others' branches +- Responding to approval-only reviews (thumbs-up with no code refs) + +## Invocation + +Loaded by `github-agent` when a `pull_request_review_comment`, +`issue_comment`, or `pull_request_review` event arrives for a PR the +bot is involved in. Requires `repo-setup` first. + +## Runtime contract + +### Input + +- Comment body, author, and location (inline or top-level) +- PR number and repository context + +### Output + +- A reply comment (and optionally a fix commit with SHA) + +### Side effects + +- May push commits to own PRs +- Posts reply comments on GitHub + +## Evaluation criteria + +- Own comments are detected and skipped (no self-loops) +- Actionable feedback on own PRs results in a fix commit +- Replies are concise and natural — teammate tone, not support bot +- Approval thumbs-ups are not replied to + +## Maintenance + +- The thread API vs. top-level comment distinction depends on GitHub's API diff --git a/skills/review-pr/SPEC.md b/skills/review-pr/SPEC.md new file mode 100644 index 0000000..70fab5b --- /dev/null +++ b/skills/review-pr/SPEC.md @@ -0,0 +1,53 @@ +# review-pr — Specification + +## Intent + +Review a PR the bot is involved in, either self-fixing findings on +its own PRs or posting review feedback on others' PRs. + +## Scope + +### In scope + +- Determining PR authorship (own vs. others') +- Loading the `review` skill for structured analysis +- Applying fixes on own PRs via `apply-fixes` +- Posting review comments on others' PRs + +### Out of scope + +- Merging PRs +- Approving PRs (reserve for when correctness can be vouched for) +- Pushing changes to others' branches + +## Invocation + +Loaded by `github-agent` when a PR event involves the bot (as author, +reviewer, or assignee). Requires `repo-setup` first. + +## Runtime contract + +### Input + +- PR number and repository context +- A prepared worktree on the PR's branch + +### Output + +- A review URL (posted via `gh pr review`), or fix commits pushed to own PR + +### Side effects + +- May push commits to own PRs (via `apply-fixes`) +- Posts review comments on GitHub + +## Evaluation criteria + +- Own PR findings are fixed, not just reported +- Others' PR reviews are specific and actionable +- Review tone matches "senior dev" style — direct, no filler +- Drafts are not skipped (own drafts are reviewed for self-improvement) + +## Maintenance + +- The `explore` sub-agent delegation for large diffs should be monitored for quality diff --git a/skills/review/SKILL.md b/skills/review/SKILL.md index 0f2fa6b..38f8858 100644 --- a/skills/review/SKILL.md +++ b/skills/review/SKILL.md @@ -69,6 +69,20 @@ Keep `summary` and `suggested_fix` terse — one or two sentences each. The fix-applier reading these has the file open already; don't restate context the diff already carries. +## Confidence calibration + +Assign a confidence level to each potential finding before including +it. See `references/confidence-calibration.md` for the full table. + +- **HIGH** — you traced the code path and verified the issue. Report it. +- **MEDIUM** — the pattern is suspicious and likely wrong. Report it, + noting uncertainty in `summary` if relevant. +- **LOW** — the code looks unusual but could be intentional. Do **not** + report it. + +Bug findings should almost always be HIGH. If you can't trace the +code path to confirm the issue, it's speculation — downgrade or drop. + ## When to flag vs. not - A `style` finding only counts if the PR's *neighbouring* code uses a @@ -80,6 +94,18 @@ context the diff already carries. - Don't include "everything looks good" notes in `findings`. Empty array is the correct positive signal. +## Not a finding + +Before reporting, check `references/not-a-finding.md` for patterns +that look suspicious but are typically correct. Common examples: + +- Null checks on values that "should" be set (defensive, valid if + function is public or called from multiple sites) +- Empty catch blocks in error boundaries (framework convention) +- Trivial getters/setters (not worth a test-gap finding) +- Fixing a typo adjacent to changed code (not scope creep) +- TODO comments describing known limitations (not stale docs) + --- *Adapted from [BYK/dotskills](https://github.com/BYK/dotskills) diff --git a/skills/review/SPEC.md b/skills/review/SPEC.md new file mode 100644 index 0000000..53ec25a --- /dev/null +++ b/skills/review/SPEC.md @@ -0,0 +1,56 @@ +# review — Specification + +## Intent + +Produce a structured list of findings from the current branch's +changes, enabling downstream consumers (fix-appliers, PR reviewers) +to act on them mechanically. + +## Scope + +### In scope + +- Diffing the branch against the default branch (three-dot syntax) +- Classifying findings by kind: bug, test-gap, style, scope, docs +- Running the test suite to catch regressions +- Delegating to an `explore` sub-agent for large diffs + +### Out of scope + +- Fixing findings (handled by `apply-fixes`) +- Posting reviews on GitHub (handled by `review-pr`) +- Reviewing files outside the branch's diff + +## Invocation + +Loaded as a utility skill by `resolve-issue`, `review-pr`, `fix-ci`, +and any other skill that needs a quality check before committing. + +## Runtime contract + +### Input + +- A branch with commits ahead of the default branch + +### Output + +- A JSON block with a `findings` array (empty array = no issues) +- Each finding: `{ kind, file, line, summary, suggested_fix }` + +### Side effects + +- None (read-only analysis). May spawn an `explore` sub-agent. + +## Evaluation criteria + +- Findings are actionable — each one points to a specific file and line +- False positive rate is low (style findings require neighboring code to differ) +- Bug findings reflect real logic errors, not hypothetical edge cases +- Empty findings array is used correctly (no "everything looks good" entries) +- Confidence levels are applied appropriately (see references/confidence-calibration.md) +- Known false-positive patterns are filtered (see references/not-a-finding.md) + +## Maintenance + +- Finding categories may expand as new patterns emerge +- The three-dot diff syntax depends on the branch having a merge-base with the default branch diff --git a/skills/review/references/confidence-calibration.md b/skills/review/references/confidence-calibration.md new file mode 100644 index 0000000..d42bc03 --- /dev/null +++ b/skills/review/references/confidence-calibration.md @@ -0,0 +1,40 @@ +# Confidence Calibration + +Assign a confidence level to each finding before including it in the +output. Only report findings at HIGH or MEDIUM confidence. + +## Levels + +| Level | Criteria | Action | +|---|---|---| +| **HIGH** | You traced the code path and verified the issue exists. The finding is demonstrably wrong, not hypothetically wrong. | Report as a finding. | +| **MEDIUM** | The pattern is suspicious and likely a problem, but you haven't fully traced every code path. The surrounding code suggests this wasn't intentional. | Report as a finding. Note uncertainty in `summary` if relevant. | +| **LOW** | The code looks unusual but could be intentional. You can construct a scenario where it breaks, but it requires unlikely inputs or specific timing. | Do **not** report. | + +## Calibration guidelines + +- **Bug findings should almost always be HIGH.** If you can't trace + the code path to confirm the bug, it's not a bug finding — it's + speculation. Downgrade to MEDIUM or drop it. + +- **Test-gap findings are typically MEDIUM.** You can see the behavior + isn't tested but can't always prove it would catch a real bug class. + +- **Style findings should be HIGH** (you can see the neighboring code + uses a different convention) or dropped (it's a taste preference). + +- **Scope findings should be HIGH.** The change is demonstrably + unrelated to the stated intent, or it isn't. + +- **Docs findings are typically HIGH.** The comment is stale or it isn't. + +## Common false positive patterns + +Before reporting, check whether the "issue" is actually: + +- **Intentional defensive code.** The function is called from multiple + sites and the check is valid for some of them. +- **Framework convention.** The pattern looks unusual but is standard + for the framework (e.g., empty catch blocks in error boundaries). +- **Existing behavior, not new.** The "issue" exists in the base + branch too — the PR didn't introduce it. diff --git a/skills/review/references/not-a-finding.md b/skills/review/references/not-a-finding.md new file mode 100644 index 0000000..2d5313a --- /dev/null +++ b/skills/review/references/not-a-finding.md @@ -0,0 +1,56 @@ +# Not a Finding + +These patterns look suspicious but are typically correct. Do not +report them unless you have specific evidence they cause a problem +in this codebase. + +## By category + +### Bug — safe patterns + +- **Null/undefined checks on values that "should" be set.** If the + function is public or called from multiple sites, defensive checks + are reasonable. +- **Empty catch blocks in error boundaries** (React, Express, Hono). + The framework expects them. Only flag if the error is silently + swallowed in a non-boundary context. +- **Loose equality (`==`) in JavaScript** when comparing to `null` to + catch both `null` and `undefined`. This is an intentional pattern. +- **Re-throwing the same error.** Sometimes done to add context or + trigger a different catch block. + +### Test-gap — not worth flagging + +- **Trivial getters/setters** that delegate directly to another + property or method. +- **Type-only changes** (adding/removing TypeScript types) that don't + affect runtime behavior. +- **Log/telemetry additions** that don't change control flow. +- **Comment or documentation updates.** + +### Style — taste preferences, not findings + +- **Named vs. default exports.** Both are valid; flag only if the + file's neighbors are consistent and this one breaks the pattern. +- **Arrow functions vs. function declarations.** Same rule — only + flag if the file is inconsistent with itself. +- **Trailing commas.** Unless the formatter config enforces one way. +- **Single vs. double quotes.** Unless the formatter config enforces + one way. +- **Import ordering.** Unless an auto-sorter is configured. + +### Scope — not unrelated + +- **Fixing a typo in a comment** adjacent to the changed code. This + is a natural drive-by fix, not scope creep. +- **Renaming a variable** in the same function being modified. This + improves readability of the change. +- **Updating a type** that the changed code depends on. This is a + necessary part of the change. + +### Docs — not stale + +- **TODO comments** that describe known limitations. These are + intentional markers, not stale documentation. +- **Comments explaining "why"** (not "what"). Even if the code + changes, the reasoning may still be valid.