Skip to content

fix: rivet binary path resolution + drop fragile .issues guard#29

Merged
avrabe merged 1 commit intomainfrom
fix/binary-path-and-review-guard
Apr 27, 2026
Merged

fix: rivet binary path resolution + drop fragile .issues guard#29
avrabe merged 1 commit intomainfrom
fix/binary-path-and-review-guard

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented Apr 27, 2026

Two production bugs surfaced by today's live test

Bug 1 — rivet binary looked up in the wrong cwd

Logs at 20:14:31 UTC: `err: 'spawn data/rivet/rivet ENOENT'`. `runRivetOracle` calls `execFile(binary, args, { cwd: repoPath })` where `repoPath` is the freshly-extracted PR tarball. A relative `binary_path` is resolved against THAT cwd, not the temper deploy root. Node looks for `/data/rivet/rivet` instead of `/opt/temper/data/rivet/rivet`.

Fix: resolve a relative path against `__dirname/..` before passing to the oracle.

Bug 3 — `/review-pr` silently swallowed

The `issue_comment.created` handler had:
```js
if (!comment?.body || !context.octokit?.issues) return;
```
Same Probot v14 issue as PR #22: `context.octokit.issues` is sometimes `undefined` for this event type. The guard then silently kills every ChatOps command. Today's `/review-pr` on rivet#213 and temper#28 hit this — webhook arrived, handler bailed in 1 ms, no log.

Fix: drop the `.issues` portion of the guard. Downstream calls that work today keep working; one that fails throws into `onError` instead of vanishing.

Test plan

  • 766 tests pass, eslint clean
  • After merge + self-update: comment `/review-pr` on rivet#213 — should see a "🔍 AI review in progress..." within seconds.
  • Logs show `rivet validate` executing (no ENOENT). Oracle findings prepend the review.

Risk & rollout

  • Risk: low. Both changes are narrow. Guard removal restores the contract the rest of the chatops already depend on; binary-path resolve is additive (only fires for relative paths).
  • Rollout: self-update on merge.

🤖 Generated with Claude Code

## Two production bugs surfaced by today's live test

### Bug 1 — rivet binary path looked up in the wrong directory
Logs from netcup at 20:14:31 UTC:

  err: "spawn data/rivet/rivet ENOENT"
  msg: "rivet validate failed to spawn"

\`runRivetOracle\` calls \`execFile(binary, args, { cwd: repoPath })\` where
\`repoPath\` is the freshly-extracted PR tarball. A *relative* binary_path
is then resolved against THAT cwd — not the temper deploy root. So Node
looked for \`<tmpdir>/data/rivet/rivet\` and never finds the real binary at
\`/opt/temper/data/rivet/rivet\`.

Fix: in \`src/ai-review.js\`, resolve a relative \`binary_path\` against
\`__dirname/..\` (the temper repo root) before passing it to the oracle.
Absolute paths are passed through unchanged.

### Bug 3 — \`/review-pr\` silently dropped on every comment
The \`issue_comment.created\` handler had a defensive early-exit:

    if (!comment?.body || !context.octokit?.issues) {
      return;
    }

In Probot v14, \`context.octokit.issues\` is sometimes \`undefined\`
(same root cause as PR #22 but at a different call site). When that
happens, the entire ChatOps system silently breaks — every \`/review-pr\`,
\`/configure-repo\`, \`/sync-all-repos\` etc. evaporates with no log,
no comment, no error. Today's \`/review-pr\` triggers on rivet#213 and
temper#28 hit exactly this.

Fix: drop the \`.issues\` portion of the guard. Keep \`!context.octokit\`
because we genuinely need an octokit. The downstream \`.issues.X\` calls
that already work (e.g., the existing \`/configure-repo\` ChatOps in
production) keep working; if any individual one fails, it'll throw and
the Probot \`onError\` handler logs it — *much* better signal than silent
skipping every command.

## Test plan
- [x] All 766 tests pass
- [x] eslint clean
- [ ] After merge + self-update: comment \`/review-pr\` on a recent PR.
      The bot should post a "🔍 AI review in progress..." comment within
      a couple of seconds (whereas today nothing appears at all).
- [ ] Trigger \`/review-pr\` on a rivet-instrumented PR. Logs should now
      show successful \`rivet validate\` execution (no more ENOENT). If
      the binary path is correct, the oracle's findings prepend the
      review comment.

## Risk & rollout
- Risk: low. Two narrow changes. Path resolution is purely additive (new
  branch only fires when binary_path is relative). Guard removal restores
  behaviour to what the rest of the org's chatops have been depending on.
- Rollout: self-update on merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avrabe avrabe merged commit b3f6170 into main Apr 27, 2026
5 checks passed
@avrabe avrabe deleted the fix/binary-path-and-review-guard branch April 27, 2026 04:10
avrabe added a commit that referenced this pull request Apr 27, 2026
## Why
PR #29 deployed and removed the early-exit guard that silently swallowed
chatops, but at app.js:644 the very next line was:

    const workingComment = await context.octokit.issues.createComment({...});

Logs from netcup at 04:39 UTC after retriggering /review-pr on rivet#213:

    err: TypeError: Cannot read properties of undefined (reading 'createComment')
        at file:///opt/temper/src/app.js:644:59
        at issue_comment processWebhook

Same root cause as PR #22 (ai-review.js) — Probot v14's `context.octokit.issues`
is undefined for issue_comment.created webhooks too. The PR #22 fix only
covered ai-review.js; app.js had 28 more `.issues.X` call sites that all
throw the moment they're hit.

## What
- New `issueOps` helper at the top of app.js wraps the four methods that
  app.js actually uses (`createComment`, `updateComment`, `deleteComment`,
  `create`/`createIssue`) with `octokit.request(<route>, args)` calls.
- `replace_all` sweep replaced all 28 call sites:
    `context.octokit.issues.X(args)` → `issueOps.X(context.octokit, args)`
- Updated `createMockOctokit()` to dispatch `.request(route, params)` into
  the same namespaced jest.fn()s as before, so existing test assertions
  on `octokit.issues.createComment` etc. keep working unchanged.
- Three /generate-dependabot tests previously used
  `request.mockResolvedValue({...})` which clobbered the dispatcher;
  changed to `mockImplementationOnce` so dispatch survives.
- E2e webhook test: removing the guard caused the issue_comment.created
  handler to actually run; with fake test credentials it hits Probot's
  installation token fetch and returns 401. Added 401 to the accepted
  status list (alongside the existing 200/202/500).

## Test plan
- [x] All 773 tests pass (was 766; net +7 from PR #30's schema work)
- [x] eslint clean
- [ ] After deploy: `/review-pr` on rivet#213 should now post the
      "🔍 AI review in progress..." comment within seconds. Then either
      finish with a real review (oracle + model) or update with an error
      message — but no longer silently die.
- [ ] All other chatops (`/configure-repo`, `/sync-all-repos`,
      `/check-config`, etc.) keep working.

## Risk & rollout
- Risk: medium. Sweeping mechanical change across app.js. Mitigated by:
  - the 28 sites are all the same shape (object arg → object arg)
  - the `request()` form is what every other callsite in the codebase
    already uses successfully
  - 773 tests including the previously-broken e2e
- Rollout: self-update on merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant