fix: fall back to tmpdir state when CLAUDE_PLUGIN_DATA state is empty#125
fix: fall back to tmpdir state when CLAUDE_PLUGIN_DATA state is empty#125tmchow wants to merge 6 commits intoopenai:mainfrom
Conversation
Setup commands run via Bash (no CLAUDE_PLUGIN_DATA) write state to tmpdir. Stop hooks run via Claude Code (CLAUDE_PLUGIN_DATA set) read from the persistent plugin data dir, missing the setup state entirely. resolveStateDir() now checks for existing state at the tmpdir fallback when CLAUDE_PLUGIN_DATA is set but the primary location has no state.json. Also fixes the pre-existing test failure by properly isolating env vars. Fixes openai#59
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 22d42f43ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address Codex review feedback: returning the fallback tmpdir path means state stays in tmpdir forever and can be lost on tmp cleanup. Now copies all state files from the tmpdir fallback to the CLAUDE_PLUGIN_DATA path on first discovery, then always returns the persistent path. Updated test to verify migration behavior.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f59803fe31
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The fallback state directory can contain subdirectories (e.g., jobs/) which causes copyFileSync to throw EISDIR. Use fs.cpSync with recursive: true to copy the entire directory tree in one call.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e68aef3ebd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
After cpSync copies state from tmpdir to the persistent plugin data dir, job.logFile paths inside state.json still reference the old /tmp/... location. Replace all occurrences of the fallback dir path with the new primary dir path in the migrated state file.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 954bc67b71
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
On Windows, JSON stores paths with doubled backslashes (C:\\Users\\...). The plain replaceAll only matches unescaped paths. Now also replaces the JSON-escaped variant so logFile paths are rewritten correctly on all platforms.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f6f1f95cd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Individual jobs/*.json files also contain logFile paths pointing at the tmpdir fallback. After migration, resumed jobs would read stale paths from their job file and fail with ENOENT after tmp cleanup. Extracted the path rewrite into a helper and apply it to state.json plus all .json files in the jobs/ subdirectory.
|
🦀 Assigned! I will submit a PR for this issue. Looking at the fix now. |
@D2758695161 you're clearly an automated bot who's logic is gone wrong 😑. Don't propose PRs to issues fixed by the PRs you're reading. |
The fallback path was hard-coded to '/tmp' — broken on Windows. Use os.tmpdir() so Windows and other platforms get a real tmp path. Additionally: when CLAUDE_PLUGIN_DATA is set on a later call but state was previously written to the tmpdir fallback, copy it into the plugin-data dir and rewrite absolute path references inside state.json and jobs/*.json so logFile pointers don't dangle. Prevents job history from being silently dropped when commands run under different env contexts within one Claude session. Closes #47. Port of openai/codex-plugin-cc#125.
* fix: quote $ARGUMENTS in cancel/result/status commands Unquoted $ARGUMENTS allows shell splitting on user-supplied job IDs containing metacharacters. Wrap in double quotes to match review.md and adversarial-review.md. Closes #38. Port of openai/codex-plugin-cc#168. * fix: declare model: sonnet in opencode-rescue agent frontmatter Without a model declaration the agent tier was unpredictable. The rescue subagent is a thin forwarder that invokes the companion via a single Bash call and applies trivial routing logic — sonnet is sufficient and gives users a cost guarantee. Closes #39. Port of openai/codex-plugin-cc#169. * fix: scope /opencode:cancel default to current Claude session Without ref, resolveCancelableJob now filters running jobs by sessionId so a cancel in session A cannot kill jobs in session B. Explicit ref still searches all sessions — naming a job counts as intent. Closes #45. Port of openai/codex-plugin-cc#84. * fix: enforce hard wall-clock timeout on runTrackedJob Wrap the runner with Promise.race against a 30-minute default timeout. On expiry the job transitions to failed/phase:failed so zombie 'running' rows can't accumulate when a runner hangs. OPENCODE_COMPANION_JOB_TIMEOUT_MS overrides the default. Closes #41. Port of openai/codex-plugin-cc#184. * fix: reconcile dead-PID jobs on every status read Adds isProcessAlive helper and reconcileIfDead / reconcileAllJobs / markDeadPidJobFailed in job-control. buildStatusSnapshot and the handleResult/handleCancel paths now probe kill(pid, 0) on any active-state job and rewrite dead ones to failed before consuming the list. A single /opencode:status / result / cancel surfaces stuck workers without waiting for SessionEnd. markDeadPidJobFailed is race-safe: it re-reads state and refuses to downgrade terminal states or rewrite when the pid has changed. Closes #42. Port of openai/codex-plugin-cc#176 + dead-PID parts of #184. * fix: avoid embedding large diffs in review prompts Classify review scope before building the prompt. When the diff exceeds ~5 files or ~256 KB, fall back to a lightweight context (status, changed-files, diff_stat) and tell OpenCode to inspect the diff itself via read-only git commands. Prevents HTTP 400 / shallow findings on moderate-to-large changesets. Adversarial template grows a {{REVIEW_COLLECTION_GUIDANCE}} slot. Thresholds overridable via opts.maxInlineDiffFiles/Bytes. Closes #40. Port of openai/codex-plugin-cc#179. * fix: respect \$SHELL on Windows when spawning child processes Add platformShellOption() helper that picks false on POSIX, and \$SHELL || true on win32 so Git Bash users get their shell while cmd fallback still resolves .cmd/.bat shims. Apply to runCommand, spawnDetached, resolveOpencodeBinary, getOpencodeVersion, and the ensureServer spawn of 'opencode serve'. Uses 'where' instead of 'which' on win32, and parses the first line of its CRLF-separated output. Closes #46. Port of openai/codex-plugin-cc#178. * fix: migrate tmpdir state to CLAUDE_PLUGIN_DATA + fix /tmp literal The fallback path was hard-coded to '/tmp' — broken on Windows. Use os.tmpdir() so Windows and other platforms get a real tmp path. Additionally: when CLAUDE_PLUGIN_DATA is set on a later call but state was previously written to the tmpdir fallback, copy it into the plugin-data dir and rewrite absolute path references inside state.json and jobs/*.json so logFile pointers don't dangle. Prevents job history from being silently dropped when commands run under different env contexts within one Claude session. Closes #47. Port of openai/codex-plugin-cc#125. * feat: pass last review findings to rescue automatically After a successful /opencode:review or /opencode:adversarial-review, save the rendered output to ~/.opencode-companion/last-review-<hash>.md (per repo, SHA-256 of workspace path). Add a new 'last-review' subcommand that reports availability or streams the content. rescue.md now checks for a saved review when invoked without task text and asks via AskUserQuestion whether to fix the prior findings or describe a new task. The save is best-effort — a failed persistence never fails the review itself. Closes #44. Port of openai/codex-plugin-cc#129 (simplified: logic lives in the companion script rather than an inline node -e one-liner). * feat: throttle controls for stop-time review gate Add --review-gate-max and --review-gate-cooldown flags to /opencode:setup so users can bound the review gate's spend: /opencode:setup --review-gate-max 5 /opencode:setup --review-gate-cooldown 10 /opencode:setup --review-gate-max off The stop hook now loads state before touching stdin, checks reviewGateMaxPerSession and reviewGateCooldownMinutes against the current session's reviewGateUsage entry, and allows the stop without running OpenCode when a limit would be exceeded. Usage entries older than 7 days are pruned on each successful run so state.json doesn't grow unbounded. renderSetup surfaces the configured limits. Closes #48. Port of openai/codex-plugin-cc#20. * feat: --worktree flag for isolated write-capable rescue tasks Add a disposable-git-worktree mode so /opencode:rescue --write --worktree runs OpenCode inside .worktrees/opencode-<ts> on a fresh opencode/<ts> branch instead of editing the working tree in place. Useful for exploratory runs, parallel rescues, and running against a dirty tree. Pieces: - lib/git.mjs: createWorktree / removeWorktree / deleteWorktreeBranch / getWorktreeDiff / applyWorktreePatch. Adds .worktrees/ to .git/info/exclude on first use so the dir never shows in status. - lib/worktree.mjs: session wrapper — createWorktreeSession, diffWorktreeSession, cleanupWorktreeSession (keep applies patch back, discard just removes). - opencode-companion.mjs: handleTask threads --worktree + swaps cwd + stores session data on the job record + renders a keep/discard footer. New worktree-cleanup subcommand reads the stored session and runs the keep or discard path. - agents/opencode-rescue.md, commands/rescue.md, skills/opencode-runtime: propagate --worktree through the forwarding layer. - tests/worktree.test.mjs: create, diff, keep-applies, discard, no-change no-op. Closes #43. Port of openai/codex-plugin-cc#137. * fix: address pr51 review findings * fix: keep tracked job timeout referenced * fix: address pr51 review conversations * fix: add exclusive file lock to updateState for concurrency safety updateState's read-modify-write cycle was not protected against concurrent companion processes (background worker + status/cancel handler), which could silently lose each other's writes. Acquire an exclusive lock file (state.json.lock via O_EXCL) before reading, hold it through mutation and write, release in finally. Stale locks older than 30s are evicted. Blocks up to 5s with retry. Closes the pre-existing concurrency race amplified by PR #51's dead-PID reconciliation (which adds upsertJob calls on every status read). * fix: address brownfield discovery bugs Critical/high fixes: - BUG-1: saveLastReview use copyFileSync+unlinkSync instead of renameSync (fixes Windows compatibility issue where rename fails if target exists) - BUG-2: handleTask worktree leak - wrap in try/finally to guarantee cleanup - BUG-3: State migration race - add fallback directory lock during migration - BUG-4/13: handleTaskWorker missing signal handlers for graceful shutdown Medium fixes: - BUG-5: releaseStateLock now fsyncs directory after lock removal - BUG-11: Error from getConfiguredProviders now logged instead of swallowed Low fixes: - BUG-6: PR number validation now rejects negative values - BUG-7: getBundledConfigDir checks directory exists before returning - BUG-8: tailLines now properly filters empty lines after split - BUG-9: resolveReviewAgent always returns tools property (undefined if not used) - BUG-10: Diff retrieval failure now logs warning instead of silent swallow - BUG-12: resolveOpencodeBinary now handles spawn errors properly Additional pre-existing work included: - safe-command.mjs wrapper for secure command execution - Command documentation updates - Test improvements * fix: polish pr51 follow-up fixes * fix: address Copilot PR#51 review comments Four findings, all valid: C1 (prompts.mjs, git.mjs, process.mjs) — buildReviewPrompt materialized the full diff string before checking thresholds. For huge diffs the git/gh subprocess could OOM the companion before the size check ran. Fix: runCommand gains maxOutputBytes, killing the child and reporting overflowed once the cap is exceeded. getDiff and getPrDiff thread maxBytes through. buildReviewPrompt now bounds the read at maxBytes+1 and treats overflow as over-byte-limit without ever materializing the rest. C2 (git.mjs) — getDiffByteSize had a docstring claiming it avoided streaming the full contents, but the implementation did exactly that. It was also dead code (zero callers). Removed. C3 (tests/state-lock.test.mjs) — the test injected path.resolve(...) into a generated ESM import specifier. On Windows that path contains backslashes and a drive letter, producing an invalid module specifier. Fix: pathToFileURL(...).href for the injected specifier. C4 (tests/dead-pid-reconcile.test.mjs) — beforeEach mutated the object returned by loadState() without saving it back, leaving on-disk state from earlier tests intact. Fix: saveState(workspace, { config:{}, jobs:[] }). Adds coverage: - tests/process.test.mjs: runCommand overflow path and non-overflow path. - tests/review-prompt-size.test.mjs: bounds-huge-diff end-to-end test that writes a 50k-byte file and asserts fewer than 10k 'x' chars land in the prompt. All 167 tests pass. * fix: keep reading fallback state while migrate lock is held Addresses a Codex P1 on PR#51: when another migrator holds \`primaryDir.migrate.lock\`, migrateTmpdirStateIfNeeded waits up to 2s and then returns without copying anything. Before this fix, stateRoot still returned primaryDir — but primary/state.json didn't exist yet, so loadState returned an empty state and the next upsertJob created primary/state.json with only the new entry, orphaning every seeded fallback job. Fix: after a migration attempt, if primary/state.json is absent and fallback/state.json is present, stateRoot returns the fallback dir. Reads see real data, writes land in fallback, and a later migration retry can pick them up cleanly. Adds a regression test that pre-creates the migrate lock, seeds fallback with a job, switches to CLAUDE_PLUGIN_DATA, and verifies that stateRoot falls back, loadState sees the seeded job, and a subsequent write preserves both the seeded and the in-flight rows. The symlink-refusal test had to be updated because it was reusing stateRoot to name the "primary" dir — with the new fallback guard, that call now returns fallbackDir on failed migration. The test now computes the expected primary path directly.
Summary
/codex:setup --enable-review-gatewritesstopReviewGate: trueto a temp directory becauseCLAUDE_PLUGIN_DATAisn't set when commands run via Bash. The Stop hook then reads from the persistent plugin data directory (whereCLAUDE_PLUGIN_DATApoints) and never finds the config.Changes
resolveStateDir()inplugins/codex/scripts/lib/state.mjsnow checks both locations whenCLAUDE_PLUGIN_DATAis set. If nostate.jsonexists at the plugin data path but one exists in the tmpdir fallback, it returns the tmpdir path. New state still defaults to the plugin data directory.Also fixes a pre-existing test failure in
state.test.mjswhere the tmpdir assertion failed because the test didn't isolateCLAUDE_PLUGIN_DATA. Added a new test covering the fallback behavior.Testing
npm test: 62 pass, 3 fail (pre-existing in commands.test.mjs, unrelated to this change. Was 4 failures on main, this PR fixes 1.)npm run build: passesCLAUDE_PLUGIN_DATAis found whenCLAUDE_PLUGIN_DATAis later setThis contribution was developed with AI assistance (Codex).
Fixes #59