Skip to content

fix(multi_review): route filesystem ops through the gateway container#32

Merged
samestrin merged 6 commits into
mainfrom
fix/multi-review-container-fs
May 13, 2026
Merged

fix(multi_review): route filesystem ops through the gateway container#32
samestrin merged 6 commits into
mainfrom
fix/multi-review-container-fs

Conversation

@samestrin
Copy link
Copy Markdown
Owner

Summary

PR #30 pre-computed the diff so reviewers only had to cat it — correct intent, but smoke-testing against a real sprint showed all six reviewers reporting the diff file missing, including the strong ones (kai, mira) that actually ran ls -la <diffPath> and got "No such file or directory".

Root cause: the openclaw-gateway container has no /tmp bind mount (confirmed via docker inspect). Container /tmp is overlay-only. Every prior multi_review run since #29 wrote the bundle, clone, and diff to the host's /tmp, where reviewers running inside the container could never see them. The "0-findings" runs from earlier "successful" reviews were reviewing stale orphan clones (HEAD 43c9840 vs requested 9326759) placed in the container by some manual openclaw setup.

Fix

Route every filesystem operation through docker exec, so the bundle, clone, and diff all land inside the container's /tmp:

Step Command
1 ssh <host> -- docker exec <c> mkdir -p <container workdir>
2 ssh <host> -- mkdir -p <host staging>
3 scp bundle.git <host>:<host staging>/bundle.git
4 ssh <host> -- docker cp <host staging>/bundle.git <c>:<container workdir>/bundle.git
5 ssh <host> -- docker exec <c> 'cd <workdir> && git clone -q bundle.git <repo>'
6 ssh <host> -- docker exec <c> 'git -C <repo> diff <base>..<head> > diff.txt && wc -c && wc -l'

Cleanup is two-stage: docker exec rm -rf <container workdir> + ssh rm -rf <host staging>. Both honor --skip-cleanup. Defers are registered before ShipBundle so partial-state failures don't leak overlay-filesystem space inside the container.

New / changed APIs

  • New ContainerExec helper (internal/support/multireview/container.go) — mirrors SSHRun but wraps ssh <host> -- docker exec <c> sh -c "$(echo <b64> | base64 -d)". Base64 round-trip removes all quoting concerns.
  • ShipBundleParams gains GatewayContainer (defaults to openclaw-gateway) and HostStagingDir (required).
  • PreComputeDiffParams gains GatewayContainer.
  • ShipBundleResult.RemoteBundlePath renamed to HostStagingBundlePath (the on-disk bundle lives on the host, not in the container). RemoteRepoPath is unchanged in name but now returns a container-local path.

Test plan

  • go test ./internal/support/multireview/... — 9 PreComputeDiff cases, 11 ShipBundle cases (including new container-mkdir-fails, docker-cp-fails, container-clone-fails, container-path-returned), 8 ContainerExec cases (build, defaults, special chars, timeout, etc.). 83.9% coverage on the package.
  • go test ./internal/support/commands/... — 11 multi_review cases (4 new: GatewayContainerFlagsThreaded, HostStagingDirThreadsToShipBundle, BothCleanupsFire, SkipCleanupSkipsBoth) plus CleanupFiresOnShipBundleFailure from the adversarial pass. 80.7% coverage; runMultiReview at 94.3%.
  • All subprocess calls mocked via execCommandContext swap and the new sshRunFn / containerExecFn package vars. No real SSH, no real docker, no real openclaw, no campaigns.
  • go vet, gofmt -l ./internal/, go install ./cmd/llm-support all clean. --help output reviewed.
  • Manual smoke test post-merge against yamtorah.estr.in/sprint/1.1.2-I_tier_c_classifier_microservice:
    rm -rf .planning/sprints/active/1.1.2-I_tier_c_classifier_microservice/code-review/multi-agent
    llm-support multi_review \
      --reviewers bruce,greta,kai,mira,dax,otto \
      --serial-reviewers dax,otto \
      --repo ~/Documents/GitHub/yamtorah.estr.in \
      --base 10054c8 --head 9326759 \
      --openclaw-host samestrin@nucleus.lan \
      --output-dir .planning/sprints/active/1.1.2-I_tier_c_classifier_microservice/code-review/multi-agent
    Pass criteria: 5+/6 reviewers status: ok with tdLineCount > 0; multi-review-summary.json.totalFindings > 10; no reviewer reply contains "clone missing" / "directory doesn't exist" / "refs not found"; container's /tmp/multi-review-<ts>/ is removed after run; host's /tmp/multi-review-staging-<ts>/ is removed.

Adversarial review notes

Walked through 15 scenarios including container/host ordering, base64 transport, docker cp container:path shell parsing, container restart mid-run, parallel reads of diff.txt, partial-state leaks. The cleanup-deferred-after-ship bug (pre-existing from PR #29) was found and fixed.

Files changed

New

  • internal/support/multireview/container.go + _test.go

Modified

  • internal/support/multireview/bundle.go + _test.go
  • internal/support/multireview/diff.go + _test.go
  • internal/support/commands/multi_review.go + _test.go
  • docs/llm-support-commands.md
  • CHANGELOG.md

samestrin added 6 commits May 13, 2026 15:11
…he gateway container

Wraps:  ssh <host> -- docker exec <c> sh -c "$(echo <b64> | base64 -d)"

The base64 round-trip removes all quoting concerns for the caller's
command. Mirrors SSHRun semantics: non-zero exit is data, not error;
returned error means the call could not be executed.

8 unit tests via the execCommandContext swap. No real SSH, no real
docker. 100% coverage of the new function.

Needed because the openclaw-gateway container has no /tmp bind mount,
so prior runs (PR #29, PR #30) wrote bundles and diffs to host /tmp
where reviewers running inside the container could never see them.
Subsequent commits will route ShipBundle and PreComputeDiff through
this helper.
The openclaw-gateway container has no /tmp bind mount, so the previous
sequence (mkdir + scp + clone, all on host) created files reviewers
could never see. New sequence:

  1. ssh -- docker exec mkdir <container workdir>
  2. ssh -- mkdir <host staging dir>
  3. scp bundle.git -> host staging
  4. ssh -- docker cp <host staging>/bundle.git <container>:<workdir>/
  5. ssh -- docker exec git clone (inside container workdir)

ShipBundleParams gains:
- GatewayContainer (defaults to openclaw-gateway)
- HostStagingDir (required; scp landing pad)

ShipBundleResult.RemoteRepoPath now returns a container-local path.
RemoteBundlePath was renamed to HostStagingBundlePath since the bundle
on disk lives on the host, not the container.

7 bundle tests verify the 5-call sequence, container defaults,
field-required validations, and per-step failure modes
(container-mkdir fail, docker-cp fail, container-clone fail). The
RemoteRepoPath-is-container-path invariant is asserted directly.

multi_review_test.go mock updated to the renamed field. All existing
tests still pass.
git diff + wc now run inside the openclaw-gateway container via
ContainerExec, so the resulting diff.txt lands in the container's
overlay /tmp where reviewers can actually see it. Writing to the
host's /tmp (PR #30's approach) put the file in a filesystem the
container couldn't read.

PreComputeDiffParams gains GatewayContainer (defaults to
"openclaw-gateway"). Inner pipeline shape and wc parsing are
unchanged.

2 new tests:
- TestPreComputeDiff_RoutedThroughContainer: outer ssh contains
  'docker exec', container name, and 'base64 -d'; decoded inner
  is still 'git ... diff > diff.txt && wc -c && wc -l'
- TestPreComputeDiff_DefaultsContainerName: empty GatewayContainer
  defaults via ContainerExec

Existing tests updated: HeadRefDefaultsToHEAD now decodes the
base64 inner payload to find the quoted 'HEAD' token (the outer
ssh args no longer contain it in clear text).

Coverage 91.3% on PreComputeDiff.
…n two

runMultiReview now passes:
- GatewayContainer into BOTH ShipBundleParams and PreComputeDiffParams
- HostStagingDir into ShipBundleParams as /tmp/multi-review-staging-<ts>

Cleanup is now two defers (LIFO):
1. ContainerExec rm -rf <container workdir>  via containerExecFn
2. SSHRun        rm -rf <host staging dir>   via sshRunFn

Both honor --skip-cleanup. The new sshRunFn and containerExecFn
package vars let tests assert on cleanup commands deterministically.

4 new tests:
- GatewayContainerFlagsThreaded: --gateway-container value reaches both
  ShipBundle and PreComputeDiff (different default name verifies the
  thread isn't just defaulting both ends)
- HostStagingDirThreadsToShipBundle: HostStagingDir is set, has the
  /tmp/multi-review-staging- prefix, and differs from RemoteWorkdir
- BothCleanupsFire: on success, container rm -rf <workdir> and host
  rm -rf <staging> both run
- SkipCleanupSkipsBoth: --skip-cleanup suppresses both channels

Coverage on runMultiReview: 94.3%.
…LOG entry

docs/llm-support-commands.md: replaces the pre-compute-step section
with a 6-step container-routed pipeline (mkdir container + mkdir host
staging + scp + docker cp + git clone in container + diff pre-compute
via docker exec). Adds failure-semantics rows for the new container
failure modes.

CHANGELOG.md: new Unreleased fix entry under llm-support describing
the host/container filesystem boundary bug and the ContainerExec /
ShipBundle / PreComputeDiff refactor. Includes the breaking rename
of ShipBundleResult.RemoteBundlePath -> HostStagingBundlePath.
Adversarial review surfaced a pre-existing (PR #29) bug: if ShipBundle
fails partway (e.g. container mkdir succeeds, then docker cp fails),
the container workdir leaks because cleanup defers were registered
only AFTER ShipBundle returned success.

Fix: register both cleanup defers BEFORE the ShipBundle call. rm -rf
on a path that was never created is a harmless no-op, so this is safe.

New test TestMultiReview_CleanupFiresOnShipBundleFailure verifies
both cleanup channels fire even when ShipBundle returns an error.

Also cleaned up dead code in TestShipBundle_ContainerCloneFails (the
test had an unused first execCommandContext mock immediately
overwritten by a second). And tightened the --help text to mention
container routing and clarify the diff lives inside the container,
not on the remote host.
@samestrin samestrin merged commit 1bb5d1a into main May 13, 2026
8 checks passed
@samestrin samestrin deleted the fix/multi-review-container-fs branch May 13, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant