feat(deps): add Ollama service dependency with provisioning and cache support by dpup · Pull Request #238 · majorcontext/moat

dpup · 2026-03-17T03:01:13Z

Summary

Adds Ollama as a service dependency following the established postgres/mysql/redis pattern
Introduces generic provisioning primitives (provisions_key, provision_cmd, cache_path) on ServiceDef — reusable by any future service
Agent declares ollama in dependencies: and lists models under services.ollama.models; moat starts sidecar, pulls models, caches on host, injects MOAT_OLLAMA_* env vars
Includes code review fixes: provision item validation (prevents shell injection), per-item timeouts, per-item flock scope, tightened cache permissions

Test plan

🤖 Generated with Claude Code

claude · 2026-03-17T03:01:30Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 44s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go)
Review config changes (internal/config/config.go)
Review container changes (docker/apple service files)
Review registry and types changes
Review tests
Review docs and examples
Review miscellaneous files (todos/, compound-engineering.local.md)
Write up review summary

Summary

The overall design is clean and follows the existing postgres/mysql/redis pattern well. Shell injection is correctly addressed with a strict allowlist regex, the cache/provision fields are well-factored as generic ServiceDef primitives, and the tests are thorough. A few real bugs and process concerns need attention.

Bugs

1. Scalar models: value is silently swallowed — medium severity

In config.go:196, when a user writes a scalar (models: qwen2.5:7b) instead of a list, the unmarshaler captures it as Extra["models"] = nil. Then in services.go:178, provisions = userSpec.Extra["models"] resolves to nil, and the loop never validates or errors. The user gets no models pulled and no diagnostic.

// config.go:190-197
} else {
    // Capture unknown non-sequence keys as single-element entries
    // so the run layer can validate and reject them with useful errors.
    if s.Extra == nil {
        s.Extra = make(map[string][]string)
    }
    s.Extra[key] = nil // nil signals "key present but not a list"
}

The run layer never actually validates this nil signal for a known provisions key. It needs an explicit check:

// services.go — after retrieving provisions
provisions = userSpec.Extra[spec.Service.ProvisionsKey]
// BUG: if provisions is nil but key exists, the user wrote a scalar — error here

Fix this →

2. Misleading comment in provisionService — low severity but will mislead future readers

services.go:278:

// Acquire lock per-item so parallel runs pulling different items don't serialize.

This comment is incorrect. The lock file is a single file under cfg.CacheHostPath, shared by all concurrent moat run processes. Two runs pulling different models still serialize — they both wait on the same .lock. The per-item granularity only helps within a single run (prevents one run's models from blocking each other). This is already tracked as todo/005 but the comment should accurately describe current behavior.

Fix this →

Files that should not be committed

3. compound-engineering.local.md — this file appears to be local tooling configuration for a "compound engineering" review agent system. It has .local.md in the name, suggesting it's a local-only file. It has no value for contributors and will confuse anyone who clones the repo. Fix this →

4. todos/ directory (9 files) — these are code-review work items generated by the compound engineering system. They shouldn't live in the repository:

They duplicate what GitHub Issues provide, without the notifications, assignment, or searchability
They'll create noise for contributors and in git log
They reference ephemeral implementation state ("P1", "pending") that goes stale

Convert these to GitHub issues (or close them if already addressed) and remove the directory. Fix this →

Minor issues

5. cancel() without defer in provisioning loop (services.go:276–296)

The context cancel is called explicitly on every exit path, which works, but any future added return path before the cancel() call will silently leak the context. The idiomatic fix is to extract the loop body into a named helper and use defer cancel() there.

for _, cmd := range cmds {
    cmdCtx, cancel := context.WithTimeout(ctx, provisionTimeout)
    // ... cancel() called explicitly on each exit path
    // if a new return is added above cancel(), context leaks
}

6. Example moat.yaml is missing the agent field (examples/service-ollama/moat.yaml)

The agent field is declared without omitempty in the Config struct, and other examples (presumably) include it. This example uses command: to run a shell script directly, which may be intentional, but it's worth confirming whether moat run requires agent: or if command: alone is sufficient.

What's good

Injection defense: The validProvisionItem regex (^[a-zA-Z0-9][a-zA-Z0-9._:/@-]*$) correctly blocks ;, |, $, `, &, space, and other shell metacharacters. Validation happens before any shell execution.
Generic design: provisions_key / provision_cmd / cache_path in ServiceDef are cleanly reusable for any future service that needs post-start provisioning.
No-auth services: The needsPassword guard properly skips password generation for Ollama, and the tests verify this.
Per-item flock: Acquiring/releasing flock per model (rather than holding it for the full batch) is the right direction, even if cross-run serialization remains a todo.
Test coverage: The test file covers injection attempts, valid names, no-auth behavior, unknown key rejection, and env generation for Ollama. Good breadth.
Error messages: The "did you mean?" hint for wrong provision key names is a nice touch.
Cache permissions: 0o700 is tight and correct.

- Validate scalar provision keys (e.g., `models: value` instead of list) with clear "must be a list" error message - Fix misleading flock comment — parallel runs still serialize on shared lock file; extracted provisionItem helper with defer cancel() - Remove compound-engineering.local.md (local tooling, not for repo) - Remove todos/ directory (tracked as GitHub issues instead)

claude · 2026-03-17T03:08:48Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 7s —— View job

Code Review

Gather context and read changed files
Review core provisioning logic (internal/run/services.go)
Review config changes (internal/config/config.go)
Review container changes (docker/apple service files)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

The previous review issues (scalar value detection, misleading lock comment, todos/ and compound-engineering.local.md cleanup) have all been addressed. The core implementation is solid. A few remaining issues below.

Bugs

1. Docs security model now inaccurate — docs/content/guides/08-services.md:246

**Authenticated** — Every service requires a password, even on the isolated network.

This is no longer true with Ollama. Needs a carve-out (e.g., "Services that support authentication require a password...") or just remove the absolute claim.

Fix this →

2. Readiness check table is misleading — docs/content/guides/08-services.md:192

| `ollama` | `ollama list` + pull declared models |

Model pulling is NOT part of the readiness check — it runs after readiness passes. CheckReady only runs ollama list. The provisioning step is a separate phase. This will mislead users trying to understand why startup takes a long time ("it's ready, why is it still pulling?").

Fix this →

Minor issues

3. Startup lifecycle missing provisioning step — docs/content/guides/08-services.md:154–166

The documented startup sequence goes:

5. Run readiness checks (poll every 1 second, timeout after 30 seconds)
6. Inject MOAT_* environment variables

For Ollama the actual sequence is 5 → pull models → 6. Since model pulls can take minutes, the missing step is the most user-visible part of the flow and should be documented (perhaps as a conditional step: "5b. Run provisioning commands (e.g., pull models)").

Fix this →

4. Model pull output bypasses run logs — internal/run/manager.go:1968

if err := provisionService(ctx, svcMgr, info, svcConfigs[i], os.Stderr); err != nil {

provisionService writes to os.Stderr directly, so model pull progress is not captured in ~/.moat/runs/<id>/logs.jsonl. A user running moat logs <run-id> after a failed pull won't see the pull output. If the intent is to show live progress during the pull, consider also routing through the storage.LogWriter (tee to both). Low priority since this is logged to the terminal during the run, but worth tracking.

5. provisionItem comment references todo/005 — internal/run/services.go:293

// See todo/005 for narrowing to per-model locks.

The todos/ directory was removed per the previous review. This reference is now a dead link. Either replace with a GitHub issue reference or just drop the "See todo/005" clause.

Fix this →

What looks good

All previous review feedback addressed: scalar value detection, lock comment accuracy, todos/ and compound-engineering.local.md removed.
provisionItem flock + defer: LIFO defer ordering is correct — LOCK_UN fires before cancel(). Each item gets a fresh 30-minute window. No context leak.
serviceUsesPasswordPlaceholder: Correctly handles Redis-style auth (no password_env, but {password} in extra_cmd/readiness_cmd).
Injection defense unchanged and correct: validProvisionItem blocks all shell metacharacters; validation happens before any exec.
Empty list handled correctly: models: [] → empty Extra["models"] → no provisioning, no error.
Container implementations consistent: Docker and Apple ProvisionService both fail-fast on non-zero exit.
Test breadth: injection tests, valid names, no-auth behavior, scalar rejection, unknown key rejection, env generation. Very thorough.

- Fix readiness table: Ollama check is just "ollama list", model pulling is a separate provisioning step - Add provisioning step (step 6) to startup sequence docs - Fix security model docs: not all services require passwords (Ollama has no auth, relies on network isolation) - Remove dead todo/005 reference from provisionItem comment

claude · 2026-03-17T03:15:25Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 5s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go)
Review config changes (internal/config/config.go)
Review container changes (docker/apple service files)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

Both previous rounds of feedback have been addressed. The implementation is solid. Two remaining issues below; one is a real behavior bug, one is a documentation gap.

Bugs

1. wait: false silently skips model provisioning — medium severity

manager.go:1946:

if !wait {
    continue  // skips BOTH readiness check AND the provisioning block below
}

If a user declares models under services.ollama.models and also sets wait: false, moat starts the agent immediately and never pulls the models. There's no warning and no error — the agent just finds Ollama running with no models available.

For databases, skipping both readiness and provisioning on wait: false is fine (there's nothing to provision). For Ollama it's a silent failure mode. The model pull logically depends on readiness anyway (you can't ollama pull until the server accepts connections), but the user doesn't know this.

Minimum fix: detect the combination and return an error or warning:

if !wait {
    if svcConfigs[i].ProvisionCmd != "" && len(svcConfigs[i].Provisions) > 0 {
        return nil, fmt.Errorf("%s: wait: false is incompatible with declared models — "+
            "models cannot be pulled until the service is ready", dep.Name)
    }
    continue
}

Alternatively, document it in the wait: false section of the guide and reference docs.

Fix this →

Documentation

2. MOAT_OLLAMA_HOST and MOAT_OLLAMA_PORT missing from reference docs — minor

docs/content/reference/02-moat-yaml.md documents only MOAT_OLLAMA_URL in the Ollama env var table, but the code also injects MOAT_OLLAMA_HOST and MOAT_OLLAMA_PORT (from generateServiceEnv — every service gets _HOST and _PORT). Postgres, MySQL, and Redis all document their full variable sets. The Ollama table should list all three:

Variable	Description	Example
`MOAT_OLLAMA_HOST`	Service hostname	`ollama`
`MOAT_OLLAMA_PORT`	Service port	`11434`
`MOAT_OLLAMA_URL`	Base URL for the Ollama API	`http://ollama:11434`

Fix this →

Minor nit

demo.sh truncates JSON mid-stream — examples/service-ollama/demo.sh:16

| head -c 500

head -c 500 cuts at a byte boundary, not a JSON boundary. The demo output will show broken JSON for any non-trivial response. Use | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['response'])" or just drop the truncation — the response field value is what users actually want to see.

What's good

All round 1 and round 2 feedback addressed: scalar value detection, lock comment, todos/ removal, docs security model, readiness table, provisioning step in startup sequence.
provisionItem defer ordering correct: LIFO means LOCK_UN fires before cancel() — correct.
Empty models list handled: models: [] → no provisioning, no error.
serviceUsesPasswordPlaceholder: correctly handles Redis; Ollama correctly gets no password.
Shell injection defense solid: validProvisionItem regex blocks all metacharacters, validated before exec.
Tests comprehensive: injection attempts, valid names, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building.
Docs are accurate (modulo issue Add Claude Code GitHub Workflow #2 above): startup sequence, readiness table, security model all updated correctly.

- Error on wait: false with provisioned items (models can't be pulled until service is ready — silent failure before this fix) - Add MOAT_OLLAMA_HOST and MOAT_OLLAMA_PORT to reference docs - Remove head -c truncation from demo.sh (was breaking JSON output)

claude · 2026-03-17T03:21:43Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 4m 18s —— View job

Code Review

Gather context and understand PR scope
Review previous round feedback resolution
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker/apple service files)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All three previous rounds of feedback have been addressed. The implementation is solid. A few remaining minor items below.

Still present from round 2 (low priority)

1. Provision output bypasses run logs — internal/run/manager.go:1982

if err := provisionService(ctx, svcMgr, info, svcConfigs[i], os.Stderr); err != nil {

Model pull progress writes directly to os.Stderr and is not captured in ~/.moat/runs/<id>/logs.jsonl. A user running moat logs <run-id> after a failed pull won't see the pull output. This was flagged as low priority in round 2 and is still present. Fine to merge as-is, but tracking the gap is worth it.

New findings

2. syscall.Flock is not context-aware — internal/run/services.go:299 — low severity

if err := syscall.Flock(int(lockFile.Fd()), syscall.LOCK_EX); err != nil {

If two moat run processes race for the lock and one is cancelled (Ctrl+C, timeout), Flock(LOCK_EX) blocks until the other process releases it — it does not respect Go context cancellation. For typical usage (model pulls that succeed quickly), this is invisible. But if a 30-minute pull is interrupted by the user while a second run is waiting, the second run will not exit cleanly until the first run's lock is released (which happens on process exit, so it's fine eventually). Not a correctness bug — just worth noting for anyone debugging a stuck moat run.

Fix this →

3. No unit test for wait: false + provisions error path — internal/run/manager.go:1946–1960

The new guard that rejects wait: false when provisions are declared has no unit test. All the other validation in buildServiceConfig is tested in services_test.go, but this new path lives in manager.go and only exercises via integration or E2E. Given that wait: false is an edge case, this is low priority, but worth tracking.

Nit

4. Docker ProvisionService inspects exec before Docker finishes writing exit code — internal/container/docker_service.go:101–110

_, _ = stdcopy.StdCopy(stdout, stdout, resp.Reader)
resp.Close()

execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)

There's a well-known narrow race in the Docker SDK: after the exec's output stream closes (EOF), Docker may not have committed the exit code to its internal store yet. ContainerExecInspect called immediately after could return exit code 0 on a command that actually failed. In practice this is very rare, and it's the same pattern used in CheckReady — so this is pre-existing and acceptable. Not blocking, just worth being aware of if ollama pull failures are ever reported as passing silently.

What's good

All round 1–3 feedback addressed: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model corrected, readiness table fixed, startup lifecycle updated, wait: false + provisions error added, MOAT_OLLAMA_HOST/MOAT_OLLAMA_PORT in reference docs, demo.sh truncation removed.
provisionItem defer ordering correct: LIFO means LOCK_UN fires before cancel() — correct sequencing.
Shell injection defense: validProvisionItem regex blocks all shell metacharacters; validated before any exec reaches the container.
No-auth path: Ollama correctly gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
wait: false guard: actionable error message with clear fix hint.
Empty models: list: no provisioning, no error.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation — thorough.
Docs: startup sequence, readiness table, security model, and env var reference all accurate and current.
Example: moat.yaml + demo.sh are clean and self-contained.

This is ready to merge. The remaining items are all low-priority follow-ups, not blockers.

claude · 2026-03-17T03:58:43Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 49s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker/apple service files)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All previous round feedback has been addressed. The implementation is solid. One new real bug introduced in this round, plus a minor issue worth noting.

Bug

1. Docker exec retry loop silently returns success if exec is still Running after retries — internal/container/docker_service.go:77–96 (CheckReady) and :121–140 (ProvisionService)

var exitCode int
for attempt := 0; attempt < 3; attempt++ {
    execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)
    if err != nil {
        return fmt.Errorf(...)
    }
    if !execInspect.Running {
        exitCode = execInspect.ExitCode
        break
    }
    if attempt < 2 {
        time.Sleep(100 * time.Millisecond)
    }
}
if exitCode != 0 {
    return fmt.Errorf(...)
}
return nil

If Docker still reports the exec as Running after all three attempts, the loop exits without hitting the break. exitCode stays at its zero value. if exitCode != 0 is false. Both CheckReady and ProvisionService return nil — a false success.

For ProvisionService this means ollama pull could be silently treated as complete while the pull is still in progress, and the agent starts without the model available.

The fix is to track whether the break was actually hit:

var exitCode int
complete := false
for attempt := 0; attempt < 3; attempt++ {
    execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)
    if err != nil {
        return fmt.Errorf(...)
    }
    if !execInspect.Running {
        exitCode = execInspect.ExitCode
        complete = true
        break
    }
    if attempt < 2 {
        time.Sleep(100 * time.Millisecond)
    }
}
if !complete {
    return fmt.Errorf("exec still running after retries")
}
if exitCode != 0 {
    return fmt.Errorf(...)
}

Fix this →

Minor

2. flockContext goroutine outlives context cancellation — internal/run/services.go:311–324

When ctx.Done() fires, flockContext returns ctx.Err() immediately, but the goroutine doing syscall.Flock(LOCK_EX) stays blocked. The channel is buffered (make(chan error, 1)) so the goroutine won't hang forever — it will eventually complete when the flock becomes available or when the fd is closed by defer lockFile.Close(). But between context cancellation and goroutine completion, there's a window where the goroutine acquires the lock without any corresponding LOCK_UN call. The lock gets released on lockFile.Close(), which is soon after, so this is benign in practice. Worth noting if debugging unexpected lock delays.

What's good

All round 1–4 feedback addressed: scalar detection, lock comment, todos/+compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, demo.sh cleaned up, flock goroutine now context-aware.
flockContext design: buffered channel means the goroutine can always drain without blocking; defer ordering in provisionItem is correct (LOCK_UN fires before cancel).
Shell injection defense: validProvisionItem regex correctly rejects all shell metacharacters; validation runs before any exec.
No-auth path: Ollama gets no password; Redis {password} placeholder detection is correct.
wait: false guard: clear error with actionable fix hint.
lw.Close() via defer in provisioning loop: safe — deferred to function exit, all writers get closed.
Config scalar detection: nil signal from UnmarshalYAML is explicitly checked in buildServiceConfig with a useful error.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive.
Docs: accurate and in sync with implementation.

The Docker retry false-success is the only remaining real bug. Once fixed, this is ready to merge.

dpup · 2026-03-17T21:23:57Z

Fixed the Docker exec retry false-success bug in both CheckReady and ProvisionService. Added a complete bool that's only set when the break fires — if the loop exhausts all retries with the exec still Running, we now return an error instead of silently succeeding.

The two low-priority items (provision output bypassing run logs, and the goroutine outliving context cancellation on lock contention) are noted — tracking separately rather than blocking merge.

claude · 2026-03-17T21:24:06Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 2m 12s —— View job

Claude finished @dpup's task ——

Code Review

Gather context and understand PR scope
Review the Docker exec retry fix (internal/container/docker_service.go)
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes and tests
Review docs and examples
Write up review summary

The Docker exec retry fix is correct. All five rounds of previous feedback have been addressed. The implementation is solid and ready to merge.

The fix (round 5)

docker_service.go:72–90 (CheckReady) and :124–145 (ProvisionService):

complete := false
for attempt := 0; attempt < 3; attempt++ {
    execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)
    ...
    if !execInspect.Running {
        exitCode = execInspect.ExitCode
        complete = true
        break
    }
    ...
}
if !complete {
    return fmt.Errorf("... exec still running after retries")
}

The complete bool is set only inside the break path. If the loop exhausts all retries with the exec still Running, !complete catches it and returns an error. Previously exitCode stayed at zero, silently returning success.

Minor observations (non-blocking)

defer lw.Close() inside service loop — manager.go:1998–2001

for i, dep := range serviceDeps {
    ...
    if lw, lwErr := store.LogWriter(); lwErr == nil {
        provOut = io.MultiWriter(os.Stderr, lw)
        defer lw.Close()
    }

defer applies to function exit, not loop iteration. If multiple services ever have provisions, each iteration adds another deferred close. Currently only Ollama has provisions so this never fires more than once in practice. Worth knowing if new provision-capable services are added.

demo.sh outputs raw JSON — examples/service-ollama/demo.sh:13–15

The head -c 500 truncation was correctly removed, but the curl /api/generate output is now a raw JSON blob with the model response embedded. Cosmetic — doesn't affect functionality.

What's good

Docker retry fix: correct on both CheckReady and ProvisionService paths — no false-success possible.
All previous feedback addressed: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, flockContext goroutine, demo.sh truncation.
flockContext: buffered channel prevents goroutine leak; defer ordering in provisionItem is LIFO-correct (LOCK_UN before cancel).
Shell injection defense: validProvisionItem regex blocks all metacharacters; validated before any exec.
No-auth path: Ollama gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
Cache directory creation: MkdirAll with 0o700 at manager.go:1912 runs before StartService — bind mount is safe.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive.
Docs: startup sequence, readiness table, security model, and env var reference all accurate.

This is ready to merge.

dpup · 2026-03-17T21:26:44Z

Both low-priority items addressed:

Provision output — already captured: the io.MultiWriter(os.Stderr, lw) tee is in place at manager.go:1997–2000.
flockContext goroutine — no code change needed. When context cancellation fires, the goroutine stays blocked on Flock until the caller's deferred lockFile.Close() closes the fd, which causes Flock to return EBADF and the goroutine to terminate. The buffered channel prevents any blocking on send. Added a comment to flockContext documenting this so the behavior is obvious to future readers.

claude · 2026-03-17T21:26:54Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 2m 9s —— View job

Code Review

Gather context and understand PR scope
Review Docker exec retry fix (internal/container/docker_service.go)
Review flockContext comment (internal/run/services.go)
Review provision output capture (internal/run/manager.go)
Review remaining changed files
Write up review summary

All previous round feedback has been addressed. The implementation is solid. This is ready to merge.

The fixes (this round)

Docker exec retry false-success (docker_service.go:72–90, 124–145) — correctly fixed with complete bool on both CheckReady and ProvisionService. The complete flag is only set inside the break path; exhausting all 3 retries with exec still Running now returns an error rather than silently returning nil.

flockContext goroutine lifecycle (services.go:308–315) — the comment accurately documents the shutdown path: context cancellation returns immediately, the blocked goroutine eventually wakes when lockFile.Close() closes the fd (EBADF), the buffered channel prevents any goroutine leak. Correct.

Provision output capture (manager.go:1997–2000) — io.MultiWriter(os.Stderr, lw) correctly tees pull output to both the terminal and ~/.moat/runs/<id>/logs.jsonl. The concern from round 2 was already handled.

Minor observations (non-blocking, tracked for awareness)

CheckReady uses io.Copy instead of stdcopy.StdCopy (docker_service.go:67)

ProvisionService correctly uses stdcopy.StdCopy to demultiplex Docker's stdout/stderr stream. CheckReady uses io.Copy(io.Discard, ...) — functionally fine since we discard the output, but it reads the raw multiplexed wire format. If anyone ever changes CheckReady to capture output for debugging, they'll get garbled bytes. No action needed now.

defer lw.Close() accumulates across service loop iterations (manager.go:2000)

for i, dep := range serviceDeps {
    if lw, lwErr := store.LogWriter(); lwErr == nil {
        provOut = io.MultiWriter(os.Stderr, lw)
        defer lw.Close()   // defers to function exit, not loop iteration
    }

Each provision-capable service in the loop adds another deferred close. Currently only Ollama has provisions so this never fires more than once. Fine to leave as-is, but worth noting if more provision-capable services are added.

demo.sh outputs raw JSON (examples/service-ollama/demo.sh:13–15)

curl /api/generate with stream: false returns a single JSON blob. The demo output will be one line of dense JSON rather than the readable model response. Cosmetic only — doesn't affect functionality or correctness.

What's good

All round 1–5 feedback resolved: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, flockContext goroutine docs, provision output tee, Docker retry false-success.
Shell injection defense: validProvisionItem regex blocks all metacharacters; validated before any exec.
No-auth path: Ollama gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
wait: false guard: clear, actionable error message.
Cache directory: MkdirAll with 0o700 runs before StartService; bind mount is safe.
LIFO defer ordering: LOCK_UN fires before cancel() in provisionItem — correct.
Empty models: list: no provisioning, no error.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive.
Docs: startup sequence, readiness table, security model, env var reference all accurate.

claude · 2026-03-17T21:29:31Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 2s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker/apple service files)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All six prior rounds of feedback are addressed. The two low-priority items from the last round are now fixed:

Log writer lifecycle (manager.go:1997–2007): The IIFE correctly scopes defer lw.Close() to each provisioning call rather than accumulating until function exit. Clean.
flockContext goroutine shutdown (services.go:308–315): The added comment accurately documents the shutdown path — context cancellation returns early, the goroutine eventually unblocks via EBADF when lockFile.Close() fires, the buffered channel prevents any goroutine leak. Correct.

Bug

demo.sh uses python3 but it's not in the dependencies — examples/service-ollama/demo.sh:15

| python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

The agent container image is resolved from declared dependencies. With only ollama@0.9 and curl in dependencies:, the base image is debian:bookworm-slim, which does not include Python. The demo will fail at the JSON parsing step.

Fix: add jq to dependencies and use jq -r .response instead — jq is already defined in the registry.

# moat.yaml
dependencies:
  - ollama@0.9
  - curl
  - jq

# demo.sh
  | jq -r .response

Fix this →

What's good

All prior feedback resolved: scalar detection, lock comment, todos/ and compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, Docker retry false-success, IIFE log writer, flockContext docs.
Shell injection defense: validProvisionItem regex blocks all metacharacters; validation runs before any exec reaches the container.
No-auth path: Ollama gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
LIFO defer ordering: LOCK_UN fires before cancel() in provisionItem.
Cache directory: MkdirAll with 0o700 before StartService; bind mount is always safe.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive.
Docs: startup sequence, readiness table, security model, env var reference all accurate.

Fix the demo.sh dependency and this is ready to merge.

claude · 2026-03-17T21:32:48Z

+echo "--- Generating response ---"
+curl -s "$MOAT_OLLAMA_URL/api/generate" \
+  -d '{"model":"qwen2.5-coder:1.5b","prompt":"Write hello world in Go","stream":false}' \
+  | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"


The base image for this example is debian:bookworm-slim (no Python runtime in dependencies:), so python3 won't be available. The demo will fail here.

Fix: add jq to moat.yaml dependencies and use:

Suggested change

| python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"

| jq -r .response

Then in moat.yaml:

dependencies: - ollama@0.9 - curl - jq

claude · 2026-03-17T21:56:57Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 2m 7s —— View job

Claude finished @dpup's task ——

Code Review

Gather context and read changed files
Fix outstanding demo.sh python3/jq issue
Verify no new issues from latest commits
Write up review summary

The latest commit (36ee2ae) correctly updates Ollama version tags to match actual Docker Hub tags — the previous 0.9-style versions don't exist, causing image pull 404s. Good catch and clean fix.

One outstanding issue from the previous round remained unfixed:

Fixed: `demo.sh` used `python3` without it in dependencies

examples/service-ollama/demo.sh:15 used python3 to parse the JSON response, but the base image for this example is debian:bookworm-slim (no Python runtime in dependencies:), so the demo would fail at that step. Fixed directly: added jq to moat.yaml dependencies and replaced the Python pipe with jq -r .response.

Everything else looks good

All prior review rounds have been addressed:

Scalar models: value detection and error
Lock comment accuracy
todos/ and compound-engineering.local.md removed
Docs security model, readiness table, startup lifecycle updated
wait: false + provisions guard with actionable error
MOAT_OLLAMA_HOST/MOAT_OLLAMA_PORT in reference docs
Docker exec retry false-success fixed with complete bool
flockContext goroutine shutdown documented
IIFE scoping for lw.Close() in provisioning loop
Provision output teed to logs.jsonl via io.MultiWriter

This is ready to merge.

claude · 2026-03-17T22:05:46Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 2m 26s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker_service.go, apple_service.go)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All previous round feedback has been resolved. One real doc bug and one cosmetic issue remain.

Bug

ollama@0.9 is not a valid version — will fail validation — docs/content/guides/08-services.md:203 and docs/content/reference/02-moat-yaml.md:269

Both docs show:

- ollama@0.9

But the registry only allows ["0.16.3", "0.17.7", "0.18.0", "0.18.1"]. Moat validates against the versions: list and returns an error:

invalid version "0.9" for ollama

  Available versions: 0.16.3, 0.17.7, 0.18.0, 0.18.1
  Or omit the version to use the default (0.18.1)

Any user copying the doc examples will hit this error immediately. The working example in examples/service-ollama/moat.yaml already correctly uses ollama@0.18.1 — the docs just need to match.

Fix this →

Nit

api/tags response in demo.sh is raw JSON — examples/service-ollama/demo.sh:9

curl -s "$MOAT_OLLAMA_URL/api/tags"

jq is already declared as a dependency, so the output could be formatted with | jq . for consistency with the generate step. Not functional — the demo works — but the model list will be one line of dense JSON.

What's good

All prior round feedback addressed: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, Docker retry false-success (with complete bool), flockContext goroutine docs, IIFE log writer scoping, provision output tee, demo.sh python3 → jq, status messages for wait/pull.
flockContext shutdown: well-documented — context cancellation returns early, goroutine terminates on EBADF when lockFile.Close() fires, buffered channel prevents leak.
Injection defense: validProvisionItem regex (^[a-zA-Z0-9][a-zA-Z0-9._:/@-]*$) correctly rejects all shell metacharacters; validated before any exec.
No-auth path: Ollama gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
LIFO defer ordering in provisionItem: LOCK_UN fires before cancel() — correct.
Cache directory: MkdirAll with 0o700 before StartService; bind mount is always safe.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive.
Status messages: Waiting for %s to be ready... and Pulling N item(s) for %s: ... give clear user feedback during the potentially-long startup.

Fix the version string in docs and this is ready to merge.

claude · 2026-03-17T22:15:23Z

Claude finished @dpup's task in 3m 33s —— View job

📦 Previous Review (superseded by newer review)

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker_service.go, apple_service.go)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All eight previous rounds of feedback have been addressed. The implementation is solid. One doc bug remains unfixed from the previous round, plus a cosmetic nit.

Bug

ollama@0.9 still in docs — will fail validation — docs/content/guides/08-services.md:203 and docs/content/reference/02-moat-yaml.md:269

Both docs still show:

- ollama@0.9

The registry only permits ["0.16.3", "0.17.7", "0.18.0", "0.18.1"]. Any user copying these examples will immediately hit:

invalid version "0.9" for ollama
  Available versions: 0.16.3, 0.17.7, 0.18.0, 0.18.1

This was flagged in the previous round. Change ollama@0.9 to ollama@0.18.1 (or just ollama) in both files. Fix this →

Nit

api/tags output in demo.sh is raw JSON — examples/service-ollama/demo.sh:9

curl -s "$MOAT_OLLAMA_URL/api/tags"

jq is already in dependencies:. This could be | jq . for consistent formatting with the generate step below it. Purely cosmetic.

What's good

All prior rounds (1–8) resolved: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, Docker retry false-success (with complete bool), flockContext goroutine shutdown documented, IIFE log writer scoping, provision output tee, demo.sh python3 → jq, status messages for wait/pull.
Shell injection defense: validProvisionItem regex (^[a-zA-Z0-9][a-zA-Z0-9._:/@-]*$) blocks all metacharacters; validated before any exec reaches the container.
Docker retry fix: complete bool correctly prevents false-success on both CheckReady and ProvisionService when the exec is still Running after all retries.
flockContext goroutine shutdown: well-documented — context cancellation returns early, goroutine terminates on EBADF when lockFile.Close() fires, buffered channel prevents leak.
No-auth path: Ollama correctly gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
wait: false guard: actionable error message with clear fix hint.
LIFO defer ordering in provisionItem: LOCK_UN fires before cancel() — correct.
IIFE scoping for lw.Close() in provisioning loop: deferred close fires per provisioning call, not at function exit.
Provision output tee: io.MultiWriter(os.Stderr, lw) correctly captures pull progress in logs.jsonl.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building, wait:false guard conditions — comprehensive.
Docs accurate: startup sequence, readiness table, security model, env var reference all in sync with implementation.

Fix the ollama@0.9 version string and this is ready to merge.

claude · 2026-03-17T22:17:31Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 9s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker_service.go, apple_service.go)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All previous round feedback (rounds 1–8) has been addressed. One remaining doc bug is still present.

Bug

ollama@0.9 still in two doc files — will fail immediately for any user who copies the example

Both files still show the invalid version 0.9 (valid versions are 0.16.3, 0.17.7, 0.18.0, 0.18.1):

docs/content/guides/08-services.md:202: - ollama@0.9 (Ollama section example)
docs/content/reference/02-moat-yaml.md:269: | ollama@0.9 | Ollama | 11434 | (service dependencies table)

I committed a fix locally (18f5a80) changing both to ollama@0.18.1, but couldn't push due to remote divergence. The change is trivial — in each file, replace ollama@0.9 with ollama@0.18.1.

Fix this →

Nit

/api/tags output in demo.sh is raw JSON — examples/service-ollama/demo.sh:9

curl -s "$MOAT_OLLAMA_URL/api/tags"

jq is already a declared dependency, so | jq . would give readable output for the model list display. Cosmetic only — doesn't affect correctness.

What's good

Everything from all prior rounds is resolved:

Scalar models: detection: nil signal from UnmarshalYAML caught with a clear error
Lock comment accuracy: correctly describes per-lock-file behavior (not per-model)
todos/ + compound-engineering.local.md: removed
Docs security model, readiness table, startup lifecycle: all accurate
wait: false + provisions guard: clear error with actionable fix hint
MOAT_OLLAMA_HOST/PORT in reference docs: present alongside MOAT_OLLAMA_URL
Docker exec retry false-success: fixed with complete bool on both CheckReady and ProvisionService
flockContext goroutine lifetime: documented; buffered channel prevents leak; EBADF on fd close terminates the goroutine
IIFE for lw.Close(): correctly scoped per provisioning call, not accumulated to function exit
Provision output: teed to logs.jsonl via io.MultiWriter(os.Stderr, lw)
demo.sh: uses jq (no python3), qwen2.5-coder:0.5b for constrained environments, error surfacing via // (.error | "Error: " + .)
Shell injection defense: validProvisionItem regex blocks all metacharacters; validated before any exec
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive

Fix the ollama@0.9 version strings and this is ready to merge.

claude · 2026-03-17T22:19:17Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 14s —— View job

Code Review

Gather context and understand PR scope
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker_service.go, apple_service.go)
Review registry and types changes
Review tests
Review docs and examples
Fix remaining issues and write summary

The implementation is solid and all substantive feedback from prior rounds has been addressed. One doc bug persisted through multiple review iterations — fixed directly.

Fixed (committed)

ollama@0.9 in both doc files — docs/content/guides/08-services.md:203 and docs/content/reference/02-moat-yaml.md:269 still referenced ollama@0.9 despite this being flagged in the previous two rounds. The registry only allows ["0.16.3", "0.17.7", "0.18.0", "0.18.1"], so any user copying those examples would immediately hit:

invalid version "0.9" for ollama
  Available versions: 0.16.3, 0.17.7, 0.18.0, 0.18.1

Updated to ollama@0.18.1 in both files. Also piped api/tags through jq . in demo.sh for consistent formatting (jq is already a declared dependency).

What's good

All round 1–9 feedback resolved: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, Docker retry false-success (complete bool), flockContext goroutine shutdown documented, IIFE log writer scoping, provision output tee, python3 → jq in demo, version tags.
Shell injection defense: validProvisionItem regex (^[a-zA-Z0-9][a-zA-Z0-9._:/@-]*$) blocks all metacharacters; validated before any exec reaches the container.
No-auth path: Ollama correctly gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
wait: false guard: clear, actionable error message with explicit fix hint.
flockContext goroutine shutdown: well-documented — context cancellation returns early, goroutine terminates on EBADF when lockFile.Close() fires, buffered channel prevents leak.
Docker exec retry: complete bool correctly prevents false-success on both CheckReady and ProvisionService when exec is still Running after all retries.
LIFO defer ordering in provisionItem: LOCK_UN fires before cancel() — correct.
IIFE scoping for lw.Close() in provisioning loop: deferred close fires per-provisioning-call, not at function exit.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building, wait: false guard — comprehensive.
Docs accurate: startup sequence, readiness table, security model, and env var reference all in sync with implementation.

This is ready to merge.

claude · 2026-03-17T22:34:46Z

📦 Previous Review (superseded by newer review)

Claude finished @dpup's task in 3m 34s —— View job

Code Review

Gather context and read changed files
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker_service.go, apple_service.go)
Review registry and types changes
Review tests
Review docs and examples
Fix bug and write up review summary

All prior review rounds are resolved. Found and fixed one new bug.

Bug fixed (committed)

memory: leaked into Extra, breaking memory + models together — internal/config/config.go:175

The known map in ServiceSpec.UnmarshalYAML was missing "memory":

// Before
known := map[string]bool{"env": true, "image": true, "wait": true}
// After
known := map[string]bool{"env": true, "image": true, "wait": true, "memory": true}

memory: is a struct field decoded by value.Decode((*plain)(s)), but the scan loop also saw it and — since it's a scalar, not a sequence — added Extra["memory"] = nil. The provisioning validation then rejected it: services.ollama.memory is not a valid key (did you mean "models"?).

This broke examples/service-ollama/moat.yaml exactly as written (it uses both memory: 2048 and models:). No test caught it because TestBuildServiceConfigMemory constructs ServiceSpec directly rather than going through YAML parsing. Added TestServiceSpecUnmarshalMemoryNotLeakedToExtra to cover this path.

Everything else looks good

All prior round feedback (1–9): scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, Docker retry false-success (complete bool), flockContext goroutine shutdown documented, IIFE log writer scoping, provision output tee, demo.sh python3 → jq → proper formatting, Ollama version strings.
Shell injection defense: validProvisionItem regex (^[a-zA-Z0-9][a-zA-Z0-9._:/@-]*$) blocks all metacharacters; validated before any exec.
No-auth path: Ollama correctly gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
Docker exec retry: complete bool correctly prevents false-success on both CheckReady and ProvisionService.
flockContext: well-documented — context cancellation returns early, goroutine terminates on EBADF when lockFile.Close() fires, buffered channel prevents leak.
LIFO defer ordering: LOCK_UN fires before cancel() in provisionItem — correct.
IIFE scoping: lw.Close() defers per-provisioning-call, not at function exit.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building — comprehensive.
Docs: startup sequence, readiness table, security model, env var reference all accurate and in sync.

Ready to merge after this fix.

dpup · 2026-03-17T22:41:20Z

The missing "memory" in the known map was the right call — TestBuildServiceConfigMemory constructing ServiceSpec directly meant we never exercised the YAML path. The added TestServiceSpecUnmarshalMemoryNotLeakedToExtra closes that gap. Nothing else to address.

Adds design spec for Ollama as a service dependency with generalized provisions/cache extensions, and a detailed implementation plan with 10 tasks covering registry, types, config, service manager, and docs.

Add CachePath, ProvisionsKey, and ProvisionCmd fields to ServiceDef struct to support services that need host-side caching and post-start provisioning. Add ollama as a service dependency using these new fields.

… cache mount support Adds ProvisionService method to ServiceManager interface for executing provisioning commands inside service containers (e.g., pulling Ollama models). Also adds CachePath/CacheHostPath mount support to both Docker and Apple service managers so model/data caches can be persisted on the host.

…ceConfig - Guard password generation: only generate for services that have auth (password_env set, or extra_cmd/readiness_cmd use {password} placeholder) - Wire provisions from userSpec.Extra using registry's provisions_key - Resolve cache host path to ~/.moat/cache/<service-name> - Validate unknown extra keys: reject keys that don't match provisions_key

P1 fixes: - Validate provision items against regex to prevent shell injection via model names interpolated into sh -c commands - Fix error messages referencing nonexistent --service flag on moat logs P2 fixes: - Tighten cache directory permissions from 0755 to 0700 - Tighten lock file permissions from 0644 to 0600 - Change provision timeout from per-batch to per-item (30min each) - Narrow flock scope to per-item so parallel runs don't serialize P3 fixes: - Consolidate duplicated extra_env loop in buildServiceConfig - Capture unknown scalar keys in ServiceSpec YAML for validation - Add versions list and quote image in ollama registry entry

- Validate scalar provision keys (e.g., `models: value` instead of list) with clear "must be a list" error message - Fix misleading flock comment — parallel runs still serialize on shared lock file; extracted provisionItem helper with defer cancel() - Remove compound-engineering.local.md (local tooling, not for repo) - Remove todos/ directory (tracked as GitHub issues instead)

- Fix readiness table: Ollama check is just "ollama list", model pulling is a separate provisioning step - Add provisioning step (step 6) to startup sequence docs - Fix security model docs: not all services require passwords (Ollama has no auth, relies on network isolation) - Remove dead todo/005 reference from provisionItem comment

- Error on wait: false with provisioned items (models can't be pulled until service is ready — silent failure before this fix) - Add MOAT_OLLAMA_HOST and MOAT_OLLAMA_PORT to reference docs - Remove head -c truncation from demo.sh (was breaking JSON output)

- Context-aware flock: wrap syscall.Flock in a goroutine with select on ctx.Done() so provision lock acquisition respects context cancellation - Provision output to run logs: create RunStore before service provisioning and tee output to both stderr and the run's logs.jsonl via io.MultiWriter - Docker exec inspect race: retry ContainerExecInspect up to 3 times with 100ms backoff when the exec is still marked running after stream EOF - Add test validating wait:false + provisions preconditions

…pect retries Both CheckReady and ProvisionService had a bug where if ContainerExecInspect reported the exec as still Running after all 3 retry attempts, the loop would exit without setting the complete flag, exitCode would stay 0, and the function would return nil — a false success. For ProvisionService this meant ollama pull could be silently treated as complete while the pull was still in progress.

…ellation

Using defer lw.Close() inside a loop defers all closes until the outer function returns. Wrap the provision block in an IIFE so defer fires immediately after provisionService returns for each service. Also extract the response field from demo.sh JSON output.

Ollama publishes full semver tags (0.18.1, 0.18.0, ...), not short major.minor tags (0.9). The previous versions 0.6-0.9 do not exist on Docker Hub, causing a 404 on image pull. Update default to 0.18.1 and set versions list to recent releases. Update tests and example moat.yaml accordingly.

python3 is not available in debian:bookworm-slim (the base image when no Python runtime is declared). Use jq -r .response instead, which is available as an apt dependency.

…ng models

…s to jq

Add a 'memory' field to ServiceSpec so users can configure the memory limit for service sidecar containers (e.g., Ollama) independently of the main agent container's container.memory setting. Wire through ServiceConfig.MemoryMB -> SidecarConfig.MemoryMB, applied as Docker HostConfig.Resources.Memory and Apple --memory flag. Update the Ollama example to use memory: 2048 with qwen2.5-coder:1.5b.

The known map used to skip processing of parsed struct fields was missing "memory", causing memory: values to leak into Extra as nil. This caused buildServiceConfig to reject memory: as an unknown key when combined with provision-capable services (e.g. ollama). The example moat.yaml, which uses both memory: 2048 and models: [...], would fail at runtime. Also adds a test that exercises YAML with both memory: and models: to prevent regression.

claude · 2026-03-17T22:46:11Z

Claude finished @dpup's task in 3m 49s —— View job

Code Review

Gather context and read changed files
Review core provisioning logic (internal/run/services.go, internal/run/manager.go)
Review config changes (internal/config/config.go)
Review container changes (docker_service.go, apple_service.go)
Review registry and types changes
Review tests
Review docs and examples
Write up review summary

All ten previous rounds of review are addressed. I went through the current state of every key file looking for anything missed. The implementation is solid.

Final findings

1. No lock acquired on context cancellation — stale LOCK_UN deferred — low severity, cosmetic

In provisionItem (services.go:304–308):

if err := flockContext(cmdCtx, lockFile); err != nil {
    return err
}
defer func() { _ = syscall.Flock(int(lockFile.Fd()), syscall.LOCK_UN) }()

When flockContext returns ctx.Err() (context cancelled before lock was acquired), the deferred LOCK_UN still fires during unwind. On Linux, calling LOCK_UN on an fd that this process doesn't hold a lock on is harmless (it's a no-op). So this is not a correctness bug — just a slightly misleading defer registered on a path where no lock was obtained. The comment in flockContext documents the shutdown path thoroughly and the behavior is correct. Not worth changing.

2. TestBuildServiceConfigOllamaProvisionsIncompatibleWithWaitFalse name is slightly misleading — nit

The test validates the conditions the wait: false guard checks, but doesn't exercise the guard itself (which lives in manager.go). The test name implies end-to-end verification. The test body is correct and useful — just the name might mislead a reader into thinking this is a complete coverage of that error path. Not blocking.

3. flockContext goroutine can acquire lock after caller returns — documented, non-blocking

When context cancels mid-flockContext, there's a race: the goroutine could acquire the lock after the caller returns with an error. In that case, nobody reads from the done channel, but it's buffered so no leak. The lock is then released when lockFile.Close() fires from provisionService's defer. The comment in flockContext accurately documents this. Correct behavior, well documented.

What's good

All prior rounds (1–9) resolved: scalar detection, lock comment, todos/ + compound-engineering.local.md removed, docs security model, readiness table, startup lifecycle, wait: false guard, MOAT_OLLAMA_HOST/PORT in reference, Docker retry false-success (complete bool), flockContext goroutine shutdown documented, IIFE log writer scoping, provision output tee, demo.sh jq + version tags, memory field in known map + TestServiceSpecUnmarshalMemoryNotLeakedToExtra.
Shell injection defense: validProvisionItem regex (^[a-zA-Z0-9][a-zA-Z0-9._:/@-]*$) correctly blocks all shell metacharacters; validated before any exec reaches a container.
Docker exec retry: complete bool correctly prevents false-success on both CheckReady and ProvisionService when exec is still Running after all retries.
No-auth path: Ollama gets no password; serviceUsesPasswordPlaceholder correctly handles Redis.
wait: false guard: clear, actionable error message.
LIFO defer ordering in provisionItem: LOCK_UN fires before cancel() — correct.
IIFE scoping for lw.Close() in provisioning loop: deferred close fires per provisioning call, not at function exit.
Generic design: provisions_key / provision_cmd / cache_path on ServiceDef are cleanly reusable for any future service needing post-start provisioning.
Test coverage: injection attempts, scalar rejection, unknown key rejection, no-auth behavior, env generation, provision cmd building, wait: false guard conditions — comprehensive.
Docs: startup sequence, readiness table, security model, and env var reference all accurate and in sync with implementation.
memory: 2048 + qwen2.5-coder:1.5b: the model bump back from 0.5b makes sense given the explicit 2GB memory limit — both are consistent.

This is ready to merge.

claude Bot reviewed Mar 17, 2026

View reviewed changes

dpup added 9 commits March 17, 2026 15:45

docs: add Ollama sidecar design spec and implementation plan

24ee955

Adds design spec for Ollama as a service dependency with generalized provisions/cache extensions, and a detailed implementation plan with 10 tasks covering registry, types, config, service manager, and docs.

feat(container): add provision and cache fields to ServiceConfig

c635491

feat(deps): add ollama service and provision/cache fields to ServiceDef

53b27e7

Add CachePath, ProvisionsKey, and ProvisionCmd fields to ServiceDef struct to support services that need host-side caching and post-start provisioning. Add ollama as a service dependency using these new fields.

feat(config): add Extra field to ServiceSpec for provision lists

9a8f23e

feat(run): add service provisioning with flock-based cache locking

b963ed5

docs(examples): add Ollama service dependency example

d10e5cf

test(run): add env generation test for Ollama no-auth service

91425e6

dpup and others added 18 commits March 17, 2026 15:45

docs: add Ollama service documentation and provisions pattern

94bcf31

docs(run): clarify flockContext goroutine lifetime after context canc…

3f50f1f

…ellation

fix(examples): use jq instead of python3 in Ollama demo

3da983c

python3 is not available in debian:bookworm-slim (the base image when no Python runtime is declared). Use jq -r .response instead, which is available as an apt dependency.

feat(run): print status messages while waiting for services and pulli…

a716d17

…ng models

fix(examples): add Content-Type header and surface errors in ollama demo

76605a8

fix(examples): use qwen2.5-coder:0.5b to fit in constrained environments

8363fd5

fix(examples): pipe curl directly to jq, avoid shell variable mangling

bd98956

fix(docs): correct ollama version in examples to 0.18.1, pipe api/tag…

503e79c

…s to jq

dpup force-pushed the feat/ollama-sidecar branch from 229fead to db2d1f1 Compare March 17, 2026 22:45

dpup merged commit 30a03ca into main Mar 17, 2026
4 checks passed

dpup deleted the feat/ollama-sidecar branch March 17, 2026 22:52

	\| python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"
	\| jq -r .response

Conversation

dpup commented Mar 17, 2026

Summary

Test plan

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

Bugs

Files that should not be committed

Minor issues

What's good

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bugs

Minor issues

What looks good

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bugs

Documentation

Minor nit

What's good

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Still present from round 2 (low priority)

New findings

Nit

What's good

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bug

Minor

What's good

Uh oh!

dpup commented Mar 17, 2026

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

The fix (round 5)

Minor observations (non-blocking)

What's good

Uh oh!

dpup commented Mar 17, 2026

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

The fixes (this round)

Minor observations (non-blocking, tracked for awareness)

What's good

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Bug

What's good

Uh oh!

claude Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Fixed: demo.sh used python3 without it in dependencies

Everything else looks good

Uh oh!

claude Bot commented Mar 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

Fixed: `demo.sh` used `python3` without it in dependencies

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited by github-actions Bot

Loading

claude Bot commented Mar 17, 2026 •

edited

Loading