feat(deps): add Ollama service dependency with provisioning and cache support#238
Conversation
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 44s —— View job Code Review
SummaryThe overall design is clean and follows the existing postgres/mysql/redis pattern well. Shell injection is correctly addressed with a strict allowlist regex, the cache/provision fields are well-factored as generic Bugs1. Scalar In // config.go:190-197
} else {
// Capture unknown non-sequence keys as single-element entries
// so the run layer can validate and reject them with useful errors.
if s.Extra == nil {
s.Extra = make(map[string][]string)
}
s.Extra[key] = nil // nil signals "key present but not a list"
}The run layer never actually validates this nil signal for a known provisions key. It needs an explicit check: // services.go — after retrieving provisions
provisions = userSpec.Extra[spec.Service.ProvisionsKey]
// BUG: if provisions is nil but key exists, the user wrote a scalar — error here2. Misleading comment in
// Acquire lock per-item so parallel runs pulling different items don't serialize.This comment is incorrect. The lock file is a single file under Files that should not be committed3. 4.
Convert these to GitHub issues (or close them if already addressed) and remove the directory. Fix this → Minor issues5. The context cancel is called explicitly on every exit path, which works, but any future added return path before the for _, cmd := range cmds {
cmdCtx, cancel := context.WithTimeout(ctx, provisionTimeout)
// ... cancel() called explicitly on each exit path
// if a new return is added above cancel(), context leaks
}6. Example The What's good
|
- Validate scalar provision keys (e.g., `models: value` instead of list) with clear "must be a list" error message - Fix misleading flock comment — parallel runs still serialize on shared lock file; extracted provisionItem helper with defer cancel() - Remove compound-engineering.local.md (local tooling, not for repo) - Remove todos/ directory (tracked as GitHub issues instead)
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 7s —— View job Code Review
The previous review issues (scalar value detection, misleading lock comment, Bugs1. Docs security model now inaccurate — **Authenticated** — Every service requires a password, even on the isolated network.This is no longer true with Ollama. Needs a carve-out (e.g., "Services that support authentication require a password...") or just remove the absolute claim. 2. Readiness check table is misleading — | `ollama` | `ollama list` + pull declared models |Model pulling is NOT part of the readiness check — it runs after readiness passes. Minor issues3. Startup lifecycle missing provisioning step — The documented startup sequence goes: For Ollama the actual sequence is 5 → pull models → 6. Since model pulls can take minutes, the missing step is the most user-visible part of the flow and should be documented (perhaps as a conditional step: "5b. Run provisioning commands (e.g., pull models)"). 4. Model pull output bypasses run logs — if err := provisionService(ctx, svcMgr, info, svcConfigs[i], os.Stderr); err != nil {
5. // See todo/005 for narrowing to per-model locks.The What looks good
|
- Fix readiness table: Ollama check is just "ollama list", model pulling is a separate provisioning step - Add provisioning step (step 6) to startup sequence docs - Fix security model docs: not all services require passwords (Ollama has no auth, relies on network isolation) - Remove dead todo/005 reference from provisionItem comment
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 5s —— View job Code Review
Both previous rounds of feedback have been addressed. The implementation is solid. Two remaining issues below; one is a real behavior bug, one is a documentation gap. Bugs1.
if !wait {
continue // skips BOTH readiness check AND the provisioning block below
}If a user declares models under For databases, skipping both readiness and provisioning on Minimum fix: detect the combination and return an error or warning: if !wait {
if svcConfigs[i].ProvisionCmd != "" && len(svcConfigs[i].Provisions) > 0 {
return nil, fmt.Errorf("%s: wait: false is incompatible with declared models — "+
"models cannot be pulled until the service is ready", dep.Name)
}
continue
}Alternatively, document it in the Documentation2.
Minor nit
| head -c 500
What's good
|
- Error on wait: false with provisioned items (models can't be pulled until service is ready — silent failure before this fix) - Add MOAT_OLLAMA_HOST and MOAT_OLLAMA_PORT to reference docs - Remove head -c truncation from demo.sh (was breaking JSON output)
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 4m 18s —— View job Code Review
All three previous rounds of feedback have been addressed. The implementation is solid. A few remaining minor items below. Still present from round 2 (low priority)1. Provision output bypasses run logs — if err := provisionService(ctx, svcMgr, info, svcConfigs[i], os.Stderr); err != nil {Model pull progress writes directly to New findings2. if err := syscall.Flock(int(lockFile.Fd()), syscall.LOCK_EX); err != nil {If two 3. No unit test for The new guard that rejects Nit4. Docker _, _ = stdcopy.StdCopy(stdout, stdout, resp.Reader)
resp.Close()
execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)There's a well-known narrow race in the Docker SDK: after the exec's output stream closes (EOF), Docker may not have committed the exit code to its internal store yet. What's good
This is ready to merge. The remaining items are all low-priority follow-ups, not blockers. |
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 49s —— View job Code Review
All previous round feedback has been addressed. The implementation is solid. One new real bug introduced in this round, plus a minor issue worth noting. Bug1. Docker exec retry loop silently returns success if exec is still var exitCode int
for attempt := 0; attempt < 3; attempt++ {
execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)
if err != nil {
return fmt.Errorf(...)
}
if !execInspect.Running {
exitCode = execInspect.ExitCode
break
}
if attempt < 2 {
time.Sleep(100 * time.Millisecond)
}
}
if exitCode != 0 {
return fmt.Errorf(...)
}
return nilIf Docker still reports the exec as For The fix is to track whether the var exitCode int
complete := false
for attempt := 0; attempt < 3; attempt++ {
execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)
if err != nil {
return fmt.Errorf(...)
}
if !execInspect.Running {
exitCode = execInspect.ExitCode
complete = true
break
}
if attempt < 2 {
time.Sleep(100 * time.Millisecond)
}
}
if !complete {
return fmt.Errorf("exec still running after retries")
}
if exitCode != 0 {
return fmt.Errorf(...)
}Minor2. When What's good
The Docker retry false-success is the only remaining real bug. Once fixed, this is ready to merge. |
|
Fixed the Docker exec retry false-success bug in both The two low-priority items (provision output bypassing run logs, and the goroutine outliving context cancellation on lock contention) are noted — tracking separately rather than blocking merge. |
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 2m 12s —— View job Claude finished @dpup's task —— Code Review
The Docker exec retry fix is correct. All five rounds of previous feedback have been addressed. The implementation is solid and ready to merge. The fix (round 5)
complete := false
for attempt := 0; attempt < 3; attempt++ {
execInspect, err := m.cli.ContainerExecInspect(ctx, execCreateResp.ID)
...
if !execInspect.Running {
exitCode = execInspect.ExitCode
complete = true
break
}
...
}
if !complete {
return fmt.Errorf("... exec still running after retries")
}The Minor observations (non-blocking)
for i, dep := range serviceDeps {
...
if lw, lwErr := store.LogWriter(); lwErr == nil {
provOut = io.MultiWriter(os.Stderr, lw)
defer lw.Close()
}
The What's good
This is ready to merge. |
|
Both low-priority items addressed:
|
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 2m 9s —— View job Code Review
All previous round feedback has been addressed. The implementation is solid. This is ready to merge. The fixes (this round)Docker exec retry false-success (
Provision output capture ( Minor observations (non-blocking, tracked for awareness)
for i, dep := range serviceDeps {
if lw, lwErr := store.LogWriter(); lwErr == nil {
provOut = io.MultiWriter(os.Stderr, lw)
defer lw.Close() // defers to function exit, not loop iteration
}Each provision-capable service in the loop adds another deferred close. Currently only Ollama has provisions so this never fires more than once. Fine to leave as-is, but worth noting if more provision-capable services are added.
What's good
|
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 2s —— View job Code Review
All six prior rounds of feedback are addressed. The two low-priority items from the last round are now fixed:
Bug
| python3 -c "import sys,json; print(json.load(sys.stdin)['response'])"The agent container image is resolved from declared dependencies. With only Fix: add # moat.yaml
dependencies:
- ollama@0.9
- curl
- jq# demo.sh
| jq -r .responseWhat's good
Fix the |
| echo "--- Generating response ---" | ||
| curl -s "$MOAT_OLLAMA_URL/api/generate" \ | ||
| -d '{"model":"qwen2.5-coder:1.5b","prompt":"Write hello world in Go","stream":false}' \ | ||
| | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])" |
There was a problem hiding this comment.
The base image for this example is debian:bookworm-slim (no Python runtime in dependencies:), so python3 won't be available. The demo will fail here.
Fix: add jq to moat.yaml dependencies and use:
| | python3 -c "import sys,json; print(json.load(sys.stdin)['response'])" | |
| | jq -r .response |
Then in moat.yaml:
dependencies:
- ollama@0.9
- curl
- jq
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 2m 7s —— View job Claude finished @dpup's task —— Code Review
The latest commit ( One outstanding issue from the previous round remained unfixed: Fixed:
|
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 2m 26s —— View job Code Review
All previous round feedback has been resolved. One real doc bug and one cosmetic issue remain. Bug
Both docs show: - ollama@0.9But the registry only allows Any user copying the doc examples will hit this error immediately. The working example in Nit
curl -s "$MOAT_OLLAMA_URL/api/tags"
What's good
Fix the version string in docs and this is ready to merge. |
|
Claude finished @dpup's task in 3m 33s —— View job 📦 Previous Review (superseded by newer review)Code Review
All eight previous rounds of feedback have been addressed. The implementation is solid. One doc bug remains unfixed from the previous round, plus a cosmetic nit. Bug
Both docs still show: - ollama@0.9The registry only permits This was flagged in the previous round. Change Nit
curl -s "$MOAT_OLLAMA_URL/api/tags"
What's good
Fix the |
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 9s —— View job Code Review
All previous round feedback (rounds 1–8) has been addressed. One remaining doc bug is still present. Bug
Both files still show the invalid version
I committed a fix locally ( Nit
curl -s "$MOAT_OLLAMA_URL/api/tags"
What's goodEverything from all prior rounds is resolved:
Fix the |
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 14s —— View job Code Review
The implementation is solid and all substantive feedback from prior rounds has been addressed. One doc bug persisted through multiple review iterations — fixed directly. Fixed (committed)
Updated to What's good
This is ready to merge. |
📦 Previous Review (superseded by newer review)Claude finished @dpup's task in 3m 34s —— View job Code Review
All prior review rounds are resolved. Found and fixed one new bug. Bug fixed (committed)
The // Before
known := map[string]bool{"env": true, "image": true, "wait": true}
// After
known := map[string]bool{"env": true, "image": true, "wait": true, "memory": true}
This broke Everything else looks good
Ready to merge after this fix. |
|
The missing |
Adds design spec for Ollama as a service dependency with generalized provisions/cache extensions, and a detailed implementation plan with 10 tasks covering registry, types, config, service manager, and docs.
Add CachePath, ProvisionsKey, and ProvisionCmd fields to ServiceDef struct to support services that need host-side caching and post-start provisioning. Add ollama as a service dependency using these new fields.
… cache mount support Adds ProvisionService method to ServiceManager interface for executing provisioning commands inside service containers (e.g., pulling Ollama models). Also adds CachePath/CacheHostPath mount support to both Docker and Apple service managers so model/data caches can be persisted on the host.
…ceConfig
- Guard password generation: only generate for services that have auth
(password_env set, or extra_cmd/readiness_cmd use {password} placeholder)
- Wire provisions from userSpec.Extra using registry's provisions_key
- Resolve cache host path to ~/.moat/cache/<service-name>
- Validate unknown extra keys: reject keys that don't match provisions_key
P1 fixes: - Validate provision items against regex to prevent shell injection via model names interpolated into sh -c commands - Fix error messages referencing nonexistent --service flag on moat logs P2 fixes: - Tighten cache directory permissions from 0755 to 0700 - Tighten lock file permissions from 0644 to 0600 - Change provision timeout from per-batch to per-item (30min each) - Narrow flock scope to per-item so parallel runs don't serialize P3 fixes: - Consolidate duplicated extra_env loop in buildServiceConfig - Capture unknown scalar keys in ServiceSpec YAML for validation - Add versions list and quote image in ollama registry entry
- Validate scalar provision keys (e.g., `models: value` instead of list) with clear "must be a list" error message - Fix misleading flock comment — parallel runs still serialize on shared lock file; extracted provisionItem helper with defer cancel() - Remove compound-engineering.local.md (local tooling, not for repo) - Remove todos/ directory (tracked as GitHub issues instead)
- Fix readiness table: Ollama check is just "ollama list", model pulling is a separate provisioning step - Add provisioning step (step 6) to startup sequence docs - Fix security model docs: not all services require passwords (Ollama has no auth, relies on network isolation) - Remove dead todo/005 reference from provisionItem comment
- Error on wait: false with provisioned items (models can't be pulled until service is ready — silent failure before this fix) - Add MOAT_OLLAMA_HOST and MOAT_OLLAMA_PORT to reference docs - Remove head -c truncation from demo.sh (was breaking JSON output)
- Context-aware flock: wrap syscall.Flock in a goroutine with select on ctx.Done() so provision lock acquisition respects context cancellation - Provision output to run logs: create RunStore before service provisioning and tee output to both stderr and the run's logs.jsonl via io.MultiWriter - Docker exec inspect race: retry ContainerExecInspect up to 3 times with 100ms backoff when the exec is still marked running after stream EOF - Add test validating wait:false + provisions preconditions
…pect retries Both CheckReady and ProvisionService had a bug where if ContainerExecInspect reported the exec as still Running after all 3 retry attempts, the loop would exit without setting the complete flag, exitCode would stay 0, and the function would return nil — a false success. For ProvisionService this meant ollama pull could be silently treated as complete while the pull was still in progress.
Using defer lw.Close() inside a loop defers all closes until the outer function returns. Wrap the provision block in an IIFE so defer fires immediately after provisionService returns for each service. Also extract the response field from demo.sh JSON output.
Ollama publishes full semver tags (0.18.1, 0.18.0, ...), not short major.minor tags (0.9). The previous versions 0.6-0.9 do not exist on Docker Hub, causing a 404 on image pull. Update default to 0.18.1 and set versions list to recent releases. Update tests and example moat.yaml accordingly.
python3 is not available in debian:bookworm-slim (the base image when no Python runtime is declared). Use jq -r .response instead, which is available as an apt dependency.
Add a 'memory' field to ServiceSpec so users can configure the memory limit for service sidecar containers (e.g., Ollama) independently of the main agent container's container.memory setting. Wire through ServiceConfig.MemoryMB -> SidecarConfig.MemoryMB, applied as Docker HostConfig.Resources.Memory and Apple --memory flag. Update the Ollama example to use memory: 2048 with qwen2.5-coder:1.5b.
The known map used to skip processing of parsed struct fields was missing "memory", causing memory: values to leak into Extra as nil. This caused buildServiceConfig to reject memory: as an unknown key when combined with provision-capable services (e.g. ollama). The example moat.yaml, which uses both memory: 2048 and models: [...], would fail at runtime. Also adds a test that exercises YAML with both memory: and models: to prevent regression.
229fead to
db2d1f1
Compare
|
Claude finished @dpup's task in 3m 49s —— View job Code Review
All ten previous rounds of review are addressed. I went through the current state of every key file looking for anything missed. The implementation is solid. Final findings1. No lock acquired on context cancellation — stale LOCK_UN deferred — low severity, cosmetic In if err := flockContext(cmdCtx, lockFile); err != nil {
return err
}
defer func() { _ = syscall.Flock(int(lockFile.Fd()), syscall.LOCK_UN) }()When 2. The test validates the conditions the 3. When context cancels mid- What's good
This is ready to merge. |
Summary
provisions_key,provision_cmd,cache_path) onServiceDef— reusable by any future serviceollamaindependencies:and lists models underservices.ollama.models; moat starts sidecar, pulls models, caches on host, injectsMOAT_OLLAMA_*env varsTest plan
go test ./internal/run/ ./internal/config/ ./internal/container/ -race)make lint— 0 issues);,|,$,`,&fail validationqwen2.5-coder:7b,nomic-embed-text,library/model:tagmoat run examples/service-ollamastarts sidecar, pulls model, prints responseMOAT_OLLAMA_URL🤖 Generated with Claude Code