Inject run config store into WorkloadFromContainerInfo#4342
Inject run config store into WorkloadFromContainerInfo#4342
Conversation
WorkloadFromContainerInfo called loadRunConfigFields which internally created a real state.LocalStore hitting the XDG filesystem on every call. This caused flaky tests when parallel tests created or truncated runconfig files, leading to intermittent EOF errors from json.Decode. Add a state.Store parameter to WorkloadFromContainerInfo and loadRunConfigFields so callers inject the store, matching the existing dependency injection pattern in fileStatusManager. Add a runConfigStore field to runtimeStatusManager for parity. Fix file_status_test.go mock readers to use DoAndReturn for fresh readers on each call, preventing EOF when the store is read more than once per test. Fixes #4341 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4342 +/- ##
==========================================
- Coverage 69.08% 68.90% -0.18%
==========================================
Files 478 479 +1
Lines 48432 48517 +85
==========================================
- Hits 33457 33431 -26
- Misses 12314 12324 +10
- Partials 2661 2762 +101 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Check reader.Close error return and break long function signature line to satisfy errcheck and lll linters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
The DI approach is architecturally correct and cleanly fixes the flaky test root cause. The DoAndReturn changes are exactly right — sharing a single pre-consumed reader across .AnyTimes() calls was the actual bug.
Two things need addressing before merge (marked blocker); two more are nits that can be tackled in a follow-up.
Blockers
loadRunConfigFields:reader.Close()error is silently discarded — diverges from established project patternloadRunConfigFields: TOCTOU betweenstore.Existsandstore.GetReader— a concurrent deletion between the two calls returns an error that was previously handled gracefully as "not found"
Nits (follow-up PR OK)
3. store.Exists and store.GetReader errors are returned without workload name context, making debugging harder
4. TestNewStatusManagerFromRuntime hits the real XDG filesystem via NewRunConfigStore, inconsistent with the mock-based pattern used everywhere else in the suite
pkg/workloads/types/types.go
Outdated
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer func() { _ = reader.Close() }() |
There was a problem hiding this comment.
Blocker: reader.Close() error is silently discarded with _ =, but the established project pattern (see state/runconfig.go:93-95, statuses/file_status.go:99-101) is to log it. A file-backed reader can return flush/OS errors on Close() that are worth observing.
| defer func() { _ = reader.Close() }() | |
| defer func() { | |
| if err := reader.Close(); err != nil { | |
| slog.Warn("failed to close run config reader", "workload", name, "error", err) | |
| } | |
| }() |
pkg/workloads/types/types.go
Outdated
| return runConfig, nil | ||
| reader, err := store.GetReader(ctx, name) | ||
| if err != nil { | ||
| return nil, err |
There was a problem hiding this comment.
Blocker: TOCTOU between store.Exists (line 27) and store.GetReader here. If a workload is deleted between the two calls (e.g. a concurrent thv delete), GetReader returns an error — but the old code via state.LoadRunConfig handled this case gracefully by returning an empty config.
The PR description says "semantics are identical", but that only holds in the absence of concurrent writes. The fix is to treat a not-found error from GetReader the same as !exists:
| return nil, err | |
| if state.IsNotFound(err) { | |
| return &minimalRunConfig{}, nil | |
| } | |
| return nil, err |
(adjust the not-found check to match the actual sentinel/error type the store returns)
pkg/workloads/types/types.go
Outdated
| if errors.Is(err, werr.ErrRunConfigNotFound) { | ||
| return &minimalRunConfig{}, nil | ||
| } | ||
| return nil, err |
There was a problem hiding this comment.
Nit (follow-up OK): both store.Exists and store.GetReader errors are returned without wrapping the workload name, making it hard to tell which workload triggered a ListWorkloads failure. Consider:
| return nil, err | |
| return nil, fmt.Errorf("failed to check run config for workload %q: %w", name, err) |
Same pattern applies to the GetReader error a few lines below.
|
|
||
| mockRuntime := rtmocks.NewMockRuntime(ctrl) | ||
| manager := NewStatusManagerFromRuntime(mockRuntime) | ||
| manager, err := NewStatusManagerFromRuntime(mockRuntime) |
There was a problem hiding this comment.
Nit (follow-up OK): NewStatusManagerFromRuntime internally calls state.NewRunConfigStore → os.MkdirAll on the real XDG state directory. Every other test in this file injects a stateMocks.NewMockStore. This one is inconsistent and could flake in sandboxed CI environments.
The simplest fix is t.Setenv("XDG_STATE_HOME", t.TempDir()) before the call, or restructure NewStatusManagerFromRuntime to accept the store as a parameter (following the DI pattern this PR establishes).
…elds The Exists+GetReader sequence had a race where a concurrent workload deletion between the two calls would return an error instead of the intended empty config. Remove the Exists call entirely and handle not-found from GetReader directly via httperr status code. Also log reader.Close() errors with slog.Warn to match the established project pattern, and wrap errors with workload name for debuggability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
WorkloadFromContainerInfocalledloadRunConfigFieldswhich internally created a realstate.LocalStorehitting the XDG filesystem on every call. In CI, parallel tests could create or truncate runconfig files, causing intermittentEOFerrors fromjson.Decode— makingTestRuntimeStatusManager_GetWorkloadflaky.state.Storeparameter toWorkloadFromContainerInfoandloadRunConfigFieldsso callers inject the store, matching the existing DI pattern infileStatusManager. Add arunConfigStorefield toruntimeStatusManagerfor parity.file_status_test.gomock readers to useDoAndReturnfor fresh readers on each call, preventing EOF when the store is read more than once per test.Fixes #4341
Type of change
Test plan
task test)Changes
pkg/workloads/types/types.gostate.Storeparam toWorkloadFromContainerInfoandloadRunConfigFields; rewriteloadRunConfigFieldsto use injected store directlypkg/workloads/types/workload_test.gostoreto updatedWorkloadFromContainerInfopkg/workloads/statuses/status.gorunConfigStorefield toruntimeStatusManager; updateNewStatusManagerFromRuntimeto return error; pass store at 2 call sitespkg/workloads/statuses/file_status.gof.runConfigStoreat 6 call sites; name receiver onmergeHealthyWorkloadDatapkg/workloads/statuses/status_test.goruntimeStatusManager; set upExistsexpectationspkg/workloads/statuses/file_status_test.goReturn(mockReader, nil).AnyTimes()toDoAndReturn(...)for fresh readersDoes this introduce a user-facing change?
No
Special notes for reviewers
The
loadRunConfigFieldsrewrite switches fromstate.LoadRunConfig(which creates a new store internally) to directstore.Exists+store.GetReadercalls. The semantics are identical: returns empty config when not found, propagates errors otherwise — but no longer creates a store on every invocation.Generated with Claude Code