Inject run config store into WorkloadFromContainerInfo by JAORMX · Pull Request #4342 · stacklok/toolhive

JAORMX · 2026-03-24T13:22:20Z

Summary

WorkloadFromContainerInfo called loadRunConfigFields which internally created a real state.LocalStore hitting the XDG filesystem on every call. In CI, parallel tests could create or truncate runconfig files, causing intermittent EOF errors from json.Decode — making TestRuntimeStatusManager_GetWorkload flaky.
Add a state.Store parameter to WorkloadFromContainerInfo and loadRunConfigFields so callers inject the store, matching the existing DI pattern in fileStatusManager. Add a runConfigStore field to runtimeStatusManager for parity.
Fix file_status_test.go mock readers to use DoAndReturn for fresh readers on each call, preventing EOF when the store is read more than once per test.

Fixes #4341

Type of change

Bug fix

Test plan

Unit tests (task test)

Changes

File	Change
`pkg/workloads/types/types.go`	Add `state.Store` param to `WorkloadFromContainerInfo` and `loadRunConfigFields`; rewrite `loadRunConfigFields` to use injected store directly
`pkg/workloads/types/workload_test.go`	Pass existing `store` to updated `WorkloadFromContainerInfo`
`pkg/workloads/statuses/status.go`	Add `runConfigStore` field to `runtimeStatusManager`; update `NewStatusManagerFromRuntime` to return error; pass store at 2 call sites
`pkg/workloads/statuses/file_status.go`	Pass `f.runConfigStore` at 6 call sites; name receiver on `mergeHealthyWorkloadData`
`pkg/workloads/statuses/status_test.go`	Inject mock store into `runtimeStatusManager`; set up `Exists` expectations
`pkg/workloads/statuses/file_status_test.go`	Change `Return(mockReader, nil).AnyTimes()` to `DoAndReturn(...)` for fresh readers

Does this introduce a user-facing change?

No

Special notes for reviewers

The loadRunConfigFields rewrite switches from state.LoadRunConfig (which creates a new store internally) to direct store.Exists + store.GetReader calls. The semantics are identical: returns empty config when not found, propagates errors otherwise — but no longer creates a store on every invocation.

Generated with Claude Code

WorkloadFromContainerInfo called loadRunConfigFields which internally created a real state.LocalStore hitting the XDG filesystem on every call. This caused flaky tests when parallel tests created or truncated runconfig files, leading to intermittent EOF errors from json.Decode. Add a state.Store parameter to WorkloadFromContainerInfo and loadRunConfigFields so callers inject the store, matching the existing dependency injection pattern in fileStatusManager. Add a runConfigStore field to runtimeStatusManager for parity. Fix file_status_test.go mock readers to use DoAndReturn for fresh readers on each call, preventing EOF when the store is read more than once per test. Fixes #4341 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-03-24T13:27:57Z

Codecov Report

❌ Patch coverage is 52.63158% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.90%. Comparing base (df1a5cd) to head (ff81f7d).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/workloads/types/types.go	37.50%	6 Missing and 4 partials ⚠️
pkg/workloads/statuses/status.go	61.53%	3 Missing and 2 partials ⚠️
pkg/workloads/statuses/file_status.go	66.66%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4342      +/-   ##
==========================================
- Coverage   69.08%   68.90%   -0.18%     
==========================================
  Files         478      479       +1     
  Lines       48432    48517      +85     
==========================================
- Hits        33457    33431      -26     
- Misses      12314    12324      +10     
- Partials     2661     2762     +101

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Check reader.Close error return and break long function signature line to satisfy errcheck and lll linters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yrobla

The DI approach is architecturally correct and cleanly fixes the flaky test root cause. The DoAndReturn changes are exactly right — sharing a single pre-consumed reader across .AnyTimes() calls was the actual bug.

Two things need addressing before merge (marked blocker); two more are nits that can be tackled in a follow-up.

Blockers

loadRunConfigFields: reader.Close() error is silently discarded — diverges from established project pattern
loadRunConfigFields: TOCTOU between store.Exists and store.GetReader — a concurrent deletion between the two calls returns an error that was previously handled gracefully as "not found"

Nits (follow-up PR OK)
3. store.Exists and store.GetReader errors are returned without workload name context, making debugging harder
4. TestNewStatusManagerFromRuntime hits the real XDG filesystem via NewRunConfigStore, inconsistent with the mock-based pattern used everywhere else in the suite

yrobla · 2026-03-24T14:31:16Z

pkg/workloads/types/types.go

+	if err != nil {
+		return nil, err
+	}
+	defer func() { _ = reader.Close() }()


Blocker: reader.Close() error is silently discarded with _ =, but the established project pattern (see state/runconfig.go:93-95, statuses/file_status.go:99-101) is to log it. A file-backed reader can return flush/OS errors on Close() that are worth observing.

Suggested change

defer func() { _ = reader.Close() }()

defer func() {

if err := reader.Close(); err != nil {

slog.Warn("failed to close run config reader", "workload", name, "error", err)

}

}()

yrobla · 2026-03-24T14:31:16Z

pkg/workloads/types/types.go

-	return runConfig, nil
+	reader, err := store.GetReader(ctx, name)
+	if err != nil {
+		return nil, err


Blocker: TOCTOU between store.Exists (line 27) and store.GetReader here. If a workload is deleted between the two calls (e.g. a concurrent thv delete), GetReader returns an error — but the old code via state.LoadRunConfig handled this case gracefully by returning an empty config.

The PR description says "semantics are identical", but that only holds in the absence of concurrent writes. The fix is to treat a not-found error from GetReader the same as !exists:

Suggested change

return nil, err

if state.IsNotFound(err) {

return &minimalRunConfig{}, nil

}

return nil, err

(adjust the not-found check to match the actual sentinel/error type the store returns)

yrobla · 2026-03-24T14:31:16Z

pkg/workloads/types/types.go

-		if errors.Is(err, werr.ErrRunConfigNotFound) {
-			return &minimalRunConfig{}, nil
-		}
 		return nil, err


Nit (follow-up OK): both store.Exists and store.GetReader errors are returned without wrapping the workload name, making it hard to tell which workload triggered a ListWorkloads failure. Consider:

Suggested change

return nil, err

return nil, fmt.Errorf("failed to check run config for workload %q: %w", name, err)

Same pattern applies to the GetReader error a few lines below.

yrobla · 2026-03-24T14:31:16Z

pkg/workloads/statuses/status_test.go


 	mockRuntime := rtmocks.NewMockRuntime(ctrl)
-	manager := NewStatusManagerFromRuntime(mockRuntime)
+	manager, err := NewStatusManagerFromRuntime(mockRuntime)


Nit (follow-up OK): NewStatusManagerFromRuntime internally calls state.NewRunConfigStore → os.MkdirAll on the real XDG state directory. Every other test in this file injects a stateMocks.NewMockStore. This one is inconsistent and could flake in sandboxed CI environments.

The simplest fix is t.Setenv("XDG_STATE_HOME", t.TempDir()) before the call, or restructure NewStatusManagerFromRuntime to accept the store as a parameter (following the DI pattern this PR establishes).

…elds The Exists+GetReader sequence had a race where a concurrent workload deletion between the two calls would return an error instead of the intended empty config. Remove the Exists call entirely and handle not-found from GetReader directly via httperr status code. Also log reader.Close() errors with slog.Warn to match the established project pattern, and wrap errors with workload name for debuggability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JAORMX requested review from amirejaz and lujunsan as code owners March 24, 2026 13:22

github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 24, 2026

Fix lint errors in loadRunConfigFields and mergeHealthyWorkloadData

7cfac82

Check reader.Close error return and break long function signature line to satisfy errcheck and lll linters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 24, 2026

yrobla requested changes Mar 24, 2026

View reviewed changes

github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inject run config store into WorkloadFromContainerInfo#4342

Inject run config store into WorkloadFromContainerInfo#4342
JAORMX wants to merge 3 commits intomainfrom
fix-flaky-workload-from-container-info

JAORMX commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

yrobla left a comment

Uh oh!

yrobla Mar 24, 2026

Uh oh!

yrobla Mar 24, 2026

Uh oh!

yrobla Mar 24, 2026

Uh oh!

yrobla Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	return nil, err
	return nil, fmt.Errorf("failed to check run config for workload %q: %w", name, err)

Conversation

JAORMX commented Mar 24, 2026

Summary

Type of change

Test plan

Changes

Does this introduce a user-facing change?

Special notes for reviewers

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yrobla left a comment

Choose a reason for hiding this comment

Uh oh!

yrobla Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

yrobla Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

yrobla Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

yrobla Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 24, 2026 •

edited

Loading