Skip to content

Fix GetDefaultProvider bypassing registered config factory#4740

Merged
reyortiz3 merged 6 commits intomainfrom
fix/registry-factory-bypass-config-provider
Apr 10, 2026
Merged

Fix GetDefaultProvider bypassing registered config factory#4740
reyortiz3 merged 6 commits intomainfrom
fix/registry-factory-bypass-config-provider

Conversation

@reyortiz3
Copy link
Copy Markdown
Contributor

Summary

GetDefaultProvider() in pkg/registry/factory.go called config.NewDefaultProvider(), which bypasses the RegisterProviderFactory hook entirely and always reads the local XDG config file. This means any enterprise build that registers a factory (e.g. to supply an EnterpriseProvider that fetches config from a remote server) was silently ignored — thv registry commands always showed the embedded registry instead of the configured one.

  • Change the call from config.NewDefaultProvider() to config.NewProvider(), which checks the registered factory first and falls back to the default only when no factory is registered
  • Add unit tests covering: factory respected, factory returns nil falls through, caching semantics (sync.Once), and reset behaviour

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)

The regression test TestGetDefaultProvider_RespectsRegisteredFactory registers a factory pointing to a custom registry with a sentinel server name and asserts that server appears in ListServers() — this test fails on the old code and passes with the fix.

Does this introduce a user-facing change?

No change in open-source builds. In enterprise builds that register a ProviderFactory, thv registry list and related commands now correctly use the factory-backed config (e.g. a remote registry URL from the enterprise config server) instead of falling back to the embedded registry.

Generated with Claude Code

config.NewDefaultProvider() skips the RegisterProviderFactory hook and
always reads the local XDG config file. Switch to config.NewProvider(),
which checks the registered factory first and falls back to the default
only when no factory is registered.

Without this fix, enterprise builds that register a factory to supply an
EnterpriseProvider (e.g. fetching config from a remote config server)
were silently ignored, causing thv registry commands to use the embedded
registry instead of the configured one.

Add unit tests covering the factory-respected and fall-through cases,
caching semantics, and reset behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the size/S Small PR: 100-299 lines changed label Apr 10, 2026
peppescg
peppescg previously approved these changes Apr 10, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.85%. Comparing base (443d2fa) to head (0fc5546).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/registry/factory.go 75.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4740      +/-   ##
==========================================
+ Coverage   68.78%   68.85%   +0.07%     
==========================================
  Files         516      516              
  Lines       54307    54313       +6     
==========================================
+ Hits        37353    37399      +46     
+ Misses      14100    14054      -46     
- Partials     2854     2860       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JAORMX
JAORMX previously approved these changes Apr 10, 2026
Comment thread pkg/registry/factory.go
Both NewRegistryRoutes and NewRegistryRoutesForServe called
config.NewDefaultProvider(), which skips the RegisterProviderFactory
hook and always reads the local XDG config file. Switch to
config.NewProvider() so enterprise builds that register a factory
(e.g. to supply a remote config server) are correctly honoured.

Add unit tests covering the regression case and the no-factory fallback
for both constructors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@reyortiz3 reyortiz3 dismissed stale reviews from JAORMX and peppescg via 60a2ccb April 10, 2026 20:32
@reyortiz3 reyortiz3 requested a review from amirejaz as a code owner April 10, 2026 20:32
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/S Small PR: 100-299 lines changed labels Apr 10, 2026
Go evaluates function arguments eagerly, so the previous implementation
called config.NewProvider() (and thus any registered ProviderFactory)
on every invocation of GetDefaultProvider, even after sync.Once had
already fired and the resulting provider was discarded.

Move the call inside the Do closure so the factory is invoked at most
once. GetDefaultProvider now owns its sync.Once block directly rather
than delegating to GetDefaultProviderWithConfig.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 10, 2026
@reyortiz3
Copy link
Copy Markdown
Contributor Author

@claude can you review the comments are addressed?

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 10, 2026

Claude finished @reyortiz3's task in 1m 21s —— View job


Review: are the prior comments addressed?

  • Read current state of pkg/registry/factory.go
  • Check Bug 1: eager factory invocation (config.NewProvider inside sync.Once)
  • Check Bug 2: data race between GetDefaultProviderWithConfig and ResetDefaultProvider
  • Post findings

Bug 1 — Eager factory invocation ✅ Fixed

Commit 28a2927 moves config.NewProvider() inside the defaultProviderOnce.Do closure (factory.go:87). The factory is now called at most once, not on every call to GetDefaultProvider(). This fully addresses the original concern.

Bug 2 — Data race between GetDefaultProviderWithConfig / GetDefaultProvider and ResetDefaultProvider ❌ Not fixed

The race is still present. Here's the current state:

ResetDefaultProvider (lines 119–127) writes to defaultProviderOnce under defaultProviderMu:

func ResetDefaultProvider() {
    defaultProviderMu.Lock()
    defer defaultProviderMu.Unlock()
    defaultProviderOnce = sync.Once{}  // struct-level write, guarded
    ...
}

GetDefaultProviderWithConfig (lines 101–112) reads defaultProviderOnce without the mutex:

func GetDefaultProviderWithConfig(...) (Provider, error) {
    defaultProviderOnce.Do(func() { ... })  // struct-level read, NOT guarded
    ...
}

GetDefaultProvider (lines 85–95) has the same problem:

func GetDefaultProvider() (Provider, error) {
    defaultProviderOnce.Do(func() { ... })  // also NOT guarded by mutex
    ...
}

sync.Once.Do serialises concurrent calls to Do() against each other, but it does not protect against a concurrent struct assignment (defaultProviderOnce = sync.Once{}) to the variable that holds the sync.Once. The struct assignment in ResetDefaultProvider is not atomic — it overwrites all fields including the internal done uint32. If ResetDefaultProvider and either getter run concurrently, the Go race detector would flag this.

The code comment at lines 24–26 still says:

The mutex is NOT needed for GetDefaultProviderWithConfig since sync.Once already provides thread-safety for initialization.

This reasoning is incorrect and should be updated as part of the fix.

Suggested fix — acquire defaultProviderMu in both getter functions before touching defaultProviderOnce:

func GetDefaultProvider() (Provider, error) {
    defaultProviderMu.Lock()
    defer defaultProviderMu.Unlock()
    defaultProviderOnce.Do(func() {
        cfg, err := config.NewProvider().LoadOrCreateConfig()
        if err != nil {
            defaultProviderErr = err
            return
        }
        defaultProvider, defaultProviderErr = NewRegistryProvider(cfg)
    })
    return defaultProvider, defaultProviderErr
}

func GetDefaultProviderWithConfig(configProvider config.Provider, opts ...ProviderOption) (Provider, error) {
    defaultProviderMu.Lock()
    defer defaultProviderMu.Unlock()
    defaultProviderOnce.Do(func() {
        cfg, err := configProvider.LoadOrCreateConfig()
        if err != nil {
            defaultProviderErr = err
            return
        }
        defaultProvider, defaultProviderErr = NewRegistryProvider(cfg, opts...)
    })
    return defaultProvider, defaultProviderErr
}

Note: sync.Once.Do is safe to call with a mutex already held — the Do closure runs inline in the goroutine that wins the race, while other callers block inside Do. Since ResetDefaultProvider also holds defaultProviderMu for its full duration, this ensures struct-level reads and writes to defaultProviderOnce are always serialised.

Fix this →


Summary: Bug 1 is fully addressed. Bug 2 (data race) remains open.

…Config

ResetDefaultProvider assigned defaultProviderOnce = sync.Once{} (a
non-atomic struct write) while GetDefaultProviderWithConfig called
defaultProviderOnce.Do(), which does an atomic load of the internal
done field. Mixed-width concurrent access = data race.

Replace the three separate package-level variables (sync.Once,
Provider, error) with a single providerState struct stored behind an
atomic.Pointer. ResetDefaultProvider now atomically swaps in a fresh
struct instead of writing to one that may be in use. Goroutines that
loaded a pointer before a reset keep a stable reference and complete
safely; goroutines that load after the swap initialise against the new
state. The mutex is no longer needed and is removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 10, 2026
@reyortiz3
Copy link
Copy Markdown
Contributor Author

@claude can you review if the comments are finally addressed?

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 10, 2026

Claude finished @reyortiz3's task in 58s —— View job


Review: Are the prior comments addressed?

  • Read current state of pkg/registry/factory.go
  • Check Bug 1: eager factory invocation (config.NewProvider inside sync.Once)
  • Check Bug 2: data race between GetDefaultProvider/GetDefaultProviderWithConfig and ResetDefaultProvider
  • Post findings

Bug 1 — Eager factory invocation ✅ Fixed

config.NewProvider() is now called inside the sync.Once.Do closure (factory.go:96), so the factory is invoked at most once regardless of how many times GetDefaultProvider() is called.

Bug 2 — Data race ✅ Fixed (with a better approach than suggested)

Rather than the mutex-before-Do approach suggested in the prior review, the code now uses a lock-free atomic.Pointer[providerState] design:

type providerState struct {
    once     sync.Once
    provider Provider
    err      error
}

var currentProviderState atomic.Pointer[providerState]
  • ResetDefaultProvider() atomically swaps in a fresh &providerState{} (line 129)
  • GetDefaultProvider() and GetDefaultProviderWithConfig() atomically load the pointer before calling Do (lines 94, 111)

This eliminates the race cleanly: any goroutine that already holds a reference to the old providerState finishes against that state safely, while goroutines that load after the swap initialise against the new one. No mutex contention between readers, and no struct-level write vs. atomic-read conflict.

This is a better solution than the mutex approach — it's lock-free and avoids holding a mutex across potentially slow initialization (config file I/O).

Tests ✅

The five tests cover the required scenarios:

  • TestGetDefaultProvider_NoFactoryRegistered — fallback to embedded registry
  • TestGetDefaultProvider_RespectsRegisteredFactory — regression test (factory respected)
  • TestGetDefaultProvider_FactoryReturnsNil_FallsThrough — nil factory falls through
  • TestGetDefaultProvider_CachesResult — caching semantics
  • TestResetDefaultProvider_AllowsReinit — reset creates a fresh instance

The resetGlobalState helper correctly uses t.Cleanup for isolation.


Summary: Both previously identified issues are fully addressed. The implementation is clean, correct, and the approach is sound. Ready to merge.

These tests mutate global singletons (config.registeredFactory and
currentProviderState) so they cannot run in parallel. Suppress the
paralleltest linter warning with an explanatory comment on each test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 10, 2026
gci requires nolint directives to appear immediately before the func
declaration, after the doc comment block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 10, 2026
@reyortiz3 reyortiz3 merged commit aa68b58 into main Apr 10, 2026
40 checks passed
@reyortiz3 reyortiz3 deleted the fix/registry-factory-bypass-config-provider branch April 10, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants