Skip to content

fix(registry): surface legacy registry format as a structured API error#5260

Merged
peppescg merged 3 commits into
mainfrom
issues/5259
May 12, 2026
Merged

fix(registry): surface legacy registry format as a structured API error#5260
peppescg merged 3 commits into
mainfrom
issues/5259

Conversation

@peppescg
Copy link
Copy Markdown
Contributor

@peppescg peppescg commented May 12, 2026

Summary

When a configured custom registry URL serves data in the legacy ToolHive format, the API now returns HTTP 502 Bad Gateway with a structured registry_legacy_format code, instead of a generic 500 that hides the actionable migration hint in the log.

Before, NewRemoteRegistryProvider.validateConnectivity() returned a plain fmt.Errorf that the API handler did not recognise (pkg/api/v1/registry.go:259), so the response was HTTP 500 Failed to get registry provider and clients had no way to branch on the legacy-format case. The hint (run thv registry convert --in <path> --in-place) only showed up in main.log.

This PR adds a typed *LegacyFormatError in pkg/registry/errors.go (sister of *UnavailableError) that carries the offending source URL, and wires errors.As detection plus a new writeRegistryLegacyFormatError writer through all four GET handlers and the v0.1 router.

The package-private errLegacyFormat sentinel in pkg/registry/upstream_parser.go is now an instance of *LegacyFormatError; its Is() method preserves errors.Is(err, errLegacyFormat) for existing callers while enabling errors.As extraction of the URL upstream.

Closes #5259

Why HTTP 502 Bad Gateway

Per RFC 9110 §15.6.3: 502 = "while acting as a gateway or proxy, received an invalid response from an inbound server." thv serve is acting as a gateway to the configured upstream registry, and that upstream is returning data we cannot process. Exact fit.

502 also matches what the existing PUT registry handler already does for the same legacy-format detection (pkg/api/v1/registry.go:545, via ErrRegistryValidationFailed). Returning 503 from GET would split the same condition across two status codes inside the same file.

503 ("temporary overload or scheduled maintenance") does not fit a permanently misconfigured registry source.

The registry_unavailable path also handles upstream 404s on a 503 today — arguably also 502 territory — but reclassifying that is out of scope for this PR.

Changes

  • pkg/registry/errors.go — new LegacyFormatError type with URL, Error(), Is().
  • pkg/registry/provider_remote.govalidateConnectivity() returns &LegacyFormatError{URL: p.registryURL} instead of fmt.Errorf.
  • pkg/registry/upstream_parser.goerrLegacyFormat switches to &LegacyFormatError{} (same Error() output, now typed for errors.As).
  • pkg/api/v1/registry.go — new RegistryLegacyFormatCode + writeRegistryLegacyFormatError returning 502; new errors.As(err, &legacyErr) branch in getCurrentProvider, listRegistries, getRegistry, listServers.
  • pkg/api/v1/registry_v01.go — same branch in getRegistryProvider.
  • pkg/registry/errors_test.go — covers Error() with/without URL, Is matching across populated/empty URLs and via Unwrap, and As extraction from a wrapped error.
  • pkg/api/v1/registry_test.go — end-to-end test: configure a remote-URL registry pointing at an httptest server that serves legacy JSON, hit each GET handler, assert 502 + code: registry_legacy_format + migration hint in the message + source URL in the message.

Test plan

  • go test ./pkg/registry/ -run "TestLegacyFormatError" -count=1 — new error type tests pass
  • go test ./pkg/api/v1/ -run "TestRegistryAPI_GetEndpoint_LegacyFormat" -count=1 — all three GET handlers return 502 with the new code
  • go test ./pkg/registry/ -run "TestParseRegistryData_LegacyFormatDetection|TestLocalRegistryProvider_LegacyFileReturnsMigrationHint|TestRemoteRegistryProvider_ValidateConnectivity" -count=1 — existing legacy-detection coverage still green
  • go vet ./pkg/registry/... ./pkg/api/v1/... clean
  • gofmt -l clean

Four pre-existing failures on main (TestGetDefaultProvider_NoFactoryRegistered, TestGetDefaultProvider_FactoryReturnsNil_FallsThrough, TestRegistryV01Router_GetServer_NotFound, TestRegistryV01Router_ListSkills_PaginationBeyondResults) reproduce on git stash of this branch — unrelated to this change.

🤖 Generated with Claude Code

Before this change, when a configured custom registry URL served data in
the legacy ToolHive format, NewRemoteRegistryProvider returned a plain
fmt.Errorf that the API handler did not recognise. The API responded with
a generic HTTP 500 "Failed to get registry provider", and the actionable
migration hint was visible only in main.log. Desktop clients had no way
to distinguish this case from an internal error.

Add a typed *LegacyFormatError (sister of *UnavailableError) carrying the
offending source URL. Wire it through the four registry GET handlers and
the v0.1 router so the response is now HTTP 503 with a structured
"registry_legacy_format" code and the migration hint in the message body.

The upstream_parser sentinel errLegacyFormat is now an instance of
*LegacyFormatError; its Is() method preserves errors.Is(err, errLegacyFormat)
for existing callers while enabling errors.As extraction of the URL.

Fixes #5259

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the size/S Small PR: 100-299 lines changed label May 12, 2026
@peppescg peppescg self-assigned this May 12, 2026
503 was inconsistent with the existing PUT registry handler that already
returns 502 for the same legacy-format detection via ErrRegistryValidationFailed
(pkg/api/v1/registry.go updateRegistry handler). 502 is also the correct
RFC 9110 §15.6.3 code: thv serve acts as a gateway to the upstream registry
and the upstream returned a response we cannot process.

503 is reserved for "temporary overload or scheduled maintenance" which does
not match a misconfigured registry source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels May 12, 2026
@peppescg peppescg requested a review from Copilot May 12, 2026 10:06
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.04%. Comparing base (030594a) to head (4ba3e90).

Files with missing lines Patch % Lines
pkg/api/v1/registry.go 44.44% 15 Missing ⚠️
pkg/api/v1/registry_v01.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5260      +/-   ##
==========================================
+ Coverage   68.02%   68.04%   +0.02%     
==========================================
  Files         616      616              
  Lines       63005    63044      +39     
==========================================
+ Hits        42857    42900      +43     
+ Misses      16945    16940       -5     
- Partials     3203     3204       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves API ergonomics when a configured custom registry URL serves legacy ToolHive registry JSON by returning a structured, machine-detectable error (code: "registry_legacy_format") with HTTP 502 Bad Gateway, instead of a generic 500 that only logs the migration hint.

Changes:

  • Introduces a typed *registry.LegacyFormatError (with Is() support) and uses it for legacy-format detection in both parsing and remote connectivity validation.
  • Adds API-layer errors.As handling and a dedicated JSON error writer to consistently return 502 + structured error code across registry GET paths (including v0.1 router).
  • Adds/extends unit and API tests to validate Is/As behavior and the 502 structured response.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/registry/upstream_parser.go Switches legacy-format sentinel to *LegacyFormatError for errors.As support.
pkg/registry/provider_remote.go Returns *LegacyFormatError{URL: ...} from connectivity validation on legacy data.
pkg/registry/errors.go Adds LegacyFormatError type (sibling to UnavailableError).
pkg/registry/errors_test.go Adds tests for LegacyFormatError Error(), Is, and As.
pkg/api/v1/registry.go Adds registry_legacy_format code + 502 writer and wires errors.As handling into GET handlers.
pkg/api/v1/registry_v01.go Adds legacy-format error handling to v0.1 provider resolution.
pkg/api/v1/registry_test.go Adds end-to-end coverage asserting 502 + structured legacy error code/message for GET endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/registry/errors.go Outdated
Comment thread pkg/registry/errors_test.go Outdated
Two doc comments missed the 502 update from the previous commit. The code
itself already returns 502 Bad Gateway via writeRegistryLegacyFormatError;
only the comments were inconsistent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels May 12, 2026
@peppescg
Copy link
Copy Markdown
Contributor Author

@claude succinct review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 12, 2026

Claude finished @peppescg's task in 1m 33s —— View job


Succinct review

Overall: clean, well-scoped, and the 502 rationale in the PR body is solid. Two minor suggestions, plus a coverage note.

Status of prior feedback

  • Copilot's two stale-503-doc comments are already addressed in 4ba3e90errors.go:37 and errors_test.go:136 now say 502.

🟢 Strengths

  • LegacyFormatError mirrors UnavailableError cleanly (sibling type, same handler shape). Easy to extend later.
  • The Is() method correctly preserves errors.Is(err, errLegacyFormat) for the package-private sentinel callers while enabling errors.As extraction of the URL — the comment on upstream_parser.go:22-25 makes the intent obvious.
  • HTTP 502 matches the existing PUT handler's behavior for the same condition (pkg/api/v1/registry.go:545), so the same legacy-format detection no longer splits across two status codes in one file.
  • Tests cover Error() with/without URL, Is symmetry, As extraction across an fmt.Errorf wrap, and an end-to-end HTTP assertion per GET handler.

🟡 Suggestions (non-blocking)

  1. Duplication. The errors.As(&legacyErr) / errors.As(&unavailableErr) / fallback 500 block is now copy-pasted in five places (getCurrentProvider, listRegistries, getRegistry, listServers, getRegistryProvider). Worth extracting:

    // handleProviderError writes a structured response for known registry
    // failure modes and returns true if the error was handled.
    func handleProviderError(w http.ResponseWriter, err error) bool {
        if isRegistryAuthError(err) { writeRegistryAuthRequiredError(w); return true }
        var legacyErr *regpkg.LegacyFormatError
        if errors.As(err, &legacyErr) { writeRegistryLegacyFormatError(w, legacyErr); return true }
        var unavailableErr *regpkg.UnavailableError
        if errors.As(err, &unavailableErr) { writeRegistryUnavailableError(w, unavailableErr); return true }
        return false
    }

    Each call site then collapses to ~4 lines. Reduces drift risk when a new error code is added (e.g. the 404→502 reclassification you note as out of scope).

  2. Post-GetRegistry() legacy branches are unreachable from the test. getCurrentProvider already short-circuits on *LegacyFormatError, so the duplicated errors.As branches in listRegistries:408, getRegistry:495, listServers:788 are defensive but never hit in the new test. That's why Codecov flags pkg/api/v1/registry.go at 44% patch coverage. Two options:

    • Accept it as defensive (the provider singleton is cached, so a refresh that newly detects legacy data could in principle hit these paths).
    • Or remove the duplicate post-GetRegistry branches since validateConnectivity is the only producer of *LegacyFormatError today.

    I'd lean toward option 1 + the helper from fix(typo): corrects readme #1, since it keeps the code symmetrical with the existing UnavailableError handling.

  3. Log level. slog.Error for "upstream registry in legacy format" — this is user misconfiguration, not a system error. slog.Warn would fit better, and matches .claude/rules/go-style.md ("WARN for non-fatal issues … fallback behavior"). Minor nit; the unavailable path uses Error too, so leaving it is also defensible for consistency.

Nits

  • errors.go:53 — the URL == "" branch returns only legacyhint.MigrationMessage with no prefix. Functional but reads slightly oddly when surfaced as Error(). Not worth changing.

LGTM with the above. The helper extraction in suggestion #1 would be the highest-value follow-up; either ship in this PR or as a quick sequel.
· branch issues/5259

@peppescg peppescg merged commit 6ec592d into main May 12, 2026
73 of 74 checks passed
@peppescg peppescg deleted the issues/5259 branch May 12, 2026 11:21
rdimitrov added a commit that referenced this pull request May 12, 2026
The same `errors.As` ladder for `*LegacyFormatError`,
`*UnavailableError`, plus the `auth.ErrRegistryAuthRequired` check was
copy-pasted across `getCurrentProvider`, `listRegistries`,
`getRegistry`, `listServers`, and `getRegistryProvider`. Each new
error code that needs structured handling had to be wired into five
places, and the existing v0.1 router did the same `errors.Is` check
inline instead of going through `isRegistryAuthError`.

Extract a single `writeProviderError(w, err) bool` helper that
handles all three known failure modes and returns whether a response
was written. Each callsite collapses to two lines and a generic
fallback. Drop the now-unused `errors` and `registry/auth` imports
from `registry_v01.go`.

Also lower the legacy-format log to `slog.Warn`: it signals a user
misconfiguration that the API has already translated into a 502 with
an actionable hint, not a server-side failure that on-call should
investigate. The upstream-unavailable path stays at `slog.Error`
since it usually does indicate a real outage.

Improves on #5260
rdimitrov added a commit that referenced this pull request May 12, 2026
The same `errors.As` ladder for `*LegacyFormatError`,
`*UnavailableError`, plus the `auth.ErrRegistryAuthRequired` check was
copy-pasted across `getCurrentProvider`, `listRegistries`,
`getRegistry`, `listServers`, and `getRegistryProvider`. Each new
error code that needs structured handling had to be wired into five
places, and the existing v0.1 router did the same `errors.Is` check
inline instead of going through `isRegistryAuthError`.

Extract a single `writeProviderError(w, err) bool` helper that
handles all three known failure modes and returns whether a response
was written. Each callsite collapses to two lines and a generic
fallback. Drop the now-unused `errors` and `registry/auth` imports
from `registry_v01.go`.

Also lower the legacy-format log to `slog.Warn`: it signals a user
misconfiguration that the API has already translated into a 502 with
an actionable hint, not a server-side failure that on-call should
investigate. The upstream-unavailable path stays at `slog.Error`
since it usually does indicate a real outage.

Improves on #5260
peppescg added a commit to stacklok/toolhive-studio that referenced this pull request May 12, 2026
* feat(registry): surface legacy registry format with targeted UI

The backend (stacklok/toolhive#5260) now returns a structured
HTTP 502 Bad Gateway with `code: "registry_legacy_format"` when the
configured custom registry serves data in the deprecated ToolHive format.
Without renderer-side handling the response was treated as a generic
fetch error and the user fell into the catch-all GenericError screen.

This change mirrors the existing `registry_auth_required` and
`registry_unavailable` plumbing:

- Add `isRegistryLegacyFormatError` detector plus the user-facing
  `REGISTRY_LEGACY_FORMAT_UI_MESSAGE` and toast copy (no raw CLI hint —
  desktop users should not see `thv registry convert`).
- Extend `getRegistryErrorToastMessage` so the root `beforeLoad` guard
  redirects to `/settings?tab=registry` with a persistent toast instead
  of letting the error reach `RootErrorComponent`.
- Add a third empty-state branch in `RegistryError` for users who
  navigate directly to `/registry` while the source is misconfigured.

Closes #2228

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(registry): drop CLI/MCP jargon from legacy-format UI copy

REGISTRY_LEGACY_FORMAT_UI_MESSAGE referred to the "upstream MCP format",
which is a CLI/protocol term that desktop users would not recognise.
Rephrase in plain language: the message now says to update the registry
source or reset to default, without naming the destination format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface legacy registry format as a structured API error

3 participants