Fix MCPRegistry probe port and bump registry image to v1.3.0#5020
Open
Fix MCPRegistry probe port and bump registry image to v1.3.0#5020
Conversation
Two bugs shipped together in operator v0.23.1 when deploying an MCPRegistry with default settings: 1. Liveness and readiness probes target port 8080, but toolhive-registry-server v1.1.0+ serves /health and /readiness on its internal listener at :8081. The main API on :8080 stops answering those paths, so probes fail and the pod enters a restart loop. 2. The registry image pinned in the Helm chart is v1.1.1, which requires source.format to be non-empty. An MCPRegistry that omits format fails to sync with "registry data validation failed: unsupported format:". Both have a shared root cause: the chart's default registry image tag had not been bumped while toolhive-registry-server evolved. v1.3.0 dropped the format field entirely and retained the internal probe port, so bumping the default covers the second symptom and matches the probe fix. Introduce RegistryAPIHealthPort = 8081 and point the Liveness and Readiness probes at it. Keep the container's published ContainerPort on RegistryAPIPort; the health port is pod-local and does not need a Service entry. Harden TestBuildRegistryAPIContainer with explicit probe-port assertions to guard against a regression. Bump registryAPI.image to v1.3.0 in deploy/charts/operator/values.yaml and regenerate the chart README. Strip format: lines from the six examples/operator/mcp-registries/mcpregistry-configyaml-*.yaml files to match v1.3.0 behavior, and convert the ConfigMap example's inline registry data from the removed toolhive JSON format to the upstream MCP registry format so the example stays functional against the new default image. Drop the Format plumbing from cmd/thv-operator/test-integration/mcp-registry/registry_helpers.go and remove the now-stale format: toolhive assertions from registry_lifecycle_test.go. Fixes #5012
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5020 +/- ##
==========================================
- Coverage 69.11% 69.06% -0.05%
==========================================
Files 554 554
Lines 73176 73176
==========================================
- Hits 50577 50541 -36
- Misses 19590 19622 +32
- Partials 3009 3013 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as outdated.
This comment was marked as outdated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two bugs reported in #5012 that both surface when deploying an
MCPRegistrythrough the operator with defaults:/healthand/readinesson its internal listener (:8081) since v1.1.0. The probe port was never updated, so every registry pod enters a restart loop.source.format: upstreamstill required: the operator pinsthv-registry-api:v1.1.1in its chart, where an emptyformatfield producesregistry data validation failed: unsupported format:. toolhive-registry-server v1.3.0 dropped the field entirely, but the pinned image never advanced.Both share a root cause — the chart's default registry image tag was three minor releases behind the registry server. Bumping to v1.3.0 covers the format issue and aligns with the probe-port fix (v1.3.0 still serves probes on 8081).
What changed:
RegistryAPIHealthPort = 8081; point bothLivenessProbeandReadinessProbeat it. The container's publishedContainerPortstays on 8080 (the API) — probes are pod-local and don't need aServiceentry.TestBuildRegistryAPIContainerwith explicit probe-port assertions to guard against regression.registryAPI.imagefromv1.1.1tov1.3.0indeploy/charts/operator/values.yaml; regenerate the chart README viahelm-docs.format:lines from all six examplemcpregistry-configyaml-*.yamlmanifests. Convert the ConfigMap example's embedded registry data from the removed toolhive JSON format to the upstream MCP registry format so it stays functional against v1.3.0.Formatplumbing fromcmd/thv-operator/test-integration/mcp-registry/registry_helpers.go(WithUpstreamFormat,CreateUpstreamFormatRegistry,format: %semission) and remove the now-staleformat: toolhiveassertions fromregistry_lifecycle_test.go.Fixes #5012
Type of change
Test plan
task test)task lint-fix) — 0 issuestask operator-test) — pass, including the new probe-port assertionsManual verification recommended before merge:
helm template deploy/charts/operator | grep -A2 "Probe:"→ probes on port 8081, imagev1.3.0.examples/operator/mcp-registries/mcpregistry-configyaml-minimal.yaml, confirm the registry-api pod reachesRunning/Readyand logs show a successful sync with nounsupported formaterror.API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.No CRD schema changes.
formatwas never a CRD field — it lived inside the free-formConfigYAMLstring, which the operator does not parse.Changes
cmd/thv-operator/pkg/registryapi/types.goRegistryAPIHealthPort = 8081cmd/thv-operator/pkg/registryapi/podtemplatespec.goRegistryAPIHealthPortcmd/thv-operator/pkg/registryapi/podtemplatespec_test.goRegistryAPIHealthPortdeploy/charts/operator/values.yamlregistryAPI.imagetov1.3.0deploy/charts/operator/README.mdhelm-docsexamples/operator/mcp-registries/mcpregistry-configyaml-*.yaml(6 files)format:lines; rewrite ConfigMap example's inline JSON to upstream formatcmd/thv-operator/test-integration/mcp-registry/registry_helpers.goFormatfield,WithUpstreamFormat,CreateUpstreamFormatRegistry,format:emissioncmd/thv-operator/test-integration/mcp-registry/registry_lifecycle_test.goformat: toolhiveassertionsDoes this introduce a user-facing change?
Yes. Users installing the operator chart at this version will get a registry-api pod that actually passes probes, and writing an
MCPRegistrywithoutsource.formatis now the expected shape. Users who pinnedregistryAPI.imageto their own older tag in their Helm values are unaffected but will still see the original symptoms against those older images.Implementation plan
Approved implementation plan
Root cause
The operator's Helm chart pins
ghcr.io/stacklok/thv-registry-api:v1.1.1— three minor releases behind currentv1.3.0. Registry server evolution the operator never tracked::8081for/healthand/readiness. The main API on:8080stopped answering these paths. The operator's hardcoded probe port of 8080 has been wrong since v1.1.0.formatfield entirely. In v1.1.1,ValidateData()switches onformatand falls through tounsupported format:when empty.Strategy
Ship as one PR — both fixes address the same user-visible symptom (MCPRegistry deployment breaks), and releasing the image bump alone would leave probes broken. No changes needed in
toolhive-registry-server.Changes
types.goaddsRegistryAPIHealthPort = 8081;podtemplatespec.goswitches both probes; tests hardened.values.yaml→v1.3.0; chart README regenerated.format:from all sixconfigyaml-*.yaml; rewrite ConfigMap example's inline JSON to upstream format (v1.3.0 rejects the legacy toolhive JSON).Formatplumbing now that the server ignores it.Verification
task lint-fix→ 0 issues.task operator-test→ pass.Follow-ups (not in this PR)
--internal-address=:8081to the registry-api container Args as belt-and-suspenders; requires small refactor of the Args composition split betweenBuildRegistryAPIContainerandWithRegistryServerConfigMount.Special notes for reviewers
format:removed) because v1.3.0 rejects the legacy toolhive schema entirely. The new JSON follows the upstream registry schema shape (version/meta.last_updated/data.servers[]) and places tags under_meta.io.modelcontextprotocol.registry/publisher-provided.<publisher>.<image>.tagswhere the registry server'sExtractTagslooks for them.filter.tags.include: ["production"]) still makes sense given the rewritten data.🤖 Generated with Claude Code