Skip to content

Fix MCPRegistry probe port and bump registry image to v1.3.0#5020

Open
rdimitrov wants to merge 1 commit intomainfrom
fix/mcpregistry-probe-port-and-image-bump
Open

Fix MCPRegistry probe port and bump registry image to v1.3.0#5020
rdimitrov wants to merge 1 commit intomainfrom
fix/mcpregistry-probe-port-and-image-bump

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Summary

Fixes two bugs reported in #5012 that both surface when deploying an MCPRegistry through the operator with defaults:

  • Failing liveness/readiness probes: the operator's probes target port 8080, but toolhive-registry-server has served /health and /readiness on its internal listener (:8081) since v1.1.0. The probe port was never updated, so every registry pod enters a restart loop.
  • source.format: upstream still required: the operator pins thv-registry-api:v1.1.1 in its chart, where an empty format field produces registry data validation failed: unsupported format:. toolhive-registry-server v1.3.0 dropped the field entirely, but the pinned image never advanced.

Both share a root cause — the chart's default registry image tag was three minor releases behind the registry server. Bumping to v1.3.0 covers the format issue and aligns with the probe-port fix (v1.3.0 still serves probes on 8081).

What changed:

  • Introduce RegistryAPIHealthPort = 8081; point both LivenessProbe and ReadinessProbe at it. The container's published ContainerPort stays on 8080 (the API) — probes are pod-local and don't need a Service entry.
  • Harden TestBuildRegistryAPIContainer with explicit probe-port assertions to guard against regression.
  • Bump registryAPI.image from v1.1.1 to v1.3.0 in deploy/charts/operator/values.yaml; regenerate the chart README via helm-docs.
  • Strip format: lines from all six example mcpregistry-configyaml-*.yaml manifests. Convert the ConfigMap example's embedded registry data from the removed toolhive JSON format to the upstream MCP registry format so it stays functional against v1.3.0.
  • Drop the stale Format plumbing from cmd/thv-operator/test-integration/mcp-registry/registry_helpers.go (WithUpstreamFormat, CreateUpstreamFormatRegistry, format: %s emission) and remove the now-stale format: toolhive assertions from registry_lifecycle_test.go.

Fixes #5012

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix) — 0 issues
  • Operator tests (task operator-test) — pass, including the new probe-port assertions

Manual verification recommended before merge:

  • Render the chart: helm template deploy/charts/operator | grep -A2 "Probe:" → probes on port 8081, image v1.3.0.
  • Kind e2e: deploy the operator, apply examples/operator/mcp-registries/mcpregistry-configyaml-minimal.yaml, confirm the registry-api pod reaches Running/Ready and logs show a successful sync with no unsupported format error.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

No CRD schema changes. format was never a CRD field — it lived inside the free-form ConfigYAML string, which the operator does not parse.

Changes

File Change
cmd/thv-operator/pkg/registryapi/types.go Add RegistryAPIHealthPort = 8081
cmd/thv-operator/pkg/registryapi/podtemplatespec.go Point both probes at RegistryAPIHealthPort
cmd/thv-operator/pkg/registryapi/podtemplatespec_test.go Assert probe ports equal RegistryAPIHealthPort
deploy/charts/operator/values.yaml Bump registryAPI.image to v1.3.0
deploy/charts/operator/README.md Regenerated via helm-docs
examples/operator/mcp-registries/mcpregistry-configyaml-*.yaml (6 files) Remove format: lines; rewrite ConfigMap example's inline JSON to upstream format
cmd/thv-operator/test-integration/mcp-registry/registry_helpers.go Drop Format field, WithUpstreamFormat, CreateUpstreamFormatRegistry, format: emission
cmd/thv-operator/test-integration/mcp-registry/registry_lifecycle_test.go Remove stale format: toolhive assertions

Does this introduce a user-facing change?

Yes. Users installing the operator chart at this version will get a registry-api pod that actually passes probes, and writing an MCPRegistry without source.format is now the expected shape. Users who pinned registryAPI.image to their own older tag in their Helm values are unaffected but will still see the original symptoms against those older images.

Implementation plan

Approved implementation plan

Root cause

The operator's Helm chart pins ghcr.io/stacklok/thv-registry-api:v1.1.1 — three minor releases behind current v1.3.0. Registry server evolution the operator never tracked:

  • v1.1.0 (PR toolhive-registry-server#701) introduced a separate internal HTTP listener on :8081 for /health and /readiness. The main API on :8080 stopped answering these paths. The operator's hardcoded probe port of 8080 has been wrong since v1.1.0.
  • v1.3.0 (PR toolhive-registry-server#724) dropped the format field entirely. In v1.1.1, ValidateData() switches on format and falls through to unsupported format: when empty.

Strategy

Ship as one PR — both fixes address the same user-visible symptom (MCPRegistry deployment breaks), and releasing the image bump alone would leave probes broken. No changes needed in toolhive-registry-server.

Changes

  1. Probe porttypes.go adds RegistryAPIHealthPort = 8081; podtemplatespec.go switches both probes; tests hardened.
  2. Default registry imagevalues.yamlv1.3.0; chart README regenerated.
  3. Examples — strip format: from all six configyaml-*.yaml; rewrite ConfigMap example's inline JSON to upstream format (v1.3.0 rejects the legacy toolhive JSON).
  4. Test-integration helpers — drop Format plumbing now that the server ignores it.

Verification

  • task lint-fix → 0 issues.
  • task operator-test → pass.
  • Kind e2e walkthrough documented in test plan.

Follow-ups (not in this PR)

  • Add Chainsaw coverage for MCPRegistry pod readiness (its absence is why this shipped).
  • Consider appending --internal-address=:8081 to the registry-api container Args as belt-and-suspenders; requires small refactor of the Args composition split between BuildRegistryAPIContainer and WithRegistryServerConfigMount.

Special notes for reviewers

  • The ConfigMap example's embedded JSON had to be rewritten (not just format: removed) because v1.3.0 rejects the legacy toolhive schema entirely. The new JSON follows the upstream registry schema shape (version / meta.last_updated / data.servers[]) and places tags under _meta.io.modelcontextprotocol.registry/publisher-provided.<publisher>.<image>.tags where the registry server's ExtractTags looks for them.
  • Sanity-check: verify that the tag-filter assertion in the ConfigMap example (filter.tags.include: ["production"]) still makes sense given the rewritten data.
  • A follow-up issue for Chainsaw e2e coverage of MCPRegistry probe readiness is worth filing — the absence of that coverage is why this bug shipped.

🤖 Generated with Claude Code

Two bugs shipped together in operator v0.23.1 when deploying an
MCPRegistry with default settings:

1. Liveness and readiness probes target port 8080, but
   toolhive-registry-server v1.1.0+ serves /health and /readiness on
   its internal listener at :8081. The main API on :8080 stops
   answering those paths, so probes fail and the pod enters a restart
   loop.

2. The registry image pinned in the Helm chart is v1.1.1, which
   requires source.format to be non-empty. An MCPRegistry that omits
   format fails to sync with
   "registry data validation failed: unsupported format:".

Both have a shared root cause: the chart's default registry image tag
had not been bumped while toolhive-registry-server evolved. v1.3.0
dropped the format field entirely and retained the internal probe
port, so bumping the default covers the second symptom and matches
the probe fix.

Introduce RegistryAPIHealthPort = 8081 and point the Liveness and
Readiness probes at it. Keep the container's published ContainerPort
on RegistryAPIPort; the health port is pod-local and does not need a
Service entry. Harden TestBuildRegistryAPIContainer with explicit
probe-port assertions to guard against a regression.

Bump registryAPI.image to v1.3.0 in deploy/charts/operator/values.yaml
and regenerate the chart README. Strip format: lines from the six
examples/operator/mcp-registries/mcpregistry-configyaml-*.yaml files
to match v1.3.0 behavior, and convert the ConfigMap example's inline
registry data from the removed toolhive JSON format to the upstream
MCP registry format so the example stays functional against the new
default image.

Drop the Format plumbing from
cmd/thv-operator/test-integration/mcp-registry/registry_helpers.go
and remove the now-stale format: toolhive assertions from
registry_lifecycle_test.go.

Fixes #5012
@github-actions github-actions Bot added the size/S Small PR: 100-299 lines changed label Apr 22, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.06%. Comparing base (bc5b9a3) to head (dfc4dac).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5020      +/-   ##
==========================================
- Coverage   69.11%   69.06%   -0.05%     
==========================================
  Files         554      554              
  Lines       73176    73176              
==========================================
- Hits        50577    50541      -36     
- Misses      19590    19622      +32     
- Partials     3009     3013       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rdimitrov

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Toolhive Operator] MCPRegistry: Issues after upgrading to v0.23.1

1 participant