Skip to content

fix: validate muster CRDs via discovery API instead of namespaced list#574

Merged
teemow merged 1 commit intomainfrom
fix-validatecrds-namespace
Apr 20, 2026
Merged

fix: validate muster CRDs via discovery API instead of namespaced list#574
teemow merged 1 commit intomainfrom
fix-validatecrds-namespace

Conversation

@teemow
Copy link
Copy Markdown
Member

@teemow teemow commented Apr 20, 2026

Summary

The Kubernetes-backed muster client validates the presence of muster CRDs at startup. The previous probe listed MCPServer resources in the hard-coded default namespace and treated any error as "CRD not available". When muster runs with namespace-scoped RBAC (a Role limited to its own namespace), this probe returns Forbidden, the constructor wraps it as "MCPServer CRD not available", and NewMusterClient silently falls back to filesystem mode.

Symptom in the wild (BWI demo cluster): configured MCPServer CRs were never auto-started and the aggregator only exposed the 11 built-in meta-tools. Logs:

KubernetesDetector  Started watching Kubernetes resources in namespace: bwi-backstage
Orchestrator        Found 0 MCPServer definitions for auto-start processing
MCPServerReconciler Deleting MCPServer service: mcp-kubernetes
Aggregator          Received tool update event ... tools=11

The KubernetesDetector informer kept emitting reconcile events for the existing CR, but the orchestrator's MCPServerManager was filesystem-backed and returned NotFound, which the reconciler classified as a delete -- so the service was never started.

Fix

Switch the probe in validateCRDs to the discovery API (ServerResourcesForGroupVersion). Discovery checks that the muster API group is served and exposes the MCPServer kind without requiring list/get permissions on the muster CRDs in any specific namespace. This works for both cluster-scoped and namespace-scoped RBAC.

  • internal/client/kubernetes_client.go
    • Add discovery.DiscoveryInterface to kubernetesClient.
    • Construct a DiscoveryClient from the rest config in NewKubernetesClient.
    • Rewrite validateCRDs to use ServerResourcesForGroupVersion(musterv1alpha1.GroupVersion) and look for the MCPServer kind.

Test plan

  • go build ./...
  • go vet ./..., gofmt, goimports
  • go test ./...
  • Verified on the affected cluster: roll the muster pod after merging into the chart and confirm the mcp-kubernetes MCPServer is auto-started and tools are surfaced through the aggregator.

Made with Cursor

The previous CRD validation probe in the Kubernetes-backed muster client
listed `MCPServer` resources in the hard-coded `default` namespace. When
muster runs with namespace-scoped RBAC -- a `Role` limited to its own
namespace, as is typical for multi-tenant Helm deployments -- the probe
returned `Forbidden`, the constructor wrapped it as "MCPServer CRD not
available", and `NewMusterClient` silently fell back to filesystem mode.

Symptom from the wild: configured `MCPServer` CRs were never auto-started
and the aggregator only exposed the 11 built-in meta-tools. Logs showed:

    Found 0 MCPServer definitions for auto-start processing
    Deleting MCPServer service: <name>

(The `KubernetesDetector` informer kept emitting reconcile events, but
the orchestrator's `MCPServerManager` was filesystem-backed and returned
NotFound, which the reconciler classified as a delete.)

Switch the probe to the discovery API, which checks that the muster API
group is served and exposes the `MCPServer` kind. Discovery does not
require namespaced list/get permissions on the muster CRDs, so this works
for both cluster-scoped and namespace-scoped RBAC.

Signed-off-by: Timo Derstappen <teemow@gmail.com>
Made-with: Cursor
@teemow teemow requested a review from a team as a code owner April 20, 2026 13:32
// permissions on the muster CRDs in any specific namespace, which is
// important when muster runs with namespace-scoped RBAC (e.g. a Role limited
// to its own namespace).
func (k *kubernetesClient) validateCRDs(ctx context.Context) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is unused.

Is there any reason we are not doing the same validation for the other CRDs like worklow?

@teemow teemow merged commit 1a22158 into main Apr 20, 2026
8 of 9 checks passed
@teemow teemow deleted the fix-validatecrds-namespace branch April 20, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants