move spire-controller-manager to a separate pod #341

drewwells · 2024-04-30T13:37:40Z

For background (skip if you know this), ingress and k8s services only send traffic to pods marked ready. If any container in the pod is not marked ready, no traffic will be sent to the pod. This is to handle zero downtime rotations of pods in replicasets.

The spire-server and spire-controller-manager have different roles in spire. spire-server is responsible for API and serving requests. If it's down, especially in the statefulset deployment, spire eventually stops working entirely. However, spire-controller-manager is responsible for managing CRs in the cluster. If it's down, the impact is more nuanced.

Since these two containers are stuck in the same pod, when either of them are down, spire backend workload API is down. This will eventually take down spire-agents ability to service requests. The controller needs to be moved to a separate pod so its outages do not impact spire itself. I'm facing a problem where a federated endpoint lost SSL cert. Controller Manager is restarting causing outages to spire (not just federation problems).

2024-04-30T12:59:33Z    ERROR   setup   problem running manager {"error": "failed to wait for clusterspiffeid caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ClusterSPIFFEID"}
main.run
        /workspace/main.go:347
main.main
        /workspace/main.go:82
runtime.main
        /usr/local/go/src/runtime/proc.go:250
2024-04-30T12:59:33Z    DEBUG   events  spire-server-0_84b56424-9ab4-495f-bf58-c2efca64d303 stopped leading     {"type": "Normal", "object": {"kind":"Lease","namespace":"spire-server","name":"8aa27f40.spiffe.io","uid":"5deab3b1-5f1d-4855-8cc5-f15bcdcbbee0","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1272111663"}, "reason": "LeaderElection"}
2024-04-30T12:59:33Z    ERROR   error received after stop sequence was engaged  {"error": "leader election lost"}
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/manager/internal.go:555

The text was updated successfully, but these errors were encountered:

faisal-memon · 2024-04-30T16:05:22Z

Created an issue on the controller manger to see if there is interest in supporting this deployment mode. spiffe/spire-controller-manager#363

faisal-memon mentioned this issue Apr 30, 2024

Support running as a separate Pod spiffe/spire-controller-manager#363

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move spire-controller-manager to a separate pod #341

move spire-controller-manager to a separate pod #341

drewwells commented Apr 30, 2024 •

edited

faisal-memon commented Apr 30, 2024

move spire-controller-manager to a separate pod #341

move spire-controller-manager to a separate pod #341

Comments

drewwells commented Apr 30, 2024 • edited

faisal-memon commented Apr 30, 2024

drewwells commented Apr 30, 2024 •

edited