Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move spire-controller-manager to a separate pod #341

Open
drewwells opened this issue Apr 30, 2024 · 1 comment
Open

move spire-controller-manager to a separate pod #341

drewwells opened this issue Apr 30, 2024 · 1 comment

Comments

@drewwells
Copy link
Contributor

drewwells commented Apr 30, 2024

For background (skip if you know this), ingress and k8s services only send traffic to pods marked ready. If any container in the pod is not marked ready, no traffic will be sent to the pod. This is to handle zero downtime rotations of pods in replicasets.

The spire-server and spire-controller-manager have different roles in spire. spire-server is responsible for API and serving requests. If it's down, especially in the statefulset deployment, spire eventually stops working entirely. However, spire-controller-manager is responsible for managing CRs in the cluster. If it's down, the impact is more nuanced.

Since these two containers are stuck in the same pod, when either of them are down, spire backend workload API is down. This will eventually take down spire-agents ability to service requests. The controller needs to be moved to a separate pod so its outages do not impact spire itself. I'm facing a problem where a federated endpoint lost SSL cert. Controller Manager is restarting causing outages to spire (not just federation problems).

2024-04-30T12:59:33Z    ERROR   setup   problem running manager {"error": "failed to wait for clusterspiffeid caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ClusterSPIFFEID"}
main.run
        /workspace/main.go:347
main.main
        /workspace/main.go:82
runtime.main
        /usr/local/go/src/runtime/proc.go:250
2024-04-30T12:59:33Z    DEBUG   events  spire-server-0_84b56424-9ab4-495f-bf58-c2efca64d303 stopped leading     {"type": "Normal", "object": {"kind":"Lease","namespace":"spire-server","name":"8aa27f40.spiffe.io","uid":"5deab3b1-5f1d-4855-8cc5-f15bcdcbbee0","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1272111663"}, "reason": "LeaderElection"}
2024-04-30T12:59:33Z    ERROR   error received after stop sequence was engaged  {"error": "leader election lost"}
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.1/pkg/manager/internal.go:555
@faisal-memon
Copy link
Collaborator

Created an issue on the controller manger to see if there is interest in supporting this deployment mode. spiffe/spire-controller-manager#363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants