New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clouddriver in Spinnaker 1.6.1 is more fragile #2683
Comments
@wheleph spinnaker/clouddriver#2531 was merged last night which takes a similar approach for Docker registries so it's possible that this could be implemented for Kubernetes. I'm not sure if it would conflict with the proposal in #2604 but I don't think so. |
@ethanfrogers something like that would definitely help but the previous version of clouddriver became health even if some of the accounts were initially invalid and we relied on that fact in our automation. |
Alright, I've found the exact commit that introduces the behavior that breaks our deployment of Spinnaker: spinnaker/clouddriver@87c6921 Before this commit, if you launch clouddriver with a faulty k8s account it reports "UP" at Note that if clouddriver is configured in such a way that the k8s account info points to a non-existent file it will never come up healthy, before or after said commit. On the current master the exception originates here: https://github.com/spinnaker/clouddriver/blob/master/clouddriver-kubernetes/src/main/groovy/com/netflix/spinnaker/clouddriver/kubernetes/v1/security/KubernetesV1Credentials.java#L191 The exception occurs because the health check hits this: https://github.com/spinnaker/clouddriver/blob/a681c7eab5d6945b2fe60ae071577f5bfb5092af/clouddriver-kubernetes/src/main/groovy/com/netflix/spinnaker/clouddriver/kubernetes/health/KubernetesHealthIndicator.groovy#L65 @ethanfrogers the active namespace check that results from your commit makes the very first health check fail, which means that Halyard will never report the clouddriver deploy as successful because it never becomes healthy. Before your commit things at least started out healty so Halyard could complete the deploy.
Given this prerequisite, I guess it would make sense to have clouddriver report "UP" even though some k8s accounts might be (temporarily) broken. Would you agree? |
@mdirkse great find! if i remember correctly, i made this change because we removed the hard dependency on perhaps we should implement something like the above for the V1 provider that will prevent clouddriver from being deployed with invalid credentials but will not kill clouddriver is those credentials are invalidated in the future? of maybe that's missing the point? |
Well my larger question was: when is clouddriver not healthy? If assume the requirement that Clouddriver should be able to handle unavailability of (some) k8s accounts, then it would stand to reason that it's still healthy even though an account may be non-functional. So then I'd also not expect a broken account to stop clouddriver from being deployed (otherwise you could get into a situation where you can't do a Spinnaker upgrade because a network connection to a particular k8s cluster is temporarily down, seems a little strange). Lemme know if I misunderstood your comment about clouddriver health. If not then we could explore ways to not have k8s account health impact clouddriver health. |
@mdirkse you're right, that is where i was going with that. i guess we just need to define the semantics are around clouddriver health vs account health with respect to kubernetes. from my perspective, we have 2 options:
@lwander do you have any thoughts on this? |
Halyard can completely skip validation with |
This issue hasn't been updated in 103 days, so we are tagging it as 'stale'. If you want to remove this label, comment:
|
@spinnakerbot remove-label stale |
I believe this was fixed for 1.8 and greater: spinnaker/clouddriver#2752 |
This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:
|
closing because 1.6 is deprecated. if the problem still exists in newer version of Spinnaker please submit a new issue. |
Cloud Provider
GKE (Kubernetes)
Environment
Spinnaker 1.6.1 running on GKE deployed with halyard 0.49.0
Description
In our Spinnaker setup we use many Kubernetes accounts and sometimes some of them become outdated. This wasn’t a problem with Spinnaker 1.5.4 (Clouddriver 1.0.4-20180110144440) since the Clouddriver health endpoint returned OK and the clouddriver was up and running (even though some accounts had outdated keys):
After upgrade to Spinnaker 1.6.0 (Clouddriver 2.0.0-20180221152902) and/or 1.6.1 (Clouddriver 2.1.0-20180319132609) we noticed that an error even in a single Kubernetes account causes the health endpoint to fail:
Which in turn makes Kubernetes think that Clouddriver is not healthy and Spinnaker becomes not functional.
Additional information
More discussion: https://community.spinnaker.io/t/clouddriver-in-spinnaker-1-6-0-is-more-fragile/232
The text was updated successfully, but these errors were encountered: