Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce global resources checks to install and multi-stage install #2987

Merged
merged 12 commits into from
Jun 27, 2019

Conversation

ihcsim
Copy link
Contributor

@ihcsim ihcsim commented Jun 24, 2019

This PR introduces the following two new checks to pkg/healthcheck/healthcheck.go:

LinkerdPreInstallGlobalResourcesChecks which checks for the existence of Linkerd global resources when user runs linkerd check --pre. The check fails if any of the Linkerd cluster roles, cluster role bindings, mutating webhook configuration, validating webhook configuration, custom resource definition or pod security policy exist. This error will point to new troubleshooting documentation informing the user that multi control plane setup isn't supported.

LinkerdConfigNotExistsChecks which checks if the linkerd-config config map exists. The check fails if the config map already exists. This error will point to new troubleshooting documentation informing the user can't install a control plane in the specified namespace, unless overridden with the --ignore-cluster option.

The checks use the linkerd.io/control-plane-ns label to select the resources. See #2971.

The following are updates to install and multi-stage install:

Previously, linkerd install checks for existence of the linkerd-config config map to determine if the control plane installation is permitted. This is now changed to check for the existence of global resources, to ensure user can't install the control plane into a different namespace.

Previously, linkerd install control-plane checks for the existence of the specified control plane namespace to determine if linkerd install config has already been run. This is now changed to check for the existence of global resources that are installed by linkerd install config.

Previously, the linkerd install code calls the Kubernetes API directly. This is now changed to delegate the API interaction to pkg/healthcheck/healthcheck.go.

Test cases can be found here.

Open questions:

  • The usage of the linkerd.io/control-plane-ns label won't work if the existing control plane is of an older version, since the older global resources aren't labelled. Do we need to support this use case?

Fixes #2954.

Ivan Sim added 4 commits June 21, 2019 23:16
Signed-off-by: Ivan Sim <ivan@buoyant.io>
'linkerd-config' config map.

This ensures that multiple control planes can't be installed into
different namespaces.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Signed-off-by: Ivan Sim <ivan@buoyant.io>
@ihcsim ihcsim added the priority/P0 Release Blocker label Jun 24, 2019
@ihcsim ihcsim requested review from siggy and alpeb June 24, 2019 16:33
@ihcsim ihcsim self-assigned this Jun 24, 2019
@ihcsim ihcsim added this to In progress in 2.4 - Release via automation Jun 24, 2019
@ihcsim ihcsim force-pushed the isim/disable-multi-control-plane branch from cf5018f to d17ef92 Compare June 24, 2019 16:43
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Ivan Sim added 2 commits June 24, 2019 10:59
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Signed-off-by: Ivan Sim <ivan@buoyant.io>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 24, 2019

Integration test results for cad4452: fail 😕
Log output: https://gist.github.com/a8d9d268023adc813e59406ef843aff4

Signed-off-by: Ivan Sim <ivan@buoyant.io>
cli/cmd/install.go Outdated Show resolved Hide resolved
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 24, 2019

Integration test results for 6567deb: fail 😕
Log output: https://gist.github.com/6b84d383620ab82c6cbc29519e83478f

@ihcsim ihcsim force-pushed the isim/disable-multi-control-plane branch 5 times, most recently from dfe93c3 to 301f252 Compare June 24, 2019 23:04
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 24, 2019

Integration test results for 301f252: fail 😕
Log output: https://gist.github.com/8e8686b2e50705ad14dcdb0ddf2457a9

@ihcsim ihcsim force-pushed the isim/disable-multi-control-plane branch from 301f252 to eab2888 Compare June 24, 2019 23:43
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 25, 2019

Integration test results for eab2888: fail 😕
Log output: https://gist.github.com/42c125fcd8a4a85742c80b046ea11b63

Signed-off-by: Ivan Sim <ivan@buoyant.io>
@ihcsim ihcsim force-pushed the isim/disable-multi-control-plane branch from eab2888 to 913db94 Compare June 25, 2019 00:39
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 25, 2019

Integration test results for 913db94: success 🎉
Log output: https://gist.github.com/9d0912245e82bd7b75956f6cb1f070f8

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works! had some comments around interaction between linkerd install and the healthcheckers.

also, we may want to revisit our error messaging, and how we check for existing installations. my (admittedly non-standard) workflow usually involves kubectl delete ns linkerd prior to testing a new build. that now leaves me in this state:

$ bin/linkerd install
Linkerd has already been installed on your cluster in the linkerd namespace. Please run upgrade if you'd like to update this installation. Otherwise, use the --ignore-cluster flag.

$ bin/linkerd upgrade
× Failed to build upgrade configuration: could not fetch configs from kubernetes: configmaps "linkerd-config" not found
For troubleshooting help, visit: https://linkerd.io/upgrade/#troubleshooting

consider embellishing the check during install, to determine if Linkerd is actually installed, or just the global resources:

# linkerd namespace and configmap/linkerd-config not found, but global resources exist

$ bin/linkerd install
Global resources from an existing Linkerd installation detected:
- customresourcedefinitions/serviceprofiles.linkerd.io
- mutatingwebhookconfigurations/linkerd-proxy-injector-webhook-config
- podsecuritypolicies/linkerd-linkerd-control-plane
- validatingwebhookconfigurations/linkerd-sp-validator-webhook-config
Remove these resources, or use the --ignore-cluster flag to overwrite them.

i think this also begs the question: what's our uninstallation story? linkerd install --ignore cluster | kubectl delete -f - ?

thoughts @grampelberg ?

pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
cli/cmd/install.go Outdated Show resolved Hide resolved
@grampelberg
Copy link
Contributor

@siggy I think it is time for an uninstall command that is consumed by kubectl delete. We can add that to the warn text and to the website.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great! 👏
While testing install, it was indeed clear the necessity to see the full list of offending resources.

#2954 mentions an additional check for the webhook config's service namespace; will that be tackled in a separate PR?

func (hc *HealthChecker) checkClusterRoles(shouldExist bool) error {
options := metav1.ListOptions{
LabelSelector: k8s.ControllerNSLabel,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth moving this into a global var given it's used multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I prefer to keep this close to where it's used. As this is a big file, I'm reluctant to make this variable (which is used only by a handful of methods) a global, and am willing to live with a bit of redundancy.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
@ihcsim
Copy link
Contributor Author

ihcsim commented Jun 25, 2019

@alpeb Appreciate the feedback!

While testing install, it was indeed clear the necessity to see the full list of offending resources.

👍 I will surface those information.

#2954 mentions an additional check for the webhook config's service namespace; will that be tackled in a separate PR?

Probably not going to do it, as it requires the user running linkerd check to have cluster-scoped permissions, which will defeat the whole purpose of multi-stage install.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
@ihcsim ihcsim force-pushed the isim/disable-multi-control-plane branch from 58e4ab3 to 081ce1c Compare June 26, 2019 05:38
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 26, 2019

Integration test results for 081ce1c: fail 😕
Log output: https://gist.github.com/380622b78aedaf87f5fb9aabd4a6518a

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is working great, and these new resource-aware check outputs are super helpful. most of my comments were around formatting of output and error messages.

cli/cmd/install.go Outdated Show resolved Hide resolved
pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
cli/cmd/install.go Outdated Show resolved Hide resolved
cli/cmd/check.go Outdated Show resolved Hide resolved
pkg/healthcheck/healthcheck.go Outdated Show resolved Hide resolved
Signed-off-by: Ivan Sim <ivan@buoyant.io>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Jun 26, 2019

Integration test results for 5c488b0: success 🎉
Log output: https://gist.github.com/e6f9c5baa30aeb187e4907c2ce320efb

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work! 🚢 👍

@ihcsim ihcsim merged commit 866fe6f into master Jun 27, 2019
2.4 - Release automation moved this from In progress to Done Jun 27, 2019
@ihcsim ihcsim deleted the isim/disable-multi-control-plane branch June 27, 2019 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P0 Release Blocker
Projects
No open projects
2.4 - Release
  
Done
Development

Successfully merging this pull request may close these issues.

Disable installation of multiple control planes in 2.4
6 participants