Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webhook with cert-manager recipe #2100

Closed
wants to merge 5 commits into from

Conversation

anyasabo
Copy link
Contributor

@anyasabo anyasabo commented Nov 1, 2019

Related to #1769

  • Enables the webhook server if it's toggled in the config. It is not on by default because it looks for certificates in a specific location, so if they're not there it will be sad.
  • Adds docs on how to install the webhook with cert-manager
  • Updates the hasMaster validator to use a more specific error message

The update to use cert-manager v0.11 was only merged a week ago in kubebuilder so a lot of the docs there still need some work. I also added comments in the yaml of some of the weird things I noticed in Kubebuilder about how it expects webhooks to function.

The other unfortunate part is that kubebuilder docs very much expect you to be working with their deployments + kustomize configuration, which we have not migrated to yet.

To test this you may need to run make operator-image and then update the image field with the newly built image. Once this is in master I think we can update the example to a snapshot build.

@anyasabo
Copy link
Contributor Author

anyasabo commented Nov 1, 2019

Added some upstream docs PRs that should hopefully make things a little easier for people in the future:
kubernetes-sigs/controller-tools#353
kubernetes-sigs/kubebuilder#1147

@pebrc pebrc added the >enhancement Enhancement of existing functionality label Nov 8, 2019
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, will try it out on my installation as well.

namespace: elastic-system
spec:
selfSigned: {}
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why you put document end markers ("...") between the documents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got in the habit of it, I can take it out if there's a reason I'm missing

@@ -122,6 +120,11 @@ func init() {
certificates.DefaultRotateBefore,
"Duration representing how long before expiration TLS certificates should be reissued",
)
Cmd.Flags().Bool(
EnableWebhooksFlag,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat duplicates what we used to express with the webhook role. I am fine with moving away from the operator roles concept but I think we should not have both the role and the explicit flag at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that is an oversight on my part, I forgot about the roles. I'll change this, we'll just want to change the default to not-webhook now I think. And change it back when we have automatic webhook configuration

// os.Exit(1)
// }
if viper.GetBool(EnableWebhooksFlag) {
if err = (&estype.Elasticsearch{}).SetupWebhookWithManager(mgr); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only validates Elasticsearch resources but we used to have validations on (license) Secrets as well. Just flicking through the docs I think it should be possible to re-introduce them, but it might be worth revisiting the decision to validate secrets because it can potentially have a big impact on the stability of the k8s installation. Maybe worth re-introducing a license CRD?

[id="{p}-webhook"]
== Validating webhook

A validating webhook provides additional validation of Elasticsearch resources beyond what can be specified in OpenAPI specs. This allows advanced validation to help prevent Elasticsearch resources that are invalid and will not function correctly from being admitted by the Kubernetes API server, which can aid troubleshooting. To enable the validating webhook, you must first install cert-manager v0.11+ as described in their instructions link:from being admitted by the Kubernetes API server[here].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to cert-manager doc is missing, probably a copy/paste mistake

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review, still need to do some additional tests

[id="{p}-webhook"]
== Validating webhook

A validating webhook provides additional validation of Elasticsearch resources beyond what can be specified in OpenAPI specs. This allows advanced validation to help prevent Elasticsearch resources that are invalid and will not function correctly from being admitted by the Kubernetes API server, which can aid troubleshooting. To enable the validating webhook, you must first install cert-manager v0.11+ as described in their instructions link:from being admitted by the Kubernetes API server[here].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link at the end of the sentence if not set


[source,yaml]
----
cat $$<<$$EOF | kubectl apply -f -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$$<<$$EOF is rendered "as is" with github, not sure why

path: /validate-elasticsearch-k8s-elastic-co-v1beta1-elasticsearch
failurePolicy: Ignore
name: elastic-es-validation.k8s.elastic.co
timeoutSeconds: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this error when applying the ValidationWebhookConfiguration on GKE v1.13.12-gke.8 :
error: error validating "STDIN": error validating data: ValidationError(ValidatingWebhookConfiguration.webhooks[0]): unknown field "timeoutSeconds" in io.k8s.api.admissionregistration.v1beta1.Webhook; if you choose to ignore these errors, turn validation off with --validate=false

TBH I don't understand why... 😕

It's fine on GKE 1.14, so unless someone else understand the error I have on GKE 1.13 I would remove timeoutSeconds and keep the default (which is 30 seconds)

Copy link
Collaborator

@pebrc pebrc Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field simply did not exist prior to 1.14 see kubernetes/api@c33fb2b and kubernetes/kubernetes#74562

@@ -0,0 +1,168 @@
[id="{p}-webhook"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not add this file in our current documentation ? Not sure I would look into the recipes directory if I was looking for information about how to configure the webhook with the cert manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with moving it there. I think I had it here with the idea that this would be a rarely used option once we have automatic configuration enabled


=== Troubleshooting

You can change the `failurePolicy` of the webhook configuration to `Fail`, which will cause creations to error out if there is an error contacting the webhook.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why we have a default failurePolicy set to Ignore, I have some mixed feelings about this as I have been able to inject a incorrect manifest while testing certificate renewal: certificate was not up to date in the operator, got this message in ECK: http: TLS handshake error from 10.64.1.1:38142: remote error: tls: bad certificate but the resource was created anyway.

As a side note this part is not specific to the cert-manager so I guess we should eventually include it in a more generic documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have Fail but that caused a lot of issues with users having trouble with the webhooks. We decided because of these issues to loosen the policy to strike a balance between giving a better experience through immediate validation and not blocking users (especially because we were also validating k8s secrets in < 1.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants