Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Admission Webhooks to set default values and validate to Nexus CR #174

Open
ricardozanini opened this issue Oct 9, 2020 · 13 comments · May be fixed by #195
Open

Use Admission Webhooks to set default values and validate to Nexus CR #174

ricardozanini opened this issue Oct 9, 2020 · 13 comments · May be fixed by #195
Assignees
Labels
enhancement 👑 New feature or request
Milestone

Comments

@ricardozanini
Copy link
Member

We might need to get rid of the validation package or call it via Admission Webhooks after #140

See: https://sdk.operatorframework.io/docs/building-operators/golang/webhooks/

@LCaparelli
Copy link
Member

This has some unforeseen complications. The current state of implementation reworks a lot of our design in order to move the admission logic into the v1alpha1 package (see this issue).

In addition to that, mutating and validating webhooks require TLS termination, which in turn requires us to serve a TLS certificate. Kubebuilder's approach suggests the use of cert-manager for injecting the CA bundle on the webhook configuration resources as well as for populating a secret containing a self-signed cert (and its private key).

This approach forces users of our Operator to install cert-manager in order to use it, which is something we'd like to avoid.

Worth considering is that many Operators rely on the operator-sdk or kubebuilder at some level, and pulling cert-manager as a dependency might eventually become commonplace, in the sense that all operator users will already have it installed for one of the many operators installed which would require it.

In any case, some alternatives to avoid this dependency:

  1. not implementing validation through webhooks: simply go on as we do today, which consists of setting defaults and validating the CR on the reconcile loop
  2. change kubebuilder-generated manifests to remove the volume which mounts the cert secret and generate the cert during initialization, populating /tmp/k8s-webhook-server/serving-certs with it.
  3. create a sidecar container which would generate a self-signed cert and populate the webhook-server-cert secret with it, allowing it to be mounted on the manager container.

In the light of these complications, we should take a step back to analyze our options and further discuss the strategy moving forward. Moving this to v0.6.0 instead of v0.5.0.

My 2c

1. not implementing validation through webhooks

I personally dislike this approach. It allows users to persist invalid CRs to the etcd store, in case of validation failures the user does not get immediately notified with an error message thrown to the terminal (they need to check the status of the CR or its logs to understand what is going on). It does not follow the suite of the core API resources. It changes the CR during the reconcile loop, which seems like anti-pattern.

2. change kubebuilder-generated manifests and write to the container FS directly

While this is good in the sense that it allows the use of webhooks, the means to do it are pretty ugly. We change auto-generated stuff from kubebuilder, which is likely to come back to bite us on the ass in the long-run as well as generates more stuff to maintain. It also makes us interact directly with the container's filesystem, which is definitely something we should always try to avoid. We should always try to rely on K8s API instead, which should abstract this sort of low-level, implementation-specific interactions. This too might come back to bite us on the ass and generates more stuff to maintain.

3. create a sidecar container to populate the webhook-server-cert secret

This allows the use of webhooks, keeps auto-generated stuff intact and relies solely on K8s API in order to work. On the other hand, we'll need to maintain this new container which implements part of the functionality already provided by cert-manager. This isn't necessarily a bad thing, as long as we implement it correctly, in the spirit of "A little copying is better than a little dependency".

@LCaparelli LCaparelli modified the milestones: v0.5.0, v0.6.0 Dec 2, 2020
@ricardozanini
Copy link
Member Author

@LCaparelli I believe we can add cert-manager as a dependency of ours instead of copying their behavior. I think that what you said:

Worth considering is that many Operators rely on the operator-sdk or kubebuilder at some level, and pulling cert-manager as a dependency might eventually become commonplace, in the sense that all operator users will already have it installed for one of the many operators installed which would require it.

It's super ludic and we should commit to the best practices whenever is possible. Since they recommend that, why not? :)

Question: if cert-manager is not installed, the operator stops working? Do we depend on it code-wise (I believe we do not)? Or we will just accept non-validated CRs? If it's the last case, we can still support "trials" without too many installation bits.

@LCaparelli
Copy link
Member

LCaparelli commented Dec 2, 2020

It's super ludic and we should commit to the best practices whenever is possible. Since they recommend that, why not? :)

+1, I actually thought you wouldn't want to add it as dependency. Cool, the implementation is almost finished then (since the scaffolding for this is already performed by kubebuilder), just need to test it more.

Question: if cert-manager is not installed, the operator stops working? Do we depend on it code-wise (I believe we do not)? Or we will just accept non-validated CRs? If it's the last case, we can still support "trials" without too many installation bits.

It stops working in the sense it will run into nil dereference if pointer fields are not correctly set, for example the probes. Without the webhooks in place, the user MUST supply a valid Nexus which does not require setting any defaults, otherwise the operator will be in a crash loop condition, constantly panicking when attempting to process the reconcile req.

But yes, if the user makes sure it's a completely valid and populated CR, it should cause no trouble.

@ricardozanini
Copy link
Member Author

ricardozanini commented Dec 2, 2020

the user MUST supply a valid Nexus which does not require setting any defaults, otherwise the operator will be in a crash loop condition, constantly panicking when attempting to process the reconcile req.

I don't see a problem if we state this in the documentation.

@LCaparelli
Copy link
Member

Ugh, sorry. I was wrong. The container will not start without cert-manager because it requires a secret containing the TLS cert. This is part of the webhooks scaffolding.

Additionally, the scaffolded code in main.go which registers the webhook exits if that fail, but we could easily control this with an env variable (Don't register the webhook if USE_WEBHOOKS==FALSE)

We'd need to provide an alternate operator.yaml which does not attempt to mount this secret and add a check for this env var. Think it's worth the hassle @ricardozanini?

@ricardozanini
Copy link
Member Author

I don't think so. We can also add to the doc how to install the operator without the cert-manager dependency. One of the steps is to change the deployment to add the environment variable USE_WEBHOOKS.

But can't we be smart and just ignore the cert-manager if it's not there?

@LCaparelli
Copy link
Member

I don't think so. We can also add to the doc how to install the operator without the cert-manager dependency. One of the steps is to change the deployment to add the environment variable USE_WEBHOOKS.

To be clear: let's not create a custom operator.yaml, but let's create a USE_WEBHOOK env var which can control whether the hooks should be registered. This variable will be set by users experimenting with our operator and we will document the necessary changes to the deployment in order for this to work.

But can't we be smart and just ignore the cert-manager if it's not there?

No, because kubebuilder inserts this secret mount in the deployment itself, present in nexus-operator.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    control-plane: controller-manager
  name: nexus-operator-controller-manager
  namespace: nexus-operator-system
spec:
# output omitted
       volumeMounts:
        - mountPath: /tmp/k8s-webhook-server/serving-certs
          name: cert
          readOnly: true
      terminationGracePeriodSeconds: 10
      volumes:
      - name: cert
        secret:
          defaultMode: 420
          secretName: webhook-server-cert

If this secret does not exist, the container doesn't start. This is all before we're actually running, so can't escape this via code. This secret is created by cert-manager. Removing this secret is also one of the steps users would need to follow in order to install the operator without webhooks.

@ricardozanini
Copy link
Member Author

Where are you going to set the USE_WEBHOOKS env then? In the installation script? Since this is generated by kubebuilder directly in our deployment.yaml file, I don't think we will be able to do anything after that.

I think I need to take a look at Knative to see how they do there.

@LCaparelli
Copy link
Member

What I'm proposing is:

  1. for regular installations, pull in cert-manager as a dependency.
  2. if the user just wants to give it a swing, without cert-manager, we document the perils of doing this and the modifications they need to perform to the deployment (including removing the secret volume mount and adding the environment variable). To do this, they can't use OLM as changes to the deployment will be undone to match the CSV, so users that want to do this will need to use the script.

The supported, canon way to install is using webhooks, but we document the steps to try it without them. We don't provide a "ready-to-apply" manifest for it, just give the instructions to modify the manifest we supply for regular installations.

Wdyt?

@ricardozanini
Copy link
Member Author

I think it's better to just generate an operator.yaml file without the cert-manager then :/
Too much trouble for users that just want to try the operator.

@LCaparelli
Copy link
Member

LCaparelli commented Dec 2, 2020

Ok, but you mean a separate one, right? Not changing the one generated by kubebuilder and that goes into the CSV.

@ricardozanini
Copy link
Member Author

yes, definitely a separate one.

@ibotty
Copy link

ibotty commented May 28, 2021

Pretty late to the party, but for some environments there are other ways to get a valid certificate (for some definitions of valid), e.g. on OKD/OpenShift: https://docs.okd.io/latest/security/certificates/service-serving-certificate.html
It would be great to easily omit the dependency on these kubernetes deployments.

@ricardozanini ricardozanini modified the milestones: v0.6.0, v1.0.0 Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 👑 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants