Refactor `cosigned` to take advantage of duck typing. #637

mattmoor · 2021-09-08T18:12:58Z

With this change, the webhook can take advantage of duck typing to parse all of the "Pod Specable" types currently supported.

This also takes advantage of the knative.dev/pkg webhook infrastructure to reduce boilerplate and eliminate the need for cert-manager.

Lastly, this starts to sketch out some cosigned e2e tests to verify that things work.

Signed-off-by: Matt Moore mattomata@gmail.com

mattmoor · 2021-09-08T18:17:10Z

TODO list for myself:

Change the exclude label to an include label, and label the namespace for tests.
Add coverage for a PodSpecable in addition to Pod (given ./cmd/sample maybe Job?)
Split off the :nonroot base image as separate PR.
Split off the fulcioroot package split as separate PR.

mattmoor · 2021-09-08T18:36:47Z

Ok, added a Job test, and broke off: #638 and #639 (will rebase this when those merge)

With this change, the webhook can take advantage of duck typing to parse all of the "Pod Specable" types currently supported. This also takes advantage of the `knative.dev/pkg` webhook infrastructure to reduce boilerplate and eliminate the need for `cert-manager`. Lastly, this starts to sketch out some cosigned e2e tests to verify that things work. Signed-off-by: Matt Moore <mattomata@gmail.com>

dlorenc · 2021-09-08T22:30:50Z

Cc @hectorj2f FYI

This looks good to me, but let me know if you have any concerns

hectorj2f · 2021-09-08T22:43:17Z

cmd/sample/main.go

+import "log"
+
+func main() {
+	log.Printf("Hello, World!")


This looks like an unnecessary piece of code to maintain. Can you explain why you added it ?

An image is published with this code to a registry running alongside KinD in the e2e tests. We use this image to verify that:

Unsigned images are fine in namespaces not subject to the webhook,

Unsigned images are rejected in namespaces subject to the webhook,

Signed images are fine in namespaces subject to the webhook.

Personally, I'm less worried about maintaining "Hello world" than I am about maintaining a published image somewhere and logic to copy it down.

Unsigned images are fine in namespaces not subject to the webhook, Unsigned images are rejected in namespaces subject to the webhook, Signed images are fine in namespaces subject to the webhook.

Couldn't we build and publish to this registry running alongside the our cosign image without being signed ? So we can test it using our own docker image.

On the other hand, if this code is only used for testing purposes. Would it be a better choice to move it to the test directory ? wdyt ?

Yeah, that's what this does. ko publish this test image to the local registry, so we don't need to rely on a prebuilt image. I'd be happy to move this to a more clearly named directory, I just didn't see a precedent to follow yet.

hectorj2f · 2021-09-08T22:44:35Z

.github/workflows/kind-e2e-cosigned.yaml

+
+    - name: Collect diagnostics
+      if: ${{ failure() }}
+      run: |


Could we use scripts (potentially stored in a hack directory) instead of this long yaml file ?

The point of this logic is to dump useful diagnostic information from the ephemeral cluster before it is torn down. If we find ourselves creating more workflows that copy/paste this (within the same repo) then it might be useful to extract, but I don't think it's useful for local environments because they stick around for post-mortem inspection.

Perhaps I simple solution would have been to use kind export logs to a directory so it can be available as a bundle and download it.

We can certainly change this to do that. This snippet is just one I've been using across a variety of projects for this, and it's nice to be able to jump to directly the pod logs for the component you care about.

In Knative we uploaded the full logs to GCS, and they got very very large. Not sure this will end up with as much logging or as many tests, but we can tweak this however we want 👍

hectorj2f · 2021-09-08T22:45:27Z

.github/workflows/kind-e2e-cosigned.yaml

+        # Wait for the webhook to come up and become Ready
+        kubectl rollout status --timeout 5m --namespace cosign-system deployments/webhook
+
+    - name: Run Tests


I would rather prefer using ginkgo e2e tests instead of this raw script. It is easier to maintain and less error prone. We were planing to add e2e tests next week too.

Again instead of having the whole piece of code here, it would be easier to maintain if we call scripts instead.

I created ./test/e2e_test_cosigned.sh with this for now, which mostly just outlines what was here.

My goal here wasn't to set a precedent for how we build all future tests, but to at least get a workflow set up which validates something presubmit, so we don't end up with another -log_dir situation.

Suffice it to say that I'd be happy to see all of the "tests" rewritten in something better, but wanted some measure of validation before checking this in. Rather than bikeshed on which test framework to rewrite the tests in here, my inclination would be to get some test coverage in, and then we can follow up.

Sounds good to me.

hectorj2f · 2021-09-08T22:47:25Z

cmd/cosign/webhook/kodata/LICENSE

@@ -0,0 +1 @@
+../../../../LICENSE


Question: why do we need this here ?

HEAD and refs are symlinked in so that knative.dev/pkg/changeset can infuse various things (e.g. structured logging) with the changeset at which the image was built.

This is here as part of license compliance. Really this repo should also start running github.com/google/go-licenses as well to produce a third_party/VENDOR-LICENSE/... that we symlink here as well for license compliance, but that is sufficiently beyond the scope of my change that I left it out.

Could we add these kodata files to .gitignore ?

They aren't generated, and they are generally checked in because you want the published images to contain the metadata. If there were a way of generating these, we could conceivably use .gitignore, but ko isn't going to create these (kodata is its convention, not the rest)

hectorj2f · 2021-09-08T22:48:01Z

cmd/cosign/webhook/main.go

+func main() {
+	ctx := webhook.WithOptions(signals.NewContext(), webhook.Options{
+		ServiceName: "webhook",
+		Port:        8443,


Port should remain configurable.
Metrics port and other settings would need to be configurable as well.

The metrics port (if you are using prometheus) is configurable through config-observability: https://github.com/knative/pkg/blob/9a4b6128207c17418c4524f9f9a07cd9cb3babda/metrics/config.go#L101-L104

IIRC 9090 is the typical prometheus port, so IDK where 8080 came from. Hosting metrics on a port is also kind of an anti-pattern because it doesn't work well in ephemeral environments (a la serverless), so other metrics options (like stackdriver) push metrics and don't expose a port.

I take that back. The configmap controls a nested configuration. The prometheus port is configurable through environment variables: https://github.com/knative/pkg/blob/50410e0b833abc1a464d334b9e52e2c873dafcd4/metrics/config.go#L55-L56

hectorj2f · 2021-09-08T22:48:46Z

cmd/cosign/webhook/main.go

+	ctx := webhook.WithOptions(signals.NewContext(), webhook.Options{
+		ServiceName: "webhook",
+		Port:        8443,
+		SecretName:  "webhook-certs",


shouldn't it be the value of secretName ?

No, secretName holds the name of the signing key. This is the secret that holds the TLS cert, which we reconcile with certificates.NewController, and which the sharedmain logic uses to host a TLS-terminated endpoint.

Consider updating the description on L39 and expanding which secret it is, sumtin like:

The name of the secret in the webhook's namespace that holds the TLS certificate used for signing

Or maybe change the flag from secret-name to signing-secret-name or just signing-secret?

Updated the description, this is for verification (vs. signing). I was thinking earlier that we might be able to just use a configmap for this since it's mainly intended for the public key (vs. proper "secret" data), but that's beyond the scope of the change.

hectorj2f · 2021-09-08T22:49:12Z

cmd/cosign/webhook/main.go


-	// Add healthz and readyz handlers to webhook server. The controller-runtime AddHealthzCheck/AddReadyzCheck methods
-	// are served via separate http server - better to serve these from the same webhook http server.
-	webhookServer.WebhookMux.Handle("/readyz/", http.StripPrefix("/readyz/", &healthz.Handler{}))


Question: Are we losing the healthz and readyz checks with these changes ?

No, the Knative webhook logic has relatively sophisticated logic to respond to probes. You can see these configured in config/webhook.yaml.

hectorj2f · 2021-09-08T22:50:34Z

config/100-namespace.yaml

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1


We decided to move the all the installation scripts within our helm charts, as done here #609. We have a PR open on the sigstore/helm-charts repol.

Yes, I found that very confusing.

What's the plan to write e2e tests to validate changes presubmit? How do you validate changes presubmit that need both code and config changes (this PR itself is a somewhat dramatic example of that)?

Personally, I think it's fine to have Helm-specific stuff separate, and we can rationalize that when this lands (that still hadn't when I looked yesterday). My $0.02 is that Helm shouldn't be a requirement to install this, and the "lowest common denominator" is yaml configs, which is effectively all this is (ko resolve -f config > release.yaml is how quite a few projects are producing rendered release yaml).

hectorj2f · 2021-09-08T22:55:00Z

config/200-clusterrole.yaml

+    verbs: ["create"]
+
+  # Allow the reconciliation of exactly our validating webhook.
+  - apiGroups: ["admissionregistration.k8s.io"]


Why do we need all these permissions ? In the original chart, we don't need more than those https://github.com/sigstore/helm-charts/pull/10/files#diff-dba06e52f4da92f91dbb6c70da49d6d5c18f8822eb0516d76fba88c436689a0eR17.

The original code also needed cert-manager, and these permissions are actually extremely narrow if you pick through them.

We reconcile the validating webhook to make sure it has the appropriate caBundle, which is important for certificate rotation. If you look closely, the only webhook this can update is the cosigned.sigstore.dev webhook.

We support event creation (above this) at the cluster scope because we reconcile cluster-scoped resources (the webhook), and need to support attaching events to it when we fail to reconcile it.

The block below this is so the webhook can fetch the UID of the "system" namespace, and use this to create an OwnerReference link between the validating webhook and the namespace. A fairly common "uninstall" failure mode is that folks just YOLO delete the "system" namespace, which can be disastrous if cluster-scoped resources (not within the namespace) are left around. In the case of webhooks, there would remain a webhook, which will 503 indefinitely for any resources within its purview. With this capability, the webhook is cleaned up as soon as the namespace is deleted.

@mattmoor I was referring to all the yaml files in general. We decided to move the yaml manifests to the helm-charts repo.
cc @cpanato @dlorenc

I'm happy to help rationalize things here, I just hadn't seen the "dots" connected for how y'all plan to do e2e testing with the configuration in a separate repo. I really wanted e2e tests to help validate things here.

One thing I've done elsewhere is vendor the config files across repos, which is another possible option.

mattmoor

Thanks for the comments @hectorj2f.

I pushed a couple changes in a second commit, but may amend that commit as needed if I broke something doing it.

mattmoor · 2021-09-09T01:04:28Z

.github/workflows/kind-e2e-cosigned.yaml

+        # Wait for the webhook to come up and become Ready
+        kubectl rollout status --timeout 5m --namespace cosign-system deployments/webhook
+
+    - name: Run Tests


I created ./test/e2e_test_cosigned.sh with this for now, which mostly just outlines what was here.

My goal here wasn't to set a precedent for how we build all future tests, but to at least get a workflow set up which validates something presubmit, so we don't end up with another -log_dir situation.

Suffice it to say that I'd be happy to see all of the "tests" rewritten in something better, but wanted some measure of validation before checking this in. Rather than bikeshed on which test framework to rewrite the tests in here, my inclination would be to get some test coverage in, and then we can follow up.

mattmoor · 2021-09-09T01:09:46Z

.github/workflows/kind-e2e-cosigned.yaml

+
+    - name: Collect diagnostics
+      if: ${{ failure() }}
+      run: |


The point of this logic is to dump useful diagnostic information from the ephemeral cluster before it is torn down. If we find ourselves creating more workflows that copy/paste this (within the same repo) then it might be useful to extract, but I don't think it's useful for local environments because they stick around for post-mortem inspection.

mattmoor · 2021-09-09T01:12:09Z

cmd/cosign/webhook/kodata/LICENSE

@@ -0,0 +1 @@
+../../../../LICENSE


HEAD and refs are symlinked in so that knative.dev/pkg/changeset can infuse various things (e.g. structured logging) with the changeset at which the image was built.

This is here as part of license compliance. Really this repo should also start running github.com/google/go-licenses as well to produce a third_party/VENDOR-LICENSE/... that we symlink here as well for license compliance, but that is sufficiently beyond the scope of my change that I left it out.

mattmoor · 2021-09-09T01:13:58Z

cmd/cosign/webhook/main.go

+	ctx := webhook.WithOptions(signals.NewContext(), webhook.Options{
+		ServiceName: "webhook",
+		Port:        8443,
+		SecretName:  "webhook-certs",


No, secretName holds the name of the signing key. This is the secret that holds the TLS cert, which we reconcile with certificates.NewController, and which the sharedmain logic uses to host a TLS-terminated endpoint.

mattmoor · 2021-09-09T01:23:13Z

cmd/cosign/webhook/main.go

+func main() {
+	ctx := webhook.WithOptions(signals.NewContext(), webhook.Options{
+		ServiceName: "webhook",
+		Port:        8443,


The metrics port (if you are using prometheus) is configurable through config-observability: https://github.com/knative/pkg/blob/9a4b6128207c17418c4524f9f9a07cd9cb3babda/metrics/config.go#L101-L104

IIRC 9090 is the typical prometheus port, so IDK where 8080 came from. Hosting metrics on a port is also kind of an anti-pattern because it doesn't work well in ephemeral environments (a la serverless), so other metrics options (like stackdriver) push metrics and don't expose a port.

mattmoor · 2021-09-09T01:24:52Z

cmd/cosign/webhook/main.go


-	// Add healthz and readyz handlers to webhook server. The controller-runtime AddHealthzCheck/AddReadyzCheck methods
-	// are served via separate http server - better to serve these from the same webhook http server.
-	webhookServer.WebhookMux.Handle("/readyz/", http.StripPrefix("/readyz/", &healthz.Handler{}))


No, the Knative webhook logic has relatively sophisticated logic to respond to probes. You can see these configured in config/webhook.yaml.

mattmoor · 2021-09-09T01:28:46Z

cmd/sample/main.go

+import "log"
+
+func main() {
+	log.Printf("Hello, World!")


An image is published with this code to a registry running alongside KinD in the e2e tests. We use this image to verify that:

Unsigned images are fine in namespaces not subject to the webhook,

Unsigned images are rejected in namespaces subject to the webhook,

Signed images are fine in namespaces subject to the webhook.

Personally, I'm less worried about maintaining "Hello world" than I am about maintaining a published image somewhere and logic to copy it down.

mattmoor · 2021-09-09T01:43:35Z

config/100-namespace.yaml

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1


Yes, I found that very confusing.

What's the plan to write e2e tests to validate changes presubmit? How do you validate changes presubmit that need both code and config changes (this PR itself is a somewhat dramatic example of that)?

Personally, I think it's fine to have Helm-specific stuff separate, and we can rationalize that when this lands (that still hadn't when I looked yesterday). My $0.02 is that Helm shouldn't be a requirement to install this, and the "lowest common denominator" is yaml configs, which is effectively all this is (ko resolve -f config > release.yaml is how quite a few projects are producing rendered release yaml).

mattmoor · 2021-09-09T01:51:37Z

config/200-clusterrole.yaml

+    verbs: ["create"]
+
+  # Allow the reconciliation of exactly our validating webhook.
+  - apiGroups: ["admissionregistration.k8s.io"]


The original code also needed cert-manager, and these permissions are actually extremely narrow if you pick through them.

We reconcile the validating webhook to make sure it has the appropriate caBundle, which is important for certificate rotation. If you look closely, the only webhook this can update is the cosigned.sigstore.dev webhook.

We support event creation (above this) at the cluster scope because we reconcile cluster-scoped resources (the webhook), and need to support attaching events to it when we fail to reconcile it.

The block below this is so the webhook can fetch the UID of the "system" namespace, and use this to create an OwnerReference link between the validating webhook and the namespace. A fairly common "uninstall" failure mode is that folks just YOLO delete the "system" namespace, which can be disastrous if cluster-scoped resources (not within the namespace) are left around. In the case of webhooks, there would remain a webhook, which will 503 indefinitely for any resources within its purview. With this capability, the webhook is cleaned up as soon as the namespace is deleted.

mattmoor · 2021-09-09T01:55:22Z

config/webhook.yaml

+        readinessProbe: &probe
+          failureThreshold: 6
+          initialDelaySeconds: 20
+          periodSeconds: 1
+          httpGet:
+            scheme: HTTPS
+            port: 8443
+            httpHeaders:
+            - name: k-kubelet-probe
+              value: "webhook"
+        livenessProbe: *probe


@hectorj2f this is the readiness probe configuration, which validates that the webhook is still serving. The underlying webhook code has logic built in to start failing readiness probes on SIGTERM, but continues to serve traffic until a duration has elapsed without receiving any traffic, or SIGKILL (obvs).

Signed-off-by: Matt Moore <mattomata@gmail.com>

vaikas

just a couple of nits

.github/workflows/kind-e2e-cosigned.yaml

cmd/cosign/webhook/main.go

config/500-webhook-configuration.yaml

pkg/cosign/kubernetes/webhook/validator.go

vaikas · 2021-09-09T07:43:17Z

cmd/cosign/webhook/main.go

+	ctx := webhook.WithOptions(signals.NewContext(), webhook.Options{
+		ServiceName: "webhook",
+		Port:        8443,
+		SecretName:  "webhook-certs",


Consider updating the description on L39 and expanding which secret it is, sumtin like:

The name of the secret in the webhook's namespace that holds the TLS certificate used for signing

Or maybe change the flag from secret-name to signing-secret-name or just signing-secret?

…oist and comment webhook name as constant Signed-off-by: Matt Moore <mattomata@gmail.com>

hectorj2f · 2021-09-09T20:32:16Z

@mattmoor I will have another look and manually test these changes tomorrow.

mattmoor force-pushed the knative-pkg branch from e5e0774 to 9458650 Compare September 8, 2021 18:14

mattmoor force-pushed the knative-pkg branch 2 times, most recently from 345ec18 to 9da1f06 Compare September 8, 2021 18:28

mattmoor force-pushed the knative-pkg branch 2 times, most recently from 9d14f2e to a2953f9 Compare September 8, 2021 20:10

mattmoor changed the title ~~[WIP] Refactor cosigned to take advantage of duck typing.~~ Refactor cosigned to take advantage of duck typing. Sep 8, 2021

mattmoor force-pushed the knative-pkg branch from a2953f9 to f37305c Compare September 8, 2021 20:28

hectorj2f reviewed Sep 8, 2021

View reviewed changes

mattmoor force-pushed the knative-pkg branch from b83a1eb to 5586e86 Compare September 9, 2021 01:57

mattmoor commented Sep 9, 2021

View reviewed changes

mattmoor force-pushed the knative-pkg branch from 5586e86 to e85daa4 Compare September 9, 2021 02:05

Make port configurable, pull tests out into a script.

b494c78

Signed-off-by: Matt Moore <mattomata@gmail.com>

mattmoor force-pushed the knative-pkg branch from e85daa4 to b494c78 Compare September 9, 2021 02:15

vaikas reviewed Sep 9, 2021

View reviewed changes

cpanato added this to the v1.2.0 milestone Sep 9, 2021

Drop GO111MODULE, drop v1beta1 admission review, improve flag desc, h…

c78cfe1

…oist and comment webhook name as constant Signed-off-by: Matt Moore <mattomata@gmail.com>

mattmoor force-pushed the knative-pkg branch from 96d87ca to c78cfe1 Compare September 9, 2021 18:09

dlorenc approved these changes Sep 11, 2021

View reviewed changes

dlorenc merged commit fb04df8 into sigstore:main Sep 11, 2021

mattmoor deleted the knative-pkg branch September 13, 2021 17:59

luhring mentioned this pull request Sep 18, 2021

Fix go install method of installation #716

Merged

		@@ -0,0 +1 @@
		../../../../LICENSE

		@@ -0,0 +1 @@
		../../../../LICENSE

Refactor cosigned to take advantage of duck typing. #637

Refactor cosigned to take advantage of duck typing. #637

Conversation

mattmoor commented Sep 8, 2021

mattmoor commented Sep 8, 2021 • edited

mattmoor commented Sep 8, 2021

dlorenc commented Sep 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattmoor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vaikas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hectorj2f commented Sep 9, 2021

Refactor `cosigned` to take advantage of duck typing. #637

Refactor `cosigned` to take advantage of duck typing. #637

mattmoor commented Sep 8, 2021 •

edited