Skip to content

Old versions of deployment with untrusted image cannot be teared down #38

@vpnachev

Description

@vpnachev

Description

I have an already running deployment which image is not signed. Cosign has been enabled for the namespace of the deployment and I would like to update the deployment with another image that is signed can be successfully verified by the policy-controller.
So, the change was accepted for the Deployment resource, new ReplicaSet was created and then the first Pod is also created. At this point, kube-controller-manager tries to scale down the old replica set, however changing the old ReplicaSet is disallowed by cosign as it is still using the untrusted image.

Steps to reproduce (assuming cosign is installed and running).

  1. kubectl create namespace test
cat <<EOF | kubectl -n test create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo-bar
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: foo-bar
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: foo-bar
    spec:
      containers:
      - command: 
        - "sh"
        - "-c"
        - "sleep 3600"
        image: alpine:3.16
        imagePullPolicy: IfNotPresent
        name: sleep
EOF
  1. kubectl label namespace test policy.sigstore.dev/include=true
  2. kubectl -n test patch deployments.apps foo-bar -p '{"spec":{"template":{"spec":{"containers":[{"name":"sleep","image":"<my-signed-image>"}]}}}}'

After several seconds, the state of deployment will stuck with new replicaset with 1 replica and old replica set that cannot be scaled down

kubectl -n test get deployment,replicaset,pod
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/foo-bar   3/2     1            3           8m

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/foo-bar-7f7559865f   2         2         2       8m
replicaset.apps/foo-bar-bc97b45d     1         1         1       5m

NAME                           READY   STATUS    RESTARTS   AGE
pod/foo-bar-7f7559865f-d99v8   1/1     Running   0          8m
pod/foo-bar-7f7559865f-qgfjh   1/1     Running   0          8m
pod/foo-bar-bc97b45d-ndr4d     1/1     Running   0          5m

The error in the policy-controller is

{
  "level": "error",
  "ts": "2022-06-14T19:23:57.097Z",
  "logger": "policy-controller",
  "caller": "validation/validation_admit.go:170",
  "msg": "Failed the resource specific validation",
  "commit": "a4cb262",
  "knative.dev/kind": "apps/v1, Kind=ReplicaSet",
  "knative.dev/namespace": "test",
  "knative.dev/name": "foo-bar-7f7559865f",
  "knative.dev/operation": "UPDATE",
  "knative.dev/resource": "apps/v1, Resource=replicasets",
  "knative.dev/subresource": "",
  "knative.dev/userinfo": "{system:serviceaccount:kube-system:deployment-controller a1132716-21af-4e79-9d5c-ff0440b2953d [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]}",
  "error": "no matching signatures:\n: spec.template.spec.containers[0].image\n<my-signed-image-digest>",
  "stacktrace": "knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/pkg@v0.0.0-20220325200448-1f7514acd0c2/webhook/resourcesemantics/validation/validation_admit.go:170\nknative.dev/pkg/webhook/resourcesemantics/validation.(*reconciler).Admit\n\tknative.dev/p
kg@v0.0.0-20220325200448-1f7514acd0c2/webhook/resourcesemantics/validation/validation_admit.go:80\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/pkg@v0.0.0-20220325200448-1f7514acd0c2/webhook/admission.go:117\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2047\
nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2425\nknative.dev/pkg/webhook.(*Webhook).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20220325200448-1f7514acd0c2/webhook/webhook.go:262\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20220325200448-1f7514
acd0c2/network/handlers/drain.go:110\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2879\nnet/http.(*conn).serve\n\tnet/http/server.go:1930"
}

I am not sure how this can be avoided when the policy-controller is validating not only pods but also the higher level resources like replicasets and deployments. Also, it looks like the KCM does not use the /scale subresource which is not subject of admission control by the cosign and would allow tearing down the old replicaset. Here is sample patch requests not using and using the /scale subresource:

$ kubectl -n test patch replicaset foo-bar-7f7559865f  -p '{"spec":{"replicas": 1}}'
Error from server (BadRequest): admission webhook "policy.sigstore.dev" denied the request: validation failed: no matching signatures:
: spec.template.spec.containers[0].image
index.docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454

$ kubectl -n test patch replicaset foo-bar-7f7559865f  -p '{"spec":{"replicas": 1}}' --subresource=scale
scale.autoscaling/foo-bar-7f7559865f patched

Version

{
  "gitVersion": "v1.9.0",
  "gitCommit": "a4cb262dc3d45a283a6a7513bb767a38a2d3f448",
  "gitTreeState": "clean",
  "buildDate": "2022-06-03T13:47:07Z",
  "goVersion": "go1.17.11",
  "compiler": "gc",
  "platform": "linux/amd64"
}

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions