Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start moving vmi verifications to the CRD #11156

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

nunnatsa
Copy link
Contributor

@nunnatsa nunnatsa commented Feb 6, 2024

What this PR does

The CRD language supports many validations. Using this verification from the CRD instead of from the webhook code has few main advantages:

  1. The K8s API-server verifies the CR instead of the webhook, reducing load from the webhook.
  2. Transparency: the validations are visible to the user as part of the CRD.
  3. remove code duplication. For example, same validation in create and update webhook. For example, the validations in this PR only implemented in the create validation but not in the update validation. adding the validation to the CRD promises that both cases are covered.

This PR adds maxItems for several arrays in VMI and VMI preset:

  • Volumes array
  • Networks array
  • AccessCredentials array
  • Disks array
  • Interfaces array

Then the PR removes the validation from the vmi create validator, and the relevant unit test.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

Move some verification from the VMI create validation webhook to the CRD

The webhook will no longer return the following errors:
* "spec.accessCredentials list exceeds the 256 element limit in length"
* "spec.domain.devices.disks list exceeds the 256 element limit in length"
* "spec.domain.devices.interfaces list exceeds the 256 element limit in length"
* "spec.networks list exceeds the 256 element limit in length"
* "spec.volumes list exceeds the 256 element limit in length"

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L labels Feb 6, 2024
@@ -146,16 +146,6 @@ func ValidateVirtualMachineInstanceSpec(field *k8sfield.Path, spec *v1.VirtualMa
volumeNameMap := make(map[string]*v1.Volume)
networkNameMap := make(map[string]*v1.Network)

maxNumberOfDisksExceeded := len(spec.Domain.Devices.Disks) > arrayLenMax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like you can delete const arrayLenMax right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops. thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@ShellyKa13 ShellyKa13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment. Will wait for the same handling of all the maxLen variables here :)
Nice!

The CRD language supports many validations. Using this verification
from the CRD instead of from the webhook code has few main advantages:
1. The K8s API-server verifies the CR instead of the webhook, reducing
   load from the webhook.
2. Transparency: the validations are visible to the user as part of the
   CRD.
3. remove code duplication. For example, same validation in create and
   update webhook. For example, the validations in this PR only
   implemented in the create validation but not in the update validation.
   Adding the validation to the CRD promises that both cases are covered.

This PR adds maxItems for several arrays in VMI and VMI preset:
* `Volumes` array
* `Networks` array
* `AccessCredentials` array
* `Disks` array
* `Interfaces` array

Then the PR removes the validation from the vmi create validator, and
the relevant unit test.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 6, 2024

/retest

Copy link
Contributor

@ShellyKa13 ShellyKa13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 7, 2024

/cc alaypatel07

Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@EdDev
Copy link
Member

EdDev commented Feb 7, 2024

Any reason you left the "POC" prefix on the PR and commit?

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 7, 2024

Any reason you left the "POC" prefix on the PR and commit?

Because it's very preliminary. The amount of changes is tiny, relatively to all the places we can use CRD verification instead of webhook logic.

I want to see which parts of the source are affected and so on.

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 7, 2024

/test all

@nunnatsa nunnatsa changed the title POC: move vmi verifications to the CRD Start moving vmi verifications to the CRD Feb 7, 2024
@fabiand
Copy link
Member

fabiand commented Feb 7, 2024

I like this direction.

In the past there were limitations of why the CRD based validation was not working for KubeVirt.

Do we have an understanding of the historic reasons?
Have you already seen limitations that impact us today?
Do you plan to move all logic over to the CRD side?

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 7, 2024

Do we have an understanding of the historic reasons?

Not really. What can go wrong? Asking because we're using standard CRD fields for the verification.

Do you plan to move all logic over to the CRD side?

Most of the logic is more complex than length of lists. I guess some of it can be implemented with CEL, and some of it will have to stay as code in the webhook. However, CEL is only GA in K8s 1.29, so we can't use it while we still support 1.27 & 1.28

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 7, 2024

/retest-required

@acardace
Copy link
Member

acardace commented Feb 7, 2024

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acardace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2024
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-windows2016
/test pull-kubevirt-e2e-kind-1.27-vgpu
/test pull-kubevirt-e2e-kind-sriov
/test pull-kubevirt-e2e-k8s-1.29-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-network
/test pull-kubevirt-e2e-k8s-1.27-sig-storage
/test pull-kubevirt-e2e-k8s-1.27-sig-compute
/test pull-kubevirt-e2e-k8s-1.27-sig-operator
/test pull-kubevirt-e2e-k8s-1.28-sig-network
/test pull-kubevirt-e2e-k8s-1.28-sig-storage
/test pull-kubevirt-e2e-k8s-1.28-sig-compute
/test pull-kubevirt-e2e-k8s-1.28-sig-operator

@kubevirt-bot kubevirt-bot merged commit 953e69a into kubevirt:main Feb 8, 2024
37 checks passed
@nunnatsa nunnatsa deleted the poc-start-crd-validation branch February 8, 2024 04:10
@xpivarc
Copy link
Member

xpivarc commented Feb 8, 2024

Do we have an understanding of the historic reasons?

Not really. What can go wrong? Asking because we're using standard CRD fields for the verification.

Do you plan to move all logic over to the CRD side?

Most of the logic is more complex than length of lists. I guess some of it can be implemented with CEL, and some of it will have to stay as code in the webhook. However, CEL is only GA in K8s 1.29, so we can't use it while we still support 1.27 & 1.28

So how do we justify this split where we do one thing in multiple places? Is the server validation enabled by default already in Kubernetes? I remember this was only client-side a while ago. Do we know if this has a performance penalty or any other constraints?

@fabiand Have your questions been answered?

@fabiand
Copy link
Member

fabiand commented Feb 8, 2024

@xpivarc , somewhat :)

So how do we justify this split where we do one thing in multiple places? Is the server validation enabled by default already in Kubernetes?

These are good questions.

@nunnatsa can you help to answer those as well.

@enp0s3
Copy link
Contributor

enp0s3 commented Feb 8, 2024

@xpivarc I have a counter question, why to betray CRD validation by disabling server validation in k8s? why the CRD validation is allowed at all if it may introduce severe performance degradation?

@xpivarc
Copy link
Member

xpivarc commented Feb 8, 2024

@xpivarc I have a counter question, why to betray CRD validation by disabling server validation in k8s?

A while back when I was working with this, the server validation did not exist or it was disabled. That is why I am asking if all supported Kubernetes versions have the server-side validation GA-ed and on by default. This is what I am missing in the PR description or the comments.

why the CRD validation is allowed at all if it may introduce severe performance degradation?

Any work/feature has an associated cost, I am simply asking what is the cost. Note I am also asking for general disadvantages.
Note 2: The author is arguing that the load is decreased,
"The K8s API-server verifies the CR instead of the webhook, reducing load from the webhook." , I am simply questioning how did we measure this...

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 8, 2024

It's very hard to understand from K8s documentation, but I think that these verification are already supported. The "next level", the CEL verification, is only GA in 1.29, so we can't use it.

As for the performance question, this is very simple - less logic == better performance. This PR is only the start. we can remove meaningful amount of verification logic by using CRD validation (https://book.kubebuilder.io/reference/markers/crd-validation)

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 8, 2024

The doc here says:

Custom resources are validated via OpenAPI v3 schemas, by x-kubernetes-validations when the Validation Rules feature is enabled, and you can add additional validation using admission webhooks.

As far as I understand this (bit crypt) text, is that OpenAPI v3 schemas validation can always be used, CEL can be used only if the feature gate is set (1.27-1.28) or from 1.29, and then you can also use webhook.

@EdDev
Copy link
Member

EdDev commented Feb 8, 2024

I was under the impression this has been verified.
@nunnatsa , can you please verify this on a setup that has this CRD deployed, using an old client?

If the validation is not occurring, we need to revert this change.

UPDATE: Or add to the webhook a general check that validates the schema based on the CRD. I think that is possible.

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 8, 2024

Here is the same segment from 1.26 documentation (before the introduction of CEL): https://v1-26.docs.kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#specifying-a-structural-schema

There is no condition here, like a feature gate or other configuration. It can just been used.

@xpivarc
Copy link
Member

xpivarc commented Feb 8, 2024

See ServerSideFieldValidation.

As for the performance question, this is very simple - less logic == better performance.

Less is more, but in this case, we move one thing from one place to another. This often doesn't mean it is more performant. It can be less, same, or more performant. Did we do any experiments?

This PR is only the start. we can remove meaningful amount of verification logic by using CRD validation (https://book.kubebuilder.io/reference/markers/crd-validation)

Issue or design doc outlining the path forward, pros and cons would be much appreciated.
Please, keep in mind that we always need to verify that all supported Kubernetes versions have the features we use (and GA-ed).

@EdDev
Copy link
Member

EdDev commented Feb 8, 2024

I think this is the relevant feature in our case: https://kubernetes.io/blog/2023/04/24/openapi-v3-field-validation-ga/

But still, it needs verification to make sure we are good.

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 8, 2024

Running on kubevirtci cluster with KUBEVIRT_PROVIDER=1.28

I did this: I added fake pattern validation to the network.name field:

name:
  description: 'Network name. Must be a DNS_LABEL and unique ...'
  pattern: ^[a-fA-F]+$
  type: string

Then I created a vm with kubectl v1.24.0, with -v8 flag for verbosity.

This is the output. The error is clearly coming from the api-server:

$ ./oldk -v8 apply -f ../hyperconverged-cluster-operator/hack/vm.yaml                  
I0208 16:10:31.940341 3886602 loader.go:372] Config loaded from file:  /home/nunnatsa/GIT/kubevirt/_ci-configs/k8s-1.28/.kubeconfig
I0208 16:10:31.940718 3886602 round_trippers.go:463] GET https://127.0.0.1:39803/openapi/v2?timeout=32s
I0208 16:10:31.940724 3886602 round_trippers.go:469] Request Headers:
I0208 16:10:31.940729 3886602 round_trippers.go:473]     Accept: application/com.github.proto-openapi.spec.v2@v1.0+protobuf
I0208 16:10:31.940733 3886602 round_trippers.go:473]     User-Agent: oldk/v1.24.0 (linux/amd64) kubernetes/4ce5a89
I0208 16:10:31.954772 3886602 round_trippers.go:574] Response Status: 200 OK in 14 milliseconds
I0208 16:10:31.954793 3886602 round_trippers.go:577] Response Headers:
I0208 16:10:31.954798 3886602 round_trippers.go:580]     X-From-Cache: 1
I0208 16:10:31.954802 3886602 round_trippers.go:580]     X-Varied-Accept: application/com.github.proto-openapi.spec.v2@v1.0+protobuf
I0208 16:10:31.954807 3886602 round_trippers.go:580]     Cache-Control: no-cache, private
I0208 16:10:31.954815 3886602 round_trippers.go:580]     Content-Type: application/com.github.proto-openapi.spec.v2.v1.0+protobuf
I0208 16:10:31.954819 3886602 round_trippers.go:580]     Etag: "9A3E304599795F2104AA82DB03812C51CCFD4298E1FAF900855C9D00E11123015DC1329E84EA3358665B3C10919EC085CDDE9269E23237F85C79C5380B2E8954"
I0208 16:10:31.954823 3886602 round_trippers.go:580]     Date: Thu, 08 Feb 2024 14:10:31 GMT
I0208 16:10:31.954827 3886602 round_trippers.go:580]     Accept-Ranges: bytes
I0208 16:10:31.954830 3886602 round_trippers.go:580]     Vary: Accept-Encoding
I0208 16:10:31.954833 3886602 round_trippers.go:580]     Vary: Accept
I0208 16:10:31.954837 3886602 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 0d0eb00d-718d-4aeb-9f12-e4029abf2ec3
I0208 16:10:31.954841 3886602 round_trippers.go:580]     Audit-Id: 5de40ffd-09cc-43b8-aa21-0bca59270856
I0208 16:10:31.954844 3886602 round_trippers.go:580]     Last-Modified: Thu, 08 Feb 2024 12:28:18 GMT
I0208 16:10:31.954848 3886602 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: a3d3d20b-0767-4a56-88c4-f64a1b7a6c3e
I0208 16:10:32.004431 3886602 request.go:1071] Response Body:
00000000  0a 03 32 2e 30 12 15 0a  0a 4b 75 62 65 72 6e 65  |..2.0....Kuberne|
00000010  74 65 73 12 07 76 31 2e  32 38 2e 36 42 d2 b7 5c  |tes..v1.28.6B..\|
00000020  12 8c 02 0a 22 2f 2e 77  65 6c 6c 2d 6b 6e 6f 77  |...."/.well-know|
00000030  6e 2f 6f 70 65 6e 69 64  2d 63 6f 6e 66 69 67 75  |n/openid-configu|
00000040  72 61 74 69 6f 6e 2f 12  e5 01 12 e2 01 0a 09 57  |ration/........W|
00000050  65 6c 6c 4b 6e 6f 77 6e  1a 57 67 65 74 20 73 65  |ellKnown.Wget se|
00000060  72 76 69 63 65 20 61 63  63 6f 75 6e 74 20 69 73  |rvice account is|
00000070  73 75 65 72 20 4f 70 65  6e 49 44 20 63 6f 6e 66  |suer OpenID conf|
00000080  69 67 75 72 61 74 69 6f  6e 2c 20 61 6c 73 6f 20  |iguration, also |
00000090  6b 6e 6f 77 6e 20 61 73  20 74 68 65 20 27 4f 49  |known as the 'OI|
000000a0  44 43 20 64 69 73 63 6f  76 65 72 79 20 64 6f 63  |DC discovery doc|
000000b0  27 2a 2a 67 65 74 53 65  72 76 69 63 65 41 63 63  |'**getServiceAcc|
000000c0  6f 75 6e 74 49 73 73 75  65 72 4f 70 65 6e 49 44  |ountIssuerOpenI [truncated 18625358 chars]
I0208 16:10:32.037641 3886602 round_trippers.go:463] GET https://127.0.0.1:39803/apis/kubevirt.io/v1/namespaces/default/virtualmachines/testvm
I0208 16:10:32.037653 3886602 round_trippers.go:469] Request Headers:
I0208 16:10:32.037658 3886602 round_trippers.go:473]     Accept: application/json
I0208 16:10:32.037662 3886602 round_trippers.go:473]     User-Agent: oldk/v1.24.0 (linux/amd64) kubernetes/4ce5a89
I0208 16:10:32.039381 3886602 round_trippers.go:574] Response Status: 404 Not Found in 1 milliseconds
I0208 16:10:32.039391 3886602 round_trippers.go:577] Response Headers:
I0208 16:10:32.039396 3886602 round_trippers.go:580]     Cache-Control: no-cache, private
I0208 16:10:32.039400 3886602 round_trippers.go:580]     Content-Type: application/json
I0208 16:10:32.039404 3886602 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: a3d3d20b-0767-4a56-88c4-f64a1b7a6c3e
I0208 16:10:32.039407 3886602 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 0d0eb00d-718d-4aeb-9f12-e4029abf2ec3
I0208 16:10:32.039411 3886602 round_trippers.go:580]     Content-Length: 236
I0208 16:10:32.039415 3886602 round_trippers.go:580]     Date: Thu, 08 Feb 2024 14:10:32 GMT
I0208 16:10:32.039419 3886602 round_trippers.go:580]     Audit-Id: f54369b1-47ec-4fa6-9928-340c6632d9fe
I0208 16:10:32.039431 3886602 request.go:1073] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"virtualmachines.kubevirt.io \"testvm\" not found","reason":"NotFound","details":{"name":"testvm","group":"kubevirt.io","kind":"virtualmachines"},"code":404}
I0208 16:10:32.039591 3886602 request.go:1073] Request Body: {"apiVersion":"kubevirt.io/v1","kind":"VirtualMachine","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"kubevirt.io/v1\",\"kind\":\"VirtualMachine\",\"metadata\":{\"annotations\":{},\"name\":\"testvm\",\"namespace\":\"default\"},\"spec\":{\"running\":false,\"template\":{\"metadata\":{\"labels\":{\"kubevirt.io/domain\":\"testvm\",\"kubevirt.io/size\":\"small\"}},\"spec\":{\"domain\":{\"devices\":{\"disks\":[{\"disk\":{\"bus\":\"virtio\"},\"name\":\"containerdisk\"},{\"disk\":{\"bus\":\"virtio\"},\"name\":\"cloudinitdisk\"}],\"interfaces\":[{\"masquerade\":{},\"name\":\"default\"}]},\"resources\":{\"requests\":{\"memory\":\"64M\"}}},\"networks\":[{\"name\":\"default\",\"pod\":{}}],\"volumes\":[{\"containerDisk\":{\"image\":\"quay.io/kubevirt/cirros-container-disk-demo\"},\"name\":\"containerdisk\"},{\"cloudInitNoCloud\":{\"secretRef\":{\"name\":\"testvm-secret\"}},\"name\":\"cloudinitdisk\"}]}}}}\n"},"name":"testvm","namespace":"default"},"spec":{"running":false," [truncated 560 chars]
I0208 16:10:32.039616 3886602 round_trippers.go:463] POST https://127.0.0.1:39803/apis/kubevirt.io/v1/namespaces/default/virtualmachines?fieldManager=kubectl-client-side-apply&fieldValidation=Strict
I0208 16:10:32.039619 3886602 round_trippers.go:469] Request Headers:
I0208 16:10:32.039624 3886602 round_trippers.go:473]     Accept: application/json
I0208 16:10:32.039628 3886602 round_trippers.go:473]     Content-Type: application/json
I0208 16:10:32.039632 3886602 round_trippers.go:473]     User-Agent: oldk/v1.24.0 (linux/amd64) kubernetes/4ce5a89
I0208 16:10:32.043264 3886602 round_trippers.go:574] Response Status: 422 Unprocessable Entity in 3 milliseconds
I0208 16:10:32.043274 3886602 round_trippers.go:577] Response Headers:
I0208 16:10:32.043278 3886602 round_trippers.go:580]     Date: Thu, 08 Feb 2024 14:10:32 GMT
I0208 16:10:32.043282 3886602 round_trippers.go:580]     Audit-Id: 34191634-561f-4001-aac3-3a24303dfa4f
I0208 16:10:32.043286 3886602 round_trippers.go:580]     Cache-Control: no-cache, private
I0208 16:10:32.043290 3886602 round_trippers.go:580]     Content-Type: application/json
I0208 16:10:32.043293 3886602 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: a3d3d20b-0767-4a56-88c4-f64a1b7a6c3e
I0208 16:10:32.043297 3886602 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 0d0eb00d-718d-4aeb-9f12-e4029abf2ec3
I0208 16:10:32.043301 3886602 round_trippers.go:580]     Content-Length: 570
I0208 16:10:32.043319 3886602 request.go:1073] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"VirtualMachine.kubevirt.io \"testvm\" is invalid: spec.template.spec.networks[0].name: Invalid value: \"default\": spec.template.spec.networks[0].name in body should match '^[a-fA-F]+$'","reason":"Invalid","details":{"name":"testvm","group":"kubevirt.io","kind":"VirtualMachine","causes":[{"reason":"FieldValueInvalid","message":"Invalid value: \"default\": spec.template.spec.networks[0].name in body should match '^[a-fA-F]+$'","field":"spec.template.spec.networks[0].name"}]},"code":422}
The VirtualMachine "testvm" is invalid: spec.template.spec.networks[0].name: Invalid value: "default": spec.template.spec.networks[0].name in body should match '^[a-fA-F]+$'

By the way, this is what I get from a new kubectl as well.

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Feb 8, 2024

As for justification, my view is a bit different. I think we need to justify any custom validation in the webhook, if it can be done by what is now the kubernetes way. And this is something I can't justify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants