Skip to content

Conversation

@lyarwood
Copy link
Member

@lyarwood lyarwood commented Apr 11, 2024

What this PR does / why we need it:

These new env variables being useful when testing dedicatedCpuPlacement and guestMappingPassthrough without requiring a physical host with multiple NUMA nodes etc.

$ sudo dnf install numactl -y && numactl --hardware
[..]
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 0 size: 63432 MB
node 0 free: 619 MB
node distances:
node   0 
  0:  10 
$ env | grep KUBEVIRT
KUBEVIRT_PROVIDER=k8s-1.28
KUBEVIRT_MEMORY_SIZE=16384M
KUBEVIRT_HUGEPAGES_2M=1024
KUBEVIRT_DEPLOY_CDI=true
KUBEVIRT_CPU_MANAGER_POLICY=static
KUBEVIRT_NUM_NUMA_NODES=2
KUBEVIRT_NUM_VCPU=8
KUBEVIRTCI_CONTAINER_SUFFIX=latest
KUBEVIRTCI_GOCLI_CONTAINER=quay.io/kubevirtci/gocli:latest
$ make cluster-up
[..]
$ ./cluster-up/ssh.sh node01
[..]
[vagrant@node01 ~]$ sudo dnf install numactl && numactl --hardware
[..]
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 7997 MB
node 0 free: 5610 MB
node 1 cpus: 4 5 6 7
node 1 size: 8018 MB
node 1 free: 5932 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
$ cd ../kubevirt
$ rsync -av ../kubevirtci/_ci-configs/ ./_ci-configs/
$ rsync -av ../kubevirtci/cluster-up/ ./cluster-up
$ make cluster-sync
[..]
$ ./cluster-up/kubectl.sh patch kv/kubevirt -n kubevirt --type merge -p '{"spec":{"configuration":{"developerConfiguration":{"featureGates": ["CPUManager","NUMA"]}}}}'
$ ./cluster-up/kubectl.sh apply -f -<<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: example
spec:
  domain:
    cpu:
      cores: 5
      dedicatedCpuPlacement: true
      numa:
        guestMappingPassthrough: {}
    devices:
      disks:
        - disk:
            bus: virtio
          name: containerdisk
        - disk:
            bus: virtio
          name: cloudinitdisk
    resources:
      requests:
        memory: 1Gi
    memory:
      hugepages:
        pageSize: 2Mi
  volumes:
    - containerDisk:
        image: quay.io/containerdisks/fedora:39
      name: containerdisk
    - cloudInitNoCloud:
        userData: |
          #!/bin/sh
          mkdir -p  /home/fedora/.ssh
          curl https://github.com/lyarwood.keys > /home/fedora/.ssh/authorized_keys
          chown -R fedora: /home/fedora/.ssh
      name: cloudinitdisk
EOF
[..]
$ ./cluster-up/virtctl.sh ssh -lfedora example
fedora@example $ sudo dnf install numactl -y && numactl --hardware
[..]
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 502 MB
node 0 free: 293 MB
node 1 cpus: 1 2 3 4
node 1 size: 444 MB
node 1 free: 44 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
$ ./cluster-up/kubectl.sh exec pods/virt-launcher-example-glcfz -- virsh vcpuinfo 1
selecting podman as container runtime
VCPU:           0
CPU:            1
State:          running
CPU time:       14.3s
CPU Affinity:   -y------

VCPU:           1
CPU:            4
State:          running
CPU time:       5.6s
CPU Affinity:   ----y---

VCPU:           2
CPU:            5
State:          running
CPU time:       2.8s
CPU Affinity:   -----y--

VCPU:           3
CPU:            6
State:          running
CPU time:       3.8s
CPU Affinity:   ------y-

VCPU:           4
CPU:            7
State:          running
CPU time:       5.2s
CPU Affinity:   -------y
$ ./cluster-up/kubectl.sh get pods/virt-launcher-example-glcfz -o json | jq '.spec.containers
[] | select(.name == "compute") | .resources'
selecting podman as container runtime
{
  "limits": {
    "cpu": "5",
    "devices.kubevirt.io/kvm": "1",
    "devices.kubevirt.io/tun": "1",
    "devices.kubevirt.io/vhost-net": "1",
    "hugepages-2Mi": "1Gi",
    "memory": "394264576"
  },
  "requests": {
    "cpu": "5",
    "devices.kubevirt.io/kvm": "1",
    "devices.kubevirt.io/tun": "1",
    "devices.kubevirt.io/vhost-net": "1",
    "ephemeral-storage": "50M",
    "hugepages-2Mi": "1Gi",
    "memory": "394264576"
  }
}

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

NONE

@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Apr 11, 2024
@lyarwood lyarwood force-pushed the numa branch 2 times, most recently from ead80fc to 9ab8fc3 Compare April 16, 2024 10:54
@lyarwood lyarwood changed the title Introduce KUBEVIRT_NUM_NUMA_NODES and KUBEVIRT_NUM_VCPU Introduce KUBEVIRT_NUM_NUMA_NODES, KUBEVIRT_NUM_VCPU and KUBEVIRT_CPU_MANAGER_POLICY Apr 16, 2024
@lyarwood lyarwood marked this pull request as ready for review April 16, 2024 11:13
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2024
@kubevirt-bot kubevirt-bot requested review from dhiller and rmohr April 16, 2024 11:13
@lyarwood
Copy link
Member Author

/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2024
Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

From what I can tell, looks good to me!

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 16, 2024
@lyarwood
Copy link
Member Author

/retest-required

+ ./cluster-up/ssh.sh node01 -- ip l show eth1
selecting podman as container runtime
Device "eth1" does not exist.

I can't reproduce this locally so trying another run to ensure it isn't transient in CI.

@lyarwood
Copy link
Member Author

/retest-required

+ ./cluster-up/ssh.sh node01 -- ip l show eth1
selecting podman as container runtime
Device "eth1" does not exist.

I can't reproduce this locally so trying another run to ensure it isn't transient in CI.

+ ../gocli/build/cli provision 1.30 --phases k8s
time="2024-04-16T17:06:15Z" level=info msg="Using remote image quay.io/kubevirtci/centos9:2404020400-62457f2"

Ah I see it now, the issue is the tests don't also rebuild centos so the vm.sh script within is old?

I assume I need to break these changes out of 2618f66 into their own PR, land that and have a fresh centos image created?

@lyarwood
Copy link
Member Author

/hold

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2024
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2024
@lyarwood lyarwood marked this pull request as draft April 17, 2024 17:04
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2024
This change introduces the KUBEVIRT_NUM_NUMA_NODES env variable allowing
for 2 or more NUMA nodes to be defined per host. This requires that both
the amount of memory and vCPUs be divisible by this value in order for
an even distribution of resources across the resulting nodes.

Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
This change introduces the KUBEVIRT_NUM_VCPU env variable allowing users
to define a number of vCPUS exposed as cores for each host.

This was previously hardcoded to 6 so the value is retained as the
default to avoid breaking any existing envs where this value is assumed.

Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
This change introduces the KUBEVIRT_CPU_MANAGER_POLICY env var that
allows a user to enable the CPUManager with a specific policy.

The static policy is the only supported policy at present with the
full-pcpus-only option provided.

Two FIXMEs are left in the code to move to config drop-ins and fix an
issue with HereDocs that could clean up this implementation slightly.

Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
NewEnvClient being deprecated in favor of NewClientWithOpts:

https://pkg.go.dev/github.com/docker/docker/client#NewEnvClient

Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
@lyarwood
Copy link
Member Author

/hold cancel

@lyarwood lyarwood marked this pull request as ready for review April 22, 2024 08:45
@kubevirt-bot kubevirt-bot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 22, 2024
@kubevirt-bot kubevirt-bot requested a review from vladikr April 22, 2024 08:45
@brianmcarey
Copy link
Member

/cc

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 25, 2024
Copy link
Member

@brianmcarey brianmcarey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

thanks @lyarwood !

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brianmcarey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2024
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Apr 26, 2024

@lyarwood: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
check-up-kind-1.27-vgpu 4151e66 link false /test check-up-kind-1.27-vgpu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-bot kubevirt-bot merged commit 2981e7e into kubevirt:main Apr 26, 2024
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Apr 27, 2024
[2981e7e Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`](kubevirt/kubevirtci#1171)
[02b52b9 Bump golang.org/x/net from 0.17.0 to 0.23.0 in /cluster-provision/gocli](kubevirt/kubevirtci#1180)
[8341f20 Run ./hack/bump-cdi.sh](kubevirt/kubevirtci#1158)
[e1ec7d8 cdi: Deploy latest available CDI manifests](kubevirt/kubevirtci#1165)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Apr 28, 2024
[2981e7e Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`](kubevirt/kubevirtci#1171)
[02b52b9 Bump golang.org/x/net from 0.17.0 to 0.23.0 in /cluster-provision/gocli](kubevirt/kubevirtci#1180)
[8341f20 Run ./hack/bump-cdi.sh](kubevirt/kubevirtci#1158)
[e1ec7d8 cdi: Deploy latest available CDI manifests](kubevirt/kubevirtci#1165)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Apr 28, 2024
[2981e7e Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`](kubevirt/kubevirtci#1171)
[02b52b9 Bump golang.org/x/net from 0.17.0 to 0.23.0 in /cluster-provision/gocli](kubevirt/kubevirtci#1180)
[8341f20 Run ./hack/bump-cdi.sh](kubevirt/kubevirtci#1158)
[e1ec7d8 cdi: Deploy latest available CDI manifests](kubevirt/kubevirtci#1165)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Apr 29, 2024
[2981e7e Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`](kubevirt/kubevirtci#1171)
[02b52b9 Bump golang.org/x/net from 0.17.0 to 0.23.0 in /cluster-provision/gocli](kubevirt/kubevirtci#1180)
[8341f20 Run ./hack/bump-cdi.sh](kubevirt/kubevirtci#1158)
[e1ec7d8 cdi: Deploy latest available CDI manifests](kubevirt/kubevirtci#1165)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Apr 29, 2024
[2981e7e Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`](kubevirt/kubevirtci#1171)
[02b52b9 Bump golang.org/x/net from 0.17.0 to 0.23.0 in /cluster-provision/gocli](kubevirt/kubevirtci#1180)
[8341f20 Run ./hack/bump-cdi.sh](kubevirt/kubevirtci#1158)
[e1ec7d8 cdi: Deploy latest available CDI manifests](kubevirt/kubevirtci#1165)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
kubevirt-bot added a commit to kubevirt-bot/kubevirt that referenced this pull request Apr 30, 2024
[239678e Remove modprobe of vfio_mdev as no longer present](kubevirt/kubevirtci#1183)
[2981e7e Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`](kubevirt/kubevirtci#1171)
[02b52b9 Bump golang.org/x/net from 0.17.0 to 0.23.0 in /cluster-provision/gocli](kubevirt/kubevirtci#1180)
[8341f20 Run ./hack/bump-cdi.sh](kubevirt/kubevirtci#1158)
[e1ec7d8 cdi: Deploy latest available CDI manifests](kubevirt/kubevirtci#1165)

```release-note
NONE
```

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants