Setting the VMI-Under-Test CPUs to be four #55

RamLavi · 2023-12-31T12:48:25Z

This PR is changing the amount of CPUs requested to be four.
Doing this allows the checkup to run the Oslat test on a full core (in this checkup - guest CPUs 2-3).
By doing this Oslat is optimized to not have neighboring CPU noise.

RamLavi · 2023-12-31T12:49:11Z

@orelmisan let's hold this PR until we have a cluster with the HCO alignCPUs enabler

orelmisan · 2023-12-31T12:58:18Z

pkg/internal/checkup/checkup.go

+		CPUCoresCount     = 2
+		CPUTreadsCount    = 2


Currently we have a guest with SMT disabled (CPUTreadsCount = 1).
I'm not sure we need to change that.
@matosatti WDYT?

my reasons for adding this:

This is how we did it on the DPDK repo

Our stakeholders wanted the checkup to run on a SMT enabled environment, so I guess this is what they mean..

In https://access.redhat.com/solutions/7007632 (which serves as our reference), SMT is disabled in the guest.

OK then let's wait for @matosatti before we proceed

talked offline - changed to CPUTreadsCount =1.
But the question remains open - it will be addressed on a separate PR if needed.

It should not matter whether SMT is disabled or not in the guest vCPUs.

RamLavi · 2024-01-01T13:08:04Z

passed on CNV4.15 cluster:

make e2e-test
mkdir -p /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache
podman run --rm \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup:/go/src/github.com/kiagnose/kubevirt-realtime-checkup:Z \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache:/root/.cache/go-build:Z \
	-v /home/ralavi/.kube/sno03-cnvqe2-rdu2:/root/.kube:Z,ro \
	--workdir /go/src/github.com/kiagnose/kubevirt-realtime-checkup \
	-e KUBECONFIG=/root/.kube/kubeconfig \
	-e TEST_NAMESPACE=realtime-checkup-1 \
	-e TEST_CHECKUP_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup:devel \
	-e VM_UNDER_TEST_CONTAINER_DISK_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup-vm:latest \
	docker.io/library/golang:1.20.12 \
	go test -v ./tests/... -test.v -test.timeout=1h -ginkgo.v -ginkgo.timeout=1h 
=== RUN   TestTests
Running Suite: Tests Suite - /go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests
========================================================================================
Random Seed: 1704113663

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/tests_suite_test.go:39
[BeforeSuite] PASSED [0.002 seconds]
------------------------------
Checkup execution should complete successfully
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/checkup_test.go:80
• [784.826 seconds]
------------------------------

Ran 1 of 1 Specs in 784.828 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestTests (784.83s)
PASS
ok  	github.com/kiagnose/kubevirt-realtime-checkup

logs:

$ oc logs -f job/realtime-checkup
2024/01/01 12:54:28 kubevirt-realtime-checkup starting...
2024/01/01 12:54:28 Using the following config:
2024/01/01 12:54:28 	"vmUnderTestTargetNodeName": ""
2024/01/01 12:54:28 	"vmUnderTestContainerDiskImage": "quay.io/ramlavi/kubevirt-realtime-checkup-vm:latest"
2024/01/01 12:54:28 	"oslatDuration": "10m0s"
2024/01/01 12:54:28 	"oslatLatencyThresholdMicroSeconds": "45µs"
2024/01/01 12:54:28 Waiting for VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" to boot...
2024/01/01 12:56:23 VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" had successfully booted
2024/01/01 12:56:23 Login to VMI under test...
2024/01/01 12:57:13 Running Oslat test on VMI under test for 10m0s...
2024/01/01 13:07:17 Oslat test completed:
taskset -c 2-3 oslat --cpu-list 2-3 --rtprio 1 --duration 10m0s --workload memmove --workload-mem 4K 
oslat V 2.60
Total runtime: 		600 seconds
Thread priority: 	SCHED_FIFO:1
CPU list: 		2-3
CPU for main thread: 	0
Workload: 		memmove
Workload mem: 		4 (KiB)
Preheat cores: 		2

Pre-heat for 1 seconds...
Test starts...
Test completed.

        Core:	 2 3
Counter Freq:	 2096 2096 (Mhz)
    001 (us):	 0 0
    002 (us):	 4435029947 4435031233
    003 (us):	 1 0
    004 (us):	 136 137
    005 (us):	 49 49
    006 (us):	 5 4
    007 (us):	 17 18
    008 (us):	 16 16
    009 (us):	 4 4
    010 (us):	 4 3
    011 (us):	 1 1
    012 (us):	 0 0
    013 (us):	 0 0
    014 (us):	 0 0
    015 (us):	 0 0
    016 (us):	 0 0
    017 (us):	 0 0
    018 (us):	 0 0
    019 (us):	 0 0
    020 (us):	 0 0
    021 (us):	 0 0
    022 (us):	 0 0
    023 (us):	 0 0
    024 (us):	 0 0
    025 (us):	 0 0
    026 (us):	 0 0
    027 (us):	 0 0
    028 (us):	 0 0
    029 (us):	 0 0
    030 (us):	 0 0
    031 (us):	 0 0
    032 (us):	 0 0 (including overflows)
     Minimum:	 1 1 (us)
     Average:	 2.000 2.000 (us)
     Maximum:	 10 10 (us)
     Max-Min:	 9 9 (us)
    Duration:	 599.736 599.736 (sec)

[root@realtime-vmi-under-test-xp2nr cloud-user]# 
2024/01/01 13:07:17 Max Oslat Latency measured: 10µs
2024/01/01 13:07:17 Trying to delete VMI: "realtime-checkup-1/realtime-vmi-under-test-xp2nr"
2024/01/01 13:07:17 Waiting for VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" to be deleted...
2024/01/01 13:07:23 VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" was deleted successfully

RamLavi · 2024-01-01T13:08:29Z

@orelmisan let's hold this PR until we have a cluster with the HCO alignCPUs enabler

now ready for review

orelmisan · 2024-01-01T15:59:45Z

We need to consider masking CPUs 2 and 3 to be realtime, while the other two are not using spec.domain.cpu.realtime.mask.
https://kubevirt.io/user-guide/virtual_machines/numa/#running-real-time-workloads

Currently the vmi-under-test is using 3 CPUs, in order to be able to run the checkup and avoid getting the CNV-31584 [0] Jira Bug. However, this configuration is sub-optimal, as the CPUs currently running the Oslat are sibling to the non-isolated CPU 0. Now that the CNV-31584 Bug is resolved, optimizing the checkup's performance by requesting for a full core to run the Oslat test. Setting the CPUs requested to four. [0] https://issues.redhat.com/browse/CNV-31584 Signed-off-by: Ram Lavi <ralavi@redhat.com>

Signed-off-by: Ram Lavi <ralavi@redhat.com>

Setting the CPUs that will run the Oslat test to be from the same core (=siblings). Signed-off-by: Ram Lavi <ralavi@redhat.com>

RamLavi · 2024-01-02T11:56:25Z

Change: Set the guest SMT to disabled.

RamLavi · 2024-01-02T11:57:23Z

We need to consider masking CPUs 2 and 3 to be realtime, while the other two are not using spec.domain.cpu.realtime.mask. https://kubevirt.io/user-guide/virtual_machines/numa/#running-real-time-workloads

I am not against, but let's consult with @matosatti , and add it in a separate PR.

RamLavi · 2024-01-02T12:27:45Z

Passed on CNV 4.15 cluster:

make e2e-test
mkdir -p /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache
podman run --rm \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup:/go/src/github.com/kiagnose/kubevirt-realtime-checkup:Z \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache:/root/.cache/go-build:Z \
	-v /home/ralavi/.kube/sno03-cnvqe2-rdu2:/root/.kube:Z,ro \
	--workdir /go/src/github.com/kiagnose/kubevirt-realtime-checkup \
	-e KUBECONFIG=/root/.kube/kubeconfig \
	-e TEST_NAMESPACE=realtime-checkup-1 \
	-e TEST_CHECKUP_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup:devel \
	-e VM_UNDER_TEST_CONTAINER_DISK_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup-vm:latest \
	docker.io/library/golang:1.20.12 \
	go test -v ./tests/... -test.v -test.timeout=1h -ginkgo.v -ginkgo.timeout=1h 
=== RUN   TestTests
Running Suite: Tests Suite - /go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests
========================================================================================
Random Seed: 1704197442

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/tests_suite_test.go:39
[BeforeSuite] PASSED [0.001 seconds]
------------------------------
Checkup execution should complete successfully
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/checkup_test.go:80
• [132.941 seconds]
------------------------------

Ran 1 of 1 Specs in 132.943 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestTests (132.94s)
PASS
ok  	github.com/kiagnose/kubevirt-realtime-checkup/tests	132.953s

orelmisan

Thank you

matosatti · 2024-01-08T12:44:17Z

We need to consider masking CPUs 2 and 3 to be realtime, while the other two are not using spec.domain.cpu.realtime.mask. https://kubevirt.io/user-guide/virtual_machines/numa/#running-real-time-workloads

I am not against, but let's consult with @matosatti , and add it in a separate PR.

I don't think this is necessary: only setting all vCPUs as realtime is supported officially (because there are known problems by not doing that).

RamLavi requested a review from orelmisan December 31, 2023 12:48

orelmisan reviewed Dec 31, 2023

View reviewed changes

RamLavi force-pushed the cpus_even branch from c812812 to c6a9cde Compare December 31, 2023 13:05

RamLavi added 3 commits January 2, 2024 12:23

vms/vm-under-test: Update isolated cores list

172c71e

Signed-off-by: Ram Lavi <ralavi@redhat.com>

oslat/client: Update the CPUs list

d641320

Setting the CPUs that will run the Oslat test to be from the same core (=siblings). Signed-off-by: Ram Lavi <ralavi@redhat.com>

RamLavi force-pushed the cpus_even branch from c6a9cde to d641320 Compare January 2, 2024 10:39

orelmisan approved these changes Jan 2, 2024

View reviewed changes

orelmisan merged commit 97b5fbe into kiagnose:main Jan 2, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting the VMI-Under-Test CPUs to be four #55

Setting the VMI-Under-Test CPUs to be four #55

RamLavi commented Dec 31, 2023

RamLavi commented Dec 31, 2023

orelmisan Dec 31, 2023

RamLavi Dec 31, 2023 •

edited

orelmisan Jan 1, 2024

RamLavi Jan 2, 2024

RamLavi Jan 2, 2024

matosatti Jan 8, 2024

RamLavi commented Jan 1, 2024

RamLavi commented Jan 1, 2024

orelmisan commented Jan 1, 2024

RamLavi commented Jan 2, 2024

RamLavi commented Jan 2, 2024

RamLavi commented Jan 2, 2024

orelmisan left a comment

matosatti commented Jan 8, 2024

Setting the VMI-Under-Test CPUs to be four #55

Setting the VMI-Under-Test CPUs to be four #55

Conversation

RamLavi commented Dec 31, 2023

RamLavi commented Dec 31, 2023

orelmisan Dec 31, 2023

Choose a reason for hiding this comment

RamLavi Dec 31, 2023 • edited

Choose a reason for hiding this comment

orelmisan Jan 1, 2024

Choose a reason for hiding this comment

RamLavi Jan 2, 2024

Choose a reason for hiding this comment

RamLavi Jan 2, 2024

Choose a reason for hiding this comment

matosatti Jan 8, 2024

Choose a reason for hiding this comment

RamLavi commented Jan 1, 2024

RamLavi commented Jan 1, 2024

orelmisan commented Jan 1, 2024

RamLavi commented Jan 2, 2024

RamLavi commented Jan 2, 2024

RamLavi commented Jan 2, 2024

orelmisan left a comment

Choose a reason for hiding this comment

matosatti commented Jan 8, 2024

RamLavi Dec 31, 2023 •

edited