Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting the VMI-Under-Test CPUs to be four #55

Merged
merged 3 commits into from
Jan 2, 2024

Conversation

RamLavi
Copy link
Collaborator

@RamLavi RamLavi commented Dec 31, 2023

This PR is changing the amount of CPUs requested to be four.
Doing this allows the checkup to run the Oslat test on a full core (in this checkup - guest CPUs 2-3).
By doing this Oslat is optimized to not have neighboring CPU noise.

@RamLavi
Copy link
Collaborator Author

RamLavi commented Dec 31, 2023

@orelmisan let's hold this PR until we have a cluster with the HCO alignCPUs enabler

Comment on lines 201 to 202
CPUCoresCount = 2
CPUTreadsCount = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have a guest with SMT disabled (CPUTreadsCount = 1).
I'm not sure we need to change that.
@matosatti WDYT?

Copy link
Collaborator Author

@RamLavi RamLavi Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my reasons for adding this:

  • This is how we did it on the DPDK repo
  • Our stakeholders wanted the checkup to run on a SMT enabled environment, so I guess this is what they mean..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://access.redhat.com/solutions/7007632 (which serves as our reference), SMT is disabled in the guest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then let's wait for @matosatti before we proceed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked offline - changed to CPUTreadsCount =1.
But the question remains open - it will be addressed on a separate PR if needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not matter whether SMT is disabled or not in the guest vCPUs.

@RamLavi
Copy link
Collaborator Author

RamLavi commented Jan 1, 2024

passed on CNV4.15 cluster:

make e2e-test
mkdir -p /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache
podman run --rm \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup:/go/src/github.com/kiagnose/kubevirt-realtime-checkup:Z \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache:/root/.cache/go-build:Z \
	-v /home/ralavi/.kube/sno03-cnvqe2-rdu2:/root/.kube:Z,ro \
	--workdir /go/src/github.com/kiagnose/kubevirt-realtime-checkup \
	-e KUBECONFIG=/root/.kube/kubeconfig \
	-e TEST_NAMESPACE=realtime-checkup-1 \
	-e TEST_CHECKUP_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup:devel \
	-e VM_UNDER_TEST_CONTAINER_DISK_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup-vm:latest \
	docker.io/library/golang:1.20.12 \
	go test -v ./tests/... -test.v -test.timeout=1h -ginkgo.v -ginkgo.timeout=1h 
=== RUN   TestTests
Running Suite: Tests Suite - /go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests
========================================================================================
Random Seed: 1704113663

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/tests_suite_test.go:39
[BeforeSuite] PASSED [0.002 seconds]
------------------------------
Checkup execution should complete successfully
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/checkup_test.go:80
• [784.826 seconds]
------------------------------

Ran 1 of 1 Specs in 784.828 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestTests (784.83s)
PASS
ok  	github.com/kiagnose/kubevirt-realtime-checkup

logs:

$ oc logs -f job/realtime-checkup
2024/01/01 12:54:28 kubevirt-realtime-checkup starting...
2024/01/01 12:54:28 Using the following config:
2024/01/01 12:54:28 	"vmUnderTestTargetNodeName": ""
2024/01/01 12:54:28 	"vmUnderTestContainerDiskImage": "quay.io/ramlavi/kubevirt-realtime-checkup-vm:latest"
2024/01/01 12:54:28 	"oslatDuration": "10m0s"
2024/01/01 12:54:28 	"oslatLatencyThresholdMicroSeconds": "45µs"
2024/01/01 12:54:28 Waiting for VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" to boot...
2024/01/01 12:56:23 VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" had successfully booted
2024/01/01 12:56:23 Login to VMI under test...
2024/01/01 12:57:13 Running Oslat test on VMI under test for 10m0s...
2024/01/01 13:07:17 Oslat test completed:
taskset -c 2-3 oslat --cpu-list 2-3 --rtprio 1 --duration 10m0s --workload memmove --workload-mem 4K 
oslat V 2.60
Total runtime: 		600 seconds
Thread priority: 	SCHED_FIFO:1
CPU list: 		2-3
CPU for main thread: 	0
Workload: 		memmove
Workload mem: 		4 (KiB)
Preheat cores: 		2

Pre-heat for 1 seconds...
Test starts...
Test completed.

        Core:	 2 3
Counter Freq:	 2096 2096 (Mhz)
    001 (us):	 0 0
    002 (us):	 4435029947 4435031233
    003 (us):	 1 0
    004 (us):	 136 137
    005 (us):	 49 49
    006 (us):	 5 4
    007 (us):	 17 18
    008 (us):	 16 16
    009 (us):	 4 4
    010 (us):	 4 3
    011 (us):	 1 1
    012 (us):	 0 0
    013 (us):	 0 0
    014 (us):	 0 0
    015 (us):	 0 0
    016 (us):	 0 0
    017 (us):	 0 0
    018 (us):	 0 0
    019 (us):	 0 0
    020 (us):	 0 0
    021 (us):	 0 0
    022 (us):	 0 0
    023 (us):	 0 0
    024 (us):	 0 0
    025 (us):	 0 0
    026 (us):	 0 0
    027 (us):	 0 0
    028 (us):	 0 0
    029 (us):	 0 0
    030 (us):	 0 0
    031 (us):	 0 0
    032 (us):	 0 0 (including overflows)
     Minimum:	 1 1 (us)
     Average:	 2.000 2.000 (us)
     Maximum:	 10 10 (us)
     Max-Min:	 9 9 (us)
    Duration:	 599.736 599.736 (sec)

[root@realtime-vmi-under-test-xp2nr cloud-user]# 
2024/01/01 13:07:17 Max Oslat Latency measured: 10µs
2024/01/01 13:07:17 Trying to delete VMI: "realtime-checkup-1/realtime-vmi-under-test-xp2nr"
2024/01/01 13:07:17 Waiting for VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" to be deleted...
2024/01/01 13:07:23 VMI "realtime-checkup-1/realtime-vmi-under-test-xp2nr" was deleted successfully

@RamLavi
Copy link
Collaborator Author

RamLavi commented Jan 1, 2024

@orelmisan let's hold this PR until we have a cluster with the HCO alignCPUs enabler

now ready for review

@orelmisan
Copy link
Member

We need to consider masking CPUs 2 and 3 to be realtime, while the other two are not using spec.domain.cpu.realtime.mask.
https://kubevirt.io/user-guide/virtual_machines/numa/#running-real-time-workloads

Currently the vmi-under-test is using 3 CPUs, in order to be able to run
the checkup and avoid getting the CNV-31584 [0] Jira Bug.
However, this configuration is sub-optimal, as the CPUs currently
running the Oslat are sibling to the non-isolated CPU 0.
Now that the CNV-31584 Bug is resolved, optimizing the checkup's
performance by requesting for a full core to run the Oslat test.
Setting the CPUs requested to four.

[0] https://issues.redhat.com/browse/CNV-31584

Signed-off-by: Ram Lavi <ralavi@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
Setting the CPUs that will run the Oslat test to be from the same core
(=siblings).

Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi
Copy link
Collaborator Author

RamLavi commented Jan 2, 2024

Change: Set the guest SMT to disabled.

@RamLavi
Copy link
Collaborator Author

RamLavi commented Jan 2, 2024

We need to consider masking CPUs 2 and 3 to be realtime, while the other two are not using spec.domain.cpu.realtime.mask. https://kubevirt.io/user-guide/virtual_machines/numa/#running-real-time-workloads

I am not against, but let's consult with @matosatti , and add it in a separate PR.

@RamLavi
Copy link
Collaborator Author

RamLavi commented Jan 2, 2024

Passed on CNV 4.15 cluster:

make e2e-test
mkdir -p /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache
podman run --rm \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup:/go/src/github.com/kiagnose/kubevirt-realtime-checkup:Z \
	-v /home/ralavi/go/src/github.com/kiagnose/kubevirt-realtime-checkup/_go-cache:/root/.cache/go-build:Z \
	-v /home/ralavi/.kube/sno03-cnvqe2-rdu2:/root/.kube:Z,ro \
	--workdir /go/src/github.com/kiagnose/kubevirt-realtime-checkup \
	-e KUBECONFIG=/root/.kube/kubeconfig \
	-e TEST_NAMESPACE=realtime-checkup-1 \
	-e TEST_CHECKUP_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup:devel \
	-e VM_UNDER_TEST_CONTAINER_DISK_IMAGE=quay.io/ramlavi/kubevirt-realtime-checkup-vm:latest \
	docker.io/library/golang:1.20.12 \
	go test -v ./tests/... -test.v -test.timeout=1h -ginkgo.v -ginkgo.timeout=1h 
=== RUN   TestTests
Running Suite: Tests Suite - /go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests
========================================================================================
Random Seed: 1704197442

Will run 1 of 1 specs
------------------------------
[BeforeSuite] 
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/tests_suite_test.go:39
[BeforeSuite] PASSED [0.001 seconds]
------------------------------
Checkup execution should complete successfully
/go/src/github.com/kiagnose/kubevirt-realtime-checkup/tests/checkup_test.go:80
• [132.941 seconds]
------------------------------

Ran 1 of 1 Specs in 132.943 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestTests (132.94s)
PASS
ok  	github.com/kiagnose/kubevirt-realtime-checkup/tests	132.953s

Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@orelmisan orelmisan merged commit 97b5fbe into kiagnose:main Jan 2, 2024
6 checks passed
@matosatti
Copy link

We need to consider masking CPUs 2 and 3 to be realtime, while the other two are not using spec.domain.cpu.realtime.mask. https://kubevirt.io/user-guide/virtual_machines/numa/#running-real-time-workloads

I am not against, but let's consult with @matosatti , and add it in a separate PR.

I don't think this is necessary: only setting all vCPUs as realtime is supported officially (because there are known problems by not doing that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants