Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ovspinning: Set affinity of each thread #4470

Merged
merged 1 commit into from
Jul 1, 2024
Merged

Conversation

zeeke
Copy link
Contributor

@zeeke zeeke commented Jun 25, 2024

OVS CPU pinning ensures the CPU consumed by the processes ovsdb-server and ovs-vswitchd are the same as the OVNkube container. ovs-vswitchd process has threads and calling SchedSetaffinity() on the main PID only does not guarantee that the already spawned threads inherit that affinity. The following is an example of such a situation:

sh-4.4# taskset -apc $(pgrep ovs-vswitchd)
pid 4059's current affinity list: 0-2,6,9,15-66,69,70,73,79-127
pid 4335's current affinity list: 0-2,32,33,64-66,96,97
pid 2715051's current affinity list: 0-2,5,6,15-66,69,70,73,79-127
pid 2715052's current affinity list: 0-2,5,6,15-66,69,70,73,79-127
pid 2715053's current affinity list: 0-2,5,6,15-66,69,70,73,79-127
...

This changes makes the CPU align routine loop over all the threads of the given PID and invoke SchedSetaffinity on all of them.

To test these changes, a simple sleep 10 process is not enough, as it doesn't spawn any thread. A go version of the sleep is therefore provided for testing purpose, in the form of a simple main.

What this PR does and why is it needed

Which issue(s) this PR fixes

Special notes for reviewers

How to verify it

Details to documentation updates

Description for the changelog

Does this PR introduce a user-facing change?


cc @MarSik, @ricky-rav

@zeeke zeeke requested a review from a team as a code owner June 25, 2024 10:54
@zeeke zeeke requested a review from girishmg June 25, 2024 10:54
@github-actions github-actions bot added the area/unit-testing Issues related to adding/updating unit tests label Jun 25, 2024
@coveralls
Copy link

Coverage Status

Changes unknown
when pulling 6135c21 on zeeke:OCPBUGS-35347
into ** on ovn-org:master**.

for _, taskID := range taskIDs {
err = unix.SchedSetaffinity(taskID, &currentProcessCPUs)
if err != nil {
return fmt.Errorf("can't set CPU affinity of task(%d) PID(%d) to %s: %w", taskID, targetPID, printCPUSet(currentProcessCPUs), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bit resilient. Imagine OVS is reconfiguring its threads at this very time and one of the threads disappears while you are in the loop. The error will end the processing and the rest of the threads will not be configured.

Maybe that is not an issue since atm OVS always recreates all its threads, but it is still a hidden race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! going to fix this race by not breaking the loop.

@MarSik
Copy link
Contributor

MarSik commented Jun 25, 2024

Just one concurrency related comment, but otherwise the logic looks good to me. I would add the bug id to the title, but I see OVN-k is not following that pattern.

@zeeke zeeke force-pushed the OCPBUGS-35347 branch 2 times, most recently from e656ca1 to 9652299 Compare June 26, 2024 11:06
@zeeke
Copy link
Contributor Author

zeeke commented Jun 26, 2024

Just one concurrency related comment, but otherwise the logic looks good to me. I would add the bug id to the title, but I see OVN-k is not following that pattern.

Yes, I'll take care of linking the issue when this fix lands downstream

OVS CPU pinning ensures the CPU consumed by the processes `ovsdb-server`
and `ovs-vswitchd` are the same as the OVNkube container. `ovs-vswitchd` process
has threads and calling `SchedSetaffinity()` on the main PID only is does not
guarantee that the already spawned threads inherit that affinity. The following
is an example of such a situation:

```
sh-4.4# taskset -apc $(pgrep ovs-vswitchd)
pid 4059's current affinity list: 0-2,6,9,15-66,69,70,73,79-127
pid 4335's current affinity list: 0-2,32,33,64-66,96,97
pid 2715051's current affinity list: 0-2,5,6,15-66,69,70,73,79-127
pid 2715052's current affinity list: 0-2,5,6,15-66,69,70,73,79-127
pid 2715053's current affinity list: 0-2,5,6,15-66,69,70,73,79-127
...
```

This changes makes the CPU align routine loop over all the threads of the given PID
and invoke `SchedSetaffinity` on all of them.

To test these changes, a simple `sleep 10` process is not enough, as it doesn't spawn
any thread. A go version of the sleep is therefore provided for testing purpose, in the
form of a simple `main`.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@coveralls
Copy link

Coverage Status

Changes unknown
when pulling 91400d2 on zeeke:OCPBUGS-35347
into ** on ovn-org:master**.

@tssurya tssurya requested review from ricky-rav and removed request for girishmg June 28, 2024 13:54
@ricky-rav
Copy link
Contributor

/lgtm
Thanks, @zeeke!

@tssurya
Copy link
Member

tssurya commented Jun 29, 2024

Has two +1's merging this.
known failure in CP lane:

Summarizing 1 Failure:
  [FAIL] e2e egress IP validation [OVN network] Using different methods to disable a node's availability for egress Should validate the egress IP functionality against remote hosts [It] disabling egress nodes impeding GRCP health check

@tssurya
Copy link
Member

tssurya commented Jun 29, 2024

@ricky-rav : please use the GUI to add the lgtm so that I can merge this without overriding the branch protections

@tssurya tssurya merged commit b34fa57 into ovn-org:master Jul 1, 2024
34 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/unit-testing Issues related to adding/updating unit tests
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants