Bug 1967317: Do not create systemd update-rps service for veth devices #659

cynepco3hahue · 2021-06-15T17:43:14Z

The start and shutdown of big amount of pods will initiate the creation of the systemd
service that should update the new interfaces rps_cpus mask and can create an additional
CPU load under the cluster.

The PR introduces to changes that should prevent it:

The OCI hook will update the pod virtual interfaces RPS mask under the node.
Exclude veth devices from the udev rule.

Signed-off-by: Artyom Lukianov alukiano@redhat.com

openshift-ci · 2021-06-15T17:43:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cynepco3hahue]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coveralls · 2021-06-15T17:49:20Z

Pull Request Test Coverage Report for Build 1563

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 75.991%

Totals
Change from base Build 1561:	0.0%
Covered Lines:	1399
Relevant Lines:	1841

💛 - Coveralls

openshift-ci · 2021-06-16T13:05:35Z

@cynepco3hahue: This pull request references Bugzilla bug 1967317, which is invalid:

expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1967317: Do not create systemd update-rps service for veth devices

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cynepco3hahue · 2021-06-16T14:24:44Z

/bugzilla refresh

openshift-ci · 2021-06-16T14:25:00Z

@cynepco3hahue: This pull request references Bugzilla bug 1967317, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.9.0) matches configured target release for branch (4.9.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @gsr-shanks

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yanirq · 2021-06-17T07:46:58Z

build/assets/scripts/low-latency-hooks.sh

+netns_link_indexes=$(ip netns exec "${ns}" ip -j link | jq ".[] | select(.link_index != null) | .link_index")
+for link_index in ${netns_link_indexes}; do
+  container_veth=$(ip -j link | jq ".[] | select(.ifindex == ${link_index}) | .ifname" | tr -d '"')
+  echo ${mask} > /sys/devices/virtual/net/${container_veth}/queues/rx-0/rps_cpus


is that the only path to be added ? rx-0 ?

yanirq · 2021-06-17T07:57:34Z

/lgtm
/hold - in case others want to have a second look

cynepco3hahue · 2021-06-17T08:19:56Z

@browsell @MarSik I think we want this fix for 4.8?

MarSik · 2021-06-17T09:10:53Z

Yes, I think we want to backport it. Has it been tested yet?

cynepco3hahue · 2021-06-17T10:48:17Z

Yes, I think we want to backport it. Has it been tested yet?

I tested it on my cluster. @browsell Did you have a chance to run it on top of the SNO cluster?

The start and shutdown of big amount of pods will initiate creation of the systemd service that should update the new interfaces `rps_cpus` mask and can create an additional CPU load under the cluster. The PR introduces to changes that should prevent it: 1. The OCI hook will update the pod virtual interfaces RPS mask under the node. 2. Exclude veth devices from the udev rule. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

yanirq · 2021-06-23T09:48:20Z

/lgtm

cynepco3hahue · 2021-06-23T09:54:54Z

We can not really on ID_NET_DRIVER because another udev rule set it by using ethtool and sed so probably it can be some races, but I can rely on ENV{DEVPATH} because it not affected by the renaming.

cynepco3hahue · 2021-06-23T10:09:45Z

/hold cancel

openshift-ci · 2021-06-23T13:07:15Z

@cynepco3hahue: All pull requests linked via external trackers have merged:

openshift-kni/performance-addon-operators#659

Bugzilla bug 1967317 has been moved to the MODIFIED state.

In response to this:

Bug 1967317: Do not create systemd update-rps service for veth devices

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cynepco3hahue · 2021-06-23T13:15:12Z

/cherry-pick release-4.8

openshift-cherrypick-robot · 2021-06-23T13:15:46Z

@cynepco3hahue: #659 failed to apply on top of branch "release-4.8":

Applying: Do not create systemd update-rps service for veth devices
Using index info to reconstruct a base tree...
A	testdata/render-expected-output/manual_machineconfig.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): testdata/render-expected-output/manual_machineconfig.yaml deleted in HEAD and modified in Do not create systemd update-rps service for veth devices. Version Do not create systemd update-rps service for veth devices of testdata/render-expected-output/manual_machineconfig.yaml left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Do not create systemd update-rps service for veth devices
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…i_hook Bug 1967317: Do not create systemd update-rps service for veth devices (cherry picked from commit 33b9640) Signed-off-by: Artyom Lukianov <alukiano@redhat.com>

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Co-authored-by: Yanir Quinn <yquinn@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>

* set RPS for veth on host level only RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Co-authored-by: Yanir Quinn <yquinn@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com> * set-rps-mask: remove `find` Since we're dealing only with virtual devices here which has a single queue, we can set the RPS mask directly. Signed-off-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com> Co-authored-by: Yanir Quinn <yquinn@redhat.com>

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Co-authored-by: Yanir Quinn <yquinn@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Co-authored-by: Yanir Quinn <yquinn@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Signed-off-by: Talor Itzhak <titzhak@redhat.com>

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Signed-off-by: Talor Itzhak <titzhak@redhat.com>

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Signed-off-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com>

* set RPS for veth on host level only RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965) For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior. In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659) This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now. Co-authored-by: Yanir Quinn <yquinn@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com> * set-rps-mask: remove `find` Since we're dealing only with virtual devices here which has a single queue, we can set the RPS mask directly. Signed-off-by: Talor Itzhak <titzhak@redhat.com> Signed-off-by: Talor Itzhak <titzhak@redhat.com> Co-authored-by: Yanir Quinn <yquinn@redhat.com>

openshift-ci bot requested review from MarSik and yanirq June 15, 2021 17:43

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2021

cynepco3hahue force-pushed the update_rps_oci_hook branch from 39b8960 to ab0e39e Compare June 16, 2021 10:09

cynepco3hahue changed the title ~~Do not create systemd update-rps service for veth devices~~ Bug 1967317: Do not create systemd update-rps service for veth devices Jun 16, 2021

openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 16, 2021

openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 16, 2021

openshift-ci bot requested a review from gsr-shanks June 16, 2021 14:25

yanirq reviewed Jun 17, 2021

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2021

openshift-ci bot assigned yanirq Jun 17, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2021

cynepco3hahue force-pushed the update_rps_oci_hook branch from ab0e39e to 09d666c Compare June 23, 2021 08:20

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2021

cynepco3hahue force-pushed the update_rps_oci_hook branch from 09d666c to 9fecd2a Compare June 23, 2021 08:50

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2021

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2021

openshift-merge-robot merged commit 33b9640 into openshift-kni:master Jun 23, 2021

openshift-ci bot mentioned this pull request Jun 23, 2021

[release-4.8] Bug 1975348: Do not create systemd update-rps service for veth devices #667

Merged

yanirq mentioned this pull request Aug 30, 2022

POC: Set RPS for veth on host level only openshift/cluster-node-tuning-operator#451

Closed

Tal-or mentioned this pull request Oct 6, 2022

OCPBUGSM-47141: set RPS for veth only at host level openshift/cluster-node-tuning-operator#479

Merged

Tal-or mentioned this pull request Nov 6, 2022

[release-4.11] [manual] OCPBUGS-3182: set RPS for veth on host level only openshift/cluster-node-tuning-operator#508

Merged

Tal-or mentioned this pull request Nov 23, 2022

[release-4.10] [manual] OCPBUGS-4033: set RPS for veth on host level only #953

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1967317: Do not create systemd update-rps service for veth devices #659

Bug 1967317: Do not create systemd update-rps service for veth devices #659

cynepco3hahue commented Jun 15, 2021

openshift-ci bot commented Jun 15, 2021

coveralls commented Jun 15, 2021 •

edited

openshift-ci bot commented Jun 16, 2021

cynepco3hahue commented Jun 16, 2021

openshift-ci bot commented Jun 16, 2021

yanirq Jun 17, 2021

cynepco3hahue Jun 17, 2021

yanirq commented Jun 17, 2021

cynepco3hahue commented Jun 17, 2021

MarSik commented Jun 17, 2021

cynepco3hahue commented Jun 17, 2021

yanirq commented Jun 23, 2021

cynepco3hahue commented Jun 23, 2021

cynepco3hahue commented Jun 23, 2021

openshift-ci bot commented Jun 23, 2021

cynepco3hahue commented Jun 23, 2021

openshift-cherrypick-robot commented Jun 23, 2021

Bug 1967317: Do not create systemd update-rps service for veth devices #659

Bug 1967317: Do not create systemd update-rps service for veth devices #659

Conversation

cynepco3hahue commented Jun 15, 2021

openshift-ci bot commented Jun 15, 2021

coveralls commented Jun 15, 2021 • edited

Pull Request Test Coverage Report for Build 1563

💛 - Coveralls

openshift-ci bot commented Jun 16, 2021

cynepco3hahue commented Jun 16, 2021

openshift-ci bot commented Jun 16, 2021

yanirq Jun 17, 2021

Choose a reason for hiding this comment

cynepco3hahue Jun 17, 2021

Choose a reason for hiding this comment

yanirq commented Jun 17, 2021

cynepco3hahue commented Jun 17, 2021

MarSik commented Jun 17, 2021

cynepco3hahue commented Jun 17, 2021

yanirq commented Jun 23, 2021

cynepco3hahue commented Jun 23, 2021

cynepco3hahue commented Jun 23, 2021

openshift-ci bot commented Jun 23, 2021

cynepco3hahue commented Jun 23, 2021

openshift-cherrypick-robot commented Jun 23, 2021

coveralls commented Jun 15, 2021 •

edited