Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1967317: Do not create systemd update-rps service for veth devices #659

Merged

Conversation

cynepco3hahue
Copy link
Contributor

The start and shutdown of big amount of pods will initiate the creation of the systemd
service that should update the new interfaces rps_cpus mask and can create an additional
CPU load under the cluster.

The PR introduces to changes that should prevent it:

  1. The OCI hook will update the pod virtual interfaces RPS mask under the node.
  2. Exclude veth devices from the udev rule.

Signed-off-by: Artyom Lukianov alukiano@redhat.com

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from MarSik and yanirq June 15, 2021 17:43
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2021
@coveralls
Copy link

coveralls commented Jun 15, 2021

Pull Request Test Coverage Report for Build 1563

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 75.991%

Totals Coverage Status
Change from base Build 1561: 0.0%
Covered Lines: 1399
Relevant Lines: 1841

💛 - Coveralls

@cynepco3hahue cynepco3hahue changed the title Do not create systemd update-rps service for veth devices Bug 1967317: Do not create systemd update-rps service for veth devices Jun 16, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 16, 2021

@cynepco3hahue: This pull request references Bugzilla bug 1967317, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1967317: Do not create systemd update-rps service for veth devices

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 16, 2021
@cynepco3hahue
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 16, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 16, 2021

@cynepco3hahue: This pull request references Bugzilla bug 1967317, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @gsr-shanks

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from gsr-shanks June 16, 2021 14:25
netns_link_indexes=$(ip netns exec "${ns}" ip -j link | jq ".[] | select(.link_index != null) | .link_index")
for link_index in ${netns_link_indexes}; do
container_veth=$(ip -j link | jq ".[] | select(.ifindex == ${link_index}) | .ifname" | tr -d '"')
echo ${mask} > /sys/devices/virtual/net/${container_veth}/queues/rx-0/rps_cpus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the only path to be added ? rx-0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@yanirq
Copy link
Member

yanirq commented Jun 17, 2021

/lgtm
/hold - in case others want to have a second look

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2021
@cynepco3hahue
Copy link
Contributor Author

@browsell @MarSik I think we want this fix for 4.8?

@MarSik
Copy link
Member

MarSik commented Jun 17, 2021

Yes, I think we want to backport it. Has it been tested yet?

@cynepco3hahue
Copy link
Contributor Author

Yes, I think we want to backport it. Has it been tested yet?

I tested it on my cluster. @browsell Did you have a chance to run it on top of the SNO cluster?

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2021
The start and shutdown of big amount of pods will initiate creation of the systemd
service that should update the new interfaces `rps_cpus` mask and can create an additional
CPU load under the cluster.

The PR introduces to changes that should prevent it:
1. The OCI hook will update the pod virtual interfaces RPS mask under the node.
2. Exclude veth devices from the udev rule.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
@yanirq
Copy link
Member

yanirq commented Jun 23, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2021
@cynepco3hahue
Copy link
Contributor Author

We can not really on ID_NET_DRIVER because another udev rule set it by using ethtool and sed so probably it can be some races, but I can rely on ENV{DEVPATH} because it not affected by the renaming.

@cynepco3hahue
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 23, 2021
@openshift-merge-robot openshift-merge-robot merged commit 33b9640 into openshift-kni:master Jun 23, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 23, 2021

@cynepco3hahue: All pull requests linked via external trackers have merged:

Bugzilla bug 1967317 has been moved to the MODIFIED state.

In response to this:

Bug 1967317: Do not create systemd update-rps service for veth devices

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cynepco3hahue
Copy link
Contributor Author

/cherry-pick release-4.8

@openshift-cherrypick-robot

@cynepco3hahue: #659 failed to apply on top of branch "release-4.8":

Applying: Do not create systemd update-rps service for veth devices
Using index info to reconstruct a base tree...
A	testdata/render-expected-output/manual_machineconfig.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): testdata/render-expected-output/manual_machineconfig.yaml deleted in HEAD and modified in Do not create systemd update-rps service for veth devices. Version Do not create systemd update-rps service for veth devices of testdata/render-expected-output/manual_machineconfig.yaml left in tree.
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Do not create systemd update-rps service for veth devices
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cynepco3hahue pushed a commit to cynepco3hahue/performance-addon-operators that referenced this pull request Jun 23, 2021
…i_hook

Bug 1967317: Do not create systemd update-rps service for veth devices

(cherry picked from commit 33b9640)
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/performance-addon-operators that referenced this pull request Nov 5, 2021
…i_hook

Bug 1967317: Do not create systemd update-rps service for veth devices

(cherry picked from commit 33b9640)
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Tal-or added a commit to Tal-or/cluster-node-tuning-operator that referenced this pull request Oct 6, 2022
RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Tal-or added a commit to Tal-or/cluster-node-tuning-operator that referenced this pull request Oct 6, 2022
RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Tal-or added a commit to Tal-or/cluster-node-tuning-operator that referenced this pull request Oct 6, 2022
RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this pull request Oct 10, 2022
* set RPS for veth on host level only

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>

* set-rps-mask: remove `find`

Since we're dealing only with virtual devices here which has a single queue, we can set the RPS mask directly.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Tal-or added a commit to Tal-or/cluster-node-tuning-operator that referenced this pull request Nov 6, 2022
RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Tal-or added a commit to Tal-or/performance-addon-operators that referenced this pull request Nov 23, 2022
RPS handling on pod container level using crio-hooks causes long delay
times when running the low latency script to set the RPS mask
(https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient
only to set the RPS on the host level and avoid
setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional
settings on virtual devices since there was an issue where the start
and shutdown of big amount of pods will initiate the creation
of the systemd service that should update the new interfaces
rps_cpus mask and can create an additional CPU load
under the cluster (openshift-kni#659)

This might not be the case any more thus we need to examine
how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Tal-or added a commit to Tal-or/performance-addon-operators that referenced this pull request Nov 23, 2022
RPS handling on pod container level using crio-hooks causes long delay
times when running the low latency script to set the RPS mask
(https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient
only to set the RPS on the host level and avoid
setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional
settings on virtual devices since there was an issue where the start
and shutdown of big amount of pods will initiate the creation
of the systemd service that should update the new interfaces
rps_cpus mask and can create an additional CPU load
under the cluster (openshift-kni#659)

This might not be the case any more thus we need to examine
how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Tal-or added a commit to Tal-or/cluster-node-tuning-operator that referenced this pull request Nov 23, 2022
RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Tal-or added a commit to Tal-or/performance-addon-operators that referenced this pull request Nov 23, 2022
RPS handling on pod container level using crio-hooks causes long delay
times when running the low latency script to set the RPS mask
(https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient
only to set the RPS on the host level and avoid
setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional
settings on virtual devices since there was an issue where the start
and shutdown of big amount of pods will initiate the creation
of the systemd service that should update the new interfaces
rps_cpus mask and can create an additional CPU load
under the cluster (openshift-kni#659)

This might not be the case any more thus we need to examine
how the revert of the aforementioned PR will behave now.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
openshift-merge-robot pushed a commit to openshift/cluster-node-tuning-operator that referenced this pull request Nov 24, 2022
RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request May 23, 2023
* set RPS for veth on host level only

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>

* set-rps-mask: remove `find`

Since we're dealing only with virtual devices here which has a single queue, we can set the RPS mask directly.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Co-authored-by: Yanir Quinn <yquinn@redhat.com>
IlyaTyomkin pushed a commit to IlyaTyomkin/cluster-node-tuning-operator that referenced this pull request Jun 13, 2023
* set RPS for veth on host level only

RPS handling on pod container level using crio-hooks causes long delay times when running the low latency script to set the RPS mask (https://bugzilla.redhat.com/show_bug.cgi?id=2109965)

For RAN low latency solution it might be sufficient only to set the RPS on the host level and avoid setting it on the container level while utilizing RSS behavior.

In the past the low latency hook was added with RPS additional settings on virtual devices since there was an issue where the start and shutdown of big amount of pods will initiate the creation of the systemd service that should update the new interfaces rps_cpus mask and can create an additional CPU load under the cluster (openshift-kni/performance-addon-operators#659)
This might not be the case any more thus we need to examine how the revert of the aforementioned PR will behave now.

Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Signed-off-by: Talor Itzhak <titzhak@redhat.com>

* set-rps-mask: remove `find`

Since we're dealing only with virtual devices here which has a single queue, we can set the RPS mask directly.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Co-authored-by: Yanir Quinn <yquinn@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants