OCPBUGS-27316: configure-ovs: generate profiles directly in /run #4042

jcaamano · 2023-11-21T12:00:39Z

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again.

This change makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it.

jcaamano · 2023-11-21T15:07:37Z

/retest

jcaamano · 2023-11-21T15:09:09Z

/test ?

openshift-ci · 2023-11-21T15:09:49Z

@jcaamano: The following commands are available to trigger required jobs:

/test 4.12-upgrade-from-stable-4.11-images
/test cluster-bootimages
/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op
/test e2e-gcp-op-single-node
/test e2e-hypershift
/test images
/test okd-scos-images
/test unit
/test verify

The following commands are available to trigger optional jobs:

/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
/test bootstrap-unit
/test e2e-alibabacloud-ovn
/test e2e-aws-disruptive
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-fips-op
/test e2e-aws-ovn-workers-rhel8
/test e2e-aws-proxy
/test e2e-aws-serial
/test e2e-aws-single-node
/test e2e-aws-upgrade-single-node
/test e2e-aws-workers-rhel8
/test e2e-azure
/test e2e-azure-ovn-upgrade
/test e2e-azure-upgrade
/test e2e-gcp-op-layering
/test e2e-gcp-ovn-rt-upgrade
/test e2e-gcp-rt
/test e2e-gcp-rt-op
/test e2e-gcp-single-node
/test e2e-gcp-upgrade
/test e2e-metal-assisted
/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-openstack
/test e2e-openstack-dualstack
/test e2e-openstack-externallb
/test e2e-openstack-parallel
/test e2e-ovirt
/test e2e-ovirt-upgrade
/test e2e-ovn-step-registry
/test e2e-vsphere
/test e2e-vsphere-upgrade
/test e2e-vsphere-upi
/test e2e-vsphere-upi-zones
/test e2e-vsphere-zones
/test okd-e2e-aws
/test okd-e2e-gcp-op
/test okd-e2e-upgrade
/test okd-e2e-vsphere
/test okd-images
/test okd-scos-e2e-aws-ovn
/test okd-scos-e2e-gcp-op
/test okd-scos-e2e-gcp-ovn-upgrade
/test okd-scos-e2e-vsphere

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-machine-config-operator-master-bootstrap-unit
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
pull-ci-openshift-machine-config-operator-master-e2e-hypershift
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-master-okd-scos-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jcaamano · 2023-11-21T15:11:47Z

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-assisted
/test e2e-openstack
/test e2e-vsphere

openshift-bot · 2024-03-04T01:00:18Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

jcaamano · 2024-03-11T11:36:35Z

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-assisted
/test e2e-openstack
/test e2e-vsphere

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again. This changes makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem. If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

jcaamano · 2024-03-11T11:43:27Z

/title OCPBUGS-27316: configure-ovs: generate profiles directly in /run

jcaamano · 2024-03-11T11:43:44Z

/retitle OCPBUGS-27316: configure-ovs: generate profiles directly in /run

jcaamano · 2024-03-11T11:44:05Z

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-assisted
/test e2e-openstack
/test e2e-vsphere

openshift-ci-robot · 2024-03-11T11:44:10Z

@jcaamano: This pull request references Jira Issue OCPBUGS-27316, which is invalid:

expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again.

This change makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jcaamano · 2024-03-11T11:46:56Z

/jira refresh

openshift-ci-robot · 2024-03-11T11:47:03Z

@jcaamano: This pull request references Jira Issue OCPBUGS-27316, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jcaamano · 2024-03-11T17:17:47Z

/retest

jcaamano · 2024-03-12T09:33:49Z

/retest

jcaamano · 2024-03-12T15:26:21Z

@cybertron PTAL

cybertron · 2024-03-15T21:42:19Z

/lgtm

Makes sense to me and seems like ci is mostly happy with it. The dual stack failures don't particularly look related, but since that job runs so rarely I pushed #4265 to test it.

jcaamano · 2024-03-18T08:58:15Z

/test e2e-metal-ipi-ovn-dualstack

jcaamano · 2024-03-25T12:01:28Z

/test e2e-metal-ipi-ovn-dualstack

jcaamano · 2024-03-25T15:17:51Z

/assign @yuqi-zhang

yuqi-zhang

Changes seem fine, and CI looks good

openshift-ci · 2024-03-25T15:59:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, jcaamano, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2024-03-25T16:11:12Z

/retest-required

Remaining retests: 0 against base HEAD 4328697 and 2 for PR HEAD 2003da5 in total

openshift-ci · 2024-03-25T19:06:10Z

@jcaamano: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`2003da5`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2024-03-25T20:06:13Z

/retest-required

Remaining retests: 0 against base HEAD caeefea and 1 for PR HEAD 2003da5 in total

openshift-ci-robot · 2024-03-26T01:06:16Z

/retest-required

Remaining retests: 0 against base HEAD f398f28 and 0 for PR HEAD 2003da5 in total

openshift-ci-robot · 2024-03-26T01:49:15Z

@jcaamano: Jira Issue OCPBUGS-27316: All pull requests linked via external trackers have merged:

openshift/machine-config-operator#4042

Jira Issue OCPBUGS-27316 has been moved to the MODIFIED state.

In response to this:

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again.

This change makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-robot · 2024-03-29T05:29:10Z

Fix included in accepted release 4.16.0-0.nightly-2024-03-28-223620

rbbratta · 2024-04-03T18:34:27Z

templates/common/_base/files/configure-ovs-network.yaml

+      local src_path
+      src_path=$(mktemp)
+      shift
+      cat "$dst_path" > "$src_path"


I assume the cat vs. cp vs. mv is due to the selinux issue?

just because I used mktemp I think

rbbratta · 2024-04-03T18:35:59Z

templates/common/_base/files/configure-ovs-network.yaml

+      cat "$dst_path" > "$src_path"
+      rm -f "$dst_path"
+      nmcli --offline c mod "$@" < "$src_path" > "$dst_path"
+      rm -f "$src_path"


Do we need to rm -f "$src_path" in a trap handler in case nmcli --offline c mod fails?

maybe handle_exit cleans it up? The chmod 600 implies some secrecy in .nmconnection files, so we don't want them lying around after failure?

Do we need to chmod 600 "${src_path}" ?

good point, will take a look and fix if needed

rbbratta · 2024-04-03T18:38:07Z

templates/common/_base/files/configure-ovs-network.yaml

+    mod_nm_conn() {
+      # the easiest thing to do here would be to use `nmcli c mod --temporary`
+      # but there is a bug in selinux profiles that denies NM from performing
+      # the operation


do we need to track this bug here?

I should file one yeah, will do.

rbbratta · 2024-04-03T19:13:50Z

templates/common/_base/files/configure-ovs-network.yaml

          shopt -s nullglob
-          new_conn_files=(${NM_CONN_CONF_PATH}/"${ovs_interface}"*)
+          new_conn_files=(${NM_CONN_RUN_PATH}/"${ovs_interface}"*)


quotes "${NM_CONN_RUN_PATH}" ?

constant with no spaces so there shouldn't be a problem for now I guess

openshift-ci bot requested review from djoshy and yuqi-zhang November 21, 2023 12:03

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2024

jcaamano force-pushed the configure-ovs-run branch from 75d8b16 to 224b230 Compare March 11, 2024 11:35

jcaamano force-pushed the configure-ovs-run branch from 224b230 to 2003da5 Compare March 11, 2024 11:43

openshift-ci bot changed the title ~~configure-ovs: generate profiles directly in /run~~ OCPBUGS-27316: configure-ovs: generate profiles directly in /run Mar 11, 2024

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2024

openshift-ci bot requested a review from anuragthehatter March 11, 2024 11:47

openshift-ci bot assigned cybertron Mar 15, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 15, 2024

openshift-ci bot assigned yuqi-zhang Mar 25, 2024

yuqi-zhang approved these changes Mar 25, 2024

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2024

openshift-merge-bot bot merged commit 3493ce2 into openshift:master Mar 26, 2024
20 of 23 checks passed

rbbratta reviewed Apr 3, 2024

View reviewed changes

OCPBUGS-27316: configure-ovs: generate profiles directly in /run #4042

OCPBUGS-27316: configure-ovs: generate profiles directly in /run #4042

Conversation

jcaamano commented Nov 21, 2023 • edited

jcaamano commented Nov 21, 2023

jcaamano commented Nov 21, 2023

openshift-ci bot commented Nov 21, 2023

jcaamano commented Nov 21, 2023

openshift-bot commented Mar 4, 2024

jcaamano commented Mar 11, 2024

jcaamano commented Mar 11, 2024

jcaamano commented Mar 11, 2024

jcaamano commented Mar 11, 2024

openshift-ci-robot commented Mar 11, 2024

jcaamano commented Mar 11, 2024

openshift-ci-robot commented Mar 11, 2024

jcaamano commented Mar 11, 2024

jcaamano commented Mar 12, 2024

jcaamano commented Mar 12, 2024

cybertron commented Mar 15, 2024

jcaamano commented Mar 18, 2024

jcaamano commented Mar 25, 2024

jcaamano commented Mar 25, 2024

yuqi-zhang left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Mar 25, 2024

openshift-ci-robot commented Mar 25, 2024

openshift-ci bot commented Mar 25, 2024 • edited

openshift-ci-robot commented Mar 25, 2024

openshift-ci-robot commented Mar 26, 2024

openshift-ci-robot commented Mar 26, 2024

openshift-merge-robot commented Mar 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcaamano Apr 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcaamano commented Nov 21, 2023 •

edited

openshift-ci bot commented Mar 25, 2024 •

edited

jcaamano Apr 4, 2024 •

edited