Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-27316: configure-ovs: generate profiles directly in /run #4042

Merged
merged 1 commit into from Mar 26, 2024

Conversation

jcaamano
Copy link
Contributor

@jcaamano jcaamano commented Nov 21, 2023

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again.

This change makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it.

@jcaamano
Copy link
Contributor Author

/retest

@jcaamano
Copy link
Contributor Author

/test ?

Copy link
Contributor

openshift-ci bot commented Nov 21, 2023

@jcaamano: The following commands are available to trigger required jobs:

  • /test 4.12-upgrade-from-stable-4.11-images
  • /test cluster-bootimages
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-upgrade
  • /test e2e-gcp-op
  • /test e2e-gcp-op-single-node
  • /test e2e-hypershift
  • /test images
  • /test okd-scos-images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
  • /test bootstrap-unit
  • /test e2e-alibabacloud-ovn
  • /test e2e-aws-disruptive
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-fips-op
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-proxy
  • /test e2e-aws-serial
  • /test e2e-aws-single-node
  • /test e2e-aws-upgrade-single-node
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-ovn-upgrade
  • /test e2e-azure-upgrade
  • /test e2e-gcp-op-layering
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-gcp-rt
  • /test e2e-gcp-rt-op
  • /test e2e-gcp-single-node
  • /test e2e-gcp-upgrade
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-externallb
  • /test e2e-openstack-parallel
  • /test e2e-ovirt
  • /test e2e-ovirt-upgrade
  • /test e2e-ovn-step-registry
  • /test e2e-vsphere
  • /test e2e-vsphere-upgrade
  • /test e2e-vsphere-upi
  • /test e2e-vsphere-upi-zones
  • /test e2e-vsphere-zones
  • /test okd-e2e-aws
  • /test okd-e2e-gcp-op
  • /test okd-e2e-upgrade
  • /test okd-e2e-vsphere
  • /test okd-images
  • /test okd-scos-e2e-aws-ovn
  • /test okd-scos-e2e-gcp-op
  • /test okd-scos-e2e-gcp-ovn-upgrade
  • /test okd-scos-e2e-vsphere

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-machine-config-operator-master-bootstrap-unit
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-hypershift
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-okd-scos-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-okd-scos-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-assisted
/test e2e-openstack
/test e2e-vsphere

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2024
@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-assisted
/test e2e-openstack
/test e2e-vsphere

Up until now, configure-ovs would use nmcli to generate the profiles in
/etc/NetworkManager/system-connections and then at the very end it would
move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard
power off, it could leave half baked profiles in /etc preventing a
subsequent reboot from reaching a network-online state and running
configure-ovs again.

This changes makes use of the --temporary and --offline flags of nmcli
to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc
it would then move them at the very end to /etc. This is subject to the
problem state above but since it is not the default let's not worry too
much about it.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
@jcaamano
Copy link
Contributor Author

/title OCPBUGS-27316: configure-ovs: generate profiles directly in /run

@jcaamano
Copy link
Contributor Author

/retitle OCPBUGS-27316: configure-ovs: generate profiles directly in /run

@openshift-ci openshift-ci bot changed the title configure-ovs: generate profiles directly in /run OCPBUGS-27316: configure-ovs: generate profiles directly in /run Mar 11, 2024
@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-assisted
/test e2e-openstack
/test e2e-vsphere

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2024
@openshift-ci-robot
Copy link
Contributor

@jcaamano: This pull request references Jira Issue OCPBUGS-27316, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again.

This change makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2024
@openshift-ci-robot
Copy link
Contributor

@jcaamano: This pull request references Jira Issue OCPBUGS-27316, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor Author

/retest

1 similar comment
@jcaamano
Copy link
Contributor Author

/retest

@jcaamano
Copy link
Contributor Author

@cybertron PTAL

@cybertron
Copy link
Member

/lgtm

Makes sense to me and seems like ci is mostly happy with it. The dual stack failures don't particularly look related, but since that job runs so rarely I pushed #4265 to test it.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 15, 2024
@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack

1 similar comment
@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack

@jcaamano
Copy link
Contributor Author

/assign @yuqi-zhang

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem fine, and CI looks good

Copy link
Contributor

openshift-ci bot commented Mar 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, jcaamano, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4328697 and 2 for PR HEAD 2003da5 in total

Copy link
Contributor

openshift-ci bot commented Mar 25, 2024

@jcaamano: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 2003da5 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD caeefea and 1 for PR HEAD 2003da5 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD f398f28 and 0 for PR HEAD 2003da5 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 3493ce2 into openshift:master Mar 26, 2024
20 of 23 checks passed
@openshift-ci-robot
Copy link
Contributor

@jcaamano: Jira Issue OCPBUGS-27316: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-27316 has been moved to the MODIFIED state.

In response to this:

Up until now, configure-ovs would use nmcli to generate the profiles in /etc/NetworkManager/system-connections and then at the very end it would move them to /run/NetworkManager/system-connections

The problem with this is that if configure-ovs was killed due to a hard power off, it could leave half baked profiles in /etc preventing a subsequent reboot from reaching a network-online state and running configure-ovs again.

This change makes use of the --temporary and --offline flags of nmcli to generate the profiles directly in /run to prevent that problem.

If configure-ovs is explicitly configured to generate profiles in /etc it would then move them at the very end to /etc. This is subject to the problem state above but since it is not the default let's not worry too much about it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.16.0-0.nightly-2024-03-28-223620

local src_path
src_path=$(mktemp)
shift
cat "$dst_path" > "$src_path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the cat vs. cp vs. mv is due to the selinux issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just because I used mktemp I think

cat "$dst_path" > "$src_path"
rm -f "$dst_path"
nmcli --offline c mod "$@" < "$src_path" > "$dst_path"
rm -f "$src_path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to rm -f "$src_path" in a trap handler in case nmcli --offline c mod fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe handle_exit cleans it up? The chmod 600 implies some secrecy in .nmconnection files, so we don't want them lying around after failure?

Do we need to chmod 600 "${src_path}" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will take a look and fix if needed

mod_nm_conn() {
# the easiest thing to do here would be to use `nmcli c mod --temporary`
# but there is a bug in selinux profiles that denies NM from performing
# the operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to track this bug here?

Copy link
Contributor Author

@jcaamano jcaamano Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should file one yeah, will do.

shopt -s nullglob
new_conn_files=(${NM_CONN_CONF_PATH}/"${ovs_interface}"*)
new_conn_files=(${NM_CONN_RUN_PATH}/"${ovs_interface}"*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quotes "${NM_CONN_RUN_PATH}" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant with no spaces so there shouldn't be a problem for now I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants