Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-16035: daemon: create /etc/systemd/network directory on node #3883

Merged
merged 1 commit into from Aug 25, 2023

Conversation

sinnykumari
Copy link
Contributor

OCPBUGS-16035: On some system, nmstatectl fails to persists nic due to missing /etc/systemd/network. Until we use fixed version of nmstate in MCO, this workaround will ensure that the process happens successfully.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2023
@sinnykumari sinnykumari changed the title daemon: create /etc/systemd/network directory on node OCPBUGS-16035: daemon: create /etc/systemd/network directory on node Aug 24, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Aug 24, 2023
@openshift-ci-robot
Copy link
Contributor

@sinnykumari: This pull request references Jira Issue OCPBUGS-16035, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

OCPBUGS-16035: On some system, nmstatectl fails to persists nic due to missing /etc/systemd/network. Until we use fixed version of nmstate in MCO, this workaround will ensure that the process happens successfully.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr August 24, 2023 13:43
@sinnykumari sinnykumari requested review from cgwalters and removed request for sergiordlr August 24, 2023 13:43
@@ -1764,6 +1764,27 @@ func fileExists(path string) (bool, error) {
return false, fmt.Errorf("cannot stat file: %w", err)
}

// Determines if a directory exists by checking the returned error when we stat the file.
// Also, check that it is a directory.
func directoryExists(path string) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not objecting to this, but I don't personally think it's worth carrying over just using os.MkdirAll() from the standard library like we do in other places (a lot of other places really).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, doing all these additional check isn't adding much value. Replaced with using MkdirAll()

OCPBUGS-16035: On some system, nmstatectl fails to persists
nic due to missing /etc/systemd/network. Until we use fixed
version of nmstate in MCO, this workaround will ensure that
the process happens successfully.
@cgwalters
Copy link
Member

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2023
@sinnykumari
Copy link
Contributor Author

Putting hold for qe testing
/hold
/assign @sergiordlr

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 24, 2023

@sinnykumari: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 1212a76 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sergiordlr
Copy link

sergiordlr commented Aug 25, 2023

Verified using UPI on AWS

We don't have an environment to reproduce the original problem.

After talking with @sinnykumari we executed the following verification steps.

  1. Deploy a 4.12 UPI on AWS cluster using OVNKubernetes network
  2. Pause worker MCP
  3. Upgrade to 4.13.10-x86_64
  4. Remove the /etc/systemd/network directory
$ oc debug  node/$(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}")  -- chroot /host rm -rf  /etc/systemd/network
Starting pod/ip-10-0-48-155-debug-q5pns ...
To use host binaries, run `chroot /host`

Removing debug pod ...
  1. Upgrade to a 4.14 version containing our fix
    We can see that the new daemon pods are deployed and they create the /etc/systemd/network directory.
I0825 17:30:34.760522  214254 update.go:1962] Persisting NIC names for RHEL8 host system
I0825 17:30:34.763356  214254 daemon.go:1630] Running: /run/machine-config-daemon-bin/nmstatectl persist-nic-names --root /
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type geneve
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type geneve
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type geneve
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type geneve
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type geneve
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z WARN  nmstate::ovsdb::show] Unknown OVS interface type 
[2023-08-25T17:30:34Z INFO  nmstatectl::persist_nic] Skipping interface genev_sys_6081
[2023-08-25T17:30:34Z INFO  nmstatectl::persist_nic] Device ens5 is subordinate
[2023-08-25T17:30:34Z INFO  nmstatectl::persist_nic] Persisting the interface with MAC 02:5F:B8:28:AF:D3 to interface name ens5

  1. Unpause worker MCP

Note that this fix only works when the host OS is RHEL8. If the directory is not present in a RHEL9 host OS then this fix will not create the directory before running the nmstatectl command.

We can add the qe-approved label

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Aug 25, 2023
@sinnykumari
Copy link
Contributor Author

Thanks Sergio! Removing hold as this is pre-merge tested
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 25, 2023
@openshift-merge-robot openshift-merge-robot merged commit 046d218 into openshift:master Aug 25, 2023
12 of 13 checks passed
@openshift-ci-robot
Copy link
Contributor

@sinnykumari: Jira Issue OCPBUGS-16035: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-16035 has been moved to the MODIFIED state.

In response to this:

OCPBUGS-16035: On some system, nmstatectl fails to persists nic due to missing /etc/systemd/network. Until we use fixed version of nmstate in MCO, this workaround will ensure that the process happens successfully.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants