Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-15339: Run network config before NetworkManager #5375

Conversation

jhernand
Copy link
Contributor

@jhernand jhernand commented Jul 21, 2023

Currently the systemd service that we use to apply static network configuration from the non-minimal discovery environment wants to run after dracut-cmdline.service and before dracut-initqueue.service. But that isn't possible because this service is created by an ignition configuration, and services created that way only run after the root has pivoted, and at that point the cmdline and initqueue steps of Dracut have already been completed.

The net result is that this ordering dependencies are just ignored, and the service will run as part of multi-user.target without any special ordering restriction. It may even run after NetworkManager. If that happens then the configuration files that it generates will be ignored. That is unlikely, but it has been observed when there are many network interfaces with static configuration. To avoid that the service should be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to generate the network config when using the minimal ISO. That is added directly to the initrd, not via ignition. It runs before the root pivots because the network is needed to fetch the root file system. This patch doesn't change that service.

Related: https://issues.redhat.com/browse/MGMT-15339
Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 21, 2023
@openshift-ci-robot
Copy link

@jhernand: This pull request explicitly references no jira issue.

In response to this:

Currently the systemd service that we use to apply static network configuration from the non-minimal discovery environment wants to run after dracut-cmdline.service and before dracut-initqueue.service. But that isn't possible because this service is created by an ignition configuration, and services created that way only run after the root has pivoted, and at that point the cmdline and initqueue steps of Dracut have already been completed.

The net result is that this ordering dependencies are just ignored, and the service will run as part of multi-user.target without any special ordering restriction. It may even run before NetworkManager. If that happens then the configuration files that it generates will be ignored. That is unlikely, but it have been observed when there are many network interfaces with static configuration. To avoid that the service should be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to generate the network config when using the minimal ISO. That is added directly to the initrd, not via ignition. It runs before the root pivots because the network is needed to fetch the root file system. This patch doesn't change that service.

Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 21, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jul 21, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2023
@zaneb
Copy link
Member

zaneb commented Jul 21, 2023

typo:

It may even run before NetworkManager. If that happens then the configuration files that it generates will be ignored.

s/before/after/

Also, I think this is worth opening a bug for, because this has been wrong for a while and might need to be backported.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2023
@codecov
Copy link

codecov bot commented Jul 21, 2023

Codecov Report

Merging #5375 (ea545d1) into master (3756e7f) will increase coverage by 0.35%.
Report is 13 commits behind head on master.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5375      +/-   ##
==========================================
+ Coverage   67.59%   67.95%   +0.35%     
==========================================
  Files         226      226              
  Lines       33304    34020     +716     
==========================================
+ Hits        22513    23119     +606     
- Misses       8758     8852      +94     
- Partials     2033     2049      +16     

see 13 files with indirect coverage changes

Currently the systemd service that we use to apply static network
configuration from the non-minimal discovery environment wants to run
after `dracut-cmdline.service` and before `dracut-initqueue.service`.
But that isn't possible because this service is created by an ignition
configuration, and services created that way only run after the root has
pivoted, and at that point the `cmdline` and `initqueue` steps of Dracut
have already been completed.

The net result is that this ordering dependencies are just ignored, and
the service will run as part of `multi-user.target` without any special
ordering restriction. It may even run after NetworkManager. If that
happens then the configuration files that it generates will be ignored.
That is unlikely, but it has been observed when there are many network
interfaces with static configuration. To avoid that the service should
be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to
generate the network config when using the minimal ISO. That is added
directly to the initrd, not via ignition. It runs before the root pivots
because the network is needed to fetch the root file system. This patch
doesn't change that service.

Related: https://issues.redhat.com/browse/MGMT-15339
Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355
Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>
@jhernand jhernand force-pushed the ensure_that_pre_network_configuration_script_runs_before_network_manager branch from 4eb3b84 to ea545d1 Compare July 21, 2023 11:46
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2023
@jhernand
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2023
@jhernand jhernand changed the title NO-ISSUE: Run network config before NetworkManager MGMT-15339: Run network config before NetworkManager Jul 21, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 21, 2023

@jhernand: This pull request references MGMT-15339 which is a valid jira issue.

In response to this:

Currently the systemd service that we use to apply static network configuration from the non-minimal discovery environment wants to run after dracut-cmdline.service and before dracut-initqueue.service. But that isn't possible because this service is created by an ignition configuration, and services created that way only run after the root has pivoted, and at that point the cmdline and initqueue steps of Dracut have already been completed.

The net result is that this ordering dependencies are just ignored, and the service will run as part of multi-user.target without any special ordering restriction. It may even run before NetworkManager. If that happens then the configuration files that it generates will be ignored. That is unlikely, but it have been observed when there are many network interfaces with static configuration. To avoid that the service should be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to generate the network config when using the minimal ISO. That is added directly to the initrd, not via ignition. It runs before the root pivots because the network is needed to fetch the root file system. This patch doesn't change that service.

Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 21, 2023

@jhernand: This pull request references MGMT-15339 which is a valid jira issue.

In response to this:

Currently the systemd service that we use to apply static network configuration from the non-minimal discovery environment wants to run after dracut-cmdline.service and before dracut-initqueue.service. But that isn't possible because this service is created by an ignition configuration, and services created that way only run after the root has pivoted, and at that point the cmdline and initqueue steps of Dracut have already been completed.

The net result is that this ordering dependencies are just ignored, and the service will run as part of multi-user.target without any special ordering restriction. It may even run after NetworkManager. If that happens then the configuration files that it generates will be ignored. That is unlikely, but it has been observed when there are many network interfaces with static configuration. To avoid that the service should be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to generate the network config when using the minimal ISO. That is added directly to the initrd, not via ignition. It runs before the root pivots because the network is needed to fetch the root file system. This patch doesn't change that service.

Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jhernand
Copy link
Contributor Author

typo:

It may even run before NetworkManager. If that happens then the configuration files that it generates will be ignored.

s/before/after/

Done.

Also, I think this is worth opening a bug for, because this has been wrong for a while and might need to be backported.

Here is the bug: https://issues.redhat.com/browse/MGMT-15339

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 21, 2023

@jhernand: This pull request references MGMT-15339 which is a valid jira issue.

In response to this:

Currently the systemd service that we use to apply static network configuration from the non-minimal discovery environment wants to run after dracut-cmdline.service and before dracut-initqueue.service. But that isn't possible because this service is created by an ignition configuration, and services created that way only run after the root has pivoted, and at that point the cmdline and initqueue steps of Dracut have already been completed.

The net result is that this ordering dependencies are just ignored, and the service will run as part of multi-user.target without any special ordering restriction. It may even run after NetworkManager. If that happens then the configuration files that it generates will be ignored. That is unlikely, but it has been observed when there are many network interfaces with static configuration. To avoid that the service should be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to generate the network config when using the minimal ISO. That is added directly to the initrd, not via ignition. It runs before the root pivots because the network is needed to fetch the root file system. This patch doesn't change that service.

Related: https://issues.redhat.com/browse/MGMT-15339
Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bfournie
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2023
@romfreiman
Copy link
Contributor

@tsorya @eranco74 ptal

@eranco74
Copy link
Contributor

/lgtm

@eranco74
Copy link
Contributor

/lgtm

@zaneb
Copy link
Member

zaneb commented Jul 23, 2023

/retest-required

@bfournie
Copy link
Contributor

One minor clarification - the problem that this fixes isn't really that this service runs after NetworkManager, its that that the script doesn't complete writing out the files in /etc/NetworkManager/system-connections before NetworkManager sets up interfaces using these files. That's what makes the bug susceptible to the time that pre-network-manager-config.sh takes to run, which is longer when there are more interfaces. This fixes that timing issue.

@jhernand
Copy link
Contributor Author

To verify this I introduced an artificial delay in the script that generates the NetworkManager configuration, something like this:

diff --git a/internal/constants/scripts.go b/internal/constants/scripts.go
index c118f254c..89be73284 100644
--- a/internal/constants/scripts.go
+++ b/internal/constants/scripts.go
@@ -30,6 +30,8 @@ package constants
 //    '/etc/coreos-firstboot-network' when the script is part of the minimal ISO.
 const PreNetworkConfigScript = `#!/bin/bash
 
+sleep 60
+
 PATH_PREFIX=${PATH_PREFIX:=''}
 
 # The directory that contains nmconnection files of all nodes

Then I tried to install a cluster without this patch and using static network configuration. I observed that the static network configuration was ignored and instead the network interfaces got the IP addresses from the DHCP server.

I repeated the same test with this patch and then the network interfaces got the IP addresses specified in the configuration.

@jhernand
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jul 28, 2023

@jhernand: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 5ad5976 into openshift:master Jul 28, 2023
14 checks passed
paul-maidment pushed a commit to paul-maidment/assisted-service that referenced this pull request Jul 28, 2023
Currently the systemd service that we use to apply static network
configuration from the non-minimal discovery environment wants to run
after `dracut-cmdline.service` and before `dracut-initqueue.service`.
But that isn't possible because this service is created by an ignition
configuration, and services created that way only run after the root has
pivoted, and at that point the `cmdline` and `initqueue` steps of Dracut
have already been completed.

The net result is that this ordering dependencies are just ignored, and
the service will run as part of `multi-user.target` without any special
ordering restriction. It may even run after NetworkManager. If that
happens then the configuration files that it generates will be ignored.
That is unlikely, but it has been observed when there are many network
interfaces with static configuration. To avoid that the service should
be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to
generate the network config when using the minimal ISO. That is added
directly to the initrd, not via ignition. It runs before the root pivots
because the network is needed to fetch the root file system. This patch
doesn't change that service.

Related: https://issues.redhat.com/browse/MGMT-15339
Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>
CrystalChun pushed a commit to CrystalChun/assisted-service that referenced this pull request Aug 25, 2023
Currently the systemd service that we use to apply static network
configuration from the non-minimal discovery environment wants to run
after `dracut-cmdline.service` and before `dracut-initqueue.service`.
But that isn't possible because this service is created by an ignition
configuration, and services created that way only run after the root has
pivoted, and at that point the `cmdline` and `initqueue` steps of Dracut
have already been completed.

The net result is that this ordering dependencies are just ignored, and
the service will run as part of `multi-user.target` without any special
ordering restriction. It may even run after NetworkManager. If that
happens then the configuration files that it generates will be ignored.
That is unlikely, but it has been observed when there are many network
interfaces with static configuration. To avoid that the service should
be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to
generate the network config when using the minimal ISO. That is added
directly to the initrd, not via ignition. It runs before the root pivots
because the network is needed to fetch the root file system. This patch
doesn't change that service.

Related: https://issues.redhat.com/browse/MGMT-15339
Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>
danielerez pushed a commit to danielerez/assisted-service that referenced this pull request Oct 15, 2023
Currently the systemd service that we use to apply static network
configuration from the non-minimal discovery environment wants to run
after `dracut-cmdline.service` and before `dracut-initqueue.service`.
But that isn't possible because this service is created by an ignition
configuration, and services created that way only run after the root has
pivoted, and at that point the `cmdline` and `initqueue` steps of Dracut
have already been completed.

The net result is that this ordering dependencies are just ignored, and
the service will run as part of `multi-user.target` without any special
ordering restriction. It may even run after NetworkManager. If that
happens then the configuration files that it generates will be ignored.
That is unlikely, but it has been observed when there are many network
interfaces with static configuration. To avoid that the service should
be explicitly configured to run before NetworkManager.

Note that there is another diferent systemd service that is used to
generate the network config when using the minimal ISO. That is added
directly to the initrd, not via ignition. It runs before the root pivots
because the network is needed to fetch the root file system. This patch
doesn't change that service.

Related: https://issues.redhat.com/browse/MGMT-15339
Related: https://issues.redhat.com//browse/OCPBUGS-16219
Related: openshift/installer#7355

Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants