Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hidden persist-nic-names command #2301

Merged
merged 3 commits into from Apr 11, 2023

Conversation

cgwalters
Copy link
Contributor

@cgwalters cgwalters commented Mar 30, 2023

The goal

Provide a CLI tool that can be executed as part of a privileged container to persist network NIC names. The tool should also support --dry-run (for ease of debugging) and crucially --inspect that can be invoked at any time afterwards to help us understand what it did.

Also we want to support --root for the case where the host is mounted separately from the container image.

The interface

podman run --privileged -v /:/run/host --net=host --rm -ti <image> nmstatectl persist-nic-names --root /run/host <options> 

Which NIC names are persisted

It'd be nice to avoid persisting in the default case of DHCP (particularly in cloud) but I'm open to debate.

Dusty suggested below that we only persist NIC names that are mentioned in NM connection files.

Overall design: one-shot persistence

Original PoC code required a "double reboot", doing a diff between the detected NICs pre and post OS upgrade. This is possible but will be somewhat problematic for existing OCP flows.

Test scenarios: Should not persist

Default cloud with DHCP

e.g. AWS, GCP, qemu, etc.

Test scenarios: May or may not persist

  • Metal with DHCP
  • bonding over NICs using DHCP

Test scenarios: Must persist

  • Anything where the current NIC configuration is at risk of potentially changing across OS updates. Particularly (but probably not exclusively) static IP addresses.

Original PR description

This builds on top of #2299 and #2300 and adds a new interface:

nmstatectl pin-nic-names as well as a corresponding systemd unit nmstatectl-pin-nic-names.service that can be enabled.

This is aiming to implement #2299 (comment)

@kubevirt-bot
Copy link
Collaborator

Hi @cgwalters. Thanks for your PR.

I'm waiting for a nmstate member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tyll
Copy link
Member

tyll commented Mar 30, 2023

IMHO, this command is not in scope of the nmstatectl tool since Nmstate is about defining a state and applying it.
Therefore, this should be a standalone tool and most likely a systemd generator (this is the term used for software that generates systemd units such as link files), similar to https://www.freedesktop.org/software/systemd/man/systemd-network-generator.service.html except that it does not parse the kernel cmdline but the current network state.

In the past, there also used to be a persistent name support in - see for example https://github.com/lu-zero/udev/blob/master/src/rule_generator/write_net_rules - which created udev rules files and was hooked into udev. This could also be a good reason to keep calling this "persistent names" instead of "pinning names" btw.

[Unit]
Description=Pin active network interfaces
Documentation=man:nmstate.service(8) https://www.nmstate.io
After=NetworkManager.service
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too early; maybe use network-online.target instead?

@cgwalters
Copy link
Contributor Author

IMHO, this command is not in scope of the nmstatectl tool

I think everyone would agree this isn't a perfect fit, but creating a new tool implies some overhead.

@cgwalters
Copy link
Contributor Author

cgwalters commented Mar 30, 2023

We should debate the overall architecture; I mentioned this here #2299 (comment)

One important thing here is all current designs call for running this process once (or at least, by default). We're trying to create something that can assist the admin in creating .link files retroactively.

Also at the current time in order to aid bootstrap problems my plan is to run this from a privileged container, not the host. That's the idea behind having this be runnable from nmstatectl directly and not the systemd unit, and also the --root arg commit.

This could also be a good reason to keep calling this "persistent names" instead of "pinning names" btw.

Yeah, makes sense.

So combining these threads we could split this off as a separate binary not branded nmstate; maybe something boring like nic-name-persist-helper.

Also, we could for now make this an opt-in Rust feature.

Again the thought here is to fast track this as a one-off for OCP rhel8->rhel9 upgrades and it will likely only need to be supported in that context only at least initially.

@cgwalters
Copy link
Contributor Author

(Added an off-by-default feature and split into a separate module)

@cgwalters
Copy link
Contributor Author

OK a few more changes here, most crucially switching to skipping interfaces using DHCP; I'm 87.3% sure that persisting them is unnecessary in the OCP context.

I also added a container build of this code pushed at ghcr.io/cgwalters/nmstate:latest.

@cgwalters
Copy link
Contributor Author

One point @dustymabe raised with the "only static IP address" filtering is that it will break e.g. bonds over DHCP.

@cgwalters
Copy link
Contributor Author

I was also worried about ip= on the kernel cmdline, but Dusty also notes that because of the common coreos workflow of propagating network kargs into NM configs via coreos-copy-firstboot-network.service we may mainly be dealing with NM configs.

@cgwalters
Copy link
Contributor Author

Also chatted about Tang and ip= as well as anyone today using rpm-ostree initramfs-etc --track /etc/NetworkManager/system-connections - basically if we detect that we'll need to also rpm-ostree initramfs --track /etc/systemd/network

@dustymabe
Copy link

dustymabe commented Mar 31, 2023

@cgwalters and I were discussing this topic (note I haven't reviewed the code in this PR so this is just comments on the general problem and the approach for a solution).

I wonder if we can generalize the solution to this to not worry about static IP/DHCP or not. Can we just look for nm connections that are of ethernet type that are matched based on an interface name. For all that are we add the .link file.

Here's an example (taken from the FCOS docs):



variant: fcos
version: 1.4.0
storage:
  files:
    - path: /etc/NetworkManager/system-connections/bond0.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=bond0
          type=bond
          interface-name=bond0
          [bond]
          miimon=100
          mode=active-backup
          [ipv4]
          address1=10.10.10.10/24,10.10.10.1
          dhcp-hostname=myhostname
          dns=8.8.8.8;
          dns-search=
          may-fail=false
          method=manual
    - path: /etc/NetworkManager/system-connections/bond0-slave-ens2.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=bond0-slave-ens2
          type=ethernet
          interface-name=ens2
          master=bond0
          slave-type=bond
    - path: /etc/NetworkManager/system-connections/bond0-slave-ens3.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=bond0-slave-ens3
          type=ethernet
          interface-name=ens3
          master=bond0
          slave-type=bond

In this case we just loop over the connections.. Here's the flow:

  • iteration bond0.nmconnection
    • is connection.type == ethernet
      • "no" -> continue
  • iteration bond0-slave-ens2.nmconnection
    • is connection.type == ethernet
      • "yes"
        • is connection.interface-name != ""
          • "yes"
            • write out .link file persisting mac addr with ens2
  • iteration bond0-slave-ens3.nmconnection
    • is connection.type == ethernet
      • "yes"
        • is connection.interface-name != ""
          • "yes"
            • write out .link file persisting mac addr with ens3

In this flow we don't have to worry about DHCP or static IP and also ignore any device that isn't a physical NIC.

Would this work? What are the flaws here (I'm sure there are some)?

@cgwalters
Copy link
Contributor Author

Tentatively, I think matching on the connection information makes sense...it looks like to me we could use the NmApi::active_connections_get() API instead/in addition?

@cathay4t
Copy link
Member

cathay4t commented Apr 3, 2023

Tentatively, I think matching on the connection information makes sense...it looks like to me we could use the NmApi::active_connections_get() API instead/in addition?

We try to hide backend detail in the top API. We have state: ignore for those interface been marked as unmanaged in network manager. So I guess we can just filter state: up and type: ethernet here.

@cgwalters
Copy link
Contributor Author

xref openshift/machine-config-operator#3650 (comment) for skipping some OVS types

@cgwalters
Copy link
Contributor Author

cgwalters commented Apr 3, 2023

I provisioned a rhel8 system c3.small.x86 in Equinix. One interesting discovery is that they actually set up a bond over two NICs set up by DHCP by default!

[root@walters-testing-1 network-scripts]# podman run --privileged -v /:/run/host --net=host --rm -ti ghcr.io/cgwalters/nmstate nmstatectl persist-nic-names --root /run/host --dry-run
[2023-04-03T21:51:01Z INFO  nmstatectl::persist_nic] Skipping interface eno1 as no static IP addressing was found
[2023-04-03T21:51:01Z INFO  nmstatectl::persist_nic] Skipping interface eno2 as no static IP addressing was found
[2023-04-03T21:51:01Z INFO  nmstatectl::persist_nic] Would pin the interface with MAC EA:FE:2B:75:6F:3A to interface name enp0s20f0u5u2c2
[2023-04-03T21:51:01Z INFO  nmstatectl::persist_nic] No changes.

[root@walters-testing-1 network-scripts]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s20f0u5u2c2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether ea:fe:2b:75:6f:3a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::479e:6fd1:a2bb:e0bd/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether b4:96:91:84:3c:68 brd ff:ff:ff:ff:ff:ff
    altname enp1s0f0
4: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether b4:96:91:84:3c:68 brd ff:ff:ff:ff:ff:ff permaddr b4:96:91:84:3c:69
    altname enp1s0f1
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b4:96:91:84:3c:68 brd ff:ff:ff:ff:ff:ff
    inet 147.28.149.75/31 scope global noprefixroute bond0
       valid_lft forever preferred_lft forever
    inet 10.65.41.1/31 scope global noprefixroute bond0:0
       valid_lft forever preferred_lft forever
    inet6 2604:1380:4643:c00::1/127 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::b696:91ff:fe84:3c68/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[root@walters-testing-1 network-scripts]# ls -al /sys/class/net/eno1/device/driver/module
lrwxrwxrwx. 1 root root 0 Apr  3 17:52 /sys/class/net/eno1/device/driver/module -> ../../../../module/i40e
[root@walters-testing-1 network-scripts]# ls -al /sys/class/net/eno2/device/driver/module
lrwxrwxrwx. 1 root root 0 Apr  3 17:52 /sys/class/net/eno2/device/driver/module -> ../../../../module/i40e  
[root@walters-testing-1 network-scripts]# ls -al /sys/class/net/enp0s20f0u5u2c2/device/driver/module
lrwxrwxrwx. 1 root root 0 Apr  3 17:52 /sys/class/net/enp0s20f0u5u2c2/device/driver/module -> ../../../../module/cdc_ether
[root@walters-testing-1 network-scripts]# 

Also...there seems to be this virtual/emulated USB Ethernet device that we're trying to do DHCP on and not getting a reply. And this shows another bug - we're apparently trying to persist this NIC just because we didn't succeed in DHCP on it. It's using the default/virtual "Wired connection 1".

EDIT: Also of interest, Equinix defaults to injecting net.ifnames=1 net.naming-scheme=rhel-8.4 by default.

Equinix and rhel9: They don't have official RHEL9, but AlmaLinux 9. I noticed that the Ethernet device names are the same eno1 and eno2. They don't inject net.naming-scheme, but at least the names for i40e appear to be the same across RHEL8 and 9.

Equinix: It looks like c3.large.arm64 uses mlx5_core; with rhel8 I see e.g. enP1p1s0f0 (using 4.18.0-425.13.1.el8_7.aarch64).

Equinix: m3.large.x86 uses i40e.

We want to document the Rust library.  This fixes invoking
`cargo doc`.

Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters
Copy link
Contributor Author

cgwalters commented Apr 4, 2023

  • Per discussion I dropped the previous code to persist via "difference" and managed by nmstate.service; the new approach is that we execute once while in rhel8
  • I've squashed all the commits together
  • More s/pin/persist/
  • The persist-nic-names CLI entrypoint is now hidden

I'd like to change to requiring a special --i-know-this-is-unstable CLI argument but that requiries ratcheting with the MCO code. (Hmm, maybe an env variable instead...)

@cgwalters cgwalters marked this pull request as ready for review April 4, 2023 16:43
@cgwalters cgwalters changed the title Pin nic command2 Add a hidden persist-nic-names command Apr 4, 2023
query_apply = ["nmstate/query_apply", "ctrlc"]
gen_conf = ["nmstate/gen_conf"]
persist_nic = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess now that the CLI option is hidden, we can drop the feature flag?

@@ -0,0 +1,42 @@
name: Create and publish a container image
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional, but I found it useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have https://quay.io/organization/nmstate , how about we place it to quay and autobuild on every merge?

@cgwalters
Copy link
Contributor Author

OK we did some testing and actually in OpenShift OVN setups move the primary NIC into the OVS bridge, and we definitely need to pin the NIC name in this case.

@cgwalters
Copy link
Contributor Author

Changelog:

  • We now persist devices that are subordinate (e.g. part of a bond, or the ovs-system case for OCP)
  • Skip anything that's under devices/virtual, which avoids persisting the genev_sys generated by OCP

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Apr 6, 2023
When upgrading rhel8 -> rhel9, some NIC names may change; e.g.
the `mlx5` driver started emitting port information.

This breaks static IP addressing which references those interfaces.

We're going to land code in nmstate to address this; see
nmstate/nmstate#2301

- For a few reasons; one is that this isn't an OCP specific problem
- nmstate is in Rust
- nmstate already parses network interface information
- Adding more glue into the MCO is undesirable

However...to solve the bootstrap problem here in that we don't
have this nmstate code on 4.12, add it to the MCD binary.
Copy link
Member

@cathay4t cathay4t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please fix cargo flippy warnnings.
  • No need to use .github/workflows/container.yml, nmstate has quay repo. I will set it up once this PR merged.

xref https://issues.redhat.com//browse/OCPBUGS-10787

In RHEL8 to RHEL9 upgrades, some NIC names may change; we saw
this in the wild with `mlx5` based interfaces at least.
RHEL Leapp has logic to handle this, but adapting it to work
with rpm-ostree/coreos systems would be a heavy lift.

This logic is designed to be executed from a privileged container
*before* the host upgrades, which fits in well with current
OCP logic in the Machine Config Operator.

Specifically, openshift/machine-config-operator#3650
will run this command.

Co-authored-by: Gris Ge <fge@redhat.com>
Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters
Copy link
Contributor Author

OK, did both of those. Dealing with conditional compilation and unused variables is a bit annoying.

I also dropped the feature and just made it gate on query_apply.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/machine-config-operator that referenced this pull request Apr 6, 2023
When upgrading rhel8 -> rhel9, some NIC names may change; e.g.
the `mlx5` driver started emitting port information.

This breaks static IP addressing which references those interfaces.

We're going to land code in nmstate to address this; see
nmstate/nmstate#2301

- For a few reasons; one is that this isn't an OCP specific problem
- nmstate is in Rust
- nmstate already parses network interface information
- Adding more glue into the MCO is undesirable

However...to solve the bootstrap problem here in that we don't
have this nmstate code on 4.12, add it to the MCD binary.
Copy link
Member

@cathay4t cathay4t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me.
I will use another PR to improve this with a clean up process.

@cathay4t
Copy link
Member

I am intend to merge this PR. Do we need more discussion here?

@cgwalters
Copy link
Contributor Author

I will use another PR to improve this with a clean up process.

Yep, SGTM!

@cathay4t cathay4t merged commit 6f95011 into nmstate:base Apr 11, 2023
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants