Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move 'hooks' to their respective backends #3392

Merged
merged 7 commits into from
Oct 7, 2022
Merged

Move 'hooks' to their respective backends #3392

merged 7 commits into from
Oct 7, 2022

Conversation

jvonau
Copy link
Contributor

@jvonau jvonau commented Oct 5, 2022

Fixes bug:

With NetworkManager active, systemd-networkd/networkd-dispatcher are used for br0/ap0 support but the hooks were not being installed on the first pass through network as systemd_networkd_active was not detected when ansibled stated.

Description of changes proposed in this pull request:

Move 'hooks' to their respective backends

Smoke-tested on which OS or OS's:

RasPiOS

With NetworkManager active, systemd-networkd/networkd-dispatcher are used for
br0/ap0 support but the hooks were not being installed on the first pass through
network as systemd_networkd_active was not detected when ansibled stated.
@holta holta added the bug label Oct 5, 2022
@holta holta added this to the 8.0 milestone Oct 5, 2022
@holta
Copy link
Member

holta commented Oct 5, 2022

Should this PR me revised to take into account...?

Or conversely, should this PR be merged just as it is?

@jvonau
Copy link
Contributor Author

jvonau commented Oct 6, 2022

It's actually an enhancement also extending RasPiOS support when NetworkManager is used. The 'bug' would be contained to Ubuntu on RPi hardware and a second pass through networking would install the hooks.

@jvonau jvonau changed the title WIP: Move 'hooks' to their respective backends Move 'hooks' to their respective backends Oct 6, 2022
@holta
Copy link
Member

holta commented Oct 6, 2022

use ap0 hack only on RPi hardware

Question: if people running non-RPi hardware want AP+STA mode (e.g. using ap0) — shouldn't we allow them to explore that possibility? (So as to enable wifi_up_down to see if that works on their hardware.) Particularly given the severe shortage of Raspberry Pi hardware over the past 14 months ~ exploring other hardware options has become important to a lot of communities in 2022.

Certainly wifi_up_down: True doesn't need to be the default for all hardware, which would neither be safe nor wise!

CAVEAT: If I'm misunderstanding certain aspects of this commit, I apologize!

ASIDE: This commit has a small syntax error on the end of Line 72 (dangling double-quote) just fyi.

@holta
Copy link
Member

holta commented Oct 6, 2022

What kinds of testing are likely most useful?

(For RasPiOS and otherwise!)

@jvonau
Copy link
Contributor Author

jvonau commented Oct 7, 2022

use ap0 hack only on RPi hardware

Question: if people running non-RPi hardware want AP+STA mode (e.g. using ap0) — shouldn't we allow them to explore that possibility? (So as to enable wifi_up_down to see if that works on their hardware.) Particularly given the severe shortage of Raspberry Pi hardware over the past 14 months ~ exploring other hardware options has become important to a lot of communities in 2022.

Think you misunderstand the purpose of the 'hooks' that are used solely to workaround limitations and quarks with the wifi firmware used on RPis. I have documented where others have needed to use similar code to archive satisfactory results within the PRs that birthed the existence of wifi_up_down, please go back are read them.

Certainly wifi_up_down: True doesn't need to be the default for all hardware, which would neither be safe nor wise!

Currently wifi_up_down is the default for all detected wifi hardware, the hooks are firmware specific as used on the RPis. The 'can_be_ap' is more useful to ferret out WiFi hardware that cannot be used with AP, which is the primary use of the WiFi device, that is the starting point for a forced opt out of wifi_up_down mode by not installing the support for hostapd. Without feedback from the user base about hardware that doesn't support cloning for ap0, think there has been one issue where can_be_ap was true but didn't support wifi_up_down since introduced there is nothing to write code from. Given I review all diagnostics I can point to where 'ap0 just works' as in #3330 #3385 from recent memory otherwise the installer might blowup before admin-console gets installed given the point where the cloning occurs. I would expect more issues reported because the cloning failed but I hear crickets about any failures related to that.

CAVEAT: If I'm misunderstanding certain aspects of this commit, I apologize!

You decided RPis/RasPiOS were the primary target hardware/distro to be supported which I found rather short sighted and I kept Ubuntu network support alive anyway over the years. Without that piece in place Mint would not of become possible. Now there has been shift back to Ubuntu given the hardware requirements of some the roles and/or the use of docker with them.

ASIDE: This commit has a small syntax error on the end of Line 72 (dangling double-quote) just fyi.

Fixed.

@jvonau
Copy link
Contributor Author

jvonau commented Oct 7, 2022

Can this line (when: not is_linuxmint) now be removed, as discussed during the call (http://minutes.iiab.io/) earlier today?

Removed via a revert/rebase, grouped with the other 11 packages now.

@jvonau
Copy link
Contributor Author

jvonau commented Oct 7, 2022

What kinds of testing are likely most useful?

(For RasPiOS and otherwise!)

In the end the same files are installed on RasPiOS just in a slightly different order with dhcpcd running the show. On Ubuntu server same thing same files different order with systemd-networkd running the show. With NetworkManager in the mix, the diagnostics hooks should be installed correctly now on the first pass though network as systemd-networkd is really what is used for br0 support.

@holta
Copy link
Member

holta commented Oct 7, 2022

A MEDIUM-sized IIAB test install (of this PR) has begun on 10.8.0.26 = 181-rpi4-64lite-PR3392, after I ran:

curl iiab.io/fast.txt | bash -s 3392

@holta
Copy link
Member

holta commented Oct 7, 2022

MEDIUM-sized IIAB test install (of this PR) has begun on 10.8.0.26 = 181-rpi4-64lite-PR3392

Install is complete. iiab-diagnostics output:

http://sprunge.us/rkyWWS?bash

@holta
Copy link
Member

holta commented Oct 7, 2022

10.8.0.26 = 181-rpi4-64lite-PR3392

FYI the RPi 4's internal WiFi hotspot works, when tested with a phone's browser (over WiFi) with the usual URL's:

@holta
Copy link
Member

holta commented Oct 7, 2022

purpose of the 'hooks' [is] solely to workaround limitations and [quirks] with the wifi firmware used on RPis

Thanks much for explaining.

Without feedback from the user base about hardware that doesn't support cloning for ap0, think there has been one issue where can_be_ap was true but didn't support wifi_up_down since introduced there is nothing to write code from.

I agree, this is a pattern but not all have been written up for analysis.

So definitely something to monitor going forward.

'can_be_ap' is more useful to ferret out WiFi hardware that cannot be used with AP, which is the primary use of the WiFi device, that is the starting point for a forced opt out of wifi_up_down mode by not installing the support for hostapd

👍

shift back to Ubuntu

FWIW Raspberry Pi OS is the dominant OS by far, even despite the ongoing Raspberry Pi supply chain problems.

Regardless, we should strengthen our Ubuntu support where that's possible — acknowledging that both OS's are moving targets — so supporting the most common use cases is indeed the best we can do!

With Debian 12 (Bookworm) release "freezes" expected Q1 2023, which is not far away. So possibly that might add a few quirks 😉

@holta
Copy link
Member

holta commented Oct 7, 2022

Quick IIAB install tested on a Mint 21 VM using:

curl iiab.io/risky.txt | bash -s 3392

iiab-diagnostics after reboot: http://sprunge.us/RGQDuV?bash

Any further tests needed?

@jvonau
Copy link
Contributor Author

jvonau commented Oct 7, 2022

Nothing is triggering the need to configure br0 without a second network adapter, as quick workaround/test could you try adding to local_vars:

iiab_wired_lan_iface: eth1
iiab_lan_iface: br0

then sudo iiab-network
Should have br0 present in ip a if not reboot and recheck.. and a iiab-diagnostic after a reboot

@holta
Copy link
Member

holta commented Oct 7, 2022

iiab_wired_lan_iface: eth1
iiab_lan_iface: br0

Done. Prior to reboot, br0 appeared:

root@box:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:26:9e:0c brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.196/24 brd 192.168.0.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::b31:8c88:dacd:89c9/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none
    inet 10.8.0.22 peer 10.8.0.21/32 scope global tun0
       valid_lft forever preferred_lft forever
4: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether fa:f0:96:c1:ee:eb brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.10/24 brd 10.10.10.255 scope global br0
       valid_lft forever preferred_lft forever

Right after reboot, ssh connections where being dropped (the very moment after typing in a correct password) which is behavior I'd not seen before. Quite strange.

Likewise when attempting to ssh in as root, it was a bit similar...but with this twist:

Pre-authentication banner message from server:
| "System is booting up. Unprivileged users are not permitted to log in yet. Pl
> ease come back later. For technical details, see pam_nologin(8)."
End of banner message from server
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."

Published password in use by user 'iiab-admin'.
THIS IS A SECURITY RISK - please run 'sudo passwd iiab-admin' to change it.

root@box:~#

In any case, a couple minutes later, ssh logins started to work again properly.

Finally, iiab-diagnostics was run, and its output is: http://sprunge.us/axrXFI?bash

@jvonau
Copy link
Contributor Author

jvonau commented Oct 7, 2022

Thanks, looks like on Mint we are good to go..

1025 =IIAB==========================================================================
1026 COMMAND: /usr/bin/sudo journalctl -b 0 -u networkd-dispatcher    # networkd-dispatcher log
1027 
1028 Oct 07 15:41:15 box systemd[1]: Starting Dispatcher daemon for systemd-networkd...
1029 Oct 07 15:41:16 box networkd-dispatcher[702]: NET-DISP-unmanaged lo carrier
1030 Oct 07 15:41:16 box networkd-dispatcher[704]: NET-DISP-unmanaged enp0s3 off
1031 Oct 07 15:41:16 box networkd-dispatcher[706]: NET-DISP-configured br0 no-carrier
1032 Oct 07 15:41:16 box systemd[1]: Started Dispatcher daemon for systemd-networkd.
1033 Oct 07 15:41:16 box networkd-dispatcher[716]: NET-DISP-unmanaged enp0s3 no-carrier
1034 Oct 07 15:41:16 box networkd-dispatcher[724]: NET-DISP-unmanaged enp0s3 carrier
1035 Oct 07 15:41:16 box networkd-dispatcher[736]: NET-DISP-unmanaged enp0s3 routable
1036 Oct 07 15:43:17 box networkd-dispatcher[564]: WARNING:Unknown index 4 seen, reloading interface list
1037 Oct 07 15:43:17 box networkd-dispatcher[1280]: NET-DISP-pending tun0 off
1038 Oct 07 15:43:17 box networkd-dispatcher[1291]: NET-DISP-pending tun0 carrier
1039 Oct 07 15:43:17 box networkd-dispatcher[1303]: NET-DISP-pending tun0 routable
1529 2022-10-07 15:39:45,657 p=2518 u=root n=ansible | TASK [network : Create networkd-dispatcher diagnostic hook for recording network events] ***
1530 2022-10-07 15:39:46,358 p=2518 u=root n=ansible | changed: [127.0.0.1] => (item={'src': 'hostapd/00-iiab-debug', 'dest': '/etc/networkd-dispatcher/carrier.d/00-iiab-debug'})
1531 2022-10-07 15:39:47,008 p=2518 u=root n=ansible | changed: [127.0.0.1] => (item={'src': 'hostapd/00-iiab-debug', 'dest': '/etc/networkd-dispatcher/degraded.d/00-iiab-debug'})
1532 2022-10-07 15:39:47,650 p=2518 u=root n=ansible | changed: [127.0.0.1] => (item={'src': 'hostapd/00-iiab-debug', 'dest': '/etc/networkd-dispatcher/dormant.d/00-iiab-debug'})
1533 2022-10-07 15:39:48,289 p=2518 u=root n=ansible | changed: [127.0.0.1] => (item={'src': 'hostapd/00-iiab-debug', 'dest': '/etc/networkd-dispatcher/no-carrier.d/00-iiab-debug'})
1534 2022-10-07 15:39:48,925 p=2518 u=root n=ansible | changed: [127.0.0.1] => (item={'src': 'hostapd/00-iiab-debug', 'dest': '/etc/networkd-dispatcher/off.d/00-iiab-debug'})
1535 2022-10-07 15:39:49,568 p=2518 u=root n=ansible | changed: [127.0.0.1] => (item={'src': 'hostapd/00-iiab-debug', 'dest': '/etc/networkd-dispatcher/routable.d/00-iiab-debug'})

@holta
Copy link
Member

holta commented Oct 7, 2022

Thanks, looks like on Mint we are good to go..

Great. Uncomment line 59 of roles/network/tasks/sysd-netd-debian.yml so this can be merged?

(Any test scenarios we should emphasize after this is merged?)

@jvonau
Copy link
Contributor Author

jvonau commented Oct 7, 2022

I was going to leave that for the next while as a placekeeper, as greater detail can be gathered on simple single interface machines, helps to promote learning of how networkd-dispatcher transitions between the 'states'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants