networkd-test.py test_transient_hostname is flaky #4753

martinpitt · 2016-11-27T21:48:07Z

Sometimes the recently introduced test for setting the transient host name fails:

======================================================================
FAIL: test_transient_hostname (__main__.DnsmasqClientTest)
networkd sets transient hostname from DHCP
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/adttmp/autopkgtest-virt-lxc.shared.xmf9l35n/downtmp/build.leB/systemd-debian/test/networkd-test.py", line 442, in test_transient_hostname
    self.assertIn(b'testgreen', subprocess.check_output(['hostnamectl']))
AssertionError: b'testgreen' not found in b'   Static hostname: n/a\nTransient hostname: autopkgtest-lxc-ppfmgn\n         Icon name: computer-container\n           Chassis: container\n        Machine ID: 3b5423839f2c4e83924a044a361eb013\n           Boot ID: 1743ea26733040018c15774418e0d2c5\n    Virtualization: lxc\n  Operating System: Ubuntu 16.04.1 LTS\n            Kernel: Linux 4.4.0-38-generic\n      Architecture: s390x\n'

full log, also seen on other architectures.

In the above case the transient host name did get set, but is wrong (it's the real host name).

It requires further information to debug this as this is hard to reproduce. It could be a bug in the test or some race condition between networkd and hostnamed. As a starter I'll make the test show journal output on failure.

The text was updated successfully, but these errors were encountered:

This test fails sometimes but it is hard to reproduce, so we need more information what happens. Set journal log level to "debug" for the entirety of the test script, and show networkd's and hostnamed's journals on failure of test_transient_hostname(). This should help with tracking down issue systemd#4753.

This test fails sometimes but it is hard to reproduce, so we need more information what happens. Set journal log level to "debug" for the entirety of networkd-test.py, and show networkd's and hostnamed's journals on failure of the two test_transient_hostname* tests. This should help with tracking down issue systemd#4753.

This test fails sometimes but it is hard to reproduce, so we need more information what happens. Set journal log level to "debug" for the entirety of networkd-test.py, and show networkd's and hostnamed's journals and the DHCP server log on failure of the two test_transient_hostname* tests. Also sync the journal before querying it to get more precise output. This should help with tracking down issue systemd#4753.

This test fails sometimes but it is hard to reproduce, so we need more information what happens. Set journal log level to "debug" for the entirety of networkd-test.py, and show networkd's and hostnamed's journals and the DHCP server log on failure of the two test_transient_hostname* tests. Also sync the journal before querying it to get more precise output. This should help with tracking down issue #4753.

martinpitt · 2016-11-28T11:39:47Z

PR #4754 landed now, so the next time we see this failure we should get some clue what goes wrong.

evverx · 2016-11-29T21:13:44Z

@martinpitt ,
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial-pitti-systemd-semaphore/xenial/s390x/s/systemd-upstream/20161129_202059_5d48c@/log.gz

test_transient_hostname (__main__.DnsmasqClientTest)
networkd sets transient hostname from DHCP ... ---- systemd-networkd.service ----
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Trying to enqueue job systemd-networkd.service/start/replace
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Installed new job systemd-networkd.service/start as 1208
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Enqueued job systemd-networkd.service/start as 1208
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: ConditionCapability=CAP_NET_ADMIN succeeded.
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Failed to reset devices.list: Operation not permitted
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Passing 0 fds to service
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: About to execute: /lib/systemd/systemd-networkd
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Forked /lib/systemd/systemd-networkd as 6032
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Changed dead -> start
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: Starting Network Service...
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[6032]: systemd-networkd.service: Executing: /lib/systemd/systemd-networkd
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Got notification message from PID 6032 (WATCHDOG=1)
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd-networkd[6032]: test_eth42: IPv6 enabled for interface: Success
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd-networkd[6032]: eth0: Gained IPv6LL
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd-networkd[6032]: Enumeration completed
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Got notification message from PID 6032 (READY=1, STATUS=Processing requests...)
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Changed start -> running
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-networkd.service: Job systemd-networkd.service/start finished, result=done
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: Started Network Service.
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd-networkd[6032]: test_eth42: Gained carrier
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd-networkd[6032]: router_eth42: Gained carrier
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd-networkd[6032]: test_eth42: DHCPv4 address 192.168.5.210/24 via 192.168.5.1
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd-networkd[6032]: router_eth42: Gained IPv6LL
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd-networkd[6032]: test_eth42: Gained IPv6LL
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd-networkd[6032]: test_eth42: Configured
---- systemd-hostnamed.service ----
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Trying to enqueue job systemd-hostnamed.service/stop/replace
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Installed new job systemd-hostnamed.service/stop as 1163
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Enqueued job systemd-hostnamed.service/stop as 1163
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Changed running -> stop-sigterm
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: Stopping Hostname Service...
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Child 5776 belongs to systemd-hostnamed.service
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Main process exited, code=killed, status=15/TERM
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Changed stop-sigterm -> dead
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Job systemd-hostnamed.service/stop finished, result=done
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: Stopped Hostname Service.
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: cgroup is empty
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Collecting.
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Collecting.
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Collecting.
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Trying to enqueue job systemd-hostnamed.service/start/replace
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Installed new job systemd-hostnamed.service/start as 1213
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Enqueued job systemd-hostnamed.service/start as 1213
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Passing 0 fds to service
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: About to execute: /lib/systemd/systemd-hostnamed
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Forked /lib/systemd/systemd-hostnamed as 6035
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Changed dead -> start
Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: Starting Hostname Service...
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd[6035]: systemd-hostnamed.service: Executing: /lib/systemd/systemd-hostnamed
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Got notification message from PID 6035 (WATCHDOG=1)
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: D-Bus name org.freedesktop.hostname1 changed owner from  to :1.85
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Changed start -> running
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Job systemd-hostnamed.service/start finished, result=done
Nov 29 20:18:26 autopkgtest-lxc-hbhvat systemd[1]: Started Hostname Service.
Nov 29 20:18:26 testgreen systemd-hostnamed[6035]: Changed host name to 'testgreen'


---- dnsmasq log ----
Nov 29 20:18:25 dnsmasq[6029]: started, version 2.75 cachesize 150
Nov 29 20:18:25 dnsmasq[6029]: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect inotify
Nov 29 20:18:25 dnsmasq-dhcp[6029]: DHCP, IP range 192.168.5.10 -- 192.168.5.200, lease time 1h
Nov 29 20:18:25 dnsmasq-dhcp[6029]: DHCP, sockets bound exclusively to interface router_eth42
Nov 29 20:18:25 dnsmasq[6029]: reading /etc/resolv.conf
Nov 29 20:18:25 dnsmasq[6029]: using nameserver 10.0.3.1#53
Nov 29 20:18:25 dnsmasq[6029]: using nameserver 127.0.0.53#53
Nov 29 20:18:25 dnsmasq[6029]: read /etc/hosts - 7 addresses
Nov 29 20:18:25 dnsmasq-dhcp[6029]: DHCPDISCOVER(router_eth42) de:ad:be:ef:47:11 
Nov 29 20:18:25 dnsmasq-dhcp[6029]: DHCPOFFER(router_eth42) 192.168.5.210 de:ad:be:ef:47:11 
Nov 29 20:18:25 dnsmasq-dhcp[6029]: DHCPREQUEST(router_eth42) 192.168.5.210 de:ad:be:ef:47:11 
Nov 29 20:18:25 dnsmasq-dhcp[6029]: DHCPACK(router_eth42) 192.168.5.210 de:ad:be:ef:47:11 testgreen

------

FAIL

martinpitt · 2016-11-29T21:44:24Z

yay, hit from PR #4694, the s390x one from @evverx above and the amd64 one too.

Unfortunately the networkd journal is definitively cut short in both logs, and the amd64 one even has a truncated hostnamed log. But at least we know that in both cases dnsmasq actually handed out the expected IP and host name, and in the s390x log hostnamed confirmed the Changed host name to 'testgreen'. Apparently there is some async propagation delay involved here? The weird thing is that in both cases hostnamectl does show a transient hostname (normally it doesn't), but it's the wrong (original) one.

So as the next step I could add some timeout loop (say, 5 seconds?) which waits for hostnamectl to show the expected transient hostname. We would then at least see if this is just a race condition (we look too early) or a permanently wrong one. What I'm not sure about is whether the retry loop is actually justified, or would just hide some bug.

Sometimes setting the transient hostname does not happen synchronously, so retry up to five times. It is not yet clear whether this is legitimate behaviour or an underlying bug, but this will at least show whether the wrong transient hostname is just a race condition or permanently wrong. See issue systemd#4753

evverx · 2016-11-29T22:05:02Z

@martinpitt I think this

Nov 29 20:18:25 autopkgtest-lxc-hbhvat systemd[1]: systemd-hostnamed.service: Trying to enqueue job systemd-hostnamed.service/start/replace

means that the systemd-hostnamed was activated by hostnamectl at 20:18:25
And this

Nov 29 20:18:26 testgreen systemd-hostnamed[6035]: Changed host name to 'testgreen'

means that the hostname was changed at 20:18:26

So, yes

So as the next step I could add some timeout loop (say, 5 seconds?) which waits for hostnamectl to show the expected transient hostname

makes sense

orodbhen · 2017-05-30T14:58:37Z

Do you know if this fix has been backported to RHEL/CentOS 7?

I'm using networkd on a set of tftpboot systems, and it occasionally fails to set the hostname from DHCP. It get's the IP address, and establishes an active LAN connection, but the hostname remains localhost.

Most of the time, everything works, but not always.

martinpitt added dhcp hostname network labels Nov 27, 2016

martinpitt self-assigned this Nov 27, 2016

martinpitt mentioned this issue Nov 27, 2016

device: Remove maybe-delete logic in device_setup_unit #4748

Merged

martinpitt mentioned this issue Nov 27, 2016

test: make transient hostname tests fail verbosely #4754

Merged

martinpitt mentioned this issue Nov 29, 2016

Improvements for notify services (including #4212) #4745

Merged

evverx mentioned this issue Nov 29, 2016

tree-wide: stop using canonicalize_file_name(), use chase_symlinks() … #4694

Merged

martinpitt mentioned this issue Nov 29, 2016

test: retry checking for transient hostname in hostnamectl #4769

Merged

martinpitt closed this as completed in 2926b13 Nov 30, 2016

martinpitt mentioned this issue Nov 30, 2016

networkd: Allow bridge forward delay to be set as 0 #4765

Merged

cerebro1 mentioned this issue Apr 19, 2022

hostnamed: display firmware version #23112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

networkd-test.py test_transient_hostname is flaky #4753

networkd-test.py test_transient_hostname is flaky #4753

martinpitt commented Nov 27, 2016

martinpitt commented Nov 28, 2016

evverx commented Nov 29, 2016

martinpitt commented Nov 29, 2016

evverx commented Nov 29, 2016

orodbhen commented May 30, 2017

networkd-test.py test_transient_hostname is flaky #4753

networkd-test.py test_transient_hostname is flaky #4753

Comments

martinpitt commented Nov 27, 2016

martinpitt commented Nov 28, 2016

evverx commented Nov 29, 2016

martinpitt commented Nov 29, 2016

evverx commented Nov 29, 2016

orodbhen commented May 30, 2017