Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/test-network/systemd-networkd-tests.py fails on CentOS 7.5 #10617

Closed
mrc0mmand opened this issue Nov 1, 2018 · 19 comments
Closed

test/test-network/systemd-networkd-tests.py fails on CentOS 7.5 #10617

mrc0mmand opened this issue Nov 1, 2018 · 19 comments
Labels

Comments

@mrc0mmand
Copy link
Member

mrc0mmand commented Nov 1, 2018

systemd version the issue has been seen with

239 (latest master)

Used distribution

CentOS 7.5

Right now the test/test-network/systemd-networkd-tests.py has four failing test cases on CentOS:

======================================================================
FAIL: test_ipvlan (__main__.NetworkdNetDevTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 218, in test_ipvlan
    self.assertTrue(self.link_exits('ipvlan99'))
AssertionError: False is not true

======================================================================
FAIL: test_vcan (__main__.NetworkdNetDevTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 260, in test_vcan
    self.assertTrue(self.link_exits('vcan99'))
AssertionError: False is not true

======================================================================
FAIL: test_vrf (__main__.NetworkdNetDevTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 253, in test_vrf
    self.assertTrue(self.link_exits('vrf99'))
AssertionError: False is not true

======================================================================
FAIL: test_routing_policy_rule (__main__.NetworkdNetWorkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test/test-network/systemd-networkd-tests.py", line 425, in test_routing_policy_rule
    self.assertRegex(output, 'tos 0x08')
AssertionError: Regex didn't match: 'tos 0x08' not found in '0:\tfrom all lookup local \n111:\tfrom 192.168.100.18 tos throughput iif test1 oif test1 lookup 7 \n32766:\tfrom all lookup main \n32767:\tfrom all lookup default'

----------------------------------------------------------------------
Ran 53 tests in 500.610s

FAILED (failures=4)

Full log: https://paste.fedoraproject.org/paste/wFNPam5PpMMfIbzKGALCDg/

I'm sorry for the noise in #10612 - that was actually mistake on my side, as the test probably doesn't like being run on a machine with an active DHCP same subnet as used in the test.

@ssahani could you please a look into it?

@evverx
Copy link
Member

evverx commented Nov 1, 2018

that was actually mistake on my side, as the test probably doesn't like being run on a machine with an active DHCP same subnet as used in the test

I think it'd be better if the test failed to start in this case. I mean, only the test knows in what environment it's supposed to work so it can check whether all the requirements are met before it tries to do anything.

Regarding systemd/systemd-centos-ci#21 (comment), it's basically the same issue in the sense that the test expects some modules to be installed on a machine. It should skip the tests if the necessary modules are absent.

@evverx evverx added the tests label Nov 1, 2018
@yuwata
Copy link
Member

yuwata commented Nov 1, 2018

@mrc0mmand Could you give us corresponding journal logs?

@evverx
Copy link
Member

evverx commented Nov 1, 2018

In theory the output of the test should be enough to figure out what's going on given that generally almost nobody will have access to a machine where the test is run so I'd say that the test should be clever enough to dump the logs when it fails for us to look at.

@mrc0mmand
Copy link
Member Author

@yuwata https://paste.fedoraproject.org/paste/torYK6GqJ9YuWuG9rAxxOg/

I guess the CI could dump the journal along with the test results for easier debugging. Just an idea, though.

@ssahani
Copy link
Contributor

ssahani commented Nov 1, 2018

vrf ,vcan , ipvlan networkd not able to create links which is missing kernel modules.

AssertionError: Regex didn't match: 'tos 0x08' not found in '0:\tfrom all lookup local \n111:\tfrom 192.168.100.18 tos throughput iif test1 oif test1 lookup 7 \n32766:\tfrom all lookup main \n32767:\tfrom all lookup default'

tos throughput 'throughput, delay, reliability and cost.

include/uapi/linux/ip.h:#define	IPTOS_THROUGHPUT	0x08

umm try with some other value

@evverx
Copy link
Member

evverx commented Nov 1, 2018

@ssahani regarding the modules, it's CentOS 7 where the kernel is far from the latest (3.10.0-862.14.4.el7.x86_64 to be precise). It's not easy to get the modules working there and nobody is going to update the kernel to just make the test pass there. If the test is supposed to be kept upstream, it should support different environments (not only the latest release of Fedora) by detecting which features are available (I guess modprobe should suffice in this particular case). If it isn't, then, well, it shouldn't have been merged in the first place.

Regarding the throughput, I'm not sure I understand what you meant by saying that some other value should be used. Could you elaborate?

@ssahani
Copy link
Contributor

ssahani commented Nov 1, 2018

@ssahani regarding the modules, it's CentOS 7 where the kernel is far from the latest (3.10.0-862.14.4.el7.x86_64 to be precise). It's not easy to get the modules working there and nobody is going to update the kernel to just make the test pass there. If the test is supposed to be kept upstream, it should support different environments (not only the latest release of Fedora) by detecting which features are available (I guess modprobe should suffice in this particular case). If it isn't, then, well, it shouldn't have been merged in the first place.

I agree we should do that . Specially cloud platform we don't have all modules.

Regarding the throughput, I'm not sure I understand what you meant by saying that some other value should be used. Could you elaborate?

I guess 0x08 is interpreted as throughput . I found this from kernel source . So whenever 0x08 is there ip rule interpreting as throughput. That is actually valid.

@evverx
Copy link
Member

evverx commented Nov 1, 2018

I changed TypeOfService from 0x08 to 253 (which is valid according to the documentation) and networkd started complaining that it couldn't add the routing policy rule:

test1: Could not add routing policy rule: Invalid argument

Judging by the output of strace, the messages networkd sends aren't valid on CentOS at least:

2863  sendto(3<NETLINK:[ROUTE:1]>, "D\0\0\0 \0\5\6\n\0\0\0\0\0\0\0\2\0 \375\7\3\0\1\0\0\0\0\10\0\2\0\300\250d\22\10\0\6\0o\0\0\0\n\0\3\0test1\0\0\0\n\0\21\0test1\0\0\0", 68, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 16) = 68
...
2863  recvmsg(3<NETLINK:[ROUTE:1]>, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"X\0\0\0\2\0\0\0\n\0\0\0\1\0\0\0\352\377\377\377D\0\0\0 \0\5\6\n\0\0\0\0\0\0\0\2\0 \375\7\3\0\1\0\0\0\0\10\0\2\0\300\250d\22\10\0\6\0o\0\0\0\n\0\3\0test1\0\0\0\n\0\21\0test1\0\0\0", 7160}], msg_controllen=24, [{cmsg_len=20, cmsg_level=SOL_NETLINK, cmsg_type=3}], msg_flags=MSG_CTRUNC}, MSG_TRUNC) = 88
2863  sendmsg(4<UNIX:[29945->7195]>, {msg_name(0)=NULL, msg_iov(4)=[{"PRIORITY=4\nSYSLOG_FACILITY=3\nCODE_FILE=../src/network/networkd-routing-policy-rule.c\nCODE_LINE=439\nCODE_FUNC=link_routing_policy_rule_handler\nERRNO=22\nINTERFACE=test1\nSYSLOG_IDENTIFIER=systemd-networkd\n", 202}, {"MESSAGE=", 8}, {"test1: Could not add routing policy rule: Invalid argument", 58}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 269

Apparently, 0x08 indeed makes the messages valid but I wouldn't say it's expected. From my perspective, it just means that TypeOfService doesn't seem to be working on CentOS.

@evverx
Copy link
Member

evverx commented Nov 1, 2018

By the way, 253 doesn't seem to be working even on Fedora 28 so I guess either https://www.freedesktop.org/software/systemd/man/systemd.network.html#TypeOfService= or the code is incorrect (or both :-))

@evverx
Copy link
Member

evverx commented Nov 1, 2018

I was wondering where throughput comes from and it turns out that the culprit is the following excerpt from /etc/iproute2/rt_dsfield:

# Deprecated values dropped upstream
# Kept in RHEL for backwards-compatibility
0x00   default
0x10   lowdelay
0x08   throughput
0x04   reliability
# This value overlap with ECT, do not use it!
0x02   mincost
# These values seems do not want to die, Cisco likes them by a strange reason.
0x20   priority
0x40   immediate
0x60   flash
0x80   flash-override
0xa0   critical
0xc0   internet
0xe0   network

The simplest way to fix it would be to just modify the regular expression used in the test :-) It would be even better if ip could print "raw" values but I haven't managed to find an option that would do exactly that.

@evverx
Copy link
Member

evverx commented Nov 1, 2018

Just to sum up, TypeOfService works fine; the documentation led me astray (I'm not sure why I even started reading it instead of https://github.com/torvalds/linux/blob/master/include/uapi/linux/ip.h :-)); the test could be fixed by adding a few modprobes and changing the regular expression a bit.

@yuwata
Copy link
Member

yuwata commented Nov 2, 2018

it's CentOS 7 where the kernel is far from the latest (3.10.0-862.14.4.el7.x86_64 to be precise)

BTW, we require kernel 3.13...

systemd/README

Lines 35 to 36 in 99f68ef

Linux kernel >= 3.13
Linux kernel >= 4.2 for unified cgroup hierarchy support

@evverx
Copy link
Member

evverx commented Nov 2, 2018

@yuwata as we saw yesterday, the bonding options added in 4.2 are backported to CentOS so it's really hard to tell what 3.10 means exactly :-) I personally think that 3.13 and 4.2 don't make much sense per se and it's just another example of documentation leading everybody astray :-), because these numbers refer to the "vanilla" kernels, which, frankly, are rarely seen in the wild. Anyway, it's just my interpretation and I may be wrong, of course, but, frankly, I'd be very surprised to find out that Red Hat Enterprise Linux 7 and CentOS 7 aren't on the list :-)

@ssahani
Copy link
Contributor

ssahani commented Nov 2, 2018

@evverx exactly.

evverx added a commit to evverx/systemd that referenced this issue Nov 2, 2018
Depending on the content of /etc/iproute2/rt_dsfield, ip can print either
`0x08` or `throughput` as was shown in systemd#10617 (comment).
evverx added a commit to evverx/systemd that referenced this issue Nov 2, 2018
Depending on the content of /etc/iproute2/rt_dsfield, ip can print either
`0x08` or `throughput` as was shown in systemd#10617 (comment).
@evverx
Copy link
Member

evverx commented Nov 2, 2018

I've just opened #10625 addressing the "throughput" issue.

I'll leave the "modules" stuff to someone else who knows how the modules are called :-)

@ssahani
Copy link
Contributor

ssahani commented Nov 2, 2018

@evverx thanks how about something like this and we skip the test

    try:                                                                                                                                                                                       
        subprocess.check_call('modprobe ipvlan', shell=True)                                                                                                                                    
    except subprocess.CalledProcessError as error:                                                                                                                                             
        pass       

@evverx
Copy link
Member

evverx commented Nov 2, 2018

@ssahani I considered something like this. More precisely, I was going to send a kludgy patch using @unittest.skipIf(not subprocess.call[...], '...'), but in the end decided that it would be better to run systemd-networkd even when the modules aren't available to see whether it crashes or not. So the final version of my kludgy patch looks like the following:

diff --git a/test/test-network/systemd-networkd-tests.py b/test/test-network/systemd-networkd-tests.py
index 82d10ddc4..adb372d60 100755
--- a/test/test-network/systemd-networkd-tests.py
+++ b/test/test-network/systemd-networkd-tests.py
@@ -22,6 +22,9 @@ dnsmasq_config_file='/var/run/networkd-ci/test-dnsmasq.conf'
 dnsmasq_pid_file='/var/run/networkd-ci/test-test-dnsmasq.pid'
 dnsmasq_log_file='/var/run/networkd-ci/test-dnsmasq-log-file'

+def is_module_available(module_name):
+    return not subprocess.call(["modprobe", module_name])
+
 def setUpModule():

     os.makedirs(network_unit_file_path, exist_ok=True)
@@ -218,7 +221,7 @@ class NetworkdNetDevTests(unittest.TestCase, Utilities):

         self.start_networkd()

-        self.assertTrue(self.link_exits('ipvlan99'))
+        self.assertTrue(self.link_exits('ipvlan99') or not is_module_available('ipvlan'))

     def test_veth(self):
         self.copy_unit_to_networkd_unit_path('25-veth.netdev')
@@ -253,14 +256,14 @@ class NetworkdNetDevTests(unittest.TestCase, Utilities):

         self.start_networkd()

-        self.assertTrue(self.link_exits('vrf99'))
+        self.assertTrue(self.link_exits('vrf99') or not is_module_available('vrf'))

     def test_vcan(self):
         self.copy_unit_to_networkd_unit_path('25-vcan.netdev')

         self.start_networkd()

-        self.assertTrue(self.link_exits('vcan99'))
+        self.assertTrue(self.link_exits('vcan99') or not is_module_available('vcan'))

     def test_geneve(self):
         self.copy_unit_to_networkd_unit_path('25-geneve.netdev')

I don't like it, I have to say, but it fixes the issue in hand just fine.

@evverx
Copy link
Member

evverx commented Nov 2, 2018

@mrc0mmand I've just updated #10625. It would be great if you could give it a try.

@yuwata yuwata closed this as completed in 7a0a37b Nov 3, 2018
@ssahani
Copy link
Contributor

ssahani commented Nov 3, 2018

looks nice !

mrc0mmand pushed a commit to mrc0mmand/systemd-rhel that referenced this issue Feb 6, 2019
Depending on the content of /etc/iproute2/rt_dsfield, ip can print either
`0x08` or `throughput` as was shown in systemd/systemd#10617 (comment).

(cherry picked from commit f7bdd56)
mrc0mmand pushed a commit to mrc0mmand/systemd-rhel that referenced this issue Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants