Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 Compliance Summary: 83 failures of IPv6 Core Conformance Tests #28502

Closed
LiveFreeAndRoam opened this issue Jul 24, 2023 · 31 comments
Closed
Labels
bug 🐛 Programming errors, that need preferential fixing network

Comments

@LiveFreeAndRoam
Copy link
Contributor

LiveFreeAndRoam commented Jul 24, 2023

systemd version the issue has been seen with

250

Used distribution

Embedded Linux - Debian-based

Linux kernel version used

5.15.71-rt51+gc36e774d0d9a

CPU architectures issue was seen on

aarch64

Component

systemd-networkd

Expected behaviour you didn't see

Pass IPv6 Core Conformance tests.

Unexpected behaviour you saw

Encountered 83 IPv6 Core Conformance failures.

Please find attached a zip file containing:

  1. IPv6 Core Conformance specification
  2. tc_legacy_failures_summary.txt: 6 failing cases
  3. tc_networkd_failures_summary.txt: 83 failing cases

The legacy file was captured when our host was configured using Debian's legacy /etc/network/interfaces file and networkd was disabled and stopped. The networkd file was captured after configuring networkd to manage the interface.

The txt files just list the failing test case numbers. To decode those, you will need to look them up in the test specification.

failures.zip

Steps to reproduce the problem

Run one of the conformance test suites that is referenced on IPv6 Ready Logo Page.

References:

Additional program output to the terminal or log subsystem illustrating the issue

No response

@LiveFreeAndRoam
Copy link
Contributor Author

In the past week, I've managed to upgrade our embedded system from systemd v250 to v253. I have rerun a subset of the tests, ("Base" tests), against systemd v253. The results have improved a little:

"Base" Test Failures:

  • v250: 69 failures
  • v253: 44 failures

There are still too many failures to make it practical to raise individual issues.

I've attached a zip file that lists the failing "Base" tests for v250 and v253.

tc_networkd-v253_base_failures_summary.zip

@shemminger
Copy link
Contributor

Most of this is from Linux kernel implementation choices.
Not systemd related. You probably have to raise issue with Linux IPV6 maintainers

@paulocoghi
Copy link

On July 13, Matt (the OP) posted on the mailing list:

I am noticing networkd has a number of IPv6 compliance issues, where it is not meeting various RFC's "must" requirements. For comparison, when I stop networkd and configure the network via legacy methods, the protocol exchanges comply with the RFCs.

If his tests are correct, it doesn't seem to be a kernel issue. But, of course, it would be interesting if he could provide more details about the test.

@LiveFreeAndRoam
Copy link
Contributor Author

Correct, this is not a kernel issue. Evidently, networkd is part of the IPv6 protocol exchange. It intercepts the IPv6 RAs and disables the kernel's sysctl accept_ra, so the kernel is removed from the RA conversation. Unfortunately, networkd is not handling the RAs in a compliant fashion.

I reported 7 issues (#28421, #28434, #28435, #28437, #28438, #28439, #28449) before realising there were so many more and it was not practical to keep reporting these one by one.

it would be interesting if he could provide more details about the test.

IPv6 Forum created the test specifications for IPv6 conformance and interoperability testing. More can be read about it at IPv6 Ready Logo Program.

The test suite has been around since the early days of IPv6 development. It is used by IPv6 Protocol developers to ensure they have delivered a compliant implementation of the IPv6 protocol. Over the years, I have worked with Linux kernel developers, who also use the test suite, to address a small number of issues that arise from time to time. Product developers use the test suite for certification, as various customers (understandably) demand the IPv6 protocol is compliant.

Now that networkd has its hands in the IPv6 protocol, then it too must be developed with IPv6 compliance as a goal. This test suite is the tool that will ensure that goal is met.

@crrodriguez
Copy link
Contributor

@LiveFreeAndRoam thanks for taking the time to do this, Im afraid if you want each issue fixed and not lost in the sea of bugreports you will have to file one issue per problem. otherwise things might never be corrected.

@evverx
Copy link
Member

evverx commented Jul 29, 2023

I don't think it makes much sense to report those issues (and fix them) one by one without integrating that test suite into the CI one way or another. That part of networkd is undertested and tends to break every time it changes so with no new tests all those one-off fixes are likely to make things even worse.

This test suite is the tool that will ensure that goal is met.

The problem is that the commercial test suite can't be used upstream. The TAHI project doesn't seem to be actively maintained any more but other than that it comes with a custom licence containing what seems to be a non-commercial clause so it can't be integrated into the upstream CI easily either.

@paulocoghi
Copy link

A little history/context that might be useful.

Hideaki Yoshifuji @yoshfuji worked on the project (USAGI Project) which added IPv6 conformance tests to the Linux kernel, using the TAHI tests. Unfortunately, he doesn't seems active anymore, with his last activity on Github on February 2020 [1].

On his publication with Keio University, he he emphasized that "to maintain the stack stable, we developed an automatic testing system, which greatly helps us saving our time."

It would be interesting to know how the USAGI project circumvented those custom license problems, considering that the full work was mainlined on the Linux kernel.

@evverx
Copy link
Member

evverx commented Jul 29, 2023

Looking at

No merchantable use may be permitted without prior written
notification to the copyrighters. However, using this software for the
purpose of testing or evaluating any products including merchantable
products may be permitted without any notification to the copyrighters.

I think it should be fine to run the test suite downstream somewhere without including the code of the test anywhere or changing it. The systemd test suites are kept and maintained upstream in the repository though and I think that clause would prevent that.

@evverx
Copy link
Member

evverx commented Jul 29, 2023

That licence was in v6eval. Looks like there is another license there covering the test suite itself:

  1. No merchantable use may be permitted without prior written
    notification to the copyrighters.
  1. The copyrighters, the project and the contributors may prohibit
    the use of this software at any time.

and that's not very reassuring. Either way it would be interesting to know how the USAGI project circumvented those custom license problems. It seems to me that the safest way would be to use the spec to implement that stuff from scratch but it's a huge undertaking.

@yuwata yuwata added the network label Aug 14, 2023
@ssahani
Copy link
Contributor

ssahani commented Aug 28, 2023

@evverx I failed to run The TAHI project . Hence if we get all the issues pcap files we will be able to proceed further. Else I am stuck here.

@evverx
Copy link
Member

evverx commented Aug 28, 2023

@ssahani it isn't up to me :-) Personally I don't think it makes sense to manually file a lot of issues, verify all the patches and catch all sorts of regressions in the process manually but it's just my opinion.

@ssahani
Copy link
Contributor

ssahani commented Aug 28, 2023

@ssahani it isn't up to me :-) Personally I don't think it makes sense to manually file a lot of issues, verify all the patches and catch all sorts of regressions in the process manually but it's just my opinion.

Yes I hope @LiveFreeAndRoam will be able to attach.

@evverx
Copy link
Member

evverx commented Aug 28, 2023

Before I forget #28969 seems to be somewhat related to this issue.

@ssahani
Copy link
Contributor

ssahani commented Aug 28, 2023

#28969

It talks about RFC6334 . I think only RA's M and O bit.

@LiveFreeAndRoam
Copy link
Contributor Author

I am willing to help. I feel we can get into a rhythm after we get through a few of these cases.

Is there a way we can setup a meet to coordinate and plan an approach? @ssahani, I sent you an email.

@ssahani
Copy link
Contributor

ssahani commented Aug 31, 2023

I am willing to help. I feel we can get into a rhythm after we get through a few of these cases.

Is there a way we can setup a meet to coordinate and plan an approach? @ssahani, I sent you an email.

I did not received any mail .Please add @yuwata and @keszybz too in the list.

@LiveFreeAndRoam
Copy link
Contributor Author

Hi @ssahani, I have sent an email to the three of you.

In the attached zip file, there are two tar files:

  1. tc_results_networkd.tar: test results with networkd running.
  2. tc_results_legacy.tar: networkd has been stopped.

These tar files contain my logs and pcapng files for each of the test cases in the IPv6 Logo certification test suite.

In the cases where networkd fails, the legacy version can be examined for the correct behaviour.

tc_results.zip

Let me know if you need more info or otherwise how I can help address these.

@yuwata
Copy link
Member

yuwata commented Apr 17, 2024

@LiveFreeAndRoam So, individual issues has been already fixed and closed, except for #31624. Could you re-run whole test suite again with the current git main? And if there is no new issue, please close this.

Unfortunately, still I have no idea about #31624. Still investigating.

@LiveFreeAndRoam
Copy link
Contributor Author

Agreed. With just one test that needs to be resolved, I am closing this issue.

For the record...

From the IPv6 Core Host Conformance Test Suite (PDF) there was initially 105 failing tests out of a total 416 tests. As an aside, the reason this issue lists just 83-failing tests was because the results of other tests were still pending.

Thanks to the great work by @yuwata, we now have just 1 failing test. The full list of tests that failed is below:

Test Title
v6LC_1_1_07a Unrecognized Next Header in IPv6 Header (Multiple Values)
v6LC_1_2_03a Unrecognized Next Header in Extension Header (Multiple Values)
v6LC_2_1_01b On-link Determination
v6LC_2_1_02b Resolution Wait Queue
v6LC_2_1_03 Prefix Information Option Processing, On-link Flag (Hosts Only)
v6LC_2_1_04a Host Prefix List (Hosts Only)
v6LC_2_1_05a Neighbor Solicitation Origination, Address Resolution (Local Target)
v6LC_2_1_05b Neighbor Solicitation Origination, Address Resolution (Global Target)
v6LC_2_1_06a Neighbor Solicitation Origination, Reachability Confirmation (LLA>LLA)
v6LC_2_1_06b Neighbor Solicitation Origination, Reachability Confirmation (GA>GA)
v6LC_2_1_06c Neighbor Solicitation Origination, Reachability Confirmation (LLA>GA)
v6LC_2_1_06d Neighbor Solicitation Origination, Reachability Confirmation (GLA>LLA)
v6LC_2_1_11a Neighbor Solicitation Processing, NCE State STALE (Unicast same SLLA)
v6LC_2_1_11c Neighbor Solicitation Processing, NCE State STALE (Multicast same SLLA)
v6LC_2_1_19a Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19b Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19e Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19f Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19m Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19n Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19q Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_19r Neighbor Advertisement Processing, NCE State STALE
v6LC_2_1_21a Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:1 O:1)
v6LC_2_1_21b Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:0 O:0)
v6LC_2_1_21c Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:0 O:1)
v6LC_2_1_21d Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:1 O:0)
v6LC_2_1_21e Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:1 O:1, TLL)
v6LC_2_1_21f Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:0 O:0, TLL)
v6LC_2_1_21g Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:0 O:1, TLL)
v6LC_2_1_21h Neighbor Advertisement Processing, R-bit Change (Hosts Only) (R:0 S:1 O:0, TLL)
v6LC_2_2_02f Router Solicitations, Solicited Router Advertisement (Hosts Only) (RA bad code)
v6LC_2_2_11 Default Router Switch (Hosts Only)
v6LC_2_2_13b Router Advertisement Processing, Cur Hop Limit (Non-zero)
v6LC_2_2_14a Router Advertisement Processing, Router Lifetime (Hosts Only) (Updated with same life)
v6LC_2_2_15a Router Advertisement Processing, Reachable Time (Host only)
v6LC_2_2_16f Router Advertisement Processing, Neighbor Cache (Hosts Only) (SLLA change, NCE Probe)
v6LC_2_2_16g Router Advertisement Processing, Neighbor Cache (Hosts Only) (SLLA nochange, NCE Probe)
v6LC_2_2_16h Router Advertisement Processing, Neighbor Cache (Hosts Only) (SLLA omitted, NCE Probe)
v6LC_2_2_16j Router Advertisement Processing, Neighbor Cache (Hosts Only) (SLLA nochange, NCE Stale)
v6LC_2_2_16k Router Advertisement Processing, Neighbor Cache (Hosts Only) (SLLA omitted, NCE Stale)
v6LC_2_2_19 Router Advertisement Processing, On-link determination (Host Only)
v6LC_2_2_22a Processing Router Advertisements with Router Preference (Host Only) (High Route pref)
v6LC_2_2_22b Processing Router Advertisements with Router Preference (Host Only) (Low Route pref)
v6LC_2_2_22c Processing Router Advertisements with Router Preference (Host Only) (Reserved pref)
v6LC_2_2_22d Processing Router Advertisements with Router Preference (Host Only) (Lower pref)
v6LC_2_2_22e Processing Router Advertisements with Router Preference (Host Only) (Higher pref)
v6LC_2_2_23c Processing Router Advertisement with Route Information Option (Host Only) (Reserved pref)
v6LC_2_2_23f Processing Router Advertisement with Route Information Option (Host Only) (Change pref)
v6LC_2_2_23g Processing Router Advertisement with Route Information Option (Host Only) (Len 0, pref high)
v6LC_2_2_23h Processing Router Advertisement with Route Information Option (Host Only) (Len 0, low pref)
v6LC_2_2_23j Processing Router Advertisement with Route Information Option (Host Only) (Len 0, life 0)
v6LC_2_2_25a Processing Router Advertisement DNS (Host Only) (Recursive DNS option)
v6LC_2_2_25c Processing Router Advertisement DNS (Host Only) (Recursive DNS expired)
v6LC_2_2_25d Processing Router Advertisement DNS (Host Only) (Search List Option)
v6LC_2_2_25f Processing Router Advertisement DNS (Host Only) (Search List Option expired)
v6LC_2_3_01a Redirected On-link: Valid (Hosts Only) (no TLLA or Redirect Pkt Option)
v6LC_2_3_01b Redirected On-link: Valid (Hosts Only) (no TLLA Option)
v6LC_2_3_01c Redirected On-link: Valid (Hosts Only) (no Redirect Pkt Option)
v6LC_2_3_01d Redirected On-link: Valid (Hosts Only) (TLLA and Redirect Pkt Option)
v6LC_2_3_02a Redirected On-link: Suspicious (Hosts Only) (Option unrecognized)
v6LC_2_3_02b Redirected On-link: Suspicious (Hosts Only) (Reserved field is non-zero)
v6LC_2_3_02c Redirected On-link: Suspicious (Hosts Only) (Target Address no Covered by On-link Prefix)
v6LC_2_3_04a Redirected to Alternate Router: Valid (Hosts Only) (No TLLA or Redirect Pkt Option)
v6LC_2_3_04b Redirected to Alternate Router: Valid (Hosts Only) (no TLLA Option)
v6LC_2_3_04c Redirected to Alternate Router: Valid (Hosts Only) (No Redirect Option)
v6LC_2_3_04d Redirected to Alternate Router: Valid (Hosts Only) (TLLA and Redirectd Packet Option)
v6LC_2_3_05a Redirected to Alternate Router: Valid (Hosts Only) (Option Unrecognized)
v6LC_2_3_05b Redirected to Alternate Router: Valid (Hosts Only) (Reserved field is non-zero)
v6LC_2_3_07 Redirected Twice (Hosts Only)
v6LC_2_3_08a Invalid Option (Hosts Only) (Path MTU Option)
v6LC_2_3_08b Invalid Option (Hosts Only) (Prefix Information Option)
v6LC_2_3_08c Invalid Option (Hosts Only) (Source Link-layer Address Option)
v6LC_2_3_09 No Destination Cache Entry (Hosts Only)
v6LC_2_3_13a Neighbor Cache Updated from State STALE (Hosts Only) (No TLLA, No Redirect, LLA unchanged)
v6LC_2_3_13b Neighbor Cache Updated from State STALE (Hosts Only) (TLLA, No Redirect, LLA unchanged)
v6LC_2_3_14a Neighbor Cache Updated from State PROBE (Hosts Only) (No TLLA, No Redirect, LLA unchanged)
v6LC_2_3_14b Neighbor Cache Updated from State PROBE (Hosts Only) (TLLA, No Redirect, LLA unchanged)
v6LC_2_3_14c Neighbor Cache Updated from State PROBE (Hosts Only) (TLLA, No Redirect, LLA updated)
v6LC_2_3_14d Neighbor Cache Updated from State PROBE (Hosts Only) (TLLA, Redirect, LLA updated)
v6LC_2_3_14e Neighbor Cache Updated from State PROBE (Hosts Only) (TLLA, Oversized Redirect, LLA updated)
v6LC_3_1_02b Receiving DAD Neighbor Solicitations and Advertisements (NUT receives DAD NS (target == NUT)
v6LC_3_1_02c Receiving DAD Neighbor Solicitations and Advertisements (NUT receives DAD NA (target != NUT)
v6LC_3_1_02d Receiving DAD Neighbor Solicitations and Advertisements (NUT receives DAD NA (target == NUT)
v6LC_3_1_03a Validation of DAD Neighbor Solicitations (NUT receives invalid DAD NS, ICMP len < 24 octets)
v6LC_3_1_03b Validation of DAD Neighbor Solicitations (NUT receives invalid DAD NS, HopLimit != 255)
v6LC_3_1_03c Validation of DAD Neighbor Solicitations (NUT receives invalid DAD NS, Dst=NUT's tentative addr)
v6LC_3_1_03d Validation of DAD Neighbor Solicitations (NUT receives invalid DAD NS, Dst=allnodes)
v6LC_3_1_03e Validation of DAD Neighbor Solicitations (NUT receives invalid DAD NS, ICMP code != 0)
v6LC_3_1_03i Validation of DAD Neighbor Solicitations (NUT receives valid DAD NS, Reserved field)
v6LC_3_1_03j Validation of DAD Neighbor Solicitations (NUT receives valid DAD NS, Contains TLL)
v6LC_3_1_04h Validation of DAD Neighbor Advertisements (NUT receives valid DAD NA, Contains Reserved)
v6LC_3_1_04i Validation of DAD Neighbor Advertisements (NUT receives valid DAD NA, Contains SLL)
v6LC_3_2_03 Multiple Prefixes and Network Renumbering (Hosts only)
v6LC_3_2_05c Prefix-Information Option Processing, Lifetime (Hosts Only) (prefix life < remaining && life < 2h)
v6LC_3_2_05d Prefix-Information Option Processing, Lifetime (Hosts Only) (prefix life < 2h && life > 2h)
v6LC_3_2_06a Stable addresses (Host Only) (LLA vs GA)
v6LC_3_2_06b Stable addresses (Host Only) (Reboot)
v6LC_3_2_07a Resolving DAD Conflicts (Host Only) (LLA)
v6LC_3_2_07b Resolving DAD Conflicts (Host Only) (GA)
v6LC_4_1_08 Router Advertisement with MTU Option (Hosts Only)
v6LC_4_1_10 Multicast Destination – One Router
v6LC_4_1_11 Multicast Destination – Two Router
v6LC_4_1_12 Validate Packet Too Big
v6LC_5_1_01 Transmitting Echo Requests
v6LC_5_1_07 Unrecognized Next Header (Parameter Problem Generation)

@evverx
Copy link
Member

evverx commented Apr 22, 2024

I wonder if it would be possible to run this test suite and networkd under ASan/UBSan just in case? There were already heap-use-after-frees like #31485 and crashes introduced in the process so it would be great to run the whole test suite just in case.

@LiveFreeAndRoam
Copy link
Contributor Author

I've read a little about ASan and UBSan. Do you have a particular set of configuration options you'd like to use when compiling and running under UBSan and ASan?

Periodically, I run the full test suite. It takes about 7 hours to run. So, knowing UBSan config options ahead of time will be prudent.

@evverx
Copy link
Member

evverx commented Apr 22, 2024

I think it should be documented in https://github.com/systemd/systemd/blob/main/docs/TESTING_WITH_SANITIZERS.md. (The "clang" paragraph targets the systemd test suite though where it's necessary to pass -shared-libsan so it should be possible to omit it when networkd built with clang is run outside of that systemd testsuite).

The unit file should be adjusted too to be compatible with ASan/UBSan. The upstream test suite comments out MemoryDeny* and SystemCall*

sed -i 's/^\(MemoryDeny\|SystemCall\)/#\1/'

because they interfere with the sanitizers.

@yuwata
Copy link
Member

yuwata commented Apr 22, 2024

Agreed. With just one test that needs to be resolved, I am closing this issue.

For the record...

From the IPv6 Core Host Conformance Test Suite (PDF) there was initially 105 failing tests out of a total 416 tests. As an aside, the reason this issue lists just 83-failing tests was because the results of other tests were still pending.

Thanks to the great work by @yuwata, we now have just 1 failing test. The full list of tests that failed is below:

@LiveFreeAndRoam Wow! Thank you! Your help is much appreciated.

@LiveFreeAndRoam
Copy link
Contributor Author

LiveFreeAndRoam commented Apr 23, 2024

A headsup... I have started testing with UBSan and ASan and now quite a few tests are failing, currently 56. The systemd-networkd process has been running uninterrupted the whole time, so I presume no sanitize problems are being detected.

I'm using:

$ git log -n 1
commit 6bd675a659a508cd1df987f90b633ed1c4b12cb3 (HEAD -> main)
Author: Lennart Poettering <lennart@poettering.net>
Date:   Mon Apr 22 17:30:58 2024 +0200

I'm launching systemd-networkd directly via VSCode's debugger. To capture a few things in case it matters...

launch.json file

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "(gdb) Launch systemd-networkd",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/systemd-networkd",
            "envFile": "${workspaceFolder}/.env",
            "args": [],
            "stopAtEntry": false,
            "cwd": "${fileDirname}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "miDebuggerPath": "/home/matt/bin/sudo-gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                },
                {
                    "description": "Set Disassembly Flavor to Intel",
                    "text": "-gdb-set disassembly-flavor intel",
                    "ignoreFailures": true
                }
            ],
            "preLaunchTask": "Make systemd",
        }

    ]
}

.env file

SYSTEMD_LOG_LEVEL=debug
ASAN_OPTIONS=strict_string_checks=1:detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1
UBSAN_OPTIONS=print_stacktrace=1:print_summary=1:halt_on_error=1

diff of meson.build to incorporate sanitize feature

$ git diff meson.build
diff --git a/meson.build b/meson.build
index e2de148095..423e0f41f8 100644
--- a/meson.build
+++ b/meson.build
@@ -410,6 +410,7 @@ possible_common_link_flags = [
 ]
 
 c_args = get_option('c_args')
+c_args += '-Db_sanitize=address,undefined'
 
 # Our json library does not support -ffinite-math-only, which is enabled by -Ofast or -ffast-math.
 if (('-Ofast' in c_args or '-ffast-math' in c_args or '-ffinite-math-only' in c_args) and '-fno-finite-math-only' not in c_args)

systemd-networkd.service changes for sanitize

$ systemctl  cat systemd-networkd | egrep "#Memory|#System"
#MemoryDenyWriteExecute=yes
#SystemCallArchitectures=native
#SystemCallErrorNumber=EPERM
#SystemCallFilter=@system-service

Once this test run is complete, I'll disable the UBSan and ASan and rerun the tests.

@evverx
Copy link
Member

evverx commented Apr 23, 2024

systemd-networkd directly via VSCode's debugger

Debuggers are usually incompatible with the leak sanitizer so it usually complains when networkd is stopped:

==3857==LeakSanitizer has encountered a fatal error.
==3857==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==3857==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

Even when networkd is launched directly without debuggers the leak sanitizer is usually unhappy because networkd drops privileges. In terms of making it work it would be better to run it using the unit file or systemd-run with the AmbientCapabilities, CapabilityBoundingSet and User properties copied from the unit file.

@evverx
Copy link
Member

evverx commented Apr 23, 2024

+c_args += '-Db_sanitize=address,undefined'

I'm not sure it works. Assuming it's built with gcc could you try running ldd build/systemd-networkd | grep san. If the build is sanitized it should show something like

libasan.so.8 => /lib64/libasan.so.8 (0x00007fdbd9600000)
libubsan.so.1 => /lib64/libubsan.so.1 (0x00007fdbd6600000)

@evverx
Copy link
Member

evverx commented Apr 23, 2024

Something like

--- a/meson.build
+++ b/meson.build
@@ -9,6 +9,7 @@ project('systemd', 'c',
                 'sysconfdir=/etc',
                 'localstatedir=/var',
                 'warning_level=2',
+                'b_sanitize=address,undefined',
         ],
         meson_version : '>= 0.60.0',
        )

(without c_args) should get it to work. b_sanitize is a builtin meson thing that automagically adds all the flags to the compiler/linker flags.

@LiveFreeAndRoam
Copy link
Contributor Author

LiveFreeAndRoam commented Apr 23, 2024

Thank you @evverx. As you suspected, my systemd-networkd was missing the sanitizer. I have addressed that as per your instructions and it is now built with the sanitizer.

$ ldd build/systemd-networkd | grep san
        libasan.so.8 => /lib/x86_64-linux-gnu/libasan.so.8 (0x00007b8a94e00000)
        libubsan.so.1 => /lib/x86_64-linux-gnu/libubsan.so.1 (0x00007b8a91e00000)

Perhaps your diff can make it into the meson.build but commented out, along with your ldd command? It would make the process less error-prone.

@yuwata, can you confirm the commmit that I should be testing. Yesterday's testing with systemd/main returned quite a few failures. E.g. all the tests related to Redirect failed. The version I used was:

$ git log -n 1
commit 6bd675a659a508cd1df987f90b633ed1c4b12cb3 (HEAD -> main)
Date:   Mon Apr 22 17:30:58 2024 +0200

# Edit: I also got the same result with: 9506269, Date:   Mon Apr 15 14:43:12 2024 +0900

$ git remote -v
origin  https://github.com/systemd/systemd (fetch)
origin  https://github.com/systemd/systemd (push)

I retested the Redirect failure using your latest fork and the test passed.

$ git log -n 1
commit e75ca3fe101885379b92a6511633ce065f59fecf (HEAD -> network-next, origin/network-next)
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date:   Tue Apr 23 13:15:49 2024 +0900

    sd-radv: drop sd_radv_prefix and friends, and use sd_ndisc_option to manage NDisc options
    
    No effective functional change, just refactoring.

$ git remote -v
origin  https://github.com/yuwata/systemd.git (fetch)
origin  https://github.com/yuwata/systemd.git (push)
upstream        https://github.com/systemd/systemd.git (fetch)
upstream        https://github.com/systemd/systemd.git (push)

For my reference, the previous successful test run used:

$ git log -n 1
commit 766495b2c4e33f3246d0e84d884018a69f1fe19c (HEAD -> network-next)
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date:   Mon Apr 15 11:22:56 2024 +0900

    sd-radv: drop sd_radv_prefix and friends, and use sd_ndisc_option to manage NDisc options
    
    No effective functional change, just refactoring.

@yuwata
Copy link
Member

yuwata commented Apr 24, 2024

@LiveFreeAndRoam Please test with the main branch of systemd/system. I have not changed client side recently, and no pending client side change is waiting in network-next branch.
I frequently push network-next branch, so I cannot track the commit hash of network-next branch, sorry.

The simple test case for Redirect message in our test suite is fine, IIRC.
Note, Redirect message support has knob UseRedirect=, so please check that is not disabled. Though it is enabled by default. so unless you explicitly disable it, Redirect message should be handled.

@evverx
Copy link
Member

evverx commented Apr 24, 2024

Perhaps your diff can make it into the meson.build but commented out, along with your ldd command?

I'm not sure it should go to meson.build but I agree that the documentation is far from perfect in the sense that I reread it the other day and I have to admit it isn't clear how to build and run separate components like networkd under ASan/UBsan. Ideally there should probably be scripts that could do that automatically by building systemd, making sure builds are sanitized, running separate components with the right unit files pointing to the newly built components and then looking for ASan/UBSan backtraces in the journal. The upstream test suite kind of does that and the documentation refers to the bash scripts but I don't think it should be necessary to go over bash scripts and so on to just run networkd under ASan/UBsan.

@LiveFreeAndRoam
Copy link
Contributor Author

LiveFreeAndRoam commented Apr 28, 2024

@yuwata, I have finally gotten to the bottom of these most recent test case failures. Please see #32527.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing network
Development

No branches or pull requests

7 participants