Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failure in route_handler() #7797

Closed
Bluewind opened this issue Jan 3, 2018 · 17 comments · Fixed by #7815
Closed

assertion failure in route_handler() #7797

Bluewind opened this issue Jan 3, 2018 · 17 comments · Fixed by #7815
Labels
network regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Milestone

Comments

@Bluewind
Copy link

Bluewind commented Jan 3, 2018

Submission type

  • Bug report

systemd version the issue has been seen with

236 and current git master
235 worked fine and downgrading fixes the issue

Used distribution

Arch Linux

In case of bug report: Expected behaviour you didn't see

network working after suspend/resume

In case of bug report: Unexpected behaviour you saw

networkd crashes after suspend and continues to crash leaving the network mostly down

In case of bug report: Steps to reproduce the problem

suspend machine with networkd or simply start networkd once the machine is running

I've bisected it to commit 7715629 (networkd: Fix race condition in [RoutingPolicyRule] handling (#7615)).
I'm not sure what's wrong with the assertion and I don't know which configuration files may affect this. If you want, I can upload all my networkd configs, but I have a couple (including vlans and bridges).

The problem does not appear on system boot, but only once the system has been suspended. I didn't check if it also crashes if networkd is restarted prior to suspend.

@Bluewind
Copy link
Author

Bluewind commented Jan 3, 2018

Oops, forgot to include the error:
Assertion 'IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES, LINK_STATE_SETTING_ROUTES, LINK_STATE_FAILED, LINK_STATE_LINGER)' failed at ../systemd-stable/src/network/networkd-link.c:824, function route_handler(). Aborting.

@poettering
Copy link
Member

cc @ssahani

@poettering poettering added network regression ⚠️ A bug in something that used to work correctly and broke through some recent commit labels Jan 3, 2018
@poettering poettering added this to the v237 milestone Jan 3, 2018
@ssahani
Copy link
Contributor

ssahani commented Jan 4, 2018

Could you please paste the backtrace.

@Bluewind
Copy link
Author

Bluewind commented Jan 4, 2018

0x00007ffff7837860 in raise () from /usr/lib/libc.so.6
#0  0x00007ffff7837860 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7838ec9 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff74c01f6 in log_assert_failed_realm () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so
#3  0x00005555555831f8 in route_handler ()
#4  0x00007ffff751176a in sd_netlink_process () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so
#5  0x00007ffff7511b53 in io_callback () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so
#6  0x00007ffff7479620 in source_dispatch () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so
#7  0x00007ffff747990b in sd_event_dispatch () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so                    
#8  0x00007ffff747a8e7 in sd_event_run () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so                         
#9  0x00007ffff747aacb in sd_event_loop () from /home/flo/git/systemd/build/src/shared/libsystemd-shared-236.so                        
#10 0x00005555555656f4 in main ()

@ssahani
Copy link
Contributor

ssahani commented Jan 4, 2018

sorry backtrace full

@keszybz
Copy link
Member

keszybz commented Jan 4, 2018

It's arch linux, getting a (useful) backtrace is a royal pita.

@Bluewind
Copy link
Author

Bluewind commented Jan 4, 2018

If you base that on me not responding immediately then please know that I build from git, but used the same options as the package which includes a release build. But hey, sure let's make fun of distros for no reason at all. Just because users are dicks and mock systemd whenever they get a chance, you don't have to be.

BTT:
Sorry I didn't include all of that, but I had some other things to take care of and figured a simple trace might already be enough. Got a proper debug build now so here you go:

#0  0x00007ffff7383860 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ffff7384ec9 in abort () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00007ffff79fbdba in log_assert_failed_realm (realm=LOG_REALM_SYSTEMD, text=0x5555555eb848 "IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES, LINK_STATE_SETTING_ROUTES, LINK_STATE_FAILED, LINK_STATE_LINGER)", file=0x5555555eb300 "../src/network/networkd-link.c", line=824, func=0x5555555ed840 <__PRETTY_FUNCTION__.16068> "route_handler") at ../src/basic/log.c:818                                       
No locals.                                                                                                                             
#3  0x00005555555959f2 in route_handler (rtnl=0x555555814c60, m=0x555555847290, userdata=0x555555830670) at ../src/network/networkd-link.c:822                                                                                                                                 
        link = 0x555555830670                                                                                                          
        r = 32767                                                                                                                      
        __PRETTY_FUNCTION__ = "route_handler"                                                                                          
        __func__ = "route_handler"                                                                                                     
#4  0x00007ffff7acafc3 in process_reply (rtnl=0x555555814c60, m=0x555555847290) at ../src/libsystemd/sd-netlink/sd-netlink.c:352       
        c = 0x555555838200                                                                                                             
        serial = 40                                                                                                                    
        type = 2                                                                                                                       
        r = 0                                                                                                                          
        __PRETTY_FUNCTION__ = "process_reply"                                                                                          
        __func__ = "process_reply"                                                                                                     
#5  0x00007ffff7acb2ad in process_running (rtnl=0x555555814c60, ret=0x0) at ../src/libsystemd/sd-netlink/sd-netlink.c:407              
        m = 0x555555847290                                                                                                             
        r = 1                                                                                                                          
        __PRETTY_FUNCTION__ = "process_running"                                                                                        
#6  0x00007ffff7acb46e in sd_netlink_process (rtnl=0x555555814c60, ret=0x0) at ../src/libsystemd/sd-netlink/sd-netlink.c:437           
        _dont_destroy_rtnl = 0x555555814c60                                                                                            
        r = 32767                                                                                                                      
        __PRETTY_FUNCTION__ = "sd_netlink_process"                                                                                     
#7  0x00007ffff7acc261 in io_callback (s=0x555555814d20, fd=5, revents=1, userdata=0x555555814c60) at ../src/libsystemd/sd-netlink/sd-netlink.c:731                                                                                                                            
        rtnl = 0x555555814c60                                                                                                          
        r = 32767                                                                                                                      
        __PRETTY_FUNCTION__ = "io_callback"                                                                                            
#8  0x00007ffff7ab9707 in source_dispatch (s=0x555555814d20) at ../src/libsystemd/sd-event/sd-event.c:2270                             
        saved_type = SOURCE_IO                                                                                                         
        r = 0                                                                                                                          
        __PRETTY_FUNCTION__ = "source_dispatch"                                                                                        
        __func__ = "source_dispatch"                                                                                                   
#9  0x00007ffff7aba9e4 in sd_event_dispatch (e=0x555555813250) at ../src/libsystemd/sd-event/sd-event.c:2627                           
        p = 0x555555814d20                                                                                                             
        r = 1                                                                                                                          
        __PRETTY_FUNCTION__ = "sd_event_dispatch"                                                                                      
#10 0x00007ffff7abae02 in sd_event_run (e=0x555555813250, timeout=18446744073709551615) at ../src/libsystemd/sd-event/sd-event.c:2686  
        r = 1                                                                                                                          
        __PRETTY_FUNCTION__ = "sd_event_run"                                                                                           
#11 0x00007ffff7abaf3e in sd_event_loop (e=0x555555813250) at ../src/libsystemd/sd-event/sd-event.c:2706                               
        r = 1                                                                                                                          
        __PRETTY_FUNCTION__ = "sd_event_loop"                                                                                          
#12 0x0000555555561d94 in main (argc=1, argv=0x7fffffffdf28) at ../src/network/networkd.c:154                                          
        event = 0x555555813250                                                                                                         
        m = 0x555555814a10                                                                                                             
        user = 0x5555555e0248 "systemd-network"                                                                                        
        uid = 193                                                                                                                      
        gid = 193                                                                                                                      
        r = 0                                                                                                                          
        __func__ = "main"                                                                                                              
        __PRETTY_FUNCTION__ = "main"         

@ssahani
Copy link
Contributor

ssahani commented Jan 4, 2018

I am trying to reproduce it no luck . Could you paste your conf or reproducer .

@Bluewind
Copy link
Author

Bluewind commented Jan 4, 2018

I have a couple configs, so here are all of them: https://paste.xinu.at/m-9JI/
I can't tell which file/device triggers the error. Maybe you can from this output?

/etc/systemd/network/br0.2.netdev:5: Unknown section 'VLAN'. Ignoring.
br0: netdev ready
br1002: netdev ready
br0: Gained IPv6LL
br1002: Gained IPv6LL
Enumeration completed
br1002: netdev exists, using existing without changing its parameters
br0: netdev exists, using existing without changing its parameters
br1002: Configured
br0.2: Lost carrier
br1002: Lost carrier
br0.2: IPv6 successfully disabled
guest0: IPv6 successfully disabled
intern0: Lost carrier
intern0: IPv6 successfully disabled
br0: Configured
intern0: Gained carrier
intern0: Configured
intern0: Lost carrier
br0: Lost carrier
intern0: Gained carrier
intern0: Configured
br0: Gained carrier
br0.2: Gained carrier
br0.2: Configured
br1002: Gained carrier
br1002: Configured
br0: Configured
Assertion 'IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES, LINK_STATE_SETTING_ROUTES, LINK_STATE_FAILED, LINK_STATE_LINGER)' failed at ../src/network/networkd-link.c:824, function route_handler(). Aborting.

In case you want to download the configs: https://paste.xinu.at/m-9JI/tar

@Bluewind
Copy link
Author

Bluewind commented Jan 4, 2018

I've tried to reduce the test case and came up with the following simple setup having only once real device and one bridge: https://paste.xinu.at/m-mSz/ (tarball: https://paste.xinu.at/m-mSz/tar)

It works if either, br0 does not exist (ip l delete dev br0 is enough) or if I remove the IPForward=yes from br0.network and crashes otherwise. Can you reproduce with this config?

@ssahani
Copy link
Contributor

ssahani commented Jan 4, 2018

I will try it tomorrow Sorry it's late now . Thank you :)

@Bluewind
Copy link
Author

Bluewind commented Jan 4, 2018

No problem. Thanks for looking into it.

@ssahani
Copy link
Contributor

ssahani commented Jan 5, 2018

I have reproduced it working on this fix.

We don;'t expect link state to be configured while entering into route_handler
LINK_STATE_CONFIGURED


(gdb) bt
#0  0x00007ffff7381860 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7382ec9 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7a02978 in log_assert_failed_realm (realm=LOG_REALM_SYSTEMD, 
    text=0x5555555eb888 "IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES, LINK_STATE_SETTING_ROUTES, LINK_STATE_FAILED, LINK_STATE_LINGER)", 
    file=0x5555555eb340 "../src/network/networkd-link.c", line=824, func=0x5555555ed880 <__PRETTY_FUNCTION__.16093> "route_handler") at ../src/basic/log.c:818
#3  0x00005555555959f2 in route_handler (rtnl=0x555555813790, m=0x55555582c450, userdata=0x55555581cd80) at ../src/network/networkd-link.c:822
#4  0x00007ffff7aa3b8c in process_reply (rtnl=0x555555813790, m=0x55555582c450) at ../src/libsystemd/sd-netlink/sd-netlink.c:352
#5  0x00007ffff7aa3e76 in process_running (rtnl=0x555555813790, ret=0x0) at ../src/libsystemd/sd-netlink/sd-netlink.c:407
#6  0x00007ffff7aa4037 in sd_netlink_process (rtnl=0x555555813790, ret=0x0) at ../src/libsystemd/sd-netlink/sd-netlink.c:437
#7  0x00007ffff7aa4e2a in io_callback (s=0x555555813850, fd=5, revents=1, userdata=0x555555813790) at ../src/libsystemd/sd-netlink/sd-netlink.c:731
#8  0x00007ffff7a922d0 in source_dispatch (s=0x555555813850) at ../src/libsystemd/sd-event/sd-event.c:2270
#9  0x00007ffff7a935ad in sd_event_dispatch (e=0x555555811720) at ../src/libsystemd/sd-event/sd-event.c:2627
#10 0x00007ffff7a939cb in sd_event_run (e=0x555555811720, timeout=18446744073709551615) at ../src/libsystemd/sd-event/sd-event.c:2686
#11 0x00007ffff7a93b07 in sd_event_loop (e=0x555555811720) at ../src/libsystemd/sd-event/sd-event.c:2706
#12 0x0000555555561d94 in main (argc=1, argv=0x7fffffffeb28) at ../src/network/networkd.c:154
(gdb) f 2
#2  0x00007ffff7a02978 in log_assert_failed_realm (realm=LOG_REALM_SYSTEMD, 
    text=0x5555555eb888 "IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES, LINK_STATE_SETTING_ROUTES, LINK_STATE_FAILED, LINK_STATE_LINGER)", 
    file=0x5555555eb340 "../src/network/networkd-link.c", line=824, func=0x5555555ed880 <__PRETTY_FUNCTION__.16093> "route_handler") at ../src/basic/log.c:818
818	        abort();
(gdb) f 3
#3  0x00005555555959f2 in route_handler (rtnl=0x555555813790, m=0x55555582c450, userdata=0x55555581cd80) at ../src/network/networkd-link.c:822
822	        assert(IN_SET(link->state, LINK_STATE_SETTING_ADDRESSES,
(gdb) p link->state
$1 = LINK_STATE_CONFIGURED
(gdb) f 3

@keszybz
Copy link
Member

keszybz commented Jan 5, 2018

@Bluewind don't take my comment as offensive: it was a simple statement of fact. Getting a usable backtrace on arch is more complicated, when using distro packages, because debug symbols are not readily available. E.g. in Fedora I can do dnf debuginfo-install systemd (gdb will even tell me this if I try to make a backtrace without having the debuginfo package installed), and I don't need to recompile anything. Now I see you were running from git, so this didn't really apply to you. Nevertheless, we have users reporting incomplete backtraces from arch all the time, so I didn't look close enough to notice that you're not using the distro package.

@Bluewind
Copy link
Author

Bluewind commented Jan 5, 2018

Fair enough. We are working on improving that, but I can't tell you a timeframe.

ssahani pushed a commit to ssahani/systemd that referenced this issue Jan 5, 2018
Now we don't update the link state while the carrier is lost.
For example in if we are setting the routes and the carrier goes down
then the call back route_hadler think that everyting is ok and updates
the state. Again when the carrier comes back we find ourself LINK_STATE_CONFIGURED
which is not the right state.

Closes systemd#7797
@ssahani
Copy link
Contributor

ssahani commented Jan 5, 2018

@Bluewind could you please try with #7815.

@Bluewind
Copy link
Author

Bluewind commented Jan 5, 2018

Looks like that solves the problem. Thanks!

@yuwata yuwata added the has-pr label Jan 5, 2018
ssahani pushed a commit to ssahani/systemd that referenced this issue Jan 8, 2018
commit 7715629 (networkd: Fix race condition in [RoutingPolicyRule] handling (systemd#7615)).
Does not fix race. Still there is a race in case of bride because the
bride goes down and up .

calling route_configure then link_set_routing_policy_rule and the
link_check_ready makes a race between routing_policy_rule_messages and route_messages.
While bride comes up and we call the call again route_configure if finds
it self in the callback function LINK_STATE_CONFIGURED networkd dies.
Let's handle first routing policy rules then route_configure. This fixes
the crash.
Closes systemd#7797
keszybz pushed a commit that referenced this issue Jan 9, 2018
)

commit 7715629 (networkd: Fix race condition in [RoutingPolicyRule] handling (#7615)).
Does not fix race. Still there is a race in case of bride because the
bride goes down and up .

calling route_configure then link_set_routing_policy_rule and the
link_check_ready makes a race between routing_policy_rule_messages and route_messages.
While bride comes up and we call the call again route_configure if finds
it self in the callback function LINK_STATE_CONFIGURED networkd dies.
Let's handle first routing policy rules then route_configure. This fixes
the crash.
Closes #7797
floppym pushed a commit to gentoo/systemd that referenced this issue Jan 13, 2018
…stemd#7815)

commit 7715629 (networkd: Fix race condition in [RoutingPolicyRule] handling (systemd#7615)).
Does not fix race. Still there is a race in case of bride because the
bride goes down and up .

calling route_configure then link_set_routing_policy_rule and the
link_check_ready makes a race between routing_policy_rule_messages and route_messages.
While bride comes up and we call the call again route_configure if finds
it self in the callback function LINK_STATE_CONFIGURED networkd dies.
Let's handle first routing policy rules then route_configure. This fixes
the crash.
Closes systemd#7797

(cherry picked from commit 27c34f7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
network regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Development

Successfully merging a pull request may close this issue.

5 participants