-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mwan3: opkg remove or stop on interface up .lock > lock... #13704
Comments
to clarify and unless opkg is involved I think there might be two issues here...
and / or
|
It appears that there is a deadlock somewhere. Can you turn on debug logging in the mwan3 common.sh, and uncomment the log lines in mwan3_lock and mwan3_unlock, and then report the mwan3 lines from the system log? Also, could you install the full ps command from I have made the mwan3 much more modular in the past couple P/Rs, so we can likely get rid of a lot of the locks, but it still shouldn't deadlock under the current setup. |
thanks for the debug tips... modding confs is tricky as the lock occurs post ipk install... having said that... i've attempted a few non-install time remove-install / stop / disable whilst calling the previous firewall-ipset script... and it seems i'm unable to re-trigger this behavior... one too many variables at play ... and considering the pushed version has newer code... best to close this report down me thinks... some other points in case something like this crops up again...
|
EDIT: just came across the cleanup pr ( #13853 ) which looks like you've already tracked this down / resolved... feel free to close this... thankyou! ok... we've triggered this baby again... this time with alternate conditions so debugging was somewhat simpler... this time... : again firstboot... + current master.... issued stop early.... lock deadlock.... r14863-4a976beff4@mwan3-2.10.1-1 seems the issue is with 16-mwan user... and/or the 'running' logix ( [ -d /var/run/mwan3 ] ? ) `
` ` [root@dca632 /usbstick 41°]# strace -p 5746 [root@dca632 /usbstick 41°]# strace -p 5756 [root@dca632 /usbstick 41°]# strace -p 5755 [root@dca632 /usbstick 42°]# strace -p 5784 [root@dca632 /usbstick 41°]# cat /proc/5784/cmdline [root@dca632 /usbstick 41°]# cat /proc/5784/stat [root@dca632 /usbstick 42°]# cat /proc/5746/cmdline [root@dca632 /usbstick 41°]# cat /proc/5756/cmdline [root@dca632 /usbstick 41°]# cat /proc/5756/stat [root@dca632 /usbstick 41°]# cat /proc/5744/cmdline [root@dca632 /usbstick 41°]# ps abcde root 4306 0.0 0.0 1308 1048 ? S 03:26 0:00 /bin/sh /etc/rc.common /etc/rc.d/S95done boot ` |
The cleanup commit that you mention is just code cleanup, and wouldn't change behavior. (@feckert, correct me if I'm wrong here). With the information provided, I can reproduce the problem, and it is with the line you say. Basically, there is a deadlock where procd has the lock on The purpose of the lock is to make sure that if a hotplug state transitions and mwan3 state transitions do not interfere with each other. Now that everything is on procd, we don't need separate locking mechanisms, so we can replace the locks on /var/run/mwan3.lock with procd API calls to Thanks for identifying the issue. I'll work on a fix for this, and post again when I have a potential solution to test. |
Replace locks on /var/run/mwan3.lock with locks via procd. This fixes a deadlock issue where mwan3 stop would have a procd lock, but a hotplug script would have the /var/run/mwan3.lock Locking can be removed from mwan3rtmon since: 1) procd will have sent the KILL signal to the process during shutdown, so it will not add routes to already removed interfaces on mwan3 shutdown and 2) mwan3rtmon checks if an interface is active based on the mwan3_iface_in_<IFACE> entry in iptables, and the hotplug script always adds this before creating the route table and removes it before deleting the route table Fixes github issue openwrt#13704 (openwrt#13704)
Replace locks on /var/run/mwan3.lock with locks via procd. This fixes a deadlock issue where mwan3 stop would have a procd lock, but a hotplug script would have the /var/run/mwan3.lock Locking can be removed from mwan3rtmon since: 1) procd will have sent the KILL signal to the process during shutdown, so it will not add routes to already removed interfaces on mwan3 shutdown and 2) mwan3rtmon checks if an interface is active based on the mwan3_iface_in_<IFACE> entry in iptables, and the hotplug script always adds this before creating the route table and removes it before deleting the route table Fixes github issue openwrt#13704 (openwrt#13704)
Replace locks on /var/run/mwan3.lock with locks via procd. This fixes a deadlock issue where mwan3 stop would have a procd lock, but a hotplug script would have the /var/run/mwan3.lock Locking can be removed from mwan3rtmon since: 1) procd will have sent the KILL signal to the process during shutdown, so it will not add routes to already removed interfaces on mwan3 shutdown and 2) mwan3rtmon checks if an interface is active based on the mwan3_iface_in_<IFACE> entry in iptables, and the hotplug script always adds this before creating the route table and removes it before deleting the route table Fixes github issue openwrt#13704 (openwrt#13704) Signed-off-by: Aaron Goodman <aaronjg@stanford.edu>
closing as issue has been addressed, thankyou... |
Replace locks on /var/run/mwan3.lock with locks via procd. This fixes a deadlock issue where mwan3 stop would have a procd lock, but a hotplug script would have the /var/run/mwan3.lock Locking can be removed from mwan3rtmon since: 1) procd will have sent the KILL signal to the process during shutdown, so it will not add routes to already removed interfaces on mwan3 shutdown and 2) mwan3rtmon checks if an interface is active based on the mwan3_iface_in_<IFACE> entry in iptables, and the hotplug script always adds this before creating the route table and removes it before deleting the route table Fixes github issue openwrt#13704 (openwrt#13704)
Maintainer: @feckert
Environment: master (and or PR), aarch64
Description: opkg remove or stop on interface up .lock > lock...
not a big issue... but worth sending through FYI...
i've had a firstboot script that successfully removes current master ( and installs the PR ) which ran ok...
a recent change to the logic... performs some iptables / firewall / interface up / ipset logic just prior to the above commands... and the system hangs... ( mwan3 stop? )
seems killing the first lock gets the system 'unhanged'... so something in the opkg install / mwan3 stop / startup process... blocks the stop command... guessing that would be within the PR code...
ps w ( at hang )
The text was updated successfully, but these errors were encountered: