New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base-files: sysupgrade: Add 2 sec sleep into process KILL loop #13402
Conversation
This change slows down the Wouldn't be better to wait between the loop rounds and not between the process kills (as in your first proposal)? Or - hopefully - I read the code wrong! 😅 |
You are right. My first proposal discussed in the other PR was to place a "sleep 2" to the end of the while loop. But that caused a 2 sec delay also for the first TERM round. And then another 2 sec at the end of the first KILL round even if there had been nothing to be killed. That sounded bad to me, so I changed to the current approach. If we want to avoid the (probably rare) possibility of really multiple processes-to-be-KILLed causing a long delay inside the for statement, we might again place the "sleep 2" as the last item of the while loop, but make it conditional for $run=true , which only gets set true if we are on the KILL round and there has been something to be killed.
Then there would be just one possible sleep per while loop round.
(Still, the quite simplest would be to add the simple "sleep 2" there, like I initially did, but that would prolong sysupgrade by 4 secs always.) EDIT:
|
I've tried it on nbg7815(ipq8074) and it works even while doing iperf3 with a wifi client to wired server. PD: I've algo tried with 23.05 and algo work. I think it is a must to have also there. |
Yeah, all three approaches work. 1 )
2 )
3 )
This PR's current commit reflects alternative 2), All three approaches have so far produced me 100% success rate (in 20-30 test sysupgrades done in total). |
is 1 second not enough? or combine that with increasing the number of loops before ignoring it? |
Without a delay these loops will go through extremely fast, so this LGTM.
|
@robimarko |
@hnyman, may I ask to improve the commit description? Current description (see below) suggests the 1st approach.
|
I edited the commit message to reflect the behaviour more accurately . @robimarko @Ansuel |
I dont have commit rights, so somebody else has do to merge it |
Will take care of merging this and backport to stable
Il Ven 8 Set 2023, 18:51 Hannu Nyman ***@***.***> ha scritto:
… may I ask to improve the commit description? Current description (see
below) suggests the 1st approach.
I edited the commit message to reflect the behaviour more accurately .
@robimarko <https://github.com/robimarko> @Ansuel
<https://github.com/Ansuel>
Somebody to merge it?
—
Reply to this email directly, view it on GitHub
<#13402 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE2ZMQWBXG2MKKHK5T33JPLXZNEHXANCNFSM6AAAAAA4JFMAAI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Add 2 seconds sleep after each forcibly killed/tried-to-kill process in the final process termination loop in sysupgrade stage2. This is needed especially for qualcommax/ipq807x, where ath11k wireless driver may have a long 10-20 seconds delay after termination before actually getting killed. This often breaks sysupgrade. The current KILL loop in kill_remaining does all 10 kill attempts consecutively without any delay, as evidenced here in a failing sysupgrade. It does not allow any time for the process to finalize its internal termination. Sat Sep 2 19:05:56 EEST 2023 upgrade: Sending TERM to remaining processes ... Sat Sep 2 19:05:56 EEST 2023 upgrade: Sending signal TERM to hostapd (2122) Sat Sep 2 19:05:56 EEST 2023 upgrade: Sending signal TERM to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending KILL to remaining processes ... Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2122) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Sending signal KILL to hostapd (2138) Sat Sep 2 19:06:00 EEST 2023 upgrade: Failed to kill all processes. sysupgrade aborted with return code: 256 The change in this commit adds a 2 seconds delay after each kill attempt in order to allow some processes to more gracefully handle their internal termination. The result is like this: Sun Sep 3 11:15:10 EEST 2023 upgrade: Sending TERM to remaining processes ... Sun Sep 3 11:15:10 EEST 2023 upgrade: Sending signal TERM to hostapd (2309) Sun Sep 3 11:15:10 EEST 2023 upgrade: Sending signal TERM to hostapd (2324) Sun Sep 3 11:15:14 EEST 2023 upgrade: Sending KILL to remaining processes ... Sun Sep 3 11:15:14 EEST 2023 upgrade: Sending signal KILL to hostapd (2309) [ 699.827521] br-lan: port 7(hn5wpa2r) entered disabled state [ 699.908673] device hn5wpa2r left promiscuous mode [ 699.908721] br-lan: port 7(hn5wpa2r) entered disabled state [ 701.038029] br-lan: port 6(hn5wpa3) entered disabled state Sun Sep 3 11:15:16 EEST 2023 upgrade: Sending signal KILL to hostapd (2324) [ 702.058256] br-lan: port 5(hn2wlan) entered disabled state [ 709.250063] stage2 (8237): drop_caches: 3 Sun Sep 3 11:15:25 EEST 2023 upgrade: Switching to ramdisk... The delay introduced here only kicks in if there is some process that does not get terminated by the first TERM call. Then there is at least one 2 sec wait after the first KILL loop round. This commit is related to discussion in PRs openwrt#12235 and openwrt#12632 Signed-off-by: Hannu Nyman <hannu.nyman@iki.fi> Reviewed-by: Robert Marko <robimarko@gmail.com>
@hnyman thanks merged and backported to 23.05. |
cc @robimarko @Ansuel
Add 2 seconds sleep between loop rounds in the final process termination loop in sysupgrade stage2.
This is needed especially for qualcommax/ipq807x, where ath11k wireless driver may have a long 10-20 seconds delay after termination before actually getting killed. This breaks often sysupgrade.
The current KILL loop in kill_remaining does all 10 kill attempts consecutively without any delay, as evidenced here in a failing sysupgrade. It does not allow any time for the process to finalize its internal termination.
The change in this commit adds a 2 seconds delay after each kill loop round in order to allow some processes to more gracefully handle their internal termination.
The result is like this:
The delay introduced here only kicks in if there is some process that does not get terminated by the first TERM call. Then there is at least one 2 sec wait after the first KILL loop round.
This commit is related to discussion in PRs #12235 and #12632
Compile & run-tested for qualcommax/ipq807x DL-WRX36