-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reboot held up by Crowdsec refusing to stop #4262
Comments
|
Thanks for creating this issue. Same Problem here. It also happens if you try to stop or restart Crowdsec via the WebUI. IMHO there are two Problems:
|
|
I don’t agree with no 2. this is a sporadic issue that doesn’t have operational impact overall (as in breaks running operation of the firewall). Cheers, |
|
Having the same issue here too. Update: |
|
Facing the same issue. Anything you wanna test? EDIT 1 EDIT2: |
|
Crowdsec made a barely visible effort here https://www.reddit.com/r/opnsense/comments/1fpt7xn/comment/lp3lixx/ -- but it would be nice to have someone on GitHub like in the old days ;) |
|
@fichtner I have been coping with this sporadic issue every upgrade. Today again. bye, Rudi |
|
I agree and hope crowdsec fill fix the issue soon in their code. |
|
I too am facing this issue. Current workaround is to manually stop crowdsec BEFORE rebooting. |
|
I am too facing this issue. Not sure how long it has been having since i do not reboot my OPNsense too often |
|
Same issue here too. In fact I have been experiencing for some time now. |
|
Hi, could you test this and try start/stop. Thanks Edit: added the kill command |
Sorry for the lack of communication, I was not responsive enough across the board but you are right, github is the place I prefer as well. The issue page in crowdsecurity/crowdsec ensures attention from all the team. By the way I'm thinking that a beta-test process is in order, because the plugin needs an overhaul and its user base has doubled in a few months. |
|
@mmetc I only observed the Crowdsec service (processes) hanging during system updates. I have installed the hotfix. So, status unknown for now. I'll give it a go tomorrow. Today the users can't handle more downtime. Rudi |
|
I would like to share my testing related to the original issue. I have found the below during testing: -
|
|
As i said before I also was having problem with this issue but I have another issue. While I am not sure if it is related to crowdsec but I was start having random shutdown of OPNsense VM in Proxmox AND having reboot problem. Because I only recently installed it crowdsec plugin, AND updated to OPNsense 24.7 from 24.1. I rolled back to old backup and currently running 24.7 OPNsense without crowdsec with NO issue for 2 days. I will install crowdsec after a week if there are no random crash, and observe if crowdsec is causing this issue again. |
|
@nhatlinh1982 Sounds like the same issue to me. |
Thanks for testing! You can update the script with this command until 1.6.3-2 is out. Edit: added killall |
Thanks @mmetc - Great news, initial testing is showing that this fix has corrected the issue. Any chance you can have a look at the other issue with Crowdsec? Stop/Start button issue - #4280. The system will reboot. Do you want to proceed? [y/N]: y
*** FINAL System shutdown message from root@xxx *** System going down IMMEDIATELY |
Yes, I expect the start/stop button to work after the fetch. If it doesn't, I can inspect the crowdsec logs for any other issue. Run "cscli support dump" and send the file to support@crowdsec.net (partial configuration is included, without password/api keys of course) |
|
Thanks again @mmetc. Just letting you know that the 'Play' button remains green after CrowdSec is stopped - still an issue I am afraid. I'll send the logs to that email address. |
|
Seen and replied. As I wrote in the other issue, run "killall crowdsec" after the fetch to make sure there's no orphan process. |
|
@mmetc I have reproduced the issue I still have after installing the hotfix. Hotfix installed Status just after boot Status after STOP from Services page (won't go to STOPPED state.) Status after I then used It seems there's still an issue after the hotfix. Rudi |
|
@RudiKlein It does awfully sound the same as what I was experiencing before the hotfix. These are the steps I followed:-
I then was able to stop/restart Crowdsec via services and perform reboots without any issues so far. |
I changed the recommendation to "killall -9", that should do it, thanks! And 24.7.6 is coming with the proper package. |
|
To keep thread updated refs: |
|
@mmetc @LaurenceJJones thanks guys, closing this then :) |
|
(if that was too soon we can reopen. getting mixed signals on reddit.) |
I believe the issue is some processes are still not getting killed cause the patch is only taking effect once the currently running PID is killed hence why in our thread we mention there has to be manual intervention. Once the original PID is killed the new patch takes effect and should have no issues. If you are pre patch run these commands: then update your opnsense deployment If you are stuck mid patch and waiting for the pid to be killed then run just This will allow opnsense to update and reboot as per normal. (the previous fetch contents are included in the CrowdSec update package so no need to run it again) |
|
Judging purely by opnsense/ports@4bd513d58 which gets installed prior to reboot it should fix the impending reboot as the updated script is called to stop it. That is, of course, under the assumption the fix is accurate. Cheers, |
|
Ok so the issue is users having attempted the 24.7.5 update having the process stuck and will then continue to be stuck if attempting to go to 24.7.6 directly from the bad 24.7.5 state. But any forced reboot or the mentioned workaround would make it unstuck. Understood now. |
|
Hmm.. had it last updates and have it again now. Kinda annoying having to manually intervene each time. |
Yes this patch fixes it so you will not have to manually intervene moving forward but to do it you need to do it one more time 😓 |
|
For what it's worth, I just had the same issue when upgrading from 24.5_3 to 24.6. I ssh'ed and killed the PID which was stuck. When I did that the install process on the GUI continued on as normal. |
|
Thank you for fixing this! |
|
Nice work folks! I just ran an upgrade from Of course it did hang on waiting for Crowdsec to quit, and I had to Looking forward to future unfettered reboots. |
|
@mmetc - Some not so good news I am afraid. After upgrading to OPNsense 24.7.6 and rebooting a number of times the same issue is happening :( |
|
@daygle can you please run "cscli support dump" and send the output to support@crowdsec.net? Thanks |
Thanks @mmetc - sent you an email and support logs. |
|
same problem still every update... |
|
I upgraded to 24.7.8 from 24.7.7 and the update worked without a problem. |
|
Oh okay did you have installed also crowdsec firewall bouncer? |
|
Yes. Crowdsec was installed and bouncer was running when I did the update. What version of opnsense were you upgrading from/to? You need to have been running at least 24.7.6 for the fix to be active (i.e., if you are upgrading from a version before 24.7.6 to a version => 24.7.6, you will still get a hang on the first update). Subsequent upgrades should work just fine. |
|
I also updater from 24.7.6 to a version => 24.7.6 |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
I've lost track of which release this started with (for me). I regret not bringing this up when it began. But the trouble is that rebooting OPNsense is held up by Crowdsec "refusing" to quit. Others have faced this this as well.
Specifically (and for me, most commonly), this occurs for updates requiring reboot. The update log reported to the webUI ends with something like:
It will hang there until I ssh in, and kill that PID, resulting in it proceeding with the updates and automated reboot.
To Reproduce
Steps to reproduce the behavior:
pkill -9 -f 'daemon: crowdsec'pkill -9 -f 'crowdsec -c'(this PID may also need to be whacked, but not always)rebootfrom that same ssh sessionrebootis attempted prior to killing Crowdsec, that will fail for waiting for Crowdsec to quit nicely, requiring step 3 aboveExpected behavior
That a reboot triggered will have no problem killing/quitting all running processes, and finally resulting in rebooting the system.
Screenshots
n/a
Relevant log files
(none that I could find, aside from update log in webUI pasted above)
Additional context
Once this behavior started, it has been consistently like this, every time.
Environment
OPNsense 24.7.4_1-amd64
FreeBSD 14.1-RELEASE-p4
OpenSSL 3.0.15
System: Supermicro SYS-E302-9D
CPU: Intel(R) Xeon(R) D-2123IT CPU @ 2.20GHz
Network: Intel I350-AM4 & Intel X557
The text was updated successfully, but these errors were encountered: