-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boulder container continually restarting due to healthcheck #62
Comments
I think I have the same Issue, here are my logs:
I have labca running on a proxmox VM with Debian 11 and nothing else on it. Since this error appears my NUC draws a lot more power, just this VM adding about 10W where the whole system with 5LXC and 3 other VMs usually uses 10W. |
I have this issue also on a VM with debian 11 installed. no other services on it. [AUDIT] timed out waiting for sa1.boulder:9095 health check13 seconds thought it was just me. thankfully not but now all broken. I restored my vm from a backup and it failed to work also for some other error regarding an ip of 10.88.88.88 (assuming the version of boulder is too old from that vm or something) I havent updated it since install. |
Glad to see I'm not alone! I've already got...something on the order of like a hundred certificates already issued with automated processes. Reinstalling isn't really an option for me, but I'm pleased at least to see it's not just me. No other container is causing issues still, fortunately. Although I am getting tired of my certcheck emailing me every hour for a certificate that needs renewed. 🤣 |
I tried re-installing and restored one of the weekly backups but still same issue. |
I installed a fresh run in a new debian 11 vm. all went well until restoring my backup data .tgz file I downloaded from my old instance and it went to perform a restart. then got this in the boulder container logs at the end when it crashed: 01 health check |
Interesting. Well, glad to know trying to redo this whole shebang would have resulted in the same. Did the service change every once and a while? It's sometimes the same service, then sometimes it's not. |
it does change, I think each restart. |
That would be what I've observed too. Sometimes it's the same service, more often than not it's something different. |
There was indeed an issue in generating the config files, resulting in the constant restart loop. |
Thank you!! I re-ran the install to update and its working fine now. Much appreciated. |
Can confirm, everything seems peachy now. I'm going to see if I can force run my update process and make sure all's well. EDIT: Standing corrected, I'm continually restarting again. One sec while I get the logs and see if just doing a SUPER EDIT: Standing further corrected! It seems I just needed to reboot the container, and I stopped getting my 502. I'm going to make sure at least one of my services can pull its new cert. |
Flawless! Everyone who tried to renew their cert on the hour got it renewed. Thanks for your assistance, @hakwerk |
Seems it started happening again, although this time it's a little more consistent, just @hakwerk Any chance this is the same issue? I can drop another log if needed, but it otherwise looks like the one I dropped when I first opened this. Edit: Not as consistent as I'd hoped, this last time it was |
So it had been working and then suddenly started restarting? Then it should be something else. I haven't had time yet to create a new release so nothing has changed in the code. |
All other containers running correctly, yeah. It's kind of odd, after the update it was fine, but it just started happening again recently. The only way I noticed was because my certificate updating processes started dropping tons of emails saying the gateway was 502ing again, and I haven't touched or otherwise done much with it. I think the most that's happened is the underlying VM rebooted once for updates. If you need some additional logging, I'd be happy to provide, or at least open a new issue. |
I have no way of reproducing this issue, so I would need the logs from around the time it transitions from working to not working. Hopefully there is something in the logs then to explain what triggers it |
nginx.log |
I just did a fresh install, still the same problem.
EDIT: Wait a minute, I didn't clear my databases. Let me try this again. |
...Huh. I don't know what happened, but after clearing my databases and reinstalling from scratch, it works just fine. The timeout takes a bit but it does eventually connect. I've minorly inconvenienced myself by completely clearing out the databases, but since it works now, with any manner of luck it should stay working. Won't take too long to get it back up to working state anyway. 😄 |
I just updated recently and discovered after a few days, thanks to an automated process I use for my internal renewals, the boulder container wasn't responding to requests.
On the VM in question, it runs a script every hour checking the certificate's expiration time relative to current time. If there's less than 30 days, the script calls
dehydrated -c
.This is the current return:
Where mydomain is a real domain with proper CAA, and has worked just fine up until recently.
After inspecting the boulder container's logs further, I discovered there's a healthcheck that seems to keep causing a restart, but the strange part is it's a different service every time.
boulder.log
I've included the boulder.log as a snippet from earlier today. Sometimes it's the same service twice in a row, sometimes it's others. No other container appears to be having issues, just the
boulder-boulder-1
container.I don't think I've missed anything, but I've reached the point where maybe I've been staring at the problem too long! :)
Other than this one minor hitch, this has been working perfectly up until recently, and it's been super fun to try and figure out what else I can bolt onto using self-signed ACME certs!
The text was updated successfully, but these errors were encountered: