-
-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreliability of HA after a few days #40471
Comments
Do you have samba or say add-ons installed? What you're looking for is a file called homeassistant.log in the configuration directory. Once home-assistant restarts it wipes it out, so you need to grab it before restarting the system |
Are you restarting it for it to come online, or does it come back on its own? If it comes back on its own is the last changed time for your entities recent or sometime before you lost HA? |
It does come back up on its own sometimes. Some other times it just freezes and won't even allow me in over ssh. Am I right to say that when HA crashed and the ssh add-on is unresponsive, I should still be able to ssh into the host over port 22222? If I do the later, can I still somehow grab the file you refer earlier? |
I have several add-ons installed. I also have HACS. Today it seems to be working fine, and this is what the cpu load looks like in a normal day. I have several automations in node-red and node-red has been hanging a lot lately. I'm very positive about the fact that my automations could be a lot more eficient but I'm not really skilled to refine them more. There definitely is something wrong with some of my configurations that kill the Pi. |
when it comes back on it's own, try to grab homeassistant.log via either samba or SSH |
I'll see to it! |
@Spartan-II-117 for the first time after getting your indications above, my system hung today when I logged in over ssh, all I found was the file you referred to above, whose logged activity seems to start right after the system recovered today
is there any way you could think of accessing the information we need? thanks! |
Unfortunately not. Have you tried disabling the custom components for node-red and HACS? |
Hi again, no I have not disabled the custom component. I could do that but I really don't have a clear indication it is the culprit. I installed that component long ago and it has been working flawlessly for months. But yes, I guess I could try disabling it. I'm open to doing so. Look, today for example I catched the system during one of those glitches. Fortunatelly I was allowed to ssh into HA and saw this:
I'm wondering if my onvif cameras could be the culprit. I have one camera that is mostly disabled because that is my preference while I'm at home. Although I asked it earlier, to the risk of sounding a bit tiresome (apologies) I will ask it again: If I ssh into the host over port 22222, is there any chance I can still log into HA and grab the file? I'm sure that's the way forward. Otherwise I will need to keep enabling and disabling stuff and the worst is that the system won't hung immediately after doing any of these changes; one needs to give it some time until it glitches. I'm gonna give this a shot: https://developers.home-assistant.io/docs/operating-system/debugging/#checking-the-logs |
Ok. I think I have it all set. I've enabled access over ssh on port 22222. Once you do that, you need to type
I really don't know if this will make any difference when home assistant is toasted. If it won't allow me in over ssh, I'm not that confident going through the underlying OS is going to make a difference but we will see. |
The onvif stuff mostly seemed to be warnings, but try disabling that camera and see how it goes. |
If you are unable to get into the host OS, the problem will s likely with
either your particular installation or with your hardware. You could try
running a virtual machine with HA from your desktop.
…On Tue, Sep 29, 2020, 12:22 marcgarciamarti ***@***.***> wrote:
Hi, my system is currently not responding. I've tried ssh-ing into the HA
container and it won't respond. Tried to ssh over port 22222 and it won't
react either. I have not disabled the onvif integration just yet (did not
get around to doing so). Any idea would be helpful... thanks a million!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40471 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFYSQAAOTK32NIBUW4OLLTSIIXW5ANCNFSM4RWJTRQA>
.
|
@Spartan-II-117 I seem to have managed to log into the HA container over ssh during one of those blackouts and managed to capture some errors. See the two referenced cases. What do you think? |
That tado error looks more serious than the effergy one, as a test, try disabling tado for a few days, if that clears up your instability, put it back together n and enable debug logging for tado, and add the relevant logs to your tado issue. |
ok. Will do as suggested. |
Yeah, you can use smart tools to view the drive controller statistics. |
Here also "random" restarts of HA since 0.116.0bX An hour ago I downgraded my Docker image from 0.116.0b3 to 0.115.6 to see of 0.115.6 is stable with my current config. |
Hello again @Spartan-II-117 and thanks for your continuous helps throughout this. I've so far disabled the Efergy and Tado integration yet today in the morning the system went down again, even with those components out of the picture. I just collected the logs and this is what I see
The system stop responding to my monitoring around 6:12am and started responding at around 6:18am. I can't see any error message in home-assistant.log that may help narrow down what is going on. I've logged into the host on port 22222 and checked the folder /var/log but couldn't see any relevant file. Do you happen to know where I should look for evidences once I'm in the host? I'm really a beginner when it comes to docker systems. Any information to help me narrow down what is going on would be very much appreciated. One last thing, even though I don't have evidences pointing towards it,I yesterday jumped the gun and purchased a new enclosure triggered by this https://community.home-assistant.io/t/hass-io-transfer-from-sd-card-to-ssd-or-usb/97452/518?u=kitus thanks |
I'm still running version 0.115.x but this random lockups have been happening to me for several weeks/months I would say (I can't really put my finger on it though). Do you happen to know how to check logging on the host? thanks! |
I'm afraid that I can't help you any further, it appears that all your integration issues are likely being caused by your hardware. Upgrading to an SSD (or a NUC) will hopefully solve your issues. If you continue having issues I would suggest starting from scratch and add one component at a time to see where the problem starts occuring. |
Please try to get a https://community.home-assistant.io/t/python3-high-cpu-usage/160012 |
Hello, quick update: by disabling both InfluxDB and Grafana, my setup has gone back to normal. No more issues. The only thing is that I've lost my ability to have insights into my environment. 2 weeks so far now with a solid environment. I've disabled this after much of a trial and error. I would rather have preferred to be able to conclude that but at least I am really not aware how to do so. Neither of the logs that I've reviewed pointed towards those two add-ons. I could now close this case but if I did that, nobody would evaluate if there is any room for improving the logging of the system. Regards |
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. |
My HA instance seems to be collapsing randomly after some days running. There is no clear pattern of when that happens. I'm clueless of where to look to provide any log.
Environment
Problem-relevant
I'm running HA OS on SSD with RPI 3B+ since 9 months or more (can't remember when I enabled that). My db is limited to 2 days with auto-purge.configuration.yaml
Additional to the USB powered SSD, I also have a C2351, a z-wave dongle and a rfxtrx dongle too.
My gut feeling points me to my hardware but I don't have evidence just yet to prove my hunch right or wrong.
Traceback/Error logs
I wish I knew what log to provide but when it collapses I just can't log back in, hence no log can be provided. If anybody could possibly provide guidance on how to retrieve logs and where, I would happily get them.Additional information
See attached picture with today's behavior
The text was updated successfully, but these errors were encountered: