Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Docker Container Restarts. #20829

Open
aptonline opened this Issue Feb 7, 2019 · 32 comments

Comments

Projects
None yet
5 participants
@aptonline
Copy link

aptonline commented Feb 7, 2019

Home Assistant release with the issue:
0.87.0 (and previous version from at least 0.85.X)

Last working Home Assistant release (if known):
0.84.X

Operating environment (Hass.io/Docker/Windows/etc.):
Docker (Synology Disktation)

Component/platform:
unknown component

Description of problem:
Randomly my docker install of Home Assistant is crashing out, there seems to be no direct component or platform at fault. It started a few versions ago and has gradually got worse where it restarts multiple times a day. I thought it was related to the way docker logging works (causing memory leaks) based on other issues raised on GitHub but even with logging disabled the issue is still happening. Recently I am seeing the following traceback (which I can't find reference to anywhere else in the HA issues) in the container logs just prior to the container restarting.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):
n/a

Traceback (if applicable):

python: src/unix/core.c:898: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

Additional information:
I am also seeing the following in my logs which may or not be related.

Traceback (most recent call last):
15:40:49	  File "/usr/local/lib/python3.6/socket.py", line 713, in create_connection
15:40:49	    sock.connect(sa)
15:40:49	OSError: [Errno 9] Bad file descriptor
15:40:49	During handling of the above exception, another exception occurred:
15:40:49	Traceback (most recent call last):
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 1318, in do_open
15:40:49	    encode_chunked=req.has_header('Transfer-encoding'))
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 1239, in request
15:40:49	    self._send_request(method, url, body, headers, encode_chunked)
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 1285, in _send_request
15:40:49	    self.endheaders(body, encode_chunked=encode_chunked)
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 1234, in endheaders
15:40:49	    self._send_output(message_body, encode_chunked=encode_chunked)
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 1026, in _send_output
15:40:49	    self.send(msg)
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 964, in send
15:40:49	    self.connect()
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 1392, in connect
15:40:49	    super().connect()
15:40:49	  File "/usr/local/lib/python3.6/http/client.py", line 936, in connect
15:40:49	    (self.host,self.port), self.timeout, self.source_address)
15:40:49	  File "/usr/local/lib/python3.6/socket.py", line 721, in create_connection
15:40:49	    sock.close()
15:40:49	  File "/usr/local/lib/python3.6/socket.py", line 417, in close
15:40:49	    self._real_close()
15:40:49	  File "/usr/local/lib/python3.6/socket.py", line 411, in _real_close
15:40:49	    _ss.close(self)
15:40:49	OSError: [Errno 9] Bad file descriptor
15:40:49	During handling of the above exception, another exception occurred:
15:40:49	Traceback (most recent call last):
15:40:49	  File "/usr/local/lib/python3.6/site-packages/smart_home/__init__.py", line 30, in postRequest
15:40:49	    resp = urllib.request.urlopen(req, params, timeout=timeout) if params else urllib.request.urlopen(req, timeout=timeout)
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 223, in urlopen
15:40:49	    return opener.open(url, data, timeout)
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 526, in open
15:40:49	    response = self._open(req, data)
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 544, in _open
15:40:49	    '_open', req)
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _call_chain
15:40:49	    result = func(*args)
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 1361, in https_open
15:40:49	    context=self._context, check_hostname=self._check_hostname)
15:40:49	  File "/usr/local/lib/python3.6/urllib/request.py", line 1320, in do_open
15:40:49	    raise URLError(err)
15:40:49	urllib.error.URLError: <urlopen error [Errno 9] Bad file descriptor>
@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 7, 2019

The CPU spikes seen here seem to corespondent to the restarts:
screen shot 2019-02-07 at 17 54 41

@awarecan

This comment has been minimized.

Copy link
Contributor

awarecan commented Feb 7, 2019

There is a similar issue reported in MagicStack/uvloop#125

It suppose be fixed in uvloop 0.11.1

Could you try to execute pip freeze | uvloop to check your uvloop version.

Another workaround, you can try to uninstall uvloop by pip uninstall uvloop. HA will fall back to use default asyncio implement.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 7, 2019

Ok so it wouldn’t let me check the version with the command given but when I attempted to uninstall it said uvloop 0.12.0.

@awarecan

This comment has been minimized.

Copy link
Contributor

awarecan commented Feb 7, 2019

Then maybe you can report back to uvloop, the issue has not been fixed 😄

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 7, 2019

I’ll remove and test. If my restarts are fixed I’ll 100% know.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 7, 2019

Will the fallback cause me any issues/speed problems?

@awarecan

This comment has been minimized.

Copy link
Contributor

awarecan commented Feb 7, 2019

asyncio is fast, uvloop is super fast.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 7, 2019

Out of interest was the second trace back related to the first?

@awarecan

This comment has been minimized.

Copy link
Contributor

awarecan commented Feb 7, 2019

Very likely, error no 9 means try to operate on a closed file/socket

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 8, 2019

e5bbeb9b-71f7-4de7-afde-e436c886815f

I think I can safely say that uvloop was the issues as since uninstalling I haven’t a had a single crash/restart.

How would you suggest I get this investigated with the uvloop team? Is it something that the home assistant developer community can put some ‘weight’ behind?

@awarecan

This comment has been minimized.

Copy link
Contributor

awarecan commented Feb 8, 2019

cc @pvizeli

hass.io is one depends on uvloop in HA universe.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 11, 2019

My install is still rock solid since removing uvloop. Really hoping someone can take a look at this. Cc: @pvizeli

@pvizeli

This comment has been minimized.

Copy link
Member

pvizeli commented Feb 11, 2019

:) I use never the latest uvloop on Hass.io because they work every time unstable. @balloob should know that

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 11, 2019

@pvizeli thanks for the comment. @balloob
Could this be removed or rolled back to a stable build for future versions of HA?

@balloob

This comment has been minimized.

Copy link
Member

balloob commented Feb 11, 2019

Yes, we should track whatever Hass.io does. PR welcome. Make sure to add a comment to the code to skip .0 releases.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 12, 2019

@awarecan is that something you can do or assist me with?

@wafflebot wafflebot bot added the in progress label Feb 14, 2019

@wafflebot wafflebot bot removed the in progress label Feb 15, 2019

balloob added a commit that referenced this issue Feb 15, 2019

Set uvloop version consistent with hass.io (#21080)
This sets the uvloop version in Docker containers to 0.11.3, which is the
same version that hass.io uses.

uvloop might be causing issues with some Docker containers on some host
systems, as reported in #20829

balloob added a commit that referenced this issue Feb 15, 2019

Set uvloop version consistent with hass.io (#21080)
This sets the uvloop version in Docker containers to 0.11.3, which is the
same version that hass.io uses.

uvloop might be causing issues with some Docker containers on some host
systems, as reported in #20829
@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 26, 2019

I'm sorry to say that since updating to 0.88.1 my random restarts are happening again and its still pointing to an issue with uvloop:

python: src/unix/core.c:898: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

I will let my docker container run as is today with the intention of testing then will remove uvloop as before and compare.

Could this be re-opened @balloob @pvizeli ?

@balloob balloob reopened this Feb 27, 2019

@balloob

This comment has been minimized.

Copy link
Member

balloob commented Feb 27, 2019

If the problem remains after a downgrade of uvloop, it might be related to the Docker container switching to Python 3.7.2

@balloob

This comment has been minimized.

Copy link
Member

balloob commented Feb 27, 2019

Wait a second. The original issue was on Python 3.6 and uvloop 12.0.

In 0.88 we run on Python 3.7.2 and uvloop 11.3, and the issue still persists? That is weird, as that means that the same issue would be introduced by either upgrading Python or uvloop?

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 27, 2019

I can hear the cogs whirling from here 🤗

BTW I did remove uvloop again and all stable since.

@balloob

This comment has been minimized.

Copy link
Member

balloob commented Feb 27, 2019

On 88, do you see the same stacktraces?

So it looks like smart_home is the package causing the trouble. That seems to be imported by Netatmo. Can you disable Netatmo and see if it persists on 88?

@balloob

This comment has been minimized.

Copy link
Member

balloob commented Feb 27, 2019

Also, what is the host you run this on? Any upgrades to SSL recently?

Traceback (most recent call last):
15:40:49	  File "/usr/local/lib/python3.6/socket.py", line 713, in create_connection
15:40:49	    sock.connect(sa)
15:40:49	OSError: [Errno 9] Bad file descriptor

I am starting to think that this has to do with your host machine.

@aptonline aptonline closed this Feb 28, 2019

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 28, 2019

Oops wrong button

@aptonline aptonline reopened this Feb 28, 2019

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 28, 2019

I’m running HA via Docker on a Synology DS918+, no SSL in place at the moment. As
I’ve removed uvloop from 0.88.1 is it best to wait for a new beta/release and test with Netatmo disabled or re-install uvloop (?) and test?

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 28, 2019

Ok just noticed 0.88.2 is out so disabled Netatmo in config and upgraded. Will run today and report.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 28, 2019

Ok so just had my first restart not 1 hr into testing with the above changes 🤔

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Feb 28, 2019

FYI I have multiple docker containers running with no issues, not had a single one restart unexpectedly apart from Home Assistant.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Mar 1, 2019

Due to all the random restarts my database is locked/corrupted (again) which then has a knock on effect for recorder, history and logger components at restart. Only way to resolve tomto delivery DB and start again 🤦🏼‍♂️

@balloob

This comment has been minimized.

Copy link
Member

balloob commented Mar 2, 2019

I'm starting to think more and more that it's related to your system. In 3 weeks, there has been no other reports. I don't know what it is about your system that is breaking it and it would be good to find out. I have however no further leads to follow.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Mar 2, 2019

I understand. I will keep investigating. I’m thinking of automating the removal process of uvloop via home assistant itself after every update 😂

@sunfang1cn

This comment has been minimized.

Copy link

sunfang1cn commented Apr 4, 2019

I have also met this issue on 0.91.0 and DSM docker.

@aptonline

This comment has been minimized.

Copy link
Author

aptonline commented Apr 10, 2019

I don’t seem to be experiencing the issue anymore (fingers crossed) since 0.91.x... has anything changed regarding uvloop in these releases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.