Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.6 + 0.45.1 - Segmentation Faults #7752

Closed
arraylabs opened this issue May 24, 2017 · 34 comments

Comments

Projects
None yet
10 participants
@arraylabs
Copy link
Contributor

commented May 24, 2017

Make sure you are running the latest version of Home Assistant before reporting an issue.

You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:

Home Assistant release (hass --version): 0.45.1

Python release (python3 --version): 3.6.1

Component/platform:

Description of problem:
Docker file installed as explained in documentation. Getting segmentation faults after a couple hours of running. There are no messages in the HA error log when it seg faults. Only place I have seen it logged is in the messages log from dmesg. It contains the following:

May 23 23:02:46 hass kern.info kernel: [100410.601161] python[20759]: segfault at 8 ip 000076dfbbae97b2 sp 000076dfa8837ea0 error 6 in libpython3.6m.so.1.0[76dfbba45000+29a000]
May 23 23:02:46 hass kern.alert kernel: [100410.601196] grsec: Segmentation fault occurred at 0000000000000008 in /usr/local/bin/python3.6[python:20759] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/docker-containerd-shim[docker-containe:4467] uid/euid:0/0 gid/egid:0/0
May 23 23:02:46 hass kern.alert kernel: [100410.601455] grsec: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 for /usr/local/bin/python3.6[python:20759] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/docker-containerd-shim[docker-containe:4467

Switching back to 0.44.2 all works without issue and 0.45.0/1 is the first time I have experienced any seg faults. Further info. I installed 0.45.0 (via docker rm, pull, run, etc) late saturday or early sunday on an ubuntu 16 esxi vm (my normal environment run HA without issue for the last year or so) and within a few hours it died with a seg fault error. Restarted HA, ran fine for a few hours then seg faulted again. At this point I moved back to 0.44.2 via docker and all ran fine. On monday i moved my HA docker install to a physical box running latest alpine linux with docker and version 0.44.2 (turning off my vm install) and it came up without issue and ran fine until I decided to try 0.45.1 again yesterday. So on the physical alpine linux box I again used docker to rm, pull, run, etc and move to 0.45.1. All came up fine then a few hours later the seg fault again occurred. Restarted the container, all was again fine for a few hours and again seg fault. Moved back to 0.44.2 and its been running without issue since then.

Seg faults on two different "machines" with two different operating systems concerns me.

Expected:
No seg faults

@balloob

This comment has been minimized.

Copy link
Member

commented May 25, 2017

HASS 0.45 Docker started to be based on Python 3.6.1.

Please list the components and platforms that you use.

@mattsch

This comment has been minimized.

Copy link
Contributor

commented May 25, 2017

I was having the same issue with the standard docker container, switched over the hass.io container (running python 3.5.2) and haven't had any issues. My components list:

homeassistant                                                                                                                                                                                                                                 
http                                                                                                                                                                                                                                          
notify                                                                                                                                                                                                                                        
joaoapps_join                                                                                                                                                                                                                                 
frontend                                                                                                                                                                                                                                      
discovery                                                                                                                                                                                                                                     
logbook                                                                                                                                                                                                                                       
config                                                                                                                                                                                                                                        
logger                                                                                                                                                                                                                                        
sensor                                                                                                                                                                                                                                        
binary_sensor                                                                                                                                                                                                                                 
camera                                                                                                                                                                                                                                        
remote                                                                                                                                                                                                                                        
sun                                                                                                                                                                                                                                           
conversation                                                                                                                                                                                                                                  
device_tracker                                                                                                                                                                                                                                
updater                                                                                                                                                                                                                                       
recorder                                                                                                                                                                                                                                      
history                                                                                                                                                                                                                                       
zeroconf                                                                                                                                                                                                                                      
ecobee                                                                                                                                                                                                                                        
influxdb                                                                                                                                                                                                                                      
mqtt                                                                                                                                                                                                                                          
light                                                                                                                                                                                                                                         
switch                                                                                                                                                                                                                                        
media_player (plex,kodi)                                                                                                                                                                                                                      
input_select                                                                                                                                                                                                                                  
input_boolean                                                                                                                                                                                                                                 
group                                                                                                                                                                                                                                         
automation                                                                                                                                                                                                                                    
zone                                                                                                                                                                                                                                          
zwave                                                                                                                                                                                                                                         
zha                                                                                                                                                                                                                                           
script                                                                                                                                                                                                                                        
alarm_control_panel                                                                                                                                                                                                                           
nest                                                                                                                                                                                                                                          
tts                                                                                                                                                                                                                                           
twilio                                                                                                                                                                                                                                        
ring                                                                                                                                                                                                                                          
@arraylabs

This comment has been minimized.

Copy link
Contributor Author

commented May 25, 2017

Figured I couldn't be the only one, confirms I'm not losing my mind :)

Components:

homeassistant
http
frontend
updater
ios
logger
history
logbook
sun
automation
script
group
scene
shell_command
recorder
sensor (template, rest, zoneminder,darksky)
switch (zoneminder, template)
remote (harmony x2)
device_tracker (unifi)
input_boolean
cover (myq)
zwave
zha
emulated_hue
input_select
zone
notify (smtp)
zoneminder

Only component added recently (but added at 0.44.x) is the ZHA component, removing the hue hub in the process.

Thanks for any help/information!

@balloob

This comment has been minimized.

Copy link
Member

commented May 26, 2017

So I wonder which of the components is causing this. As I have been running Home Assistant under the Python 3.6 docker image just fine .

Could you experiment by turning the following components off one by one to see if stops getting segfaults?

shell_command
recorder
remote (harmony x2)
device_tracker (unifi)
zwave
zha

@philhawthorne

This comment has been minimized.

Copy link
Contributor

commented May 26, 2017

Just chiming in to say I'm also having Docker crashing issues on Synology NAS with Docker. I have my Docker set to automatically restart, so by the time I notice HASS has ticked over, logs are gone.

Going to disable the auto reboot for now, and upgrade to 0,45.1 just for good measure. Unfortunately the reboots for me at least are random, could take over 24 hours for the crash to occur.

My component list if it helps

  • Hue
  • Hue Bridge (Alexa)
  • Z-wave
  • LimitlessLED
  • Bluetooth Device Tracker
  • Asus Device Tracker
  • Ping
  • Flux
  • Transmission
  • Harmony
  • TTS
  • Sonos
  • Plex
  • Notify (Facebook)
  • Notify (Pushbullet)
  • input_select
  • Template Sensors
  • Shell Command
  • Recorder (to MySQL DB)
  • Moon
  • MiFlora
  • Google Travel Time
  • MQTT

As @balloob suggests I'm going to disable

  • Recorder
  • Shell Command
  • Remote

I need Z-wave (as all my sensors run off that), so if crashes happen again, then we can eliminate those.

@philhawthorne

This comment has been minimized.

Copy link
Contributor

commented May 26, 2017

Had another crash with those three components disabled.

From some Googling, I think this is a known issue with the 3.6 Docker image. See docker-library/python#190 and docker-library/python#160

If hass.io is still using 3.5, are there any new components/features that require 3.6? If not, perhaps we should consider downgrading back to 3.5 for the time being?

@arraylabs

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2017

@balloob I will try turning off as many of the components as I can tomorrow, will have to be tomorrow so I can manage my wife's unhappiness with stuff not working. :)

@balloob

This comment has been minimized.

Copy link
Member

commented May 26, 2017

I would be down to go back to Python 3.5 for now. Sad but we can't be running around segfaulting either.

Wish we had a good way to reproduce it.

@arraylabs

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2017

I have a "spare" zwave stick so I may try passing that through esxi to a fresh vm with HA docker 0.45.1 installed and see if that dies out.

@mezz64

This comment has been minimized.

Copy link
Contributor

commented May 27, 2017

Just to add another data point, I've been using the 3.6 image without issue on an unRaid box. My config does use:
shell_command
recorder
zwave

@arraylabs

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2017

@balloob So the clean 0.45.1 install in docker with only the zwave stick configured ran without issue (course no real devices traffic running on it) for more than 16 hours, seems to confirm what @mezz64 commented regarding his use of zwave without issue. I just removed zha from my production install and moved it back to 0.45.1. Will report back late today or tomorrow morning.

@pvizeli

This comment has been minimized.

Copy link
Member

commented May 27, 2017

Hass.IO switch back to Python 3.6 with next stable Alpine linux release. Before we run with python 3.6 from resin and that had to many issue for us.

@arraylabs

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2017

Segmentation Fault has returned with zha disabled. Nearly same error in message log, difference being info on grsec (172.30.1.70 is a different docker host, not the one currently running HA).

May 28 02:48:50 hass kern.info kernel: [22393.123461] python[2658]: segfault at 8 ip 000076e9bad467b2 sp 000076e98b8be0a0 error 6 in libpython3.6m.so.1.0[76baca2000+29a000]
May 28 02:48:50 hass kern.alert kernel: [22393.123496] grsec: From 172.30.1.70: Segmentation fault occurred at 0000000000000008 in /usr/local/bin/python3.6[thon:2658] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/docker-containerd-shim[docker-containe:2551] uid/euid:0/
May 28 02:48:50 hass kern.alert kernel: [22393.123729] grsec: From 172.30.1.70: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 r /usr/local/bin/python3.6[python:2658] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/docker-containerd-shim[dock
@MartyTremblay

This comment has been minimized.

Copy link
Contributor

commented May 28, 2017

I'm having the same issue without z-wave enabled as well.

@balloob

This comment has been minimized.

Copy link
Member

commented May 30, 2017

For the people experiencing segmentation faults, what OS and architecture is your host?

@arraylabs

This comment has been minimized.

Copy link
Contributor Author

commented May 30, 2017

System 1:
Ubuntu 16.04.2 LTS x64
Docker version 17.03.0-ce, build 3a232c8

System 2:
Ubuntu 16.04.2 LTS x64
Docker version 17.03.0-ce, build 60ccb22

System 3:
Alpine Linux 3.5.2 x64
Docker version 17.05.0-ce, build v17.05.0-ce

Experienced the seg fault on all 3 systems with 0.45.x version.

@AlexMekkering

This comment has been minimized.

Copy link
Contributor

commented May 30, 2017

I'm not running Docker, but am also seeing Segmentation faults in a (freshly setup) venv with HA version 0.45.1 on Python 3.6.1 on Arch Linux on a Raspberry Pi 2 model B (armv7l). HA runtimes before Segmentation faults occur vary from around 3 hours until 34.5 hours.

Components:

  • mqtt
  • recorder
  • logger
  • http
  • api
  • websocket_api
  • frontend
  • switch
  • input_select
  • sensor
  • scene
  • updater
  • script
  • zwave
  • input_boolean
  • sun
  • group
  • notify
  • discovery
  • influxdb
  • automation
  • zone
  • config
  • media_player
  • zeroconf
  • device_tracker
  • tradfri
@pvizeli

This comment has been minimized.

Copy link
Member

commented May 30, 2017

@AlexMekkering do you run edge or you self compile python 3.6?

@AlexMekkering

This comment has been minimized.

Copy link
Contributor

commented May 30, 2017

I run Arch Linux (for Arm) with the most recent Python (3.6.1) package (https://archlinuxarm.org/packages/arm/python) and looking at its PKGBUILD it was compiled from the upstream https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tar.xz. It only contains a patch for Lib/test/test_socket.py and installs libpython as read-write but these shouldn't have any impact. It also ensures that libraries (expat, zlib, libffi, and libmpdec) are used from the system instead of included in the build.
The build was configured with:

  ./configure --prefix=/usr \
              --enable-shared \
              --with-threads \
              --with-computed-gotos \
              --enable-optimizations \
              --without-lto \
              --enable-ipv6 \
              --with-system-expat \
              --with-dbmliborder=gdbm:ndbm \
              --with-system-ffi \
              --with-system-libmpdec \
              --enable-loadable-sqlite-extensions \
              --without-ensurepip

The virtualenv was freshly created (as user homeassistant) with:

python -m venv /srv/homeassistant
source /srv/homeassistant/bin/activate
(homeassistant)$ pip install homeassistant
@pvizeli

This comment has been minimized.

Copy link
Member

commented May 30, 2017

You have also the right CTYPE set on running instance? I know there is a bug since python 3.3 like this: docker-library/python#13

@AlexMekkering

This comment has been minimized.

Copy link
Contributor

commented May 30, 2017

I have LANG=en_US.UTF-8 which is right I guess?

@balloob

This comment has been minimized.

Copy link
Member

commented May 31, 2017

Which event loop are you using. UV Loop or one of the built-in loops?

@AlexMekkering

This comment has been minimized.

Copy link
Contributor

commented May 31, 2017

pip list doesn't list uvloop so I must be running the default event loop.
I don't know if it's of any use but I managed to debug one of the core dumps and it seems to be related to garbage collection (Thread 1: LWP 1904 was the culprit):

Thread 19 (Thread 0x6b901470 (LWP 1933)):
#0  0x76b1aff8 in select () from /usr/lib/libc.so.6
#1  0x6c812984 in OpenZWave::SerialControllerImpl::Read() ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#2  0x6c8129c0 in OpenZWave::SerialControllerImpl::ReadThreadProc(OpenZWave::Event*) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#3  0x6c8122ac in OpenZWave::ThreadImpl::Run() ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#4  0x6c8122c8 in OpenZWave::ThreadImpl::ThreadProc(void*) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#5  0x76b98e9c in start_thread () from /usr/lib/libpthread.so.0
#6  0x76b21fc8 in ?? () from /usr/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 18 (Thread 0x6d070470 (LWP 1930)):
#0  0x76ba2e40 in do_futex_wait () from /usr/lib/libpthread.so.0
#1  0x76ba30c4 in __new_sem_wait_slow () from /usr/lib/libpthread.so.0
#2  0x76dbf154 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc60c0 in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 17 (Thread 0x7023e470 (LWP 1921)):
#0  0x76b9fc14 in pthread_cond_wait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1  0x76d57c1c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 16 (Thread 0x6d870470 (LWP 1929)):
#0  0x76b1aff8 in select () from /usr/lib/libc.so.6
Backtrace stopped: Cannot access memory at address 0x13f00

Thread 15 (Thread 0x6f23e470 (LWP 1923)):
#0  0x76ba00c8 in pthread_cond_timedwait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1  0x76d5871c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 14 (Thread 0x6b101470 (LWP 1934)):
#0  0x76ba00c8 in pthread_cond_timedwait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1  0x6c7ba6ac in OpenZWave::EventImpl::Wait(int) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#2  0x6c7b7fd4 in OpenZWave::Wait::Multiple(OpenZWave::Wait**, unsigned int, int) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
---Type <return> to continue, or q <return> to quit---
#3  0x6c7be950 in OpenZWave::Driver::PollThreadProc(OpenZWave::Event*) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#4  0x6c8122ac in OpenZWave::ThreadImpl::Run() ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#5  0x6c8122c8 in OpenZWave::ThreadImpl::ThreadProc(void*) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#6  0x76b98e9c in start_thread () from /usr/lib/libpthread.so.0
#7  0x76b21fc8 in ?? () from /usr/lib/libc.so.6
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 13 (Thread 0x71cff470 (LWP 1918)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 12 (Thread 0x6e9fe470 (LWP 1924)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 11 (Thread 0x712ff470 (LWP 1919)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 10 (Thread 0x6c101470 (LWP 1932)):
#0  0x76b9fc10 in pthread_cond_wait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1  0x6c7ba758 in OpenZWave::EventImpl::Wait(int) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#2  0x6c7b7fd4 in OpenZWave::Wait::Multiple(OpenZWave::Wait**, unsigned int, int) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#3  0x6c7cb8e0 in OpenZWave::Driver::DriverThreadProc(OpenZWave::Event*) ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#4  0x6c8122ac in OpenZWave::ThreadImpl::Run() ()
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#5  0x6c8122c8 in OpenZWave::ThreadImpl::ThreadProc(void*) ()
---Type <return> to continue, or q <return> to quit---
   from /home/homeassistant/.homeassistant/deps/libopenzwave.cpython-36m-arm-linux-gnueabihf.so
#6  0x76b98e9c in start_thread () from /usr/lib/libpthread.so.0
#7  0x76b21fc8 in ?? () from /usr/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 9 (Thread 0x72eff470 (LWP 1916)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 8 (Thread 0x724ff470 (LWP 1917)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 7 (Thread 0x6fa3e470 (LWP 1922)):
#0  0x76ba00c8 in pthread_cond_timedwait@@GLIBC_2.4 () from /usr/lib/libpthread.so.0
#1  0x76d5871c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 0x70a3e470 (LWP 1920)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 5 (Thread 0x7527d470 (LWP 1908)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 0x74a7d470 (LWP 1909)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
---Type <return> to continue, or q <return> to quit---
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0x740ff470 (LWP 1910)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0x736ff470 (LWP 1915)):
#0  0x76ba2b98 in do_futex_wait.constprop () from /usr/lib/libpthread.so.0
#1  0x76ba2d04 in __new_sem_wait_slow.constprop.1 () from /usr/lib/libpthread.so.0
#2  0x76dbf220 in PyThread_acquire_lock_timed () from /usr/lib/libpython3.6m.so.1.0
#3  0x76dc619c in ?? () from /usr/lib/libpython3.6m.so.1.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0x76f34010 (LWP 1904)):
#0  0x76dc5298 in PyObject_GC_Del () from /usr/lib/libpython3.6m.so.1.0
#1  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

balloob added a commit that referenced this issue May 31, 2017

Downgrade Docker to Python 3.5 to solve Segmentation Faults (#7799)
Downgrades the Dockerfiles used by Home Assistant to Python 3.5, after
Python 3.6 base image was causing segmentation faults.

See #7752

@balloob balloob changed the title 0.45.1 - Segmentation Faults - Docker Python 3.6 + 0.45.1 - Segmentation Faults May 31, 2017

@balloob

This comment has been minimized.

Copy link
Member

commented May 31, 2017

@AlexMekkering this is great info. Please keep those stack traces coming.

For other people, please run Python under Gdb with the Python extensions. https://wiki.python.org/moin/DebuggingWithGdb

In the meanwhile, I have merged #7799 to run Hass under Python 3.5 again in Docker. Note that our monkey patch for asyncio only is applied for < Python 3.5.3

@balloob

This comment has been minimized.

Copy link
Member

commented Jun 1, 2017

@AlexMekkering could you check out the branch from #7848 and see if running it with the monkeypatch fixes your issue?

@AlexMekkering

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2017

Of course! I'll try that this evening...

@AlexMekkering

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2017

I've tested the monkeypatch for three days now and haven't seen any Segmentation faults since so I think the monkeypatch fixes this issue.

@balloob

This comment has been minimized.

Copy link
Member

commented Jun 5, 2017

Alright, I merged the monkey patch. So if you launch Home Assistant with HASS_MONKEYPATCH_ASYNCIO=1 hass it will apply the monkey patch on 3.6.

@ray0711

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2017

Upgraded the docker image to 0.46 today but seems like the issue is still there:
Homeassistant died and dmesg shows this:
[Jun 5 19:59] traps: python[7852] general protection ip:7f82b7ecec46 sp:7f82879d30c0 error:0 in libpython3.6m.so.1.0[7f82b7e0c000+29a000]

@balloob

This comment has been minimized.

Copy link
Member

commented Jun 5, 2017

Did you add the environment variable to apply the monkey patch?

@ray0711

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2017

I didn't, i somehow expected the default the 0.46 release to have the fix activated by default, my wrong ...
I added the environment variable now and will report back.

@balloob

This comment has been minimized.

Copy link
Member

commented Jun 17, 2017

Starting 0.47 we have enabled the monkey patch by default.

@balloob balloob closed this Jun 17, 2017

@thehesiod

This comment has been minimized.

Copy link

commented Jul 27, 2017

did anyone log a bug against python3.6? I'm seeing something similar in a project of ours w/ 3.6.2

@thehesiod

This comment has been minimized.

Copy link

commented Jul 27, 2017

I've opened a python bug for the 3.6 issue: https://bugs.python.org/issue31061 as I couldn't find anything related. If anyone can help with more information that would be great!

@home-assistant home-assistant locked and limited conversation to collaborators Dec 11, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.