Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resetting hci0 in Docker container is causing zombie processes #157

Closed
iicky opened this issue Mar 22, 2020 · 11 comments
Closed

Resetting hci0 in Docker container is causing zombie processes #157

iicky opened this issue Mar 22, 2020 · 11 comments

Comments

@iicky
Copy link

iicky commented Mar 22, 2020

Describe the bug
I am getting a lot of zombie processes popping up as a result of my room-assistant Docker container. It appears as though whenever the BluetoothClassicService query takes too long and hci0 is reset, it becomes a zombie process. This results in tons of zombie processes over time until I restart the room-assistant container, at which point they all are destroyed.

This issue is specifically impacting my deployment on my Intel NUC and is not occurring on my Raspberry Pi deploys.

To reproduce
Deploy using the docker-compose file and config below.

Relevant logs

Room Assistant logs.

docker logs -f room-assistant

[Nest] 1   - 03/22/2020, 11:44:54 AM   [BluetoothClassicService] Query of xx:xx:xx:xx:xx:xx took too long, resetting hci0
[Nest] 1   - 03/22/2020, 11:46:24 AM   [BluetoothClassicService] Query of xx:xx:xx:xx:xx:xx took too long, resetting hci0
[Nest] 1   - 03/22/2020, 11:46:35 AM   [BluetoothClassicService] Query of xx:xx:xx:xx:xx:xx took too long, resetting hci0
[Nest] 1   - 03/22/2020, 11:46:36 AM   [BluetoothClassicService] Query of xx:xx:xx:xx:xx:xx took too long, resetting hci0
[Nest] 1   - 03/22/2020, 11:55:06 AM   [BluetoothClassicService] Query of xx:xx:xx:xx:xx:xx took too long, resetting hci0

Zombie processes.

user@core:~$ ps auxwww | grep ' Z '
4 Z root      5859 28012  0  80   0 -     0 -      13:44 ?        00:00:00 [hcitool] <defunct>
4 Z root      7486 28012  0  80   0 -     0 -      13:46 ?        00:00:00 [hcitool] <defunct>
4 Z root      7672 28012  0  80   0 -     0 -      13:46 ?        00:00:00 [hcitool] <defunct>
4 Z root      7713 28012  0  80   0 -     0 -      13:46 ?        00:00:00 [hcitool] <defunct>

Relevant configuration

Docker Compose docker-compose.yaml

version: "3.1"
services:

  # Room Assistant-----------------------------------
  room-assistant:
    container_name: room-assistant
    image: mkerix/room-assistant
    network_mode: host
    ports:
      - 6425:6425
    cap_add:
      - NET_ADMIN
    volumes:
      - /var/run/dbus:/var/run/dbus
      - ./room-assistant/config:/room-assistant/config
      - /etc/localtime:/etc/localtime:ro
    restart: always

Room Assistant config local.yml

global:
  instanceName: office
  integrations:
    - homeAssistant
    - bluetoothClassic
homeAssistant:
  mqttUrl: 'mqtt://XXX.XXX.XXX.XXX:1883'
  mqttOptions:
    username: <username>
    password: <password>
cluster:
  networkInterface: eno1
  port: 6425
  peerAddresses:
    - <raspberry_pi_zero_1>:6425 
    - <raspberry_pi_zero_1>:6425
bluetoothClassic:
  addresses:
    - '<mac_address_1>'
    - '<mac_address_2>' 

Expected behavior
I expect the zombie processes to not appear.

Environment

  • room-assistant version: 2.1.1
  • installation type: Docker
  • hardware: Intel NUC8i5BEK1
  • OS: Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-91-generic x86_64)

Additional context

Zombie processes are not appearing on both Raspberry Pis that have room-assistant installed.

@iicky iicky added the bug label Mar 22, 2020
@mKeRix
Copy link
Owner

mKeRix commented Mar 28, 2020

Thanks for the bug report - I'll try to reproduce it with my Ubuntu NUC at home. In theory the processes should be hard killed by NodeJS, that was also what I was observing when I tried it. I didn't try it with this specific setup though, so maybe we need some small adaptions in the Dockerfile.

@iicky
Copy link
Author

iicky commented Mar 29, 2020

@mKeRix Awesome, thanks! Let me know if you need any additional info.

@mwasowski
Copy link

I have the same issue.
Docker logs

[Nest] 1   - 04/03/2020, 5:00:53 PM   [BluetoothClassicService] Query of XX:XX:XX:XX:XX:XX took too long, resetting hci0
[Nest] 1   - 04/03/2020, 5:01:23 PM   [BluetoothClassicService] Query of XX:XX:XX:XX:XX:XX took too long, resetting hci0
[Nest] 1   - 04/03/2020, 5:01:29 PM   [BluetoothClassicService] Query of XX:XX:XX:XX:XX:XX took too long, resetting hci0
[Nest] 1   - 04/03/2020, 5:01:35 PM   [BluetoothClassicService] Query of XX:XX:XX:XX:XX:XX took too long, resetting hci0

Zombie processes

root      3779  0.0  0.0      0     0 ?        Z    19:00   0:00 [hcitool] <defunct>
root      3782  0.0  0.0      0     0 ?        Z    19:00   0:00 [hcitool] <defunct>
root      3799  0.0  0.0      0     0 ?        Z    19:01   0:00 [hcitool] <defunct>
root      3802  0.0  0.0      0     0 ?        Z    19:01   0:00 [hcitool] <defunct>
root      3805  0.0  0.0      0     0 ?        Z    19:01   0:00 [hcitool] <defunct>

Environment

room-assistant version: 2.2.0
installation type: Docker
hardware: ProxmoxVE
OS: Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux

Config
Docker

version: '3'
services:
  room-assistant:
    container_name: room-assistant
    image: mkerix/room-assistant
    restart: unless-stopped
    network_mode: host
    cap_add:
      - NET_ADMIN
    volumes:
      - /var/run/dbus:/var/run/dbus
      - /etc/room-assistant/config:/room-assistant/config

Room asistant

global:
  instanceName: Home
  integrations:
    - homeAssistant
    - bluetoothClassic
homeAssistant:
  mqttUrl: 'mqtt://XXX.XXX.XXX.XXX:1883'
  mqttOptions:
    username: <user>
    password: <pass>
bluetoothClassic:
  addresses:
    - 'XX:XX:XX:XX:XX:XX'

@iicky
Copy link
Author

iicky commented Apr 8, 2020

I just tested today using a USB Bluetooth dongle on the NUC and changing the device to hci1. The result was even more zombie processes, so I don't think it is the Bluetooth device.

@mwasowski
Copy link

Agreed. Also the fact, that we all use different dongles/built in chips should theoretically rule out hardware issue per se.

@mKeRix
Copy link
Owner

mKeRix commented Apr 9, 2020

I also think it's unlikely that this is related to hardware. My current idea is that the hcitool on Alpine Linux (what is used for the Docker images) doesn't handle SIGKILL correctly (or NodeJS doesn't send the signal correctly). As Alpine Linux is rarely used outside of Docker that would explain why we are only seeing the issues there.

@iicky
Copy link
Author

iicky commented Apr 12, 2020

I tried out a Debian image using the following Dockerfile and I am still getting the same zombie processes. I did my best to find matching or similar packages so I'm not 100% sure the image is a complete Debian replacement, but I can confirm that the zombie processes are still appearing with Debian.

FROM node:12-slim as build
ARG ROOM_ASSISTANT_VERSION=latest

RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y python g++ make libusb-dev avahi-utils libavahi-compat-libdnssd-dev

RUN npm install -g --unsafe-perm room-assistant@$ROOM_ASSISTANT_VERSION

FROM node:12-slim

WORKDIR /room-assistant

RUN apt-get update && apt-get install -y bluez libusb-dev avahi-utils dmidecode libavahi-compat-libdnssd1

RUN ln -s /usr/local/lib/node_modules/room-assistant/bin/room-assistant.js /usr/local/bin/room-assistant
COPY --from=build /usr/local/lib/node_modules/room-assistant /usr/local/lib/node_modules/room-assistant

ENTRYPOINT ["room-assistant"]
CMD ["--digResolver"]

@mKeRix
Copy link
Owner

mKeRix commented Apr 13, 2020

Thanks for checking that already @iicky - saved me some work. I reproduced the issue on a Raspi 3 with Docker today and found a fix. Expect it to be released sometime later today.

The issue arose due to the way Docker manages processes, or rather that NodeJS wasn't made to be PID 1. There is some more information in this article.

@mKeRix mKeRix closed this as completed in 932d603 Apr 13, 2020
github-actions bot pushed a commit that referenced this issue Apr 13, 2020
# [2.4.0](v2.3.0...v2.4.0) (2020-04-13)

### Bug Fixes

* include distances in API for room presence sensors ([567327d](567327d))
* use Tini for process management in Docker ([932d603](932d603)), closes [#157](#157)

### Features

* simplify log format ([0f90eaa](0f90eaa)), closes [#170](#170)
* **bluetooth-classic:** allow device-specific minRssi values ([cf3ddc5](cf3ddc5)), closes [#168](#168)
* **bluetooth-low-energy:** allow the update frequency to be throttled ([0143309](0143309)), closes [#125](#125)
@github-actions
Copy link

🎉 This issue has been resolved in version 2.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@iicky
Copy link
Author

iicky commented Apr 14, 2020

Perfect - I can confirm that there are no more zombie processes after the update. Thanks so much!

@mwasowski
Copy link

Same here, works like a charm, fantastic work! Thanks for the fix and the link to give us a bit more background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants