Occasional incorrect not_homes #142

Alfiegerner · 2020-03-12T08:24:33Z

Describe the bug
Occasional incorrect not_homes.

I have provided an example in the logs narrowed down to device and time, the not_home send at 8:17:18. Included the leader log and parents_bedroom node logs - the device was very close to the parents_bedroom log.

To reproduce
Have not been able to replicate at will, but happens a few times during the day.

Relevant logs
logs

Relevant configuration
Paste the relevant parts of your configuration below.

I don't think config is relevant, will add if you want me to.

Expected behavior
Not to send not_home if a response has been received from a node within the last cycle.

Environment

room-assistant version: 2.1.1]
installation type: NodeJS
hardware: leader is Pi 3B, node is Raspberry Pi Zero W
OS: Linux

Additional context
N/A

Alfiegerner · 2020-03-12T08:28:17Z

Any idea what might be causing this? The not_home is always corrected quickly (in the example above 1 second after the not_home, but I think always within 10 seconds), so this isn't super serious and easy to handle with delays in HA.

Thanks!

mKeRix · 2020-03-12T21:43:39Z

My instinct tells me that maybe the timeout is still just a tiny bit too short for BT Classic sensors, but I'm not entirely sure. I think a good first step would be making that timeout configurable, then you can increase it slightly and check if that takes care of it. The logs that you provided look normal to me at first glance.

Alfiegerner · 2020-03-13T01:02:19Z

Thanks for looking at it. Happy to try configurable timeout, but no rush. Thanks 🤙

The behavior of Bluetooth Classic scanning can now be customized a bit more. This potentially addresses #147 and #142.

# [2.2.0](v2.1.1...v2.2.0) (2020-03-22) ### Bug Fixes * **bluetooth-classic:** ignore requests for undefined addresses ([ed733a3](ed733a3)), closes [#136](#136) ### Features * **bluetooth-classic:** add options for interval and timeout ([c5b181f](c5b181f)), closes [#147](#147) [#142](#142) * **bluetooth-low-energy:** add blacklist option ([0f6eac4](0f6eac4)), closes [#99](#99)

mKeRix · 2020-03-22T17:32:32Z

Can you try upping timeoutCycles slightly after upgrading to 2.2.0? :)

doublej0 · 2020-03-26T18:44:32Z

I have the same experience ... It has been just recently with the version prior to 2.2.0 and with 2.2.0. I am using an iPhone X and an Apple Watch 3 and both are reporting "not_home" at the same time. I have 4 rPi Zero W's.

I turned off all 4 rPi's changed my config to test a Mi Band 4 using BLE thinking it was the BT Classic and it has the same experience.

mKeRix · 2020-03-28T14:16:13Z

@doublej0 did you try raising timeoutCycles? Sometimes you can also run into this sort of behavior if your cluster connections are acting up, in that case it might help to manually define peerAddresses and a quorum in the cluster settings.

doublej0 · 2020-03-31T15:14:22Z

@doublej0 did you try raising timeoutCycles? Sometimes you can also run into this sort of behavior if your cluster connections are acting up, in that case it might help to manually define peerAddresses and a quorum in the cluster settings.

I updated the timeoutCycles from 5 (default) to 10 and then 15 for my iPhone & my Apple Watch using bluetoothClassic. I continued to have the "not_home" experience.

For the Mi Band 4 using BLE, setting the timeoutCycles to 15 seems to have done the trick.

I will continue testing the bluetoothClassic by adjusting the cluster options to see if that works. If one rPi goes offline, that should not impact the other rPi if the devices are within range, correct?

mKeRix · 2020-03-31T20:46:32Z

timeoutCycles is only valid for BT Classic, so if you used BLE for the Mi Band 4 that setting shouldn't have made a difference.

And yeah, the clustering is built so that things still work even if one node is temporarily unavailable. The node that lost connection might think that it is in its own new cluster now though and will then start overriding distributed entities (like the BT sensors). The quorum gets rid of this "split brain" issue, as it only allows a cluster that contains the majority to make the decision (which means only one cluster in the network is controlling the distributed entities).

dimmanramone · 2020-04-06T19:26:40Z

I have the same problem. 3 devices in the cluster:

One NUC as leading instance (Home Assistant and MQTT on that one too)
One raspberry pi zero
One raspberry pi 3

My configuration looks like the following almost for all the devices in the cluster:

global:
  instanceName: room1
  integrations:
    - homeAssistant
    - bluetoothClassic
cluster:
  networkInterface: eno1
  port: 6425
  weight: 15 (for NUC, 10 for pi3 and 5 fot piZero)
  peerAddresses:
    - '192.168.0.1:6425'
    - '192.168.0.2:6425'
    - '192.168.0.3:6425'
homeAssistant:
  mqttUrl: 'mqtt://192.168.0.1:1883'
bluetoothClassic:
  minRssi: -20
  interval: 6
  timeoutCycles: 10
  addresses:
    - '3x:2x:6x:1x:1x:fx'
    - '4y:4y:3y:by:9y:9y'

Any ideas? Is the quorum going to help and if so which value is the right one? 3?

Alfiegerner · 2020-04-06T21:42:41Z

@mKeRix - apologies for delay in coming back to your.

Adjusting timeout cycles to 3 worked for me.

Also keeping the timeout cycle to 2 and bumping interval to 8 works (haven't tried 7 yet), which is better for me as with 9 nodes the additional cycle adds a bigger chunk of time.

I've also found that rebooting the pizeros every night seems to be have an impact - nightly reboots and internal 8 has worked well for 48 hours for me.

dimmanramone · 2020-04-07T18:43:35Z

@Alfiegerner @mKeRix I guess I have to try some combinations. It seem that interval 6 and timeoutCycles 10 works a little bit better (still getting incorrect not:homes) but not for all the devices :/ And my rpi 3 seems like it crashed. I have seen occasionally high cpu usage from room assistant even in my NUC.

mKeRix · 2020-04-09T07:33:02Z

@dimmanramone room-assistant shouldn't use up a lot of CPU and didn't in my testing, but to be fair I don't have permanent monitoring on my Pis (yet). I've also not seen crashes yet - just Pis dropping off the WiFi when running BT Classic, presumably because the shared chip that handles both these things messed up. Either way, as Bluetooth is finicky and devices implement it differently a lot of this is up to trying things out unfortunately. I'm happy to take on feature requests though if you have other ideas to solve these issues though!

For now I'll close this, but feel free to re-open the ticket if the problem comes back.

dimmanramone · 2020-04-13T10:47:41Z

@mKeRix Well it shouldn't but unfortunately it does in some cases, See the screenshot for example in my NUC with Hass.io. You can see that it uses a lot of CPU and RAM.

As long as for the my raspberry pi zero and pi3 seems that the bt hangs but the wifi still works. I'll try to use ethernet in both instead and see if it helps.

Alfiegerner added the bug label Mar 12, 2020

mKeRix added a commit that referenced this issue Mar 22, 2020

feat(bluetooth-classic): add options for interval and timeout

c5b181f

The behavior of Bluetooth Classic scanning can now be customized a bit more. This potentially addresses #147 and #142.

mKeRix closed this as completed Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasional incorrect not_homes #142

Occasional incorrect not_homes #142

Alfiegerner commented Mar 12, 2020

Alfiegerner commented Mar 12, 2020 •

edited

mKeRix commented Mar 12, 2020

Alfiegerner commented Mar 13, 2020

mKeRix commented Mar 22, 2020

doublej0 commented Mar 26, 2020

mKeRix commented Mar 28, 2020

doublej0 commented Mar 31, 2020

mKeRix commented Mar 31, 2020

dimmanramone commented Apr 6, 2020 •

edited

Alfiegerner commented Apr 6, 2020

dimmanramone commented Apr 7, 2020 •

edited

mKeRix commented Apr 9, 2020

dimmanramone commented Apr 13, 2020

Occasional incorrect not_homes #142

Occasional incorrect not_homes #142

Comments

Alfiegerner commented Mar 12, 2020

Alfiegerner commented Mar 12, 2020 • edited

mKeRix commented Mar 12, 2020

Alfiegerner commented Mar 13, 2020

mKeRix commented Mar 22, 2020

doublej0 commented Mar 26, 2020

mKeRix commented Mar 28, 2020

doublej0 commented Mar 31, 2020

mKeRix commented Mar 31, 2020

dimmanramone commented Apr 6, 2020 • edited

Alfiegerner commented Apr 6, 2020

dimmanramone commented Apr 7, 2020 • edited

mKeRix commented Apr 9, 2020

dimmanramone commented Apr 13, 2020

Alfiegerner commented Mar 12, 2020 •

edited

dimmanramone commented Apr 6, 2020 •

edited

dimmanramone commented Apr 7, 2020 •

edited