Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional incorrect not_homes #142

Closed
Alfiegerner opened this issue Mar 12, 2020 · 13 comments
Closed

Occasional incorrect not_homes #142

Alfiegerner opened this issue Mar 12, 2020 · 13 comments
Labels

Comments

@Alfiegerner
Copy link
Contributor

Describe the bug
Occasional incorrect not_homes.

I have provided an example in the logs narrowed down to device and time, the not_home send at 8:17:18. Included the leader log and parents_bedroom node logs - the device was very close to the parents_bedroom log.

To reproduce
Have not been able to replicate at will, but happens a few times during the day.

Relevant logs
logs

Relevant configuration
Paste the relevant parts of your configuration below.

I don't think config is relevant, will add if you want me to.

Expected behavior
Not to send not_home if a response has been received from a node within the last cycle.

Environment

  • room-assistant version: 2.1.1]
  • installation type: NodeJS
  • hardware: leader is Pi 3B, node is Raspberry Pi Zero W
  • OS: Linux

Additional context
N/A

@Alfiegerner
Copy link
Contributor Author

Alfiegerner commented Mar 12, 2020

Any idea what might be causing this? The not_home is always corrected quickly (in the example above 1 second after the not_home, but I think always within 10 seconds), so this isn't super serious and easy to handle with delays in HA.

Thanks!

@mKeRix
Copy link
Owner

mKeRix commented Mar 12, 2020

My instinct tells me that maybe the timeout is still just a tiny bit too short for BT Classic sensors, but I'm not entirely sure. I think a good first step would be making that timeout configurable, then you can increase it slightly and check if that takes care of it. The logs that you provided look normal to me at first glance.

@Alfiegerner
Copy link
Contributor Author

Thanks for looking at it. Happy to try configurable timeout, but no rush. Thanks 🤙

mKeRix added a commit that referenced this issue Mar 22, 2020
The behavior of Bluetooth Classic scanning can now be customized a bit
more. This potentially addresses #147 and #142.
github-actions bot pushed a commit that referenced this issue Mar 22, 2020
# [2.2.0](v2.1.1...v2.2.0) (2020-03-22)

### Bug Fixes

* **bluetooth-classic:** ignore requests for undefined addresses ([ed733a3](ed733a3)), closes [#136](#136)

### Features

* **bluetooth-classic:** add options for interval and timeout ([c5b181f](c5b181f)), closes [#147](#147) [#142](#142)
* **bluetooth-low-energy:** add blacklist option ([0f6eac4](0f6eac4)), closes [#99](#99)
@mKeRix
Copy link
Owner

mKeRix commented Mar 22, 2020

Can you try upping timeoutCycles slightly after upgrading to 2.2.0? :)

@doublej0
Copy link

I have the same experience ... It has been just recently with the version prior to 2.2.0 and with 2.2.0. I am using an iPhone X and an Apple Watch 3 and both are reporting "not_home" at the same time. I have 4 rPi Zero W's.

I turned off all 4 rPi's changed my config to test a Mi Band 4 using BLE thinking it was the BT Classic and it has the same experience.

@mKeRix
Copy link
Owner

mKeRix commented Mar 28, 2020

@doublej0 did you try raising timeoutCycles? Sometimes you can also run into this sort of behavior if your cluster connections are acting up, in that case it might help to manually define peerAddresses and a quorum in the cluster settings.

@doublej0
Copy link

@doublej0 did you try raising timeoutCycles? Sometimes you can also run into this sort of behavior if your cluster connections are acting up, in that case it might help to manually define peerAddresses and a quorum in the cluster settings.

I updated the timeoutCycles from 5 (default) to 10 and then 15 for my iPhone & my Apple Watch using bluetoothClassic. I continued to have the "not_home" experience.

For the Mi Band 4 using BLE, setting the timeoutCycles to 15 seems to have done the trick.

I will continue testing the bluetoothClassic by adjusting the cluster options to see if that works. If one rPi goes offline, that should not impact the other rPi if the devices are within range, correct?

@mKeRix
Copy link
Owner

mKeRix commented Mar 31, 2020

timeoutCycles is only valid for BT Classic, so if you used BLE for the Mi Band 4 that setting shouldn't have made a difference.

And yeah, the clustering is built so that things still work even if one node is temporarily unavailable. The node that lost connection might think that it is in its own new cluster now though and will then start overriding distributed entities (like the BT sensors). The quorum gets rid of this "split brain" issue, as it only allows a cluster that contains the majority to make the decision (which means only one cluster in the network is controlling the distributed entities).

@dimmanramone
Copy link

dimmanramone commented Apr 6, 2020

I have the same problem. 3 devices in the cluster:

  • One NUC as leading instance (Home Assistant and MQTT on that one too)
  • One raspberry pi zero
  • One raspberry pi 3

My configuration looks like the following almost for all the devices in the cluster:

global:
  instanceName: room1
  integrations:
    - homeAssistant
    - bluetoothClassic
cluster:
  networkInterface: eno1
  port: 6425
  weight: 15 (for NUC, 10 for pi3 and 5 fot piZero)
  peerAddresses:
    - '192.168.0.1:6425'
    - '192.168.0.2:6425'
    - '192.168.0.3:6425'
homeAssistant:
  mqttUrl: 'mqtt://192.168.0.1:1883'
bluetoothClassic:
  minRssi: -20
  interval: 6
  timeoutCycles: 10
  addresses:
    - '3x:2x:6x:1x:1x:fx'
    - '4y:4y:3y:by:9y:9y'

Any ideas? Is the quorum going to help and if so which value is the right one? 3?

@Alfiegerner
Copy link
Contributor Author

@mKeRix - apologies for delay in coming back to your.

Adjusting timeout cycles to 3 worked for me.

Also keeping the timeout cycle to 2 and bumping interval to 8 works (haven't tried 7 yet), which is better for me as with 9 nodes the additional cycle adds a bigger chunk of time.

I've also found that rebooting the pizeros every night seems to be have an impact - nightly reboots and internal 8 has worked well for 48 hours for me.

@dimmanramone
Copy link

dimmanramone commented Apr 7, 2020

@Alfiegerner @mKeRix I guess I have to try some combinations. It seem that interval 6 and timeoutCycles 10 works a little bit better (still getting incorrect not:homes) but not for all the devices :/ And my rpi 3 seems like it crashed. I have seen occasionally high cpu usage from room assistant even in my NUC.

@mKeRix
Copy link
Owner

mKeRix commented Apr 9, 2020

@dimmanramone room-assistant shouldn't use up a lot of CPU and didn't in my testing, but to be fair I don't have permanent monitoring on my Pis (yet). I've also not seen crashes yet - just Pis dropping off the WiFi when running BT Classic, presumably because the shared chip that handles both these things messed up. Either way, as Bluetooth is finicky and devices implement it differently a lot of this is up to trying things out unfortunately. I'm happy to take on feature requests though if you have other ideas to solve these issues though!

For now I'll close this, but feel free to re-open the ticket if the problem comes back.

@mKeRix mKeRix closed this as completed Apr 9, 2020
@dimmanramone
Copy link

@mKeRix Well it shouldn't but unfortunately it does in some cases, See the screenshot for example in my NUC with Hass.io. You can see that it uses a lot of CPU and RAM.

Screenshot 2020-04-13 at 12 37 10

As long as for the my raspberry pi zero and pi3 seems that the bt hangs but the wifi still works. I'll try to use ethernet in both instead and see if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants