Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect leader elected #196

Open
domoritz opened this issue May 12, 2020 · 8 comments
Open

Incorrect leader elected #196

domoritz opened this issue May 12, 2020 · 8 comments

Comments

@domoritz
Copy link
Contributor

Describe the bug
I have a cluster with two nodes and the wrong node gets elected as leader.

The main node is livingroom with weight 100. I also have a second node bedroom that should not be the leader. However, as you can see in the logs below, the bedroom got elected as the leader.

Relevant logs

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[20:29:44] INFO: Setting up Home Assistant configuration
[20:29:44] INFO: Starting room-assistant
*** WARNING *** The program 'node' uses the Apple Bonjour compatibility layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/blog/projects/avahi-compat.html>
*** WARNING *** The program 'node' called 'DNSServiceRegister()' which is not supported (or only supported partially) in the Apple Bonjour compatibility layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/blog/projects/avahi-compat.html>
5/11/2020, 8:29:44 PM - info - IntegrationsModule: Loading integrations: home-assistant, bluetooth-classic
5/11/2020, 8:29:44 PM - info - NestFactory: Starting Nest application...
5/11/2020, 8:29:44 PM - info - InstanceLoader: AppModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: ConfigModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: NestEmitterModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: IntegrationsModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: DiscoveryModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: HomeAssistantModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: ClusterModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: ScheduleModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: BluetoothClassicModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: EntitiesModule dependencies initialized
5/11/2020, 8:29:44 PM - info - InstanceLoader: StatusModule dependencies initialized
5/11/2020, 8:29:44 PM - info - RoutesResolver: EntitiesController {/entities}:
5/11/2020, 8:29:44 PM - info - RouterExplorer: Mapped {/, GET} route
5/11/2020, 8:29:44 PM - info - RoutesResolver: StatusController {/status}:
5/11/2020, 8:29:44 PM - info - RouterExplorer: Mapped {/, GET} route
5/11/2020, 8:29:44 PM - info - HomeAssistantService: Successfully connected to MQTT broker at mqtt://core-mosquitto:1883
5/11/2020, 8:29:44 PM - info - ConfigService: Loading configuration from /usr/lib/node_modules/room-assistant/dist/config/definitions/default.js, config/default.json, config/local.json
5/11/2020, 8:29:44 PM - info - ClusterService: Starting mDNS advertisements and discovery
5/11/2020, 8:29:44 PM - info - NestApplication: Nest application successfully started
5/11/2020, 8:29:45 PM - info - ClusterService: Added 192.168.0.???:6425 to the cluster with id bedroom
5/11/2020, 8:29:46 PM - info - EntitiesService: Refreshing entity states
5/11/2020, 8:30:45 PM - info - HomeAssistantService: Device tracker requires manual setup in Home Assistant with topic: room-assistant/device_tracker/bluetooth-classic-??????????-tracker/state
5/11/2020, 8:30:45 PM - info - HomeAssistantService: Device tracker requires manual setup in Home Assistant with topic: room-assistant/device_tracker/bluetooth-classic-??????????-tracker/state
5/11/2020, 8:32:45 PM - info - HomeAssistantService: Device tracker requires manual setup in Home Assistant with topic: room-assistant/device_tracker/bluetooth-classic-??????????-tracker/state
5/11/2020, 8:36:50 PM - info - HomeAssistantService: Device tracker requires manual setup in Home Assistant with topic: room-assistant/device_tracker/bluetooth-classic-??????????-tracker/state
5/11/2020, 8:51:13 PM - info - ClusterService: bedroom has been elected as leader
5/11/2020, 9:34:45 PM - info - HomeAssistantService: Device tracker requires manual setup in Home Assistant with topic: room-assistant/device_tracker/bluetooth-classic-??????????-tracker/state
5/12/2020, 1:35:30 AM - info - ClusterService: bedroom has been elected as leader
5/12/2020, 2:28:01 AM - info - ClusterService: Removed 192.168.0.???:6425 from the cluster with id bedroom
5/12/2020, 2:28:04 AM - info - ClusterService: Added 192.168.0.???:6425 to the cluster with id bedroom
5/12/2020, 3:59:41 AM - info - ClusterService: bedroom has been elected as leader
5/12/2020, 4:06:40 AM - info - ClusterService: bedroom has been elected as leader
5/12/2020, 9:04:09 AM - info - ClusterService: bedroom has been elected as leader
5/12/2020, 9:39:30 AM - info - ClusterService: bedroom has been elected as leader

Relevant configuration
Paste the relevant parts of your configuration below.

living room

global:
  instanceName: livingroom
  integrations:
    - homeAssistant
    - bluetoothClassic
  cluster:
    weight: 100
bluetoothClassic:
  interval: 60
  addresses:
    ??
    ??
    ??
    ??
    ??

bedroom

global:
  instanceName: bedroom
  integrations:
    - homeAssistant
    - bluetoothClassic
  cluster:
    quorum: 2
    weight: 1
homeAssistant:
  mqttUrl: 'mqtt://??????:1883'
  mqttOptions:
    username: mqtt
    password:?????????
bluetoothClassic:
  interval: 60
  addresses:
    ??
    ??
    ??
    ??
    ??

Expected behavior
I would expect the livingroom node to be the leader.

Environment

  • room-assistant version: 2.6.0
  • installation type: Hass.io in livingroom, and RPI in bedroom.
  • hardware: Docker and RPI 3
  • OS: Ubuntu Server and Raspian
@domoritz domoritz added the bug label May 12, 2020
@mKeRix
Copy link
Owner

mKeRix commented May 14, 2020

The weights for the leader election are more guidelines than hard rules. When connecting instances together the leader is chosen by the following logic:

  • instance boots, no other nodes to connect to -> elects itself as leader after a short timeout
  • cluster with leader exists, instance boots, connects without having elected itself yet -> accepts whatever leader is set in the cluster
  • cluster with leader exists, instance with different leader already set connects -> new election is held

During an election each instance just submits a vote for the node that has the highest weight from the ones that it knows of locally. Applying this to your scenario, I suspect that your bedroom node was already running and elected itself as leader when livingroom connected. As a quick fix you can try to shutdown both instances, then start livingroom. Once that's done you can start bedroom. Both should now have livingroom as the leader.

@domoritz
Copy link
Contributor Author

Ahh, thanks for the explanation. The issue for me is that the raspberry pi doesn't have the best connection (ssh is really slow) so I suspect that I'm not getting updates when it's the leader. I can force livingroom to be the leader by restarting the pi but my livingroom server also restarts from time to time (updates) so it would be nice if it didn't lose its leadership position because of that.

Would it make sense to change the propofol and elect a leader every time a node joins a cluster?

@mKeRix
Copy link
Owner

mKeRix commented May 15, 2020

The issue with that would likely be random state changes, as an instance starts with an empty state. Once an instance is elected as leader it will force the entities to match its own local state - if the state hasn't regenerated to the right level yet on the instance you will see random blips of wrong states with the restarts.

Aside from that, if your instance reconnects within cluster.timeout the cluster should not change leaders.

@domoritz
Copy link
Contributor Author

Couldn't there be some initialization protocol where a new node that becomes the new leader initializes its state before taking over as the leader?

The issue in my case is that the bedroom node has bad wifi and so I don't want it to be the leader.

I understand that my use case is maybe not the target use case so feel free to close this issue as wontfix but maybe my feedback is useful for future versions of room assistant.

@mKeRix
Copy link
Owner

mKeRix commented May 26, 2020

There probably could be - and at the very least we should handle these kinda scenarios better. I'll keep this open for tracking. Maybe I can think of a good solution!

@github-actions
Copy link

github-actions bot commented May 5, 2021

There hasn't been any activity on this issue recently. In an effort to provide a better overview of current issues we automatically clean some of the old ones. Many of them may already be resolved in newer versions of room-assistant.
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 5, 2021
@mKeRix mKeRix added no_stale and removed stale labels May 5, 2021
@ghzgod
Copy link

ghzgod commented May 10, 2021

Please don't close this issue.

@Nathan-Schwartz
Copy link

I believe the enableStrictWeightMode option introduced in goldfire/democracy.js#18 could resolve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants