You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is just to put together everything we found out so far about the problems surrounding our current implementation of leader election, and to help track the tasks we need to accomplish before calling it solved.
We can see from the logs that the lease renewal is failing:
E0403 16:29:14.243178 13 leaderelection.go:330] error retrieving resource lock kube-system/metricbeat-cluster-leader: leases.coordination.k8s.io "metricbeat-cluster-leader" is forbidden: User "system:serviceaccount:kube-system:metricbeat" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
...
I0403 16:29:16.238161 13 leaderelection.go:283] failed to renew lease kube-system/metricbeat-cluster-leader: timed out waiting for the condition
E0403 16:29:16.238229 13 leaderelection.go:306] Failed to release lock: resource name may not be empty
..."message":"leader election lock LOST, id beats-leader-kind-worker"...
However this should not be a reason for the previous metricbeat lease holder to keep reporting metrics. The expected behavior should be: as soon as the holder loses the lock - no matter if there was a renewal or not - that metricbeat instance should stop reporting metrics.
Since we can see in the logs the message leader election lock LOST, id beats-leader-kind-worker, we know at least that this function is being called correctly:
And this event id was used to save the configuration on autodiscover... So once we start handling the stop event, we check if we have the configuration there and upload the new autodiscover settings:
// Run starts the leader election loop. Run will not return
// before leader election loop is stopped by ctx or it has
// stopped holding the leader lease
This means we will never have the same leader twice! Because they stop running and they never are reelected!
So we have a problem. Example:
We have two nodes, node-1 and node-2.
node-1 is the first leader.
node-1 loses the locker, so it stops running.
node-2 gets elected.
There happens some kind of lease renewal that fails with timeout (for example, rolebinding gets deleted). node-2 loses the lease. It stops running.
Who's going to be leader now? There are no more instances running to report the metrics...
I tried this with a unit test, trying to renew around 20 times, and I could see the leader election had stopped like in the example above.
So I believe the implementation as of now (the official one, not the from this branch) has two problems:
A metricbeat instance never stops from reporting metrics, even after losing the lease! It causes duplicated documents.
A metricbeat instance can never be reelected as the leader.
This doesn't necessarily cause a problem in the current implementation, since our previous leader instances never stop.
I think we need to consider other alternatives to leader election or find a way to make it run again, because like this we will be forcing users to delete pods so they can start again.
I checked in elastic-agent and the behaviour seems a bit different. If the leader loses the lease, then it stops collecting the cluster metrics. But it is also removed from leader election process, and then it cannot take the lease again.
This issue is just to put together everything we found out so far about the problems surrounding our current implementation of leader election, and to help track the tasks we need to accomplish before calling it solved.
This issue was first reported here: #34998
Description
We can see from the logs that the lease renewal is failing:
However this should not be a reason for the previous metricbeat lease holder to keep reporting metrics. The expected behavior should be: as soon as the holder loses the lock - no matter if there was a renewal or not - that metricbeat instance should stop reporting metrics.
Since we can see in the logs the message
leader election lock LOST, id beats-leader-kind-worker
, we know at least that this function is being called correctly:beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go
Lines 301 to 305 in 10ff992
Why is this problem happening then?
The reason for the duplicated metrics is actually quite simple.
The leader, once it starts, emits an event with the flag
start
set to true:beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go
Lines 208 to 213 in 10ff992
This event is then captured in this part of the code:
beats/libbeat/autodiscover/autodiscover.go
Lines 141 to 146 in 10ff992
And this handle start function initializes the right configuration for our
a.configs
here:beats/libbeat/autodiscover/autodiscover.go
Lines 264 to 266 in 10ff992
So now we know
updated
is set to true and we need to reload our autodiscover configuration.Once we handle stop events we should do the same. However, we have a problem. When dealing with
stopLeading()
, we use a new event id:beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go
Lines 301 to 305 in 10ff992
And this event id was used to save the configuration on autodiscover... So once we start handling the stop event, we check if we have the configuration there and upload the new autodiscover settings:
beats/libbeat/autodiscover/autodiscover.go
Lines 281 to 284 in 10ff992
Because this is a new event id, nothing is found there, and our metricbeat instance never stops reporting metrics...
Originally posted by @constanca-m in #34998 (comment)
Consequences
Currently, we use
Run
for leader election:beats/libbeat/autodiscover/providers/kubernetes/kubernetes.go
Line 355 in 4525d78
And it says this:
This means we will never have the same leader twice! Because they stop running and they never are reelected!
So we have a problem. Example:
node-1
andnode-2
.node-1
is the first leader.node-1
loses the locker, so it stops running.node-2
gets elected.rolebinding
gets deleted).node-2
loses the lease. It stops running.I tried this with a unit test, trying to renew around 20 times, and I could see the leader election had stopped like in the example above.
So I believe the implementation as of now (the official one, not the from this branch) has two problems:
I think we need to consider other alternatives to leader election or find a way to make it run again, because like this we will be forcing users to delete pods so they can start again.
Originally posted by @constanca-m in #38471 (comment)
How to test
Issue #34998 already mentioned how to reproduce.
Here is an alternative way to do it:
Create a two node cluster. You can do it this way.
And
kind-config.yaml
:Deploy metricbeat on Kubernetes.
Update the lease object like this, so a lease renewal fail occurrs.
Depending on your current holder, you might have to update
holderIdentity
:You should see both hosts reporting metrics now.
Tasks
The text was updated successfully, but these errors were encountered: