Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Autodiscover Kubernetes] Leader election never stops instances after lease lock is lost #38543

Closed
2 tasks done
constanca-m opened this issue Mar 22, 2024 · 1 comment · Fixed by elastic/elastic-agent#4542
Closed
2 tasks done
Assignees
Labels
bug Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@constanca-m
Copy link
Contributor

constanca-m commented Mar 22, 2024

This issue is just to put together everything we found out so far about the problems surrounding our current implementation of leader election, and to help track the tasks we need to accomplish before calling it solved.

This issue was first reported here: #34998

Description

We can see from the logs that the lease renewal is failing:

E0403 16:29:14.243178      13 leaderelection.go:330] error retrieving resource lock kube-system/metricbeat-cluster-leader: leases.coordination.k8s.io "metricbeat-cluster-leader" is forbidden: User "system:serviceaccount:kube-system:metricbeat" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
...
I0403 16:29:16.238161      13 leaderelection.go:283] failed to renew lease kube-system/metricbeat-cluster-leader: timed out waiting for the condition
E0403 16:29:16.238229      13 leaderelection.go:306] Failed to release lock: resource name may not be empty
..."message":"leader election lock LOST, id beats-leader-kind-worker"...

However this should not be a reason for the previous metricbeat lease holder to keep reporting metrics. The expected behavior should be: as soon as the holder loses the lock - no matter if there was a renewal or not - that metricbeat instance should stop reporting metrics.

Since we can see in the logs the message leader election lock LOST, id beats-leader-kind-worker, we know at least that this function is being called correctly:

OnStoppedLeading: func() {
logger.Debugf("leader election lock LOST, id %v", id)
eventID := fmt.Sprintf("%v-%v", metaUID, time.Now().UnixNano())
stopLeading(uuid.String(), eventID)
},

Why is this problem happening then?

The reason for the duplicated metrics is actually quite simple.

The leader, once it starts, emits an event with the flag start set to true:

event := bus.Event{
"start": true,
"provider": uuid,
"id": eventID,
"unique": "true",
}

This event is then captured in this part of the code:

if _, ok := event["start"]; ok {
// if updated is true, we don't want to set it back to false
if a.handleStart(event) {
updated = true
}
}

And this handle start function initializes the right configuration for our a.configs here:

if updated {
a.configs[eventID] = newCfg
}

So now we know updated is set to true and we need to reload our autodiscover configuration.

Once we handle stop events we should do the same. However, we have a problem. When dealing with stopLeading(), we use a new event id:

OnStoppedLeading: func() {
logger.Debugf("leader election lock LOST, id %v", id)
eventID := fmt.Sprintf("%v-%v", metaUID, time.Now().UnixNano())
stopLeading(uuid.String(), eventID)
},

And this event id was used to save the configuration on autodiscover... So once we start handling the stop event, we check if we have the configuration there and upload the new autodiscover settings:

if len(a.configs[eventID]) > 0 {
a.logger.Debugf("Stopping %d configs", len(a.configs[eventID]))
updated = true
}

Because this is a new event id, nothing is found there, and our metricbeat instance never stops reporting metrics...

Originally posted by @constanca-m in #34998 (comment)

Consequences

Currently, we use Run for leader election:

And it says this:

// Run starts the leader election loop. Run will not return
// before leader election loop is stopped by ctx or it has
// stopped holding the leader lease

This means we will never have the same leader twice! Because they stop running and they never are reelected!

So we have a problem. Example:

  1. We have two nodes, node-1 and node-2.
  2. node-1 is the first leader.
  3. node-1 loses the locker, so it stops running.
  4. node-2 gets elected.
  5. There happens some kind of lease renewal that fails with timeout (for example, rolebinding gets deleted). node-2 loses the lease. It stops running.
  6. Who's going to be leader now? There are no more instances running to report the metrics...

I tried this with a unit test, trying to renew around 20 times, and I could see the leader election had stopped like in the example above.

So I believe the implementation as of now (the official one, not the from this branch) has two problems:

  1. A metricbeat instance never stops from reporting metrics, even after losing the lease! It causes duplicated documents.
  2. A metricbeat instance can never be reelected as the leader.
    • This doesn't necessarily cause a problem in the current implementation, since our previous leader instances never stop.

I think we need to consider other alternatives to leader election or find a way to make it run again, because like this we will be forcing users to delete pods so they can start again.

Originally posted by @constanca-m in #38471 (comment)

How to test

Issue #34998 already mentioned how to reproduce.

Here is an alternative way to do it:

Create a two node cluster. You can do it this way.
$ kind create cluster --config kind-config.yaml

And kind-config.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: KubeProxyConfiguration
        metricsBindAddress: "0.0.0.0"
  - role: worker

Deploy metricbeat on Kubernetes.

Update the lease object like this, so a lease renewal fail occurrs.

Depending on your current holder, you might have to update holderIdentity:

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  name: metricbeat-cluster-leader
  namespace: kube-system
spec:
  holderIdentity: beats-leader-metricbeat-worker

You should see both hosts reporting metrics now.

Tasks

@constanca-m constanca-m self-assigned this Mar 22, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 22, 2024
@constanca-m constanca-m added Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 22, 2024
@MichaelKatsoulis
Copy link
Contributor

I checked in elastic-agent and the behaviour seems a bit different. If the leader loses the lease, then it stops collecting the cluster metrics. But it is also removed from leader election process, and then it cannot take the lease again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants