Fabric8 leader election #1658

wind57 · 2024-05-28T19:37:40Z

No description provided.

…ubernetes

Configure Renovate

…ubernetes

wind57 · 2024-05-29T15:47:55Z

...a/org/springframework/cloud/kubernetes/commons/leader/election/LeaderElectionProperties.java

+ */
+// @formatter:off
+@ConfigurationProperties("spring.cloud.kubernetes.leader.election")
+public record LeaderElectionProperties(


the properties needed in order to configure the new leader election:

waitForPodReady should we wait for the readiness of the pod, before we even trigger the leader election process

publishEvents - should we publish events (ApplicationEvent) when the state of leaders change. We do this in the current implementation, so I added it here also

leaseDuration - TTL of the lease, if for example the leader dies. No other leader candidate can acquire the lease, unless this one expires.

lockNamespace - where to create the "lock" (this is either a lease or a config map)

lockName - the name of the lease or configmap

renewDeadline - once the lock is acquired and we are the current leader, we try to "extend" the lease. We must extend it within this timeline.

retryPeriod - how often to retry when trying to get the lock in order to become the leader.
In our current code, this is what we use in LeaderInitiator::start, more exactly in the scheduleAtFixRate

I'll try to explain this a bit more verbose.

The process internally in fabric8 is something like this:

first try to acquire the lock (lock is either a configmap or a lease) and by "acquire" I mean write to it (or its annotations for a configmap). Whomever writes first (all others will get a 409) becomes the leader.

All leader candidates that are not leaders, will continue to spin forever until they get a chance to become a leader. The retry every retryPeriod

The current leader, after it establishes itself as one, will spin forever too, but will try to extend its leadership. It extends that by updating the entries in the lease, specifically the one we care about is : renewTime. This one is updated on every time retryPeriod. For example, every 2 seconds (retryPeriod), it will update its renewTime with "now".

All other, non-leaders, are spinning and check a few things in each cycle:

"am I the leader?" If the answer is no, they go to (2)

"can I become the leader?" This is answered by looking at :

now().isAfter(leaderElectionRecord.getRenewTime().plus(leaderElectionConfig.getLeaseDuration()))

So they can only try to acquire the leadership if leaseDuration (basically a TTL) + renewTime (when was the last renewal) has expired.

As such, leaseDuration acts a s TTL, if that makes sense. But that means that no one will be able to even try to acquire the lock until that leaseDuration expires and that is OK for the cases when the pod dies or is killed.

But in case of a graceful shutdown (and I implemented this via CompletableFuture::cancel because this is what fabric8 expects), there is code that fabric8 will trigger to "reset" the lease: they will set the renewTime to "now" and leaseDuration to 1 second.

wind57 · 2024-05-29T19:24:59Z

...amework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionAutoConfiguration.java

+
+	@Bean
+	@ConditionalOnMissingBean
+	LeaderElectionConfig fabric8LeaderElectionConfig(LeaderElectionProperties properties, Lock lock,


this is the configuration that must be provided to fabric8, populated with reasonable defaults

wind57 · 2024-05-29T19:25:41Z

...amework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionAutoConfiguration.java

+
+	@Bean
+	@ConditionalOnMissingBean
+	Lock lock(KubernetesClient fabric8KubernetesClient, LeaderElectionProperties properties, String holderIdentity) {


if lease is available on the cluster, use it as the lock implementation. If not, use configmap

wind57 · 2024-05-29T19:26:03Z

...amework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionAutoConfiguration.java

+
+	@Bean
+	@ConditionalOnMissingBean
+	Fabric8LeaderElectionInitiator fabric8LeaderElectionInitiator(String holderIdentity, String podNamespace,


the "initiator" of leader election

wind57 · 2024-05-29T19:26:34Z

...springframework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionCallbacks.java

+ */
+final class Fabric8LeaderElectionCallbacks extends LeaderCallbacks {
+
+	Fabric8LeaderElectionCallbacks(Runnable onStartLeading, Runnable onStopLeading, Consumer<String> onNewLeader) {


encapsulates the callbacks that fabric8 offers to us

wind57 · 2024-05-29T19:27:20Z

...loud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionCallbacksAutoConfiguration.java

+
+	@Bean
+	String holderIdentity() throws UnknownHostException {
+		String podHostName = LeaderUtils.hostName();


we use the pod name as the holder identity

wind57 · 2024-05-29T19:29:15Z

...ramework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionConcurrentITTest.java

+/**
+ * @author wind57
+ */
+@ExtendWith(OutputCaptureExtension.class)


I know that we can't really add more integration tests due to the build time that is already big, so this is as close as I can get to an integration test

wind57 · 2024-05-29T20:37:56Z

...springframework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionInitiator.java

+		CompletableFuture<Void> podReadyFuture = new CompletableFuture<>();
+
+		// wait until pod is ready
+		if (leaderElectionProperties.waitForPodReady()) {


first we wait for the pod to be ready. For that, we schedule a Runnable via: scheduler.scheduleWithFixedDelay, where scheduler is a CachedSingleThreadScheduler, which I borrowed from fabric8, which is used in leader election in their implementation.

This CachedSingleThreadScheduler is pretty interesting: it has a single daemon thread, but in our case : two Runnables that it executes. The first one is the one we submit, the second one is an internal one that checks if there is anything else pending to do (by looking at its inner queue).

So at any point in time, the queue that this executor uses has either two tasks (ours that we submit + internal one), or just one (if any of the two is currently executing). What it does is this: in its own internal Runnable, it will check how many items its internal queue has. If its one, it means something was submitted by end users, if its zero, it means nothing was submitted (or was finished/canceled) and it can shutdown itself.

How I use it: I submit a Runnable to such an executor, that every one second checks if the pod is ready. When it is ready, it completes :

CompletableFuture<Void> podReadyFuture = new CompletableFuture<>(); .... podReady.complete()

This completion matters because later in the code, I do this:

// wait in a different thread until the pod is ready // and in the same thread start the leader election executorService.get().submit(() -> { try { if (leaderElectionProperties.waitForPodReady()) { CompletableFuture<?> ready = podReadyFuture .whenComplete((x, y) -> scheduledFuture.get().cancel(true)); ready.get(); } leaderFuture.set(leaderElector(leaderElectionConfig, fabric8KubernetesClient).start()); leaderFuture.get(); } catch (Exception e) { if (e instanceof CancellationException) { LOG.warn(() -> "leaderFuture was canceled"); } throw new RuntimeException(e); } });

Let's break it down a bit:

CompletableFuture<?> ready = podReadyFuture .whenComplete((x, y) -> scheduledFuture.get().cancel(true));

When pod is ready, cancel the Runnable that is scheduled. This means that the executor will stop also, since there would be only one internal "shutdown" task left inside.

Block until the pod is ready : ready.get()

Once pod is ready kick off leader election:

leaderFuture.set(leaderElector(leaderElectionConfig, fabric8KubernetesClient).start()); leaderFuture.get();

This will "hang" for as long as we are the leader or try to acquire the leadership.

wind57 · 2024-05-29T20:38:27Z

...springframework/cloud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionInitiator.java

+
+	@PreDestroy
+	void preDestroy() {
+		LOG.info(() -> "preDestroy called in the leader initiator");


start cleaning up after a graceful shutdown

wind57 · 2024-05-29T20:42:21Z

I've added explanation on the code a little bit, but I might be biased since I "took apart" the entire fabric8 leader election and tried to understand in its entire beauty. I've also contributed there with some minor PRs, so that I stay in good connection with the code.

Though we might not add integrations tests for obvious reasons, I did test lots of scenarios in a separate project that I will post on my github page so that in future it would be easy to debug any issues.

I am pending documentation, but in order to write it properly, I need to know what direction you see this going. I'll keep this one in sync with future merges from main, until a decision is made (if any, of course :) )

…ubernetes

stefvic · 2024-06-13T14:19:54Z

...loud/kubernetes/fabric8/leader/election/Fabric8LeaderElectionCallbacksAutoConfiguration.java

+	}
+
+	@Bean
+	String podNamespace() {


How about to add one more namespace inferring rule from service account path before fallback it to ENV?
In most use cases, the pod should have it injected by default configuration.
The Fabric8 KubernetesClient#getNamespace() already do it, see https://github.com/fabric8io/kubernetes-client/blob/main/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/Config.java#L915.

public static final String KUBERNETES_NAMESPACE_PATH = "/var/run/secrets/kubernetes.io/serviceaccount/namespace";

indeed! good to have you around Vic!

…ubernetes

wind57 · 2024-07-18T07:06:42Z

@ryanjbaxter I'm proposing a new implementation of fabric8 leader election, that is native in the library itself. For the time being I think that users should be left to opt-in and then in the long run may be have this one as the only implementation... thank you for looking into it

ryanjbaxter · 2024-07-18T19:17:02Z

docs/modules/ROOT/pages/leader-election.adoc

+
+[source]
+----
+spring.cloud.kubernetes.leader.enabled=false


Could we disable this if the new one is enabled so folks dont need to set 2 properties?

ryanjbaxter · 2024-07-18T19:18:55Z

docs/modules/ROOT/pages/leader-election.adoc

+spring.cloud.kubernetes.leader.election.lockName=other-name
+----
+
+The namespace (`default` being set if no explicit one exists) can be set also:


Shouldnt the namespace be the namespace the pod is running in?

ryanjbaxter · 2024-07-18T19:27:26Z

docs/modules/ROOT/pages/leader-election.adoc

+
+Once a certain pod establishes itself as the leader (by acquiring the lock), it will continuously (every `spring.cloud.kubernetes.leader.election.retryPeriod`) try to renew its lease, or in other words: it will try to extend its leadership. When a renewal happens, the "record" that is stored inside the lock, is updated. For example, `renewTime` is updated inside the record, to denote when the last renewal happened. (You can always peek inside these fields by using `kubectl describe lease...` for example).
+
+Renewal must happen within a certain interval, specified by `spring.cloud.kubernetes.leader.election.renewDeadline`. By default, it is equal to 10 seconds and it means that the leader pod has a maximum of 10 seconds to renew its leadership. If that does not happen, this pod is taken out from leader election process and never participates again (unless you refresh the Spring context or restart the pod).


It can never participate again, meaning the pod can no longer become the leader? Why?

wind57 and others added 30 commits December 4, 2021 07:59

test

032014e

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

c3d0ad2

…ubernetes

fix @nested tests not running

10889fd

merged main

8f0375a

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

96ebf42

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

db8403e

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

90a5345

…ubernetes

trigger again

b041c00

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

9580444

…ubernetes

Add renovate.json

a34ac47

Merge branch 'spring-cloud:main' into main

6613f78

Merge pull request #1 from wind57/renovate/configure

1b3eae9

Configure Renovate

Merge branch 'spring-cloud:main' into main

4275382

Delete renovate.json

618f25a

Delete delme.sh

100a9cd

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

4b9056b

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

315a85b

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

ed3c264

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

446d630

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

6821a28

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

9e95a8a

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

02840b1

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

7e31d65

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

926f4d7

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

344e1d4

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

e91ac12

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

ba6e088

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

49025e8

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

a647bb3

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

16b09ab

…ubernetes

wind57 commented May 29, 2024

View reviewed changes

wind57 added 2 commits May 31, 2024 10:24

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

152f6bf

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

a279c0c

…ubernetes

wind57 mentioned this pull request Jun 5, 2024

Fabric8 LeaderProperties.getNamespace(defaultValue) no longer falls back to given default when namespace is undefined #1661

Closed

wind57 added 3 commits June 11, 2024 07:18

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

f6e608e

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

6a51e1a

…ubernetes

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

a02efb2

…ubernetes

stefvic reviewed Jun 13, 2024

View reviewed changes

wind57 added 2 commits June 14, 2024 07:09

Merge branch 'main' into fabric8-leader-election

fa8ca61

merge main + review comments

24dd2e4

wind57 requested a review from stefvic June 17, 2024 02:14

This was referenced Jul 10, 2024

pods are created per Leadership status check #1671

Open

how can use options renewTime and leaseDuration ? #1672

Closed

wind57 added 5 commits July 11, 2024 17:13

Merge branch 'main' of https://github.com/spring-cloud/spring-cloud-k…

5864ab1

…ubernetes

Merge branch 'main' into fabric8-leader-election

d34806d

add documentation

8acf002

typo

2ac6bdd

typo

fdddf41

wind57 marked this pull request as ready for review July 18, 2024 07:05

ryanjbaxter reviewed Jul 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fabric8 leader election #1658

Fabric8 leader election #1658

wind57 commented May 28, 2024

wind57 May 29, 2024 •

edited

Loading

wind57 May 29, 2024

wind57 May 29, 2024

wind57 May 29, 2024

wind57 May 29, 2024

wind57 May 29, 2024

wind57 May 29, 2024

wind57 May 29, 2024

wind57 May 29, 2024

wind57 commented May 29, 2024

stefvic Jun 13, 2024

wind57 Jun 14, 2024

wind57 commented Jul 18, 2024

ryanjbaxter Jul 18, 2024

ryanjbaxter Jul 18, 2024

ryanjbaxter Jul 18, 2024


		Once a certain pod establishes itself as the leader (by acquiring the lock), it will continuously (every `spring.cloud.kubernetes.leader.election.retryPeriod`) try to renew its lease, or in other words: it will try to extend its leadership. When a renewal happens, the "record" that is stored inside the lock, is updated. For example, `renewTime` is updated inside the record, to denote when the last renewal happened. (You can always peek inside these fields by using `kubectl describe lease...` for example).

		Renewal must happen within a certain interval, specified by `spring.cloud.kubernetes.leader.election.renewDeadline`. By default, it is equal to 10 seconds and it means that the leader pod has a maximum of 10 seconds to renew its leadership. If that does not happen, this pod is taken out from leader election process and never participates again (unless you refresh the Spring context or restart the pod).

Fabric8 leader election #1658

Are you sure you want to change the base?

Fabric8 leader election #1658

Conversation

wind57 commented May 28, 2024

wind57 May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wind57 commented May 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wind57 commented Jul 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wind57 May 29, 2024 •

edited

Loading