Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change leadership mechanism #182

Merged

Conversation

RamLavi
Copy link
Member

@RamLavi RamLavi commented Jun 9, 2020

Currently our backup-active implementation make it so that only 1 kubemacpool pod is in ready state. This causes a problem with the deployment status to be not ready.
In order to fix this, we need to change the leadership election to occur using the controller-runtime api, and not externally as implemented now.

What this PR does / why we need it:

Special notes for your reviewer:

Release note:

NONE

Copy link
Collaborator

@ormergi ormergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good this is big changes!
Would you please fix typos and warp commits title and body it cut of on github.

I see that we keep using retryOnConflict when interacting with kube-api, can you please elaborate why we need it?

pkg/manager/manager.go Outdated Show resolved Hide resolved
pkg/manager/leaderelection.go Show resolved Hide resolved
pkg/manager/manager.go Outdated Show resolved Hide resolved
pkg/manager/manager.go Outdated Show resolved Hide resolved
pkg/manager/manager.go Show resolved Hide resolved
pkg/manager/manager.go Outdated Show resolved Hide resolved
@@ -13,13 +14,18 @@ func (k *KubeMacPoolManager) waitToStartLeading() error {
}

func (k *KubeMacPoolManager) markPodAsLeader() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this function name to something like labelLeaderElectedPod?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to AddLeaderLabelToElectedPod. is that ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect

pkg/manager/leaderelection.go Outdated Show resolved Hide resolved
pkg/pool-manager/virtualmachine_pool.go Outdated Show resolved Hide resolved
tests/tests.go Show resolved Hide resolved
@RamLavi
Copy link
Member Author

RamLavi commented Jun 10, 2020

/hold

new elected does not acquire old cache

Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some error wrapping is missing, also we may weant to do the Elected check at pool manager outside, so pool manager is agnostic of leader election, for that we have to make an poolManager Start method public..

pkg/manager/leaderelection.go Outdated Show resolved Hide resolved
pkg/manager/leaderelection.go Outdated Show resolved Hide resolved
pkg/manager/leaderelection.go Outdated Show resolved Hide resolved
pkg/manager/leaderelection.go Show resolved Hide resolved
pkg/manager/manager.go Outdated Show resolved Hide resolved
pkg/pool-manager/pool.go Outdated Show resolved Hide resolved
tests/tests.go Outdated
Comment on lines 333 to 340
numberOfReadyPods := int32(0)
for _, podObject := range podsList.Items {
if podObject.Status.Phase != corev1.PodRunning {
return false
for _, condition := range podObject.Status.Conditions {
if condition.Type == corev1.PodReady && condition.Status == corev1.ConditionTrue {
numberOfReadyPods += 1
}
}
}
if numberOfReadyPods < numOfReplica {
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @ormergi point out, just checking managerDeployment.Status.ReadyReplicas != numOfReplica is enough since it's more or less the same as this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamLavi this is not resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry missed it

Currently we use an external leader election where only the elected pod actually runs the controller-runtime manager.
We want to change this behavior so that all pods run the controller-runtime manager, thus are shown as ready.
To do that we remove the od code of the leader election.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
…elected leader pod

This also means we get can remove the use of leaderElectionChannel

Signed-off-by: Ram Lavi <ralavi@redhat.com>
…er()

Signed-off-by: Ram Lavi <ralavi@redhat.com>
in thie commit we move waitToStartLeading() to a separate go routine, so that both pods will start the controller-runtime manager.
we then add the leader election inside the controller runtime manager.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
since the webhook service operates only on pods with kubemacpool leader label, we cant start the webhooks until the election is complete and a leader as been chosen.
in order to do that, we add a new status to the pod, to prevent the pods to be ready and the webhhoks to start until the label is set.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
…appropriate rbacs.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
Signed-off-by: Ram Lavi <ralavi@redhat.com>
…y will initiate after teh election is complete

since we don't want both pods to perform pool-manager related operations, we start the pool-manager routines  only after the kubemacpool leader is selected.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
…oyment replicas, as well as making the needed changes in test to wait for both kubemacpool instead of only to the leader.

Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the change_leadership_mechanism branch from 537cb74 to a445b80 Compare June 10, 2020 11:37
@RamLavi
Copy link
Member Author

RamLavi commented Jun 10, 2020

/unhold

fixed issues and also added a test for it in other PR #183

pkg/manager/manager.go Outdated Show resolved Hide resolved
pkg/manager/manager.go Outdated Show resolved Hide resolved
pkg/names/names.go Outdated Show resolved Hide resolved
pkg/pool-manager/pool.go Outdated Show resolved Hide resolved
pkg/pool-manager/pool.go Show resolved Hide resolved
tests/tests.go Outdated
Comment on lines 333 to 340
numberOfReadyPods := int32(0)
for _, podObject := range podsList.Items {
if podObject.Status.Phase != corev1.PodRunning {
return false
for _, condition := range podObject.Status.Conditions {
if condition.Type == corev1.PodReady && condition.Status == corev1.ConditionTrue {
numberOfReadyPods += 1
}
}
}
if numberOfReadyPods < numOfReplica {
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamLavi this is not resolved

pkg/manager/leaderelection.go Show resolved Hide resolved
@RamLavi RamLavi force-pushed the change_leadership_mechanism branch from e4b432a to 0325252 Compare June 10, 2020 12:33
Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit.

pkg/pool-manager/pool.go Outdated Show resolved Hide resolved
pkg/manager/leaderelection.go Outdated Show resolved Hide resolved
Signed-off-by: Ram Lavi <ralavi@redhat.com>
@RamLavi RamLavi force-pushed the change_leadership_mechanism branch from 0325252 to a3708fe Compare June 10, 2020 13:09
@qinqon
Copy link
Member

qinqon commented Jun 10, 2020

/lgtm
/approved

@qinqon
Copy link
Member

qinqon commented Jun 10, 2020

/approve

@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot merged commit c1a31b3 into k8snetworkplumbingwg:master Jun 10, 2020
@RamLavi
Copy link
Member Author

RamLavi commented Jun 16, 2020

/cherry-pick release-v0.14.0

@kubevirt-bot
Copy link
Collaborator

@RamLavi: #182 failed to apply on top of branch "release-v0.14.0":

Applying: add mgr to KubeMacPoolManager and use mgr.Elected() to mark only the elected leader pod This also means we get can remove the use of leaderElectionChannel
Applying: add retryOnConflict to better handle future conflict in markPodAsLeader()
Applying: move leader election into the controller-runtime manager.
Applying: adding a new readinessGate status to kubemacpool pods.
Applying: adding the new readinessGate status to the manifest. also adding the appropriate rbacs.
Applying: changing sync script to wait for deployment ready
Applying: move pool-manager starting routines to waitToStartLeading so that they will initiate after teh election is complete
error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	pkg/pool-manager/pool_test.go
M	pkg/pool-manager/virtualmachine_pool.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/pool-manager/virtualmachine_pool.go
Auto-merging pkg/pool-manager/pool_test.go
CONFLICT (content): Merge conflict in pkg/pool-manager/pool_test.go
Patch failed at 0008 move pool-manager starting routines to waitToStartLeading so that they will initiate after teh election is complete

In response to this:

/cherry-pick release-v0.14.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants