Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd controller shouldn't removes the scale-up annotation until scale-up succeeds #587

Merged
merged 3 commits into from
May 5, 2023

Conversation

ishan16696
Copy link
Member

@ishan16696 ishan16696 commented May 2, 2023

What this PR does / why we need it:
Etcd controller shouldn't removes the scale-up annotation from etcd statefulset until scale-up succeeds.

Which issue(s) this PR fixes:
Fixes #582

Special notes for your reviewer:

Release note:

Added check to ensure that the scale up annotation is removed from the etcd statefulset only when scale-up succeeds

@ishan16696 ishan16696 requested a review from a team as a code owner May 2, 2023 03:36
@gardener-robot gardener-robot added needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels May 2, 2023
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 2, 2023
Copy link
Contributor

@abdasgupta abdasgupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment.

pkg/component/etcd/statefulset/statefulset.go Outdated Show resolved Hide resolved
@ishan16696
Copy link
Member Author

Druid e2e tests passed with this PR ✅

@abdasgupta
Copy link
Contributor

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels May 2, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 2, 2023
// If `scaleToMultiNodeAnnotationKey` annotation is already present in etcd statefulset
// then it is better to check scale-up was successful or not before removing `scaleToMultiNodeAnnotationKey` annotation.
if metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) {
if (sts.Spec.Replicas != nil && sts.Status.UpdatedReplicas < *sts.Spec.Replicas) || sts.Status.Replicas > sts.Status.UpdatedReplicas {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point:
sts.Spec.Replicas will never be nil. So you can safely remove this check.
Below is the docstring from its source code:

// replicas is the desired number of replicas of the given Template.
// These are replicas in the sense that they are instantiations of the
// same Template, but individual replicas also have a consistent identity.
// If unspecified, defaults to 1.
// TODO: Consider a rename of this field.
// +optional
Replicas *int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if DoD/SREs scale down the statefulset to 0, will it be fine as it is pointer points to zero ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A value of zero is still a valid value and that is fine. I created an STS, scaled it down to 0 and the replicas were 0. Then i removed replicas from the spec completely and as soon as i saved it, the value of replicas became 1 as indicated in the doc string. So there is no reason to have a nil check

// If `scaleToMultiNodeAnnotationKey` annotation is already present in etcd statefulset
// then it is better to check scale-up was successful or not before removing `scaleToMultiNodeAnnotationKey` annotation.
if metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) {
if (sts.Spec.Replicas != nil && sts.Status.UpdatedReplicas < *sts.Spec.Replicas) || sts.Status.Replicas > sts.Status.UpdatedReplicas {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is not complete.
I tested and captured some states. What i did was i took a single etcd and deployed it on a KIND cluster and then scaled it directly by using kubectl scale sts etcd-test --replicas=3 and i saw the following:
Before the scale-out status looked like:

status:
  availableReplicas: 1
  collisionCount: 0
  currentReplicas: 1
  currentRevision: etcd-test-68496b5c76
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updateRevision: etcd-test-68496b5c76
  updatedReplicas: 1

Just after scale up the status looked like:

status:
  availableReplicas: 1
  collisionCount: 0
  currentReplicas: 2
  currentRevision: etcd-test-68496b5c76
  observedGeneration: 2
  readyReplicas: 1
  replicas: 2
  updateRevision: etcd-test-68496b5c76
  updatedReplicas: 2

When all replicas were launched but they were not ready, the status was:

status:
  availableReplicas: 1
  collisionCount: 0
  currentReplicas: 3
  currentRevision: etcd-test-68496b5c76
  observedGeneration: 2
  readyReplicas: 1
  replicas: 3
  updateRevision: etcd-test-68496b5c76
  updatedReplicas: 3

If you see the last status sts.Status.Replicas = 3 and sts.Status.UpdatedReplicas=3 but the 2/3 pods are not ready and are not available, this means that k8s will keep restarting these containers. So as per your comment if the intent is to check that the scale-up was successful then the current conditions will not suffice. However if the intent is to check that scale-up has been done to 3 and therefore there are 3 pods (irrespective of their status) then the condition would suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wish to ensure that scale has resulted in all pods being ready then you must check availableReplicas/readyReplicas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a problem with using availableReplicas or readyReplicas when scale-up was successful but just after scale-up 1 pod goes to CrashLoopBackOff. Let me explain this:
When we marked the cluster for scale-up --> druid adds a sts annotation --> backup-restore detects scale-up and provides config to its corresponding etcd.
so, when other 2 pods starts running fine, 0th pod will always restarted in scale-up scenario,
so in this case status of sts etcd is

status:
  availableReplicas: 2
  collisionCount: 0
  currentRevision: etcd-main-55c8f5489c
  observedGeneration: 2
  readyReplicas: 2
  replicas: 3
  updateRevision: etcd-main-75989cd994
  updatedReplicas: 3

Although scale-up was successful and it is just a pod restart but as you can see if I include availableReplicas or readyReplicas then druid will still thinks scale-up is not yet complete and won't remove the annotation, hence wrong config will get provided by backup-restore to etcd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to simulate your test.

Copy link
Contributor

@unmarshall unmarshall May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scale-up annotation is currently used in 2 places in backup-restore.

  • Initializer - here your approach will work.
  • httpAPI.go which calls GetInitialClusterStateIfScaleup - for discussion i will put the code here:
func GetInitialClusterStateIfScaleup(ctx context.Context, logger logrus.Entry, clientSet client.Client, podName string, podNS string) (*string, error) {
	// Read etcd statefulset to check annotation or updated replicas to toggle `initial-cluster-state`
	etcdSts, err := GetStatefulSet(ctx, clientSet, podNS, podName)
	if err != nil {
		logger.Errorf("unable to fetch statefulset {namespace: %s, name: %s} %v", podNS, podName[:strings.LastIndex(podName, "-")], err)
		return nil, err
	}

	if IsAnnotationPresent(etcdSts, ScaledToMultiNodeAnnotationKey) {
		return pointer.StringPtr(ClusterStateExisting), nil
	}

	if *etcdSts.Spec.Replicas > 1 && *etcdSts.Spec.Replicas > etcdSts.Status.UpdatedReplicas {
		return pointer.StringPtr(ClusterStateExisting), nil
	}
	return nil, nil
}

If the annotation is removed then it evaluates the second check which it also does not enter and thus returns nil.
In the caller we have the following code:

if state == nil {
		// Not a Scale-up scenario.
		// Either a multi-node bootstrap or a restoration of single member in multi-node.
		m := member.NewMemberControl(h.EtcdConnectionConfig)

		// check whether a learner is present in the cluster
		// if a learner is present then return `ClusterStateExisting` else `ClusterStateNew`.
		if present, err := m.IsLearnerPresent(ctx); present && err == nil {
			return miscellaneous.ClusterStateExisting, nil
		}
		return miscellaneous.ClusterStateNew, nil

	}

If i read the code correctly then this will result in a ClusterStateNew instead of returning Existing, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PVC is intact then I guess etcd don't need the config as it will take the config from db, so may be we can include check of availableReplicas then , I will confirm this by trying to run scale-up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it didn’t work, we can't use availableReplicas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if sts != nil && metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) {
		if sts.Status.UpdatedReplicas < *sts.Spec.Replicas || sts.Status.Replicas > sts.Status.UpdatedReplicas || sts.Status.AvailableReplicas < sts.Status.UpdatedReplicas {
			annotations[scaleToMultiNodeAnnotationKey] = ""
			return annotations
		}
	}

I made these changes but as expected druid e2e fails due to this condition sts.Status.AvailableReplicas < sts.Status.UpdatedReplicas


// If `scaleToMultiNodeAnnotationKey` annotation is already present in etcd statefulset
// then it is better to check scale-up was successful or not before removing `scaleToMultiNodeAnnotationKey` annotation.
if metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a chance that the sts could be nil due to a getExistingSts() call and could result in a nil pointer dereference
Please add a sts != nil check here

@gardener-robot gardener-robot added needs/changes Needs (more) changes and removed reviewed/lgtm Has approval for merging labels May 2, 2023
@aaronfern
Copy link
Contributor

Can you update the release note to something like the following?

Added additional check to ensure that the scale up annotation is removed from the statefulset only when scale-up succeeds

@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 3, 2023
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 4, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 4, 2023
@ishan16696
Copy link
Member Author

/invite @unmarshall @aaronfern

Copy link
Contributor

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added the reviewed/lgtm Has approval for merging label May 5, 2023
@gardener-robot gardener-robot removed the needs/changes Needs (more) changes label May 5, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 5, 2023
@@ -286,7 +286,8 @@ func immutableFieldUpdate(sts *appsv1.StatefulSet, val Values) bool {

func clusterScaledUpToMultiNode(val *Values, sts *appsv1.StatefulSet) bool {
if sts != nil && sts.Spec.Replicas != nil {
return val.Replicas > 1 && *sts.Spec.Replicas == 1
return (val.Replicas > 1 && *sts.Spec.Replicas == 1) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you enhance the unit tests to include all cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so there is an already existing unit tests for this.

Copy link
Contributor

@unmarshall unmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ishan16696 ishan16696 merged commit b07ebcd into gardener:master May 5, 2023
1 check passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label May 5, 2023
@ishan16696 ishan16696 deleted the bugFix/Scale-upAnnotation branch May 5, 2023 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Etcd-druid removes the scale-up annotation even if scale-up didn't succeed.
9 participants