-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Etcd controller shouldn't removes the scale-up annotation until scale-up succeeds #587
Etcd controller shouldn't removes the scale-up annotation until scale-up succeeds #587
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment.
Druid e2e tests passed with this PR ✅ |
/lgtm |
// If `scaleToMultiNodeAnnotationKey` annotation is already present in etcd statefulset | ||
// then it is better to check scale-up was successful or not before removing `scaleToMultiNodeAnnotationKey` annotation. | ||
if metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) { | ||
if (sts.Spec.Replicas != nil && sts.Status.UpdatedReplicas < *sts.Spec.Replicas) || sts.Status.Replicas > sts.Status.UpdatedReplicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point:
sts.Spec.Replicas
will never be nil. So you can safely remove this check.
Below is the docstring from its source code:
// replicas is the desired number of replicas of the given Template.
// These are replicas in the sense that they are instantiations of the
// same Template, but individual replicas also have a consistent identity.
// If unspecified, defaults to 1.
// TODO: Consider a rename of this field.
// +optional
Replicas *int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if DoD/SREs scale down the statefulset to 0
, will it be fine as it is pointer points to zero ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A value of zero is still a valid value and that is fine. I created an STS, scaled it down to 0 and the replicas were 0. Then i removed replicas from the spec completely and as soon as i saved it, the value of replicas became 1 as indicated in the doc string. So there is no reason to have a nil check
// If `scaleToMultiNodeAnnotationKey` annotation is already present in etcd statefulset | ||
// then it is better to check scale-up was successful or not before removing `scaleToMultiNodeAnnotationKey` annotation. | ||
if metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) { | ||
if (sts.Spec.Replicas != nil && sts.Status.UpdatedReplicas < *sts.Spec.Replicas) || sts.Status.Replicas > sts.Status.UpdatedReplicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is not complete.
I tested and captured some states. What i did was i took a single etcd and deployed it on a KIND cluster and then scaled it directly by using kubectl scale sts etcd-test --replicas=3
and i saw the following:
Before the scale-out status looked like:
status:
availableReplicas: 1
collisionCount: 0
currentReplicas: 1
currentRevision: etcd-test-68496b5c76
observedGeneration: 1
readyReplicas: 1
replicas: 1
updateRevision: etcd-test-68496b5c76
updatedReplicas: 1
Just after scale up the status looked like:
status:
availableReplicas: 1
collisionCount: 0
currentReplicas: 2
currentRevision: etcd-test-68496b5c76
observedGeneration: 2
readyReplicas: 1
replicas: 2
updateRevision: etcd-test-68496b5c76
updatedReplicas: 2
When all replicas were launched but they were not ready, the status was:
status:
availableReplicas: 1
collisionCount: 0
currentReplicas: 3
currentRevision: etcd-test-68496b5c76
observedGeneration: 2
readyReplicas: 1
replicas: 3
updateRevision: etcd-test-68496b5c76
updatedReplicas: 3
If you see the last status sts.Status.Replicas = 3
and sts.Status.UpdatedReplicas=3
but the 2/3 pods are not ready and are not available, this means that k8s will keep restarting these containers. So as per your comment if the intent is to check that the scale-up was successful then the current conditions will not suffice. However if the intent is to check that scale-up has been done to 3 and therefore there are 3 pods (irrespective of their status) then the condition would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wish to ensure that scale has resulted in all pods being ready then you must check availableReplicas/readyReplicas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a problem with using availableReplicas or readyReplicas
when scale-up was successful but just after scale-up 1 pod goes to CrashLoopBackOff
. Let me explain this:
When we marked the cluster for scale-up --> druid adds a sts annotation --> backup-restore detects scale-up and provides config to its corresponding etcd.
so, when other 2 pods starts running fine, 0th pod will always restarted in scale-up scenario,
so in this case status of sts etcd is
status:
availableReplicas: 2
collisionCount: 0
currentRevision: etcd-main-55c8f5489c
observedGeneration: 2
readyReplicas: 2
replicas: 3
updateRevision: etcd-main-75989cd994
updatedReplicas: 3
Although scale-up was successful and it is just a pod restart but as you can see if I include availableReplicas or readyReplicas
then druid will still thinks scale-up is not yet complete and won't remove the annotation, hence wrong config will get provided by backup-restore to etcd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to simulate your test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scale-up annotation is currently used in 2 places in backup-restore.
- Initializer - here your approach will work.
- httpAPI.go which calls
GetInitialClusterStateIfScaleup
- for discussion i will put the code here:
func GetInitialClusterStateIfScaleup(ctx context.Context, logger logrus.Entry, clientSet client.Client, podName string, podNS string) (*string, error) {
// Read etcd statefulset to check annotation or updated replicas to toggle `initial-cluster-state`
etcdSts, err := GetStatefulSet(ctx, clientSet, podNS, podName)
if err != nil {
logger.Errorf("unable to fetch statefulset {namespace: %s, name: %s} %v", podNS, podName[:strings.LastIndex(podName, "-")], err)
return nil, err
}
if IsAnnotationPresent(etcdSts, ScaledToMultiNodeAnnotationKey) {
return pointer.StringPtr(ClusterStateExisting), nil
}
if *etcdSts.Spec.Replicas > 1 && *etcdSts.Spec.Replicas > etcdSts.Status.UpdatedReplicas {
return pointer.StringPtr(ClusterStateExisting), nil
}
return nil, nil
}
If the annotation is removed then it evaluates the second check which it also does not enter and thus returns nil.
In the caller we have the following code:
if state == nil {
// Not a Scale-up scenario.
// Either a multi-node bootstrap or a restoration of single member in multi-node.
m := member.NewMemberControl(h.EtcdConnectionConfig)
// check whether a learner is present in the cluster
// if a learner is present then return `ClusterStateExisting` else `ClusterStateNew`.
if present, err := m.IsLearnerPresent(ctx); present && err == nil {
return miscellaneous.ClusterStateExisting, nil
}
return miscellaneous.ClusterStateNew, nil
}
If i read the code correctly then this will result in a ClusterStateNew instead of returning Existing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If PVC is intact then I guess etcd don't need the config as it will take the config from db, so may be we can include check of availableReplicas
then , I will confirm this by trying to run scale-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it didn’t work, we can't use availableReplicas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sts != nil && metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) {
if sts.Status.UpdatedReplicas < *sts.Spec.Replicas || sts.Status.Replicas > sts.Status.UpdatedReplicas || sts.Status.AvailableReplicas < sts.Status.UpdatedReplicas {
annotations[scaleToMultiNodeAnnotationKey] = ""
return annotations
}
}
I made these changes but as expected druid e2e fails due to this condition sts.Status.AvailableReplicas < sts.Status.UpdatedReplicas
|
||
// If `scaleToMultiNodeAnnotationKey` annotation is already present in etcd statefulset | ||
// then it is better to check scale-up was successful or not before removing `scaleToMultiNodeAnnotationKey` annotation. | ||
if metav1.HasAnnotation(sts.ObjectMeta, scaleToMultiNodeAnnotationKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a chance that the sts could be nil due to a getExistingSts()
call and could result in a nil pointer dereference
Please add a sts != nil
check here
Can you update the release note to something like the following?
|
/invite @unmarshall @aaronfern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -286,7 +286,8 @@ func immutableFieldUpdate(sts *appsv1.StatefulSet, val Values) bool { | |||
|
|||
func clusterScaledUpToMultiNode(val *Values, sts *appsv1.StatefulSet) bool { | |||
if sts != nil && sts.Spec.Replicas != nil { | |||
return val.Replicas > 1 && *sts.Spec.Replicas == 1 | |||
return (val.Replicas > 1 && *sts.Spec.Replicas == 1) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you enhance the unit tests to include all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so there is an already existing unit tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
What this PR does / why we need it:
Etcd controller shouldn't removes the scale-up annotation from etcd statefulset until scale-up succeeds.
Which issue(s) this PR fixes:
Fixes #582
Special notes for your reviewer:
Release note: