-
Notifications
You must be signed in to change notification settings - Fork 521
CLOUDP-69096: Implement correct scale down logic #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing reconciliation to keep track of the value correctly
| func (m MongoDB) DesiredReplicas() int { | ||
| return m.Spec.Members | ||
| } | ||
|
|
||
| func (m MongoDB) CurrentReplicas() int { | ||
| return m.Status.CurrentStatefulSetReplicas | ||
| } | ||
|
|
||
| func (m *MongoDB) StatefulSetReplicasThisReconciliation() int { | ||
| return scale.ReplicasThisReconciliation(m) | ||
| } | ||
|
|
||
| type automationConfigReplicasScaler struct { | ||
| current, desired int | ||
| } | ||
|
|
||
| func (a automationConfigReplicasScaler) DesiredReplicas() int { | ||
| return a.desired | ||
| } | ||
|
|
||
| func (a automationConfigReplicasScaler) CurrentReplicas() int { | ||
| return a.current | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-use existing interface to allow us to get the correct number of automation config members each reconciliation (we will get all of them on first creation, otherwise we scale up/down one at a time)
| value: "mongodb-kubernetes-operator" | ||
| - name: AGENT_IMAGE # The MongoDB Agent the operator will deploy to manage MongoDB deployments | ||
| value: quay.io/mongodb/mongodb-agent:10.19.0.6562-1 | ||
| value: quay.io/mongodb/mongodb-agent-dev:scaledown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this image is the same as the previous one, but just with a new readiness probe, I will create a separate PR once this is no longer a dev image.
| RUN mkdir -p /var/lib/mongodb-mms-automation/probes/ \ | ||
| && curl --retry 3 https://readinessprobe.s3-us-west-1.amazonaws.com/readiness -o /var/lib/mongodb-mms-automation/probes/readinessprobe \ | ||
| # && curl --retry 3 https://readinessprobe.s3-us-west-1.amazonaws.com/readiness -o /var/lib/mongodb-mms-automation/probes/readinessprobe \ | ||
| && curl --retry 3 https://readiness-probe-scale-test.s3.amazonaws.com/readiness -o /var/lib/mongodb-mms-automation/probes/readinessprobe \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update this again once I've moved the readiness probe to the correct bucket
bznein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Massive work, just a very small comment!
| makeStatefulSetReady(t, mgr.GetClient(), mdb) | ||
|
|
||
| makeStatefulSetReady(t, mgr.GetClient(), mdb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated by mistake?
| makeStatefulSetReady(t, mgr.GetClient(), mdb) | ||
|
|
||
| makeStatefulSetReady(t, mgr.GetClient(), mdb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated by mistake?
rodrigovalin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
My only concern is related to the confusing status attributes, one related to MongoDB Replica Set members and the other one related to Kubernetes StatefulSet members. On top of that the API to change these attributes uses a different set of names (automation config members vs statefulset replicas).
My proposal is to change:
- status.replicaSetMembers -> status.mongoDBMembers
- withAutomationConfigMembers -> withMongoDBMembers
This is completely subjective so I would like to hear @bznein opinion on this.
pkg/apis/mongodb/v1/mongodb_types.go
Outdated
| Message string `json:"message,omitempty"` | ||
|
|
||
| CurrentStatefulSetReplicas int `json:"currentStatefulSetReplicas"` | ||
| CurrentReplicaSetMembers int `json:"currentReplicaSetMembers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately there's a name clash between MongoDB ReplicaSet and Kubernetes' Replica Set and this one is actually Mongo's version of it, but not StatefulSet, which correspond to the Kubernetes object.
This will be super confusing at people looking at a status at any given time...
I would definitely vote for:
CurrentStatefulSetReplicas int `json:"currentStatefulSetReplicas"`
CurrentMongoDBMembers int `json:"currentMongoDBMembers"` There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this but I agree this sounds clearer 👍🏻
| if err := r.resetStatefulSetUpdateStrategy(mdb); err != nil { | ||
| return status.Update(r.client.Status(), &mdb, | ||
| statusOptions(). | ||
| withAutomationConfigMembers(mdb.AutomationConfigMembersThisReconciliation()). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently in the status we have members for "statefulset" (kubernetes) and "replicaset" (mongodb), but this API for the statusOptions builder accepts a "withAutomationConfigMembers" which I'm not sure which one it corresponds.
| return result.OK() | ||
| } | ||
|
|
||
| func (o *optionBuilder) withAutomationConfigMembers(members int) *optionBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this API seems super ergonomic, it is not :)
There's no way (easy way) to see which attribute on the status this function will end up modifying.
If I call withAutomationConfigMembers(n), will it end up doing status.currentStatefulSetReplicas = n or status.currentReplicaSetMembers = n?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a developer's perspective, it would be as easy as to explicitly describe which status attribute this function is supposed to change.
// withAutomationConfigMembers sets `status.someImportantAttribute`.
func withAutomationConfigMembers () {}
All Submissions:
closes #XXXXin your comment to auto-close the issue that your PR fixes (if such).closes #136
This PR updates the operator to correctly scale down the deployment. Previously we updated the automation config and the stateful set at the same time, resulting in reducing the StatefulSet size while the agents were still trying to find members of the replica set that no longer exist.
The readiness probe was updated to check to see if the host was removed from the automation config. If so, it is no longer ready. We now reply on k8s to kill the pod once it has been removed from the replica set.
Unfortunately this current implementation takes a long time to start, as we need to wait
failureThresholdfailures, until the pods start being killed.Because it takes so long, for now I've commented out the scale down code in our e2e and added a new test that only performs a scale down to reduce test time but ensure we are still testing scale down.