Skip to content

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Sep 10, 2020

All Submissions:

  • Have you opened an Issue before filing this PR?
  • Have you signed our CLA?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Put closes #XXXX in your comment to auto-close the issue that your PR fixes (if such).

closes #136

This PR updates the operator to correctly scale down the deployment. Previously we updated the automation config and the stateful set at the same time, resulting in reducing the StatefulSet size while the agents were still trying to find members of the replica set that no longer exist.

The readiness probe was updated to check to see if the host was removed from the automation config. If so, it is no longer ready. We now reply on k8s to kill the pod once it has been removed from the replica set.

Unfortunately this current implementation takes a long time to start, as we need to wait failureThreshold failures, until the pods start being killed.

Because it takes so long, for now I've commented out the scale down code in our e2e and added a new test that only performs a scale down to reduce test time but ensure we are still testing scale down.

@chatton chatton added the wip label Sep 10, 2020
Comment on lines +301 to +323
func (m MongoDB) DesiredReplicas() int {
return m.Spec.Members
}

func (m MongoDB) CurrentReplicas() int {
return m.Status.CurrentStatefulSetReplicas
}

func (m *MongoDB) StatefulSetReplicasThisReconciliation() int {
return scale.ReplicasThisReconciliation(m)
}

type automationConfigReplicasScaler struct {
current, desired int
}

func (a automationConfigReplicasScaler) DesiredReplicas() int {
return a.desired
}

func (a automationConfigReplicasScaler) CurrentReplicas() int {
return a.current
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-use existing interface to allow us to get the correct number of automation config members each reconciliation (we will get all of them on first creation, otherwise we scale up/down one at a time)

value: "mongodb-kubernetes-operator"
- name: AGENT_IMAGE # The MongoDB Agent the operator will deploy to manage MongoDB deployments
value: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
value: quay.io/mongodb/mongodb-agent-dev:scaledown
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this image is the same as the previous one, but just with a new readiness probe, I will create a separate PR once this is no longer a dev image.

@chatton chatton removed the wip label Sep 21, 2020
RUN mkdir -p /var/lib/mongodb-mms-automation/probes/ \
&& curl --retry 3 https://readinessprobe.s3-us-west-1.amazonaws.com/readiness -o /var/lib/mongodb-mms-automation/probes/readinessprobe \
# && curl --retry 3 https://readinessprobe.s3-us-west-1.amazonaws.com/readiness -o /var/lib/mongodb-mms-automation/probes/readinessprobe \
&& curl --retry 3 https://readiness-probe-scale-test.s3.amazonaws.com/readiness -o /var/lib/mongodb-mms-automation/probes/readinessprobe \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this again once I've moved the readiness probe to the correct bucket

Copy link
Contributor

@bznein bznein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Massive work, just a very small comment!

Comment on lines 436 to 438
makeStatefulSetReady(t, mgr.GetClient(), mdb)

makeStatefulSetReady(t, mgr.GetClient(), mdb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated by mistake?

Comment on lines +480 to 482
makeStatefulSetReady(t, mgr.GetClient(), mdb)

makeStatefulSetReady(t, mgr.GetClient(), mdb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated by mistake?

Copy link
Contributor

@rodrigovalin rodrigovalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

My only concern is related to the confusing status attributes, one related to MongoDB Replica Set members and the other one related to Kubernetes StatefulSet members. On top of that the API to change these attributes uses a different set of names (automation config members vs statefulset replicas).

My proposal is to change:

  • status.replicaSetMembers -> status.mongoDBMembers
  • withAutomationConfigMembers -> withMongoDBMembers

This is completely subjective so I would like to hear @bznein opinion on this.

Message string `json:"message,omitempty"`

CurrentStatefulSetReplicas int `json:"currentStatefulSetReplicas"`
CurrentReplicaSetMembers int `json:"currentReplicaSetMembers"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's a name clash between MongoDB ReplicaSet and Kubernetes' Replica Set and this one is actually Mongo's version of it, but not StatefulSet, which correspond to the Kubernetes object.

This will be super confusing at people looking at a status at any given time...

I would definitely vote for:

	CurrentStatefulSetReplicas int `json:"currentStatefulSetReplicas"`
	CurrentMongoDBMembers   int `json:"currentMongoDBMembers"` 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this but I agree this sounds clearer 👍🏻

if err := r.resetStatefulSetUpdateStrategy(mdb); err != nil {
return status.Update(r.client.Status(), &mdb,
statusOptions().
withAutomationConfigMembers(mdb.AutomationConfigMembersThisReconciliation()).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in the status we have members for "statefulset" (kubernetes) and "replicaset" (mongodb), but this API for the statusOptions builder accepts a "withAutomationConfigMembers" which I'm not sure which one it corresponds.

return result.OK()
}

func (o *optionBuilder) withAutomationConfigMembers(members int) *optionBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this API seems super ergonomic, it is not :)

There's no way (easy way) to see which attribute on the status this function will end up modifying.

If I call withAutomationConfigMembers(n), will it end up doing status.currentStatefulSetReplicas = n or status.currentReplicaSetMembers = n?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a developer's perspective, it would be as easy as to explicitly describe which status attribute this function is supposed to change.

// withAutomationConfigMembers sets `status.someImportantAttribute`.
func withAutomationConfigMembers () {}

@chatton chatton merged commit 83b011f into master Sep 24, 2020
@chatton chatton deleted the CLOUDP-69096_fix_scale_down branch September 24, 2020 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pods crashed and can't start back with replicas number from 3 to 1

4 participants