Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow statefulsets to be scaled up and down based on each other #62

Merged
merged 2 commits into from
Jun 1, 2023

Conversation

56quarters
Copy link
Contributor

Allow statefulsets within a rollout group to declare a "leader" and set their number of replicas based on the number of replicas in the leader statefulset. Scaling up is allowed immediately while scaling down must respect the minimum amount of time between scale downs. For example at minimum, 12 hours needs to pass between scale downs of Mimir ingester zones.

Allow statefulsets within a rollout group to declare a "leader" and set
their number of replicas based on the number of replicas in the leader
statefulset. Scaling up is allowed immediately while scaling down must
respect the minimum amount of time between scale downs. For example at
minimum, 12 hours needs to pass between scale downs of Mimir ingester
zones.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
@@ -198,6 +199,10 @@ func (c *RolloutController) reconcile(ctx context.Context) error {
groups := groupStatefulSetsByLabel(sets, RolloutGroupLabel)
var reconcileErrs error
for groupName, groupSets := range groups {
if err := c.reconcileStatefulSetsGroupReplicas(ctx, groupName, groupSets); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the way this functionality integrates with the existing reconcile loop, it feels "bolted on" to me. However, I did it like this to minimize changes to the existing logic and make it easier to develop. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you're calling this function here? I would feel more natural to call it inside reconcileStatefulSetsGroup(). Few reasons (not in order of importance):

  1. The logic here should accounted by the metrics tracked by reconcileStatefulSetsGroup()
  2. sortStatefulSets() is already called by reconcileStatefulSetsGroup()
  3. I think it's safer that if we do any change to replicas, we skip the rest of "rollout" reconciliation in this loop and we'll re-reconcile in the next iteration. This allows us to avoid to take in account "what happens if in the same reconciliation loop we both do changes to replicas and delete pods to rollout changes"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thank you. I'll refactor a bit to move the replica reconciliation around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47883c7


- For `ingester-zone-a`, add the following:
- Labels:
- `grafana.com/min-time-between-zones-downscale=12h` (change the value here to an appropriate duration)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to make both of these labels into annotations because they don't need to be labels but wanted to get this functionality rolled out first. We'll have to coordinate with Loki for that change.

README.md Show resolved Hide resolved
@56quarters 56quarters marked this pull request as ready for review May 30, 2023 18:49
@56quarters 56quarters requested a review from a team as a code owner May 30, 2023 18:49
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good job! The code is clean and every easy to follow. I just left 1 major comment about skipping the rest of the reconcile if we do any change to replicas. Other comments are nits / non blocking. Thanks!

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@@ -198,6 +199,10 @@ func (c *RolloutController) reconcile(ctx context.Context) error {
groups := groupStatefulSetsByLabel(sets, RolloutGroupLabel)
var reconcileErrs error
for groupName, groupSets := range groups {
if err := c.reconcileStatefulSetsGroupReplicas(ctx, groupName, groupSets); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you're calling this function here? I would feel more natural to call it inside reconcileStatefulSetsGroup(). Few reasons (not in order of importance):

  1. The logic here should accounted by the metrics tracked by reconcileStatefulSetsGroup()
  2. sortStatefulSets() is already called by reconcileStatefulSetsGroup()
  3. I think it's safer that if we do any change to replicas, we skip the rest of "rollout" reconciliation in this loop and we'll re-reconcile in the next iteration. This allows us to avoid to take in account "what happens if in the same reconciliation loop we both do changes to replicas and delete pods to rollout changes"?

pkg/controller/controller.go Outdated Show resolved Hide resolved
pkg/controller/controller_test.go Show resolved Hide resolved
pkg/controller/replicas.go Outdated Show resolved Hide resolved
pkg/controller/replicas.go Outdated Show resolved Hide resolved
pkg/controller/replicas.go Outdated Show resolved Hide resolved
@colega
Copy link
Contributor

colega commented Jun 1, 2023

Very clean work! Agree with Marco's comments, especially about changing replicas when reconciling each statefulset group.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, LGTM, thanks!

@andyasp andyasp mentioned this pull request Jun 1, 2023
@56quarters 56quarters merged commit 0cbea1a into main Jun 1, 2023
5 checks passed
@56quarters 56quarters deleted the 56quarters-replicas branch June 1, 2023 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants