Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated grafana deployment cannot be upgraded without downtime #381

Closed
mariusstaicu opened this issue Mar 30, 2021 · 2 comments · Fixed by #383
Closed

Generated grafana deployment cannot be upgraded without downtime #381

mariusstaicu opened this issue Mar 30, 2021 · 2 comments · Fixed by #383
Labels
bug Something isn't working good first issue Good for newcomers triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mariusstaicu
Copy link

Is your feature request related to a problem? Please describe.
Whenever I change something to the grafana.yml spec file and redeploy, the operator upgrades all the replicas at once, hence there's no deployment update possible without downtime.
I suspect this is because the deployment is generated with RollingUpdate strategy with max surge / max unavailable 25. This causes all the replicas to be updated at once, which defeats the purpose of having multiple replicas.

strategy:
    rollingUpdate:
      maxSurge: 25
      maxUnavailable: 25
    type: RollingUpdate

(If applicable)If your feature request solves a bug please provide a link to the community issue

didn't find any issue reported on this matter before

Describe the solution you'd like

Leave the update strategy default - type RollingUpdate with maxSurge 25% and maxUnavailable 25% or implement an update strategy that doesn't cause all pods to be updated at once.

Describe alternatives you've considered
n/a

Additional context
Add any other context or screenshots about the feature request here.

Existing solutions
If applicable please provide a link to an existing solution from a different project

@mariusstaicu mariusstaicu added enhancement New feature or request needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 30, 2021
@mariusstaicu
Copy link
Author

mariusstaicu commented Mar 30, 2021

This should be relabeled as a bug.
I suspect this is a typo kind of bug because instead of operator generating update strategy like:

strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

the operator generates it instead like this, which translates to 25 replicas updating at once, so it's not possible to have no downtime if running grafana with less than 26 replicas

strategy:
    rollingUpdate:
      maxSurge: 25
      maxUnavailable: 25
    type: RollingUpdate

Extract from k8s documentation, the value can be either absolute or percentace (int or string):

strategy.rollingUpdate.maxSurge (IntOrString)

The maximum number of pods that can be scheduled above the desired number of pods. Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%). This can not be 0 if MaxUnavailable is 0. Absolute number is calculated from percentage by rounding up. Defaults to 25%. Example: when this is set to 30%, the new ReplicaSet can be scaled up immediately when the rolling update starts, such that the total number of old and new pods do not exceed 130% of desired pods. Once old pods have been killed, new ReplicaSet can be scaled up further, ensuring that total number of pods running at any time during the update is at most 130% of desired pods.

IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.

strategy.rollingUpdate.maxUnavailable (IntOrString)

The maximum number of pods that can be unavailable during the update. Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%). Absolute number is calculated from percentage by rounding down. This can not be 0 if MaxSurge is 0. Defaults to 25%. Example: when this is set to 30%, the old ReplicaSet can be scaled down to 70% of desired pods immediately when the rolling update starts. Once new pods are ready, old ReplicaSet can be scaled down further, followed by scaling up the new ReplicaSet, ensuring that the total number of pods available at all times during the update is at least 70% of desired pods.

IntOrString is a type that can hold an int32 or a string. When used in JSON or YAML marshalling and unmarshalling, it produces or consumes the inner type. This allows you to have, for example, a JSON field that can accept a name or number.

@HVBE
Copy link
Collaborator

HVBE commented Mar 30, 2021

Hi @mariusstaicu , Thanks for bringing this up, I've done a bit of testing locally and it seems like switching
https://github.com/integr8ly/grafana-operator/blob/master/pkg/controller/model/grafanaDeployment.go#L118-L125

func getRollingUpdateStrategy() *v1.RollingUpdateDeployment {
	var maxUnaval intstr.IntOrString = intstr.FromInt(25)
	var maxSurge intstr.IntOrString = intstr.FromInt(25)
	return &v1.RollingUpdateDeployment{
		MaxUnavailable: &maxUnaval,
		MaxSurge:       &maxSurge,
	}
}

to use intstr.FromString("25%") should get the job done. We can implement this

@HVBE HVBE added bug Something isn't working good first issue Good for newcomers triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. enhancement New feature or request labels Mar 30, 2021
@HVBE HVBE linked a pull request Mar 31, 2021 that will close this issue
7 tasks
@pb82 pb82 closed this as completed in #383 Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants