Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing spec.replicas of the Deployment resets replicas count to single replica #67135

Closed
kirs opened this Issue Aug 8, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@kirs
Copy link

kirs commented Aug 8, 2018

/kind bug

What happened:

A Deployment spec with hardcoded replicas parameter was applied:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    name: critical-service
  name: critical-service
spec:
  replicas: 100
  ...

Later, we've decided to let the replicas value be dynamic to be able to manage it with kubectl scale, without changing YAML every time. So we applied another spec, without the replicas key:

diff --git a/deployment.yml b/deployment.yml
index 339531d..f5a3e5f 100644
--- a/deployment.yml
+++ b/deployment.yml
@@ -5,5 +5,4 @@ metadata:
     name: critical-service
   name: critical-service
 spec:
-  replicas: 100
   ...

After the updated spec was applied, Kubernetes ignored existing replicas count and scaled the RS down to 1 replica. This was a very unexpected behaviour to scale down an existing deployment.

What you expected to happen:

We expected that skipping replicas line in YAML would preserve existing replicas count, but in fact it reset all of them to default value which is 1 replicas. This effectively scaled lack of capacity and a severe service distruption to a service that had > 1000 replicas.

How to reproduce it (as minimally and precisely as possible):

  1. Create a deployment from a YAML file with hardcoded replicas count, with the value above 1
  2. Remove replicas line from the YAML and apply the file
  3. See ReplicaSet being scaled down to 1 replicas, instead of preserving existing value above 1

Anything else we need to know?:

For us this feels that scaling down by default is an extremely dangerous behaviour. It makes sense to use the default of 1 replica for new resources, but not for existing resources that have thousands of replicas running.

Environment:

  • Kubernetes version: v1.10.5-gke.3
  • Cloud provider or hardware configuration: GKE
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Aug 8, 2018

@kirs: There are no sig labels on this issue. Please add a sig label.

A sig label can be added by either:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <group-name>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. See the group list.
The <group-suffix> in method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Aug 8, 2018

defaults are applied for missing values on both create and update.

to let the replicas field be ignored by apply, omit it from the initial applied manifest

to stop replicas from being managed by apply, use kubectl apply edit-last-applied (or kubectl apply set-last-applied) to remove the replicas field from the saved "last applied" state prior to applying the new manifest to keep apply from removing the replicas field (which makes the apiserver default back to 1)

/close

@kirs

This comment has been minimized.

Copy link
Author

kirs commented Aug 25, 2018

@liggitt thanks for the tip! apply set-last-applied is indeed helpful in this case.

I understand that it comes from the default value of 1 being set because replicas line is missing.

My intention for opening this issue was more about starting a conversation about what's the best default behaviour for Kubernetes.

As for me and for my colleagues, it seems like scaling down by default is an extremely dangerous behaviour. It makes sense to use the default of 1 replica for new resources, but Kubernetes should not scale down existing resources that have thousands of replicas live.

I'll be looking forward to make a change for ReplicaSet controller to make this behaviour slightly safer and a bit less unexpected. Let me know what you think!

@ravihugo

This comment has been minimized.

Copy link

ravihugo commented Feb 23, 2019

Just got hit by this too. Our CI environment does not currently set the replica value because we would rather manage this separately. So now it looks like we'll need to start managing this in the CI environment. Also the suggested work-around seems a bit strange. Is that equivalent of the first apply setting the replica size to 1 and then re-apply it back to 100 (for example).

Wouldn't that massively affect what the rolling update would look like?

In case anyone needs our fix it basically looks like this:

REPLICA_COUNT=$(kubectl -n $ENV get deployment ${DEPLOYMENT_NAME} -o=jsonpath='{$.spec.replicas}')
# Runs sed to replace CI_COMMIT_SHA located anywhere in the template file. Passes output of sed directly to kubectl's deploy
sed -e "s/\$CI_COMMIT_SHA/$CI_COMMIT_SHA/" -e "s/\#REPLICA_COUNT/replicas: ${REPLICA_COUNT:-1}/" kube-deploy.template.yaml | kubectl -n $ENV apply -f -

And our template kube-deploy.template.yaml has #REPLICA_COUNT in the proper place:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: taco-service
spec:
  #REPLICA_COUNT
  selector:
    matchLabels:

The reason I chose to put this in as #REPLICA_COUNT was basically to avoid error reports from VSCode since my Yaml files get red-underlines if I try to get too fancy.

Oh - forgot to mention, the primary reason we opted to not use kubectl rolling-update and use kubectl apply is because rolling-update primary can affect like 1 attribute, and the primary use-case is just the image name. We like the ability to control mounts, RAM, variables, etc.. with each git push. This makes our CI environment easier when we do have to make such changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.