Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow horizontally scaling statefulsets #29

Closed
stuart-warren opened this issue May 11, 2017 · 34 comments
Closed

Allow horizontally scaling statefulsets #29

stuart-warren opened this issue May 11, 2017 · 34 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@stuart-warren
Copy link

Hi,

We'd like to add the ability to proportionally scale stateful sets. Is there a particular reason this is a really bad idea?

Our use-case is for docker registry mirror pods and prometheus instances where we use a shared git repo of kubernetes manifests. On minikube/tiny clusters, we run out of resources that we would have in production.

Might need to be a little more careful, but should be doable https://kubernetes.io/docs/tasks/run-application/scale-stateful-set

@stuart-warren
Copy link
Author

I assume this would require a newer version of the client library too

@MrHohn MrHohn added the kind/feature Categorizes issue or PR as related to a new feature. label May 11, 2017
@MrHohn
Copy link
Member

MrHohn commented May 11, 2017

Sounds good to me, would the idea also be scaling StatefulSets proportionally based on cluster size?

/cc @foxish @janetkuo for more insight :)

@davidopp
Copy link

@fgrzadkowski
Copy link

What would you be scaling based on?

@fgrzadkowski
Copy link

Would it be safe to just remove some instances? Would it be safe to add new ones?

@stuart-warren
Copy link
Author

We'd likely be scaling on the number of nodes, but perhaps if the #19 is completed, then also size of node.

I see it that you have to know that you want to scale a stateful set in advance, so it's up to you to know it is safe to do so and whether a min/max setting is required. I haven't looked too deep at the code yet to see if would handle not being able to scale down due to a node being unavailable.

@mwielgus
Copy link

What would be the main use-case for this?

@stuart-warren
Copy link
Author

I said in the initial post :)

We'd like to have a common set of kubernetes manifest files that set up a base cluster and any necessary services for dev, test and prod. Currently we struggle to run this on very small clusters/minikube because on larger clusters we want more instances of prometheus/etcd/docker registry mirrors/etc.

@mwielgus
Copy link

Right, must have missed that. Sorry :).

@MrHohn
Copy link
Member

MrHohn commented May 12, 2017

I see it that you have to know that you want to scale a stateful set in advance, so it's up to you to know it is safe to do so and whether a min/max setting is required. I haven't looked too deep at the code yet to see if would handle not being able to scale down due to a node being unavailable.

@stuart-warren The autoscaler only takes available nodes into account.

It is totally true that it's up to the users to know whether it is safe to scale a statefulset. Though I'm not sure how to make our generic controllers be application-aware?

@Tedezed
Copy link

Tedezed commented May 22, 2017

Issue related in kubernetes/kubernetes: kubernetes/kubernetes#44033

@foxish
Copy link

foxish commented Jul 7, 2017

@gyliu513
The doc talks about kubectl scale which is shorthand from the CLI to modify the number of replicas in the StatefulSet spec. kubectl autoscale creates an HPA resource for you (described in kubernetes/kubernetes#48591), and that's different. This thread is talking about cluster-proportional-autoscaler which would scale the statefulset according to the number of nodes in the cluster.

For most stateful applications (zookeeper, mysql, etc), the scale needs deliberate thought and is likely not be something we want to vary with the size of the cluster. @stuart-warren, for the docker registry mirror pods, why does that use-case require a StatefulSet?

/cc @kow3ns

@rdsubhas
Copy link

rdsubhas commented Jul 7, 2017

Hi folks, reading the docs, node autoscaling says it won't scale down a node if there is a pod not backed by a replicationcontroller. Similarly, looking at the API docs, looks like horizontal pod autoscaler supports only pods backed by replicationcontroller now.

So - a tantential question, are there any plans to back StatefulSet with a replicationcontroller? Because now I don't see any RS behind SS. Because having the pod ID and ordering guarantees are really nice (reduce metrics explosion, better kubectl dev UX, etc) while still having same functionality of Deployments would be cool 👍

@stuart-warren
Copy link
Author

@foxish technically yes this app doesn't need to be a statefulset, but we'd still like to be able to control the size of a zookeeper/cassandra cluster depending on the number of nodes in a cluster.

Ideally we'd have "everything" run in minikube with reduced resource requests and single instances and in a massive production cluster with many instances and increased resource requests.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 21, 2019
@stuart-warren
Copy link
Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 22, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 19, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@salavessa
Copy link

salavessa commented Nov 6, 2019

Hi, I would also like to see this functionality implemented.

Looking at the source code the change seems super simple and straight forward (but I may be terribly wrong).

The use case is related to the unique ability of a statefulset to use volumeClaimTemplates which provisions pv/pvc automatically if necessary (https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#components).
I also use podManagementPolicy: Parallel so no issue if one of the lower index pods is erroring or can't be scheduled.
There could also be a big fat WARNING in the docs to highlight the less obvious issues (and discussed in some comments above) of using a statefulset instead of a deployment but the functionality would be there if you would really want it.

Thanks!

@salavessa
Copy link

salavessa commented Nov 6, 2019

/remove-lifecycle rotten
/reopen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 6, 2019
@salavessa
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@salavessa: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MrHohn MrHohn reopened this Nov 6, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 5, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@djjayeeta
Copy link

A use case for this feature. Kube state metrics has automated horizontal sharding (https://github.com/kubernetes/kube-state-metrics#horizontal-scaling-sharding) which is based on statefulset without PVC. A good metric to scale the statefulset is latency recommended by the community. Given it requires HPA with custom metrics (requires some extra work to adjust it in our architecture). Another approximate way can be to scale statefulset based on number of nodes (I may be wrong in this scenario).

What I would like to say, statefulset PV's can be left on the users to handle, some statefulset may not use PV's (ex this case), so it may be safe to use this feature in those cases.

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@diranged
Copy link

/reopen

This is a really obvious feature to add IMO... Would really like to see it worked on.

@k8s-ci-robot
Copy link
Contributor

@diranged: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

This is a really obvious feature to add IMO... Would really like to see it worked on.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests