Limitation of node number per provisioner #732

sergkondr · 2023-01-12T09:42:48Z

Tell us about your request

It would be nice to have the ability to limit the number of nodes created by certain provisioner. For example:

  limits:
    nodes: "6"

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Let's say our application spawns pods for some tasks, and each pod requires a separate node. Now it limits by the max size of the node group. All these nodes are Spot instances, and there is a problem that sometimes there are no instances of the current family in the region, so we use different instance families: m4, m5, m5a, r5, r5a, etc. These instances could have a different amount of CPU and mem.

It would be nice to have the ability to limit the number of nodes by their count, not by their resources. It is clear that we have max pods in our app, but it is a bit of a synthetic example.

Are you currently working around this issue?

We use limits.resources.cpu with approx number of CPUs, but it is not accurate and not transparent a little bit.

Additional Context

No response

Attachments

No response

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

ellistarn · 2023-01-12T19:05:01Z

Can you explain the use case of why you care about the number of nodes? It's a tricky metric, since nodes are vastly different sizes. You could potentially pay way more with a small number of massive nodes.

sergkondr · 2023-01-18T13:24:14Z

Can you explain the use case of why you care about the number of nodes? It's a tricky metric, since nodes are vastly different sizes. You could potentially pay way more with a small number of massive nodes.

I've been thinking for a while, and it looks like you are right. The only reason to think in the node category is a habit of thinking about servers in data centers or on-prem environments.

But anyway, I think it is a nice-to-have feature, maybe someone will implement it in the future.

cest-pas-faux · 2023-01-18T13:45:04Z

I have a use-case for this, if I buy a specific amount of reserved instances and I want a provisioner with only the related instance type and the number of nodes, it will be easier to read/write instead of having to read the CPU amount * the nodes count.

That's more a nice to have than a really fundamental feature.

ellistarn · 2023-01-18T14:15:30Z

I'm curious -- is there a reason you don't use an EKS managed node group for your RI? If you're already paying for the instances, is there a reason to not have them online and ready to go?

cest-pas-faux · 2023-01-18T14:25:13Z

Sorry for the formulation, we went for savings plans instead a few weeks back, but at that time we were thinking about RI and I thought that would be an easy way to write directly limits.nodes: X in the provisioner.

We are a big company with multiple accounts so if a RI is not used in one account, another one will benefit of the cost reduction, so having all of them up is not that important.

runningman84 · 2023-01-18T19:03:23Z

One usecase for limiting number of nodes could be licensing… maybe you only paid for a max number of nodes with some specific agent for example for monitoring…

sidewinder12s · 2023-01-19T20:08:04Z

Also if you have large scale/complicated IP addressing like Custom Networking with secondary subnets, you may want to limit node count per host/primary AZ to ensure you always have IP addresses available.

gazal-k · 2023-02-05T22:48:45Z

Just my 2 cents: Considering Karpenter could be used to provision a wide range of instance types of various sizes and resource ratios, I think specifying number of nodes could be somewhat counter intuitive. We of course used to specify min & max node numbers with nodegroups / ASGs where it made sense. To leverage RIs or capacity reservations, would adding comments to indicate number of nodes help?

  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - m6i.2xlarge

  limits:
    resources:
      cpu: "80" # 10 instances * 8 vCPU
      memory: 320Gi # 10 instances * 32Gi

sidewinder12s · 2023-02-06T19:46:18Z

Combining restricted requirements + a node count limit might be easy enough to manage.

At scale, using comments to denote instance sizes/classes/resource sets breaks down really quickly/is immediately out of date and is generally a pain to maintain. Also, many of the reasons for wanting to restrict based on node count have nothing to do with resourcing and everything to do with either the physical node count and/or IP/networking limits that are also disconnected from cpu or memory sizing.

gazal-k · 2023-02-06T20:20:52Z

Ah, yes. Apologies, I hadn't noticed your earlier comment about IP addressing. I concede that it's not a problem that's addressed by existing resource requirements.

github-actions · 2023-02-27T12:04:33Z

Labeled for closure due to inactivity in 10 days.

cest-pas-faux · 2023-02-27T13:44:58Z

Up

jonathan-innis · 2023-05-24T19:00:15Z

you may want to limit node count per host/primary AZ to ensure you always have IP addresses available

@sidewinder12s How would you achieve this if each instance type can have a different number of ENIs and a different number of IPs would be allocated for each?

sidewinder12s · 2023-05-24T23:07:33Z

At least in our case, our issues were in a large batch environment where pod density per node was not too bad, so we could generally allocate X IPs per node (vs lots of small pods where we might hit the per ENI IP assignment limits).

We're also using custom networking settings with the aws-vpc-cni to control IP usage, though this increases API Calls against AWS EC2 APIs.

zmpeg · 2023-08-01T19:55:47Z

From a licensing standpoint this would be extremely useful feature. For example: an org purchases 3 licenses, would be nice if karpenter could scale the cluster by replacing a smaller node with a larger one when workloads are added, keeping at 3 license limit. In our case the cost of the licenses heavily outweigh the cost of the nodes.

yr8sdk · 2023-12-11T07:10:46Z

Max nodes could benefit the total cluster resource utilization. Ideally, I would like to schedule most of my pods on the least number of nodes ( depends of HA of course) to have simple control over the pods memory limit and since there isn't CPU limit already, more pods could use the unutilized CPU.

k8s-triage-robot · 2024-03-10T08:05:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

cest-pas-faux · 2024-03-10T13:33:44Z

/remove-lifecycle stale

cest-pas-faux · 2024-03-10T13:35:49Z

Better explanation here : #745

myaser · 2024-03-12T15:14:21Z

we have a similar need (but as a global limit) which is to limit the max number of nodes per cluster
in our setup; we only assign a single IP to the node (always). and we have limited pool of IPs. so the most straightforward way to do this is by having a global limit on number of nodes. similar to the flag --max-nodes-total on cluster autoscaler

this was also explained here
aws/karpenter-provider-aws#4462

jukie · 2024-04-02T14:28:14Z

I've started on adding a global with #1151 but still need to test

Bryce-Soghigian · 2024-04-02T18:53:05Z

/lifecycle frozen

Bryce-Soghigian · 2024-04-02T18:53:22Z

/assign @jukie

jukie · 2024-04-02T22:26:59Z

RFC here: #1160

sergkondr added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 12, 2023

njtran added kind/support Categorizes issue or PR as a support question. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Jan 17, 2023

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2023

tzneal added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2023

njtran transferred this issue from aws/karpenter-provider-aws Nov 2, 2023

ellistarn added the cost-optimization label Nov 7, 2023

JeremyBolster mentioned this issue Jan 29, 2024

Limit The Maximum Number of Nodes in a Cluster #976

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2024

jukie mentioned this issue Apr 2, 2024

feat: Add MaxNodeClaims option #1151

Closed

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 2, 2024

k8s-ci-robot assigned jukie Apr 2, 2024

justinkillen mentioned this issue Apr 26, 2024

NodePool Limit by dollars #1215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limitation of node number per provisioner #732

Limitation of node number per provisioner #732

sergkondr commented Jan 12, 2023

ellistarn commented Jan 12, 2023

sergkondr commented Jan 18, 2023

cest-pas-faux commented Jan 18, 2023 •

edited

ellistarn commented Jan 18, 2023

cest-pas-faux commented Jan 18, 2023

runningman84 commented Jan 18, 2023

sidewinder12s commented Jan 19, 2023

gazal-k commented Feb 5, 2023

sidewinder12s commented Feb 6, 2023

gazal-k commented Feb 6, 2023

github-actions bot commented Feb 27, 2023

cest-pas-faux commented Feb 27, 2023

jonathan-innis commented May 24, 2023

sidewinder12s commented May 24, 2023 •

edited

zmpeg commented Aug 1, 2023 •

edited

yr8sdk commented Dec 11, 2023

k8s-triage-robot commented Mar 10, 2024

cest-pas-faux commented Mar 10, 2024

cest-pas-faux commented Mar 10, 2024

myaser commented Mar 12, 2024

jukie commented Apr 2, 2024 •

edited

Bryce-Soghigian commented Apr 2, 2024

Bryce-Soghigian commented Apr 2, 2024

jukie commented Apr 2, 2024

Limitation of node number per provisioner #732

Limitation of node number per provisioner #732

Comments

sergkondr commented Jan 12, 2023

Tell us about your request

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Are you currently working around this issue?

Additional Context

Attachments

Community Note

ellistarn commented Jan 12, 2023

sergkondr commented Jan 18, 2023

cest-pas-faux commented Jan 18, 2023 • edited

ellistarn commented Jan 18, 2023

cest-pas-faux commented Jan 18, 2023

runningman84 commented Jan 18, 2023

sidewinder12s commented Jan 19, 2023

gazal-k commented Feb 5, 2023

sidewinder12s commented Feb 6, 2023

gazal-k commented Feb 6, 2023

github-actions bot commented Feb 27, 2023

cest-pas-faux commented Feb 27, 2023

jonathan-innis commented May 24, 2023

sidewinder12s commented May 24, 2023 • edited

zmpeg commented Aug 1, 2023 • edited

yr8sdk commented Dec 11, 2023

k8s-triage-robot commented Mar 10, 2024

cest-pas-faux commented Mar 10, 2024

cest-pas-faux commented Mar 10, 2024

myaser commented Mar 12, 2024

jukie commented Apr 2, 2024 • edited

Bryce-Soghigian commented Apr 2, 2024

Bryce-Soghigian commented Apr 2, 2024

jukie commented Apr 2, 2024

cest-pas-faux commented Jan 18, 2023 •

edited

sidewinder12s commented May 24, 2023 •

edited

zmpeg commented Aug 1, 2023 •

edited

jukie commented Apr 2, 2024 •

edited