Add VMSize validation to reject VM Sizes with < 2 CPU #765

CecileRobertMichon · 2020-07-08T00:48:44Z

As documented at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin, kubeadm requires 2 CPUs. To improve UX, we should fail fast when the user specifies an Azure VM size that has less than 2 CPUs, that is:

Standard_A1_v2
Standard_B1ls
Standard_B1s
Standard_B1ms
Standard_DC1s_v2
Standard_D1_v2
Standard_DS1_v2
Standard_F1s
Standard_F1

(source: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/compute-benchmark-scores)

This validation should be added to azuremachine_validation.go

/help
/good-first-issue

CecileRobertMichon · 2020-07-08T00:49:12Z

@rsmitty does the same apply to Talos?

rsmitty · 2020-07-08T01:30:38Z

It’s not strictly required but we do mention 2 CPUs as a minimum for control plane nodes in our docs, so I’m cool with this.

alexeldeib · 2020-07-10T03:37:58Z

It's more accurate to call the SKU API for this to avoid generating a static list, the API exposes vCPU as a capability. We could cache the known sizes at start up to avoid extra http calls inside the webhook? Maybe with some background refresh or refresh-on-miss logic?

mboersma · 2020-07-10T14:57:09Z

Agree that we should make an API call for the SKU rather than hard-code a list. We do already have a resourceskus client that is used to check the acceleratedNetworking capability pre-flight, and it should not be too hard to extend that to check the vCPU count.

alexeldeib · 2020-07-10T16:08:04Z

thoughts on breaking out skus client to a standalone cache? it might make sense not to issue calls for every VM, every time. We can query once for a subscription and cache the result. If we see we're restricted we can use that as a cache "miss" to refresh (to avoid getting hit by shifting point in time restrictions).

mboersma · 2020-07-10T20:54:54Z

I think a cache would be an good optimization for something like this. The results for such API calls are basically static for a given subscription. I intended to memoize the resourceskusclient.HasAcceleratedNetworking call in #645 but didn't get there.

CecileRobertMichon · 2020-07-10T23:42:31Z

I'm going to remove good-first-issue, given that requirements aren't well defined at the moment and that adding a cache/making calls to the sku client would require a little more work.

/remove-good-first-issue

My initial thinking was that a static list was acceptable for something like this since it would only be used to improve UX when that SKU is going to fail down the line anyways and the deny list would basically be a superset of each individual subscription's list of VM sizes with 1 CPU. So it be a good incremental value add and a low hanging fruit (very easy to implement).

I agree that dynamic calls is the best option for accuracy but I'm concerned that in this situation it might be overkill since the alternative of not catching the wrong SKU does result in a failure, just not a fast one. Having to fetch the SKUs from Azure every create would definitely add latency + extra API calls so I would prefer staying away from that. If there is a simple-ish way to implement a cache so it doesn't have to be fetched every reconcile and that the added latency is not significant, I coud get behind that.

Another idea: we could look into just doing regex since 1 CPU SKUs are named in a pretty predictable way (something like Standard_[A-Z]*1[^1-9]*.* should work) and that can be easily unit tested + would avoid needing the make calls to the CPU.

alexeldeib · 2020-07-11T00:30:32Z

I did some work recently with the SKU API for https://github.com/alexeldeib/throttled/blob/ace/dev/src/resource.rs and some other capacity work. I can take a shot at this after I tie up IP and ephemeral OS.

I'll leave it unassigned in case someone beats me to it :)

alexeldeib · 2020-07-22T01:34:02Z

After #783 merges this will be really simple, but i'll probably tackle a few other pieces before circling back. If anyone wants to work on this feel free.

compute SKUs API exposes a capability called vCPUs (or CPUs for e.g. bare metal dedicated). It should be at least 2 for the user requested vm SKU. Look at ephemeral OS' use of HasCapability.

/good-first-issue

cpanato · 2020-08-06T15:43:25Z

can i try this one? @CecileRobertMichon @alexeldeib

devigned · 2020-08-06T15:44:23Z

/assign @cpanato

Go for it!

cpanato · 2020-08-10T14:47:58Z

/remove-help

cpanato · 2020-08-15T16:26:21Z

should we validate the memory as well?

cloud/vm/vmss: validate if vCPUs and Memory matched the minimum required

k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jul 8, 2020

CecileRobertMichon mentioned this issue Jul 9, 2020

💚 add test to validate accelerated networking for VMs #764

Merged

3 tasks

CecileRobertMichon added this to the next milestone Jul 10, 2020

k8s-ci-robot removed the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Jul 10, 2020

alexeldeib mentioned this issue Jul 12, 2020

💎 refactor skus client #783

Merged

3 tasks

k8s-ci-robot added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Jul 22, 2020

CecileRobertMichon added this to Backlog in Cluster API Azure Jul 22, 2020

k8s-ci-robot assigned cpanato Aug 6, 2020

k8s-ci-robot removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Aug 10, 2020

cpanato moved this from Backlog to In progress in Cluster API Azure Aug 16, 2020

cpanato mentioned this issue Aug 16, 2020

cloud/vm/vmss: validate if vCPUs and Memory matched the minimum required #884

Merged

3 tasks

k8s-ci-robot closed this as completed in #884 Aug 21, 2020

Cluster API Azure automation moved this from In progress to Done (2020 - Q2) Aug 21, 2020

k8s-ci-robot added a commit that referenced this issue Aug 21, 2020

Merge pull request #884 from cpanato/GH-765

d1d3090

cloud/vm/vmss: validate if vCPUs and Memory matched the minimum required

CecileRobertMichon removed this from the next milestone May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VMSize validation to reject VM Sizes with < 2 CPU #765

Add VMSize validation to reject VM Sizes with < 2 CPU #765

CecileRobertMichon commented Jul 8, 2020

CecileRobertMichon commented Jul 8, 2020

rsmitty commented Jul 8, 2020

alexeldeib commented Jul 10, 2020

mboersma commented Jul 10, 2020

alexeldeib commented Jul 10, 2020 •

edited

mboersma commented Jul 10, 2020

CecileRobertMichon commented Jul 10, 2020

alexeldeib commented Jul 11, 2020

alexeldeib commented Jul 22, 2020 •

edited

cpanato commented Aug 6, 2020

devigned commented Aug 6, 2020 •

edited

cpanato commented Aug 10, 2020

cpanato commented Aug 15, 2020

Add VMSize validation to reject VM Sizes with < 2 CPU #765

Add VMSize validation to reject VM Sizes with < 2 CPU #765

Comments

CecileRobertMichon commented Jul 8, 2020

CecileRobertMichon commented Jul 8, 2020

rsmitty commented Jul 8, 2020

alexeldeib commented Jul 10, 2020

mboersma commented Jul 10, 2020

alexeldeib commented Jul 10, 2020 • edited

mboersma commented Jul 10, 2020

CecileRobertMichon commented Jul 10, 2020

alexeldeib commented Jul 11, 2020

alexeldeib commented Jul 22, 2020 • edited

cpanato commented Aug 6, 2020

devigned commented Aug 6, 2020 • edited

cpanato commented Aug 10, 2020

cpanato commented Aug 15, 2020

alexeldeib commented Jul 10, 2020 •

edited

alexeldeib commented Jul 22, 2020 •

edited

devigned commented Aug 6, 2020 •

edited