-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VMSize validation to reject VM Sizes with < 2 CPU #765
Comments
@rsmitty does the same apply to Talos? |
It’s not strictly required but we do mention 2 CPUs as a minimum for control plane nodes in our docs, so I’m cool with this. |
It's more accurate to call the SKU API for this to avoid generating a static list, the API exposes vCPU as a capability. We could cache the known sizes at start up to avoid extra http calls inside the webhook? Maybe with some background refresh or refresh-on-miss logic? |
Agree that we should make an API call for the SKU rather than hard-code a list. We do already have a |
thoughts on breaking out skus client to a standalone cache? it might make sense not to issue calls for every VM, every time. We can query once for a subscription and cache the result. If we see we're restricted we can use that as a cache "miss" to refresh (to avoid getting hit by shifting point in time restrictions). |
I think a cache would be an good optimization for something like this. The results for such API calls are basically static for a given subscription. I intended to memoize the |
I'm going to remove good-first-issue, given that requirements aren't well defined at the moment and that adding a cache/making calls to the sku client would require a little more work. /remove-good-first-issue My initial thinking was that a static list was acceptable for something like this since it would only be used to improve UX when that SKU is going to fail down the line anyways and the deny list would basically be a superset of each individual subscription's list of VM sizes with 1 CPU. So it be a good incremental value add and a low hanging fruit (very easy to implement). I agree that dynamic calls is the best option for accuracy but I'm concerned that in this situation it might be overkill since the alternative of not catching the wrong SKU does result in a failure, just not a fast one. Having to fetch the SKUs from Azure every create would definitely add latency + extra API calls so I would prefer staying away from that. If there is a simple-ish way to implement a cache so it doesn't have to be fetched every reconcile and that the added latency is not significant, I coud get behind that. Another idea: we could look into just doing regex since 1 CPU SKUs are named in a pretty predictable way (something like |
I did some work recently with the SKU API for https://github.com/alexeldeib/throttled/blob/ace/dev/src/resource.rs and some other capacity work. I can take a shot at this after I tie up IP and ephemeral OS. I'll leave it unassigned in case someone beats me to it :) |
After #783 merges this will be really simple, but i'll probably tackle a few other pieces before circling back. If anyone wants to work on this feel free. compute SKUs API exposes a capability called /good-first-issue |
can i try this one? @CecileRobertMichon @alexeldeib |
/assign @cpanato Go for it! |
/remove-help |
should we validate the memory as well? |
cloud/vm/vmss: validate if vCPUs and Memory matched the minimum required
As documented at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#before-you-begin, kubeadm requires 2 CPUs. To improve UX, we should fail fast when the user specifies an Azure VM size that has less than 2 CPUs, that is:
(source: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/compute-benchmark-scores)
This validation should be added to azuremachine_validation.go
/help
/good-first-issue
The text was updated successfully, but these errors were encountered: