Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GCP spot machines #905

Open
williamflynt opened this issue Jun 3, 2022 · 0 comments
Open

Support GCP spot machines #905

williamflynt opened this issue Jun 3, 2022 · 0 comments

Comments

@williamflynt
Copy link

williamflynt commented Jun 3, 2022

What would you like to be added:

Google Cloud recently rolled out spot VMs. Unlike the existing preemptible machine type, a spot VM has a notionally unlimited lifetime. Pricing for spot VMs is the same as for preemptible VMs, typically about 30% of the cost of a dedicated VM.

Why is this needed:

Imagine running a cluster using relatively expensive machines, like Tau 2D instances. Your pods are part of a big data serving platform with automatic shard management - great! That means you can use ephemeral VMs because the cluster will automatically reflow data away from non-operational nodes, and redistribute when new nodes come up.

Of course, this takes time, and loading the indices to memory also takes more time. The tradeoff is that we can run at a fraction of the cost!!

Overall, this use case is well-served by spot instances. It is poorly served by preemptible instances. When using preemtible VMs, the data shuffle and index build/load requires ~15% of life just for reshuffle data. Spot is more like 4% in practice. Overall impact on query latency exceeds that level of improvement.

Extra info (e.g. existing slack convo link):

The SPOT provisioning_model is supported in Terraform 4.23 as a beta feature.

(Optional, Beta) Describe the type of preemptible VM. This field accepts the value STANDARD or SPOT. If the value is STANDARD, there will be no discount. If this is set to SPOT, preemptible should be true and auto_restart should be false.

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#provisioning_model

Slack link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant