Does priority expander do fallbacks? #2075

TarekAS · 2019-05-30T10:12:19Z

If the ASG with highest priority fails to launch instances for some reason, does it fallback to lower priority ASGs?

My use-case is falling back from Spot ASGs to OnDemand ASGs in-case the there are no Spot instances available.

If not, then I would suggest something like a timeout on each priority.

aleksandra-malinowska · 2019-05-30T11:36:45Z

Jeffwan · 2019-05-30T22:52:07Z

If spot request can not fulfilled over max-node-provision-time (by default 15min), CA should stop considering this node group in simulations and will attempt to scale up a different group. I have not tried priority plugins since it's merged recently. I can have a test for this case since many users like OnDemand fallback case and this is one way to do that.

As you said corresponding node group should be removed from priority list with a timeout, because spot capacity may be better after a while.

Will come back to you later.

Jeffwan · 2019-06-07T21:50:09Z

@TarekAS I notice spot may have some issues if request can not be filled. Community has a PR to track this issue. #2008

Depends on if you use LauchConfiguration or LaunchTemplate, use mixed Instance policy as an example, If you don't set fix price (let ASG to manage price), you may not see this issue.

Seems timer doesn't start if require can not be fulfilled.

TarekAS · 2019-06-10T09:52:36Z

We have a very simple use-case. For each AZ, we have 1 Spot and 1 OnDemand ASGs that are identical. How can we effectively prefer scaling Spot ASGs over OnDemand?

We do not want to use MixedInstancesPolicy due to the following concerns:

It requires at least 2 instance types, but CA requires both types to have identical CPU/mem (which are rare).
We think that using a specific percentage of OnDemand instances is arbitrary and wasteful. Does specifying 0% OnDemand allow for fallback in case Spot requests cannot be fulfilled?
Some workloads actually require OnDemand instances, so the OnDemand ASG is not just for fallbacks. We cannot rely on a % chance to find an OnDemand instance.

How are other people doing this? I've done some research and all I could find was using a Spot Rescheduler to mitigate this issue. I hope priority expander can help solve it.

In the meantime, there should be a basic guide on how to set this up.

Thanks!

Jeffwan · 2019-06-11T22:32:35Z

We think that using a specific percentage of OnDemand instances is arbitrary and wasteful. Does specifying 0% OnDemand allow for fallback in case Spot requests cannot be fulfilled?

I confirmed with EC2 ASG team, no, fallback is not supported in MixedInstancePolicy.

Some workloads actually require OnDemand instances, so the OnDemand ASG is not just for fallbacks. We cannot rely on a % chance to find an OnDemand instance.

To me, I think you may better to use different ASG for these kind of jobs and use node affinity to schedule pods on that ASG?

How are other people doing this? I've done some research and all I could find was using a Spot Rescheduler to mitigate this issue. I hope priority expander can help solve it.
I will check it out. Pretty full recently and if you have ideas, please contribute or discuss with me. https://spotinst.com I think this company has this feature and I think that's achievable in CA. Need to come up with a solution to meet most of the case.

Priority expander can fallback to OnDemand, but there's no logic to fallback to Spot once Spot instance becomes cheaper.

In the meantime, there should be a basic guide on how to set this up.

Thanks. If we think there's limitation on Spot, we can reopen a closed PR and make ASG Spot available there. (But this solution won't guarantee lowest price, price model is kind of hacky)
Guidance will be provided only if these problem are resolved.

TarekAS · 2019-06-12T12:02:11Z

To me, I think you may better to use different ASG for these kind of jobs and use node affinity to schedule pods on that ASG?

Definitely, that's what we're doing right now for "OnDemand" workloads. Therefore, even with MixedInstancePolicy, one would still need to create a dedicated ASG for on-demand workloads to guarantee availability. We just keep things simpler by creating separate ASGs for Spot and OnDemand.

If only CA supported the preferredDuringSchedulingIgnoredDuringExecution node affinity, it would be able to scale spot instances while supporting fallback.

Priority expander can fallback to OnDemand, but there's no logic to fallback to Spot once Spot instance becomes cheaper.

We use Launch Templates (automatic spot pricing). We're more concerned Spot instances becoming completely unavailable rather than expensive (it actually happened to us for 30 minutes). In case Spots become available again, would the priority expander be able to switch back to the higher-priority Spot ASGs?

Jeffwan · 2019-06-12T19:35:39Z

We use Launch Templates (automatic spot pricing). We're more concerned Spot instances becoming completely unavailable rather than expensive (it actually happened to us for 30 minutes). In case Spots become available again, would the priority expander be able to switch back to the higher-priority Spot ASGs?

No, priority expander only works when there's a decision to made to choose right node groups. It doesn't actively change existing nodes.

fejta-bot · 2019-09-10T20:26:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-10-10T21:11:18Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Jeffwan · 2019-10-11T01:03:13Z

If user looks for fall back options, here's one example I think can be used to move workloads back to Spot once it's available.

https://github.com/pusher/k8s-spot-rescheduler

fejta-bot · 2019-11-10T01:57:53Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-11-10T01:58:00Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aleksandra-malinowska added area/provider/aws Issues or PRs related to aws provider area/cluster-autoscaler labels May 30, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 10, 2019

k8s-ci-robot closed this as completed Nov 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does priority expander do fallbacks? #2075

Does priority expander do fallbacks? #2075

TarekAS commented May 30, 2019

aleksandra-malinowska commented May 30, 2019

Jeffwan commented May 30, 2019

Jeffwan commented Jun 7, 2019

TarekAS commented Jun 10, 2019

Jeffwan commented Jun 11, 2019 •

edited

Loading

TarekAS commented Jun 12, 2019 •

edited

Loading

Jeffwan commented Jun 12, 2019

fejta-bot commented Sep 10, 2019

fejta-bot commented Oct 10, 2019

Jeffwan commented Oct 11, 2019

fejta-bot commented Nov 10, 2019

k8s-ci-robot commented Nov 10, 2019

Does priority expander do fallbacks? #2075

Does priority expander do fallbacks? #2075

Comments

TarekAS commented May 30, 2019

aleksandra-malinowska commented May 30, 2019

Jeffwan commented May 30, 2019

Jeffwan commented Jun 7, 2019

TarekAS commented Jun 10, 2019

Jeffwan commented Jun 11, 2019 • edited Loading

TarekAS commented Jun 12, 2019 • edited Loading

Jeffwan commented Jun 12, 2019

fejta-bot commented Sep 10, 2019

fejta-bot commented Oct 10, 2019

Jeffwan commented Oct 11, 2019

fejta-bot commented Nov 10, 2019

k8s-ci-robot commented Nov 10, 2019

Jeffwan commented Jun 11, 2019 •

edited

Loading

TarekAS commented Jun 12, 2019 •

edited

Loading