[autoscaler] Avoid launching GPU nodes when the workload only has CPU tasks. #13776

ericl · 2021-01-29T01:31:47Z

This issue was reported by @robertnishihara dogfooding a numpy workload.

AmeerHajAli

Checkout the comment.

AmeerHajAli · 2021-01-29T01:46:55Z

python/ray/autoscaler/_private/resource_demand_scheduler.py

+
+    # Avoid launching GPU nodes if there aren't any GPU tasks at all. Note that
+    # if there *is* a GPU task, then CPU tasks can be scheduled as well.
+    if is_gpu_node and not any_gpu_task:


On issue here is that if all the available nodes are gpu nodes you will never scale up if you have CPU tasks only and instead you will print to the user the "The autoscaler could not find a node type to satisfy the ..." message which is bad.
Please fix that before merging this PR.

I thought about that, and think in this case this is the right behavior. The user should be adding some CPU nodes to their config, rather than Ray auto launching expensive GPU nodes.

wuisawesome

I don't agree with this policy. If there are no CPU-only nodes available, launching a node with a GPU seems reasonable. For example, an r5dn.16xlarge costs $5.20/hr while a g3.16xlarge costs $4.56/hr for the same number of CPUs, so it would actually be cheaper to get a GPU in this case.

I obviously cherry-picked instances here, but the point is that I don't think the price difference is extreme enough to warrant this policy.

If we do implement this, can we feature flag it? My example is at least one good use case where you'd want to turn this policy off.

ericl · 2021-01-29T02:36:22Z

Sure, I added a feature flag. However, I don't think this will be popular flag.

wuisawesome

I don't agree with this policy. If there are no CPU-only nodes available, launching a node with a GPU seems reasonable. For example, an r5dn.16xlarge costs $5.20/hr while a g3.16xlarge costs $4.56/hr for the same number of CPUs, so it would actually be cheaper to get a GPU in this case.

I obviously cherry-picked instances here, but the point is that I don't think the price difference is extreme enough to warrant this policy.

Still against this policy, but this seems like a reasonable way to implement it.

Done

… tasks. (ray-project#13776) * wip * avoid gpus * update * update

… has CPU tasks. (ray-project#13776)" This reverts commit 7ad571e.

ericl added 3 commits January 28, 2021 17:24

wip

7c0f6b6

avoid gpus

562ba2b

update

d200e61

ericl assigned wuisawesome and AmeerHajAli Jan 29, 2021

AmeerHajAli previously requested changes Jan 29, 2021

View reviewed changes

wuisawesome requested changes Jan 29, 2021

View reviewed changes

update

3a1900a

wuisawesome approved these changes Jan 29, 2021

View reviewed changes

ericl merged commit b20a38f into ray-project:master Jan 29, 2021

fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021

[autoscaler] Avoid launching GPU nodes when the workload only has CPU…

7ad571e

… tasks. (ray-project#13776) * wip * avoid gpus * update * update

fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021

Revert "[autoscaler] Avoid launching GPU nodes when the workload only…

59b3bce

… has CPU tasks. (ray-project#13776)" This reverts commit 7ad571e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoscaler] Avoid launching GPU nodes when the workload only has CPU tasks. #13776

[autoscaler] Avoid launching GPU nodes when the workload only has CPU tasks. #13776

ericl commented Jan 29, 2021

AmeerHajAli left a comment

AmeerHajAli Jan 29, 2021

ericl Jan 29, 2021

wuisawesome left a comment

ericl commented Jan 29, 2021

wuisawesome left a comment

[autoscaler] Avoid launching GPU nodes when the workload only has CPU tasks. #13776

[autoscaler] Avoid launching GPU nodes when the workload only has CPU tasks. #13776

Conversation

ericl commented Jan 29, 2021

AmeerHajAli left a comment

Choose a reason for hiding this comment

AmeerHajAli Jan 29, 2021

Choose a reason for hiding this comment

ericl Jan 29, 2021

Choose a reason for hiding this comment

wuisawesome left a comment

Choose a reason for hiding this comment

ericl commented Jan 29, 2021

wuisawesome left a comment

Choose a reason for hiding this comment