[autoscaler] Request resources behaves inconsistently #12443

PidgeyBE · 2020-11-26T10:42:11Z

What is the problem?

Ray nightly

Reproduction (REQUIRED)

set up k8s autoscaling cluster
Run this (I did it in ipython on ray head):

import os
import ray
from ray.autoscaler.sdk import request_resources

ray.init(address="auto")

@ray.remote(num_cpus=0.2)
class ActorA:
    def __init__(self):
        pass

a = ActorA.remote()

request_resources(bundles=[{"CPU": 0.1}, {"CPU": 0.1}])

-> Output of autoscaling monitor is

2020-11-26 10:37:16,419 INFO resource_demand_scheduler.py:193 -- Resource demands: [{'CPU': 0.2}]
...
2020-11-26 10:37:16,425 INFO autoscaler.py:612 -- StandardAutoscaler: resource_requests=[{'CPU': 0.1}, {'CPU': 0.1}]
...
2020-11-26 10:37:26,588 INFO resource_demand_scheduler.py:193 -- Resource demands: [{'CPU': 0.2}, {'CPU': 0.1}]

-> The expected output is [{'CPU': 0.2}, {'CPU': 0.1}, {'CPU': 0.1}]
In other tests I did, the requested resources where totally ignored.

If I do ray.kill(a) now, the output shows:
Resource demands: [{'CPU': 0.2}, {'CPU': 0.1}, {'CPU': 0.1}]
So the missing request shows up, but the the request related to the actor is not cleaned up ([autoscaler] Actor resource demands are not cleared after actor is scheduled #12441)

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

The text was updated successfully, but these errors were encountered:

AmeerHajAli · 2020-11-27T09:59:42Z

Hi @PidgeyBE,
when you call request_resources(bundles=[{"CPU": 0.1}, {"CPU": 0.1}]) you get these resources "immediately", but they do not add on top of what you already have.
So if you resource demands are {"CPU": 0.2} the available resources become:
[{"CPU": 0.2}, {"CPU": 0.1}] the [{"CPU": 0.1}, {"CPU": 0.1}] becomes available immediately but the remaining {"CPU": 0.1} might take more time to become available.

Checkout how request_resources() works here.
Does it make sense?

PidgeyBE · 2020-11-27T14:14:52Z

So if I do:

ray.remote(num_cpus=0.2)(ActorA).remote()
request_resources(bundles=[{"CPU": 0.1}, {"CPU": 0.1}])

the total Resource Demands become [{'CPU': 0.2}, {'CPU': 0.1}], because one requested bundle {'CPU': 0.1} fits into the currently deployed task with{'CPU': 0.2}.
But if I do

request_resources(bundles=[{"CPU": 0.2}, {"CPU": 0.2}])
ray.remote(num_cpus=0.1)(ActorA).remote()

the total Resource Demand becomes [{'CPU': 0.1}, {'CPU': 0.2}, {'CPU': 0.2}], because non of the requested bundles fits into one of the already deployed tasks?

So the Resource Demands are basically the requested resources, minus the bundles that are smaller or equal to running tasks?

ericl · 2020-11-28T00:02:18Z

@PidgeyBE that's right. Note that you're seeing here an artifact of the implementation of request_resources() (the internal bin packing algorithm). If only {"CPU": 1} were used instead of different shapes, this artifact were disappear. We could in the future improve the algorithm to fix this edge case.

The intended use of request_resources() is as a hint to scale the cluster to accommodate the requests, ignoring any existing utilization of the cluster; the result cluster size might be slightly larger than necessary due to sub-optimal packing.

PidgeyBE added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 26, 2020

ericl mentioned this issue Nov 27, 2020

[autoscaler] Try to improve the request_resources() documentation #12465

Merged

ericl closed this as completed in #12465 Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoscaler] Request resources behaves inconsistently #12443

[autoscaler] Request resources behaves inconsistently #12443

PidgeyBE commented Nov 26, 2020 •

edited

Loading

AmeerHajAli commented Nov 27, 2020 •

edited

Loading

PidgeyBE commented Nov 27, 2020

ericl commented Nov 28, 2020 •

edited

Loading

[autoscaler] Request resources behaves inconsistently #12443

[autoscaler] Request resources behaves inconsistently #12443

Comments

PidgeyBE commented Nov 26, 2020 • edited Loading

What is the problem?

Reproduction (REQUIRED)

AmeerHajAli commented Nov 27, 2020 • edited Loading

PidgeyBE commented Nov 27, 2020

ericl commented Nov 28, 2020 • edited Loading

PidgeyBE commented Nov 26, 2020 •

edited

Loading

AmeerHajAli commented Nov 27, 2020 •

edited

Loading

ericl commented Nov 28, 2020 •

edited

Loading