Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Non Unit Instance fractional value fix #39293

Merged
merged 14 commits into from
Sep 26, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/ray-core/scheduling/resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,7 @@ The precision of the fractional resource requirement is 0.0001 so you should avo

Besides resource requirements, you can also specify an environment for a task or actor to run in,
which can include Python packages, local files, environment variables, and more---see :ref:`Runtime Environments <runtime-environments>` for details.
jonathan-anyscale marked this conversation as resolved.
Show resolved Hide resolved

.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move it to be above tip?

For any unit resource (Resources assigned by scheduler using ids such as GPU, TPU, Neuron Core) requirement greater than 1,
jonathan-anyscale marked this conversation as resolved.
Show resolved Hide resolved
it needs to be a whole number (e.g. ``num_gpus=1.5`` is invalid).
jonathan-anyscale marked this conversation as resolved.
Show resolved Hide resolved
8 changes: 6 additions & 2 deletions python/ray/_raylet.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -640,10 +640,14 @@ cdef int prepare_resources(
if value < 0:
raise ValueError("Resource quantities may not be negative.")
if value > 0:
unit_resources = f"{RayConfig.instance().predefined_unit_instance_resources()},\
{RayConfig.instance().custom_unit_instance_resources()}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets split by (,) and convert them to a set and then check. Otherwise, we may mismatch: e.g. we have a unit resource called Foo_Bar and you will think Foo is also unit resource.


if (value >= 1 and isinstance(value, float)
and not value.is_integer()):
and not value.is_integer() and str(key) in unit_resources):
raise ValueError(
"Resource quantities >1 must be whole numbers.")
"Unit instance resource (GPU, TPU, Neuron Core) quantities >1 must",
f" be whole numbers. {key} resource with value {value} is invalid.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Unit instance resource (GPU, TPU, Neuron Core) quantities >1 must",
f" be whole numbers. {key} resource with value {value} is invalid.")
f"{key} resource quantities >1 must",
f" be whole numbers. The specified quantity {value} is invalid.")

resource_map[0][key.encode("ascii")] = float(value)
return 0

Expand Down
4 changes: 4 additions & 0 deletions python/ray/includes/ray_config.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,7 @@ cdef extern from "ray/common/ray_config.h" nogil:
int64_t grpc_client_keepalive_timeout_ms() const

c_bool enable_autoscaler_v2() const

c_string predefined_unit_instance_resources() const

c_string custom_unit_instance_resources() const
23 changes: 15 additions & 8 deletions python/ray/tests/test_advanced_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def method(self):


def test_fractional_resources(shutdown_only):
ray.init(num_cpus=6, num_gpus=3, resources={"Custom": 1})
ray.init(num_cpus=6, num_gpus=3, resources={"Custom": 3, "TPU": 3})

@ray.remote(num_gpus=0.5)
class Foo1:
Expand All @@ -168,7 +168,7 @@ def method(self):
pass

# Create an actor that requires 0.7 of the custom resource.
f1 = Foo2._remote([], {}, resources={"Custom": 0.7})
f1 = Foo2._remote([], {}, resources={"Custom": 2.7})
ray.get(f1.method.remote())
# Make sure that we cannot create an actor that requires 0.7 of the
# custom resource. TODO(rkn): Re-enable this once ray.wait is
Expand All @@ -183,18 +183,25 @@ def method(self):

del f1, f3

# Make sure that we get exceptions if we submit tasks that require a
# fractional number of resources greater than 1.
# Non unit resources (e.g. CPU, ) allow fractional
# number of resources greather than 1.
@ray.remote(num_cpus=1.5, resources={"Custom": 2.5})
def test_frac_cpu():
return True

@ray.remote(num_cpus=1.5)
def test():
assert ray.get(test_frac_cpu.remote())

# Unit instance resources (GPU, TPU, neuron_core) throw exceptions
# for fractional number of resources greater than 1.
@ray.remote(num_gpus=1.5)
def test_frac_gpu():
pass

with pytest.raises(ValueError):
test.remote()
test_frac_gpu.remote()

with pytest.raises(ValueError):
Foo2._remote([], {}, resources={"Custom": 1.5})
Foo2._remote([], {}, resources={"TPU": 2.5})


def test_fractional_memory_round_down(shutdown_only):
Expand Down
Loading