[Core] It is not allowed to specify both num_cpus and num_gpus for map tasks #33908

v4if · 2023-03-30T03:42:08Z

What happened + What you expected to happen

It is not allowed to specify both num_cpus and num_gpus for map tasks. When only num_gpus is specified, num_cpus seems to be specified as 1 by default, actor pending due to insufficient cpu resources. However, gpu computing is often the performance bottleneck of the system. How to increase the concurrency of actors when gpu resources are still available?

ray status

 {'CPU': 1.0, 'GPU': 0.01}: 4+ pending tasks/actors

run log

Resource usage vs limits: 16.0/16.0 CPU, 0.2/1.0 GPU, 0.0 MiB/13.49 GiB object_store_memory 0:   0%|                                    | 0/1 [14:11<?, ?it/s]
ReadRange: 16 active, 8598 queued 1:  14%|██████████▊                                                                   | 1386/10000 [14:11<01:06, 130.05it/s]
MapBatches(ModelPredict): 30 active, 0 queued, 16 actors (4 pending) [0 locality hits, 1386 misses] 2:  14%|█▍         | 1356/10000 [14:25<1:08:53,  2.09it/s]
output: 0 queued 3:  14%|████████████▋                                                                                 | 1356/10000 [14:25<1:08:56,  2.09it/s]

Versions / Dependencies

ray, version 3.0.0.dev0

cluster_resources

{'memory': 256000000000.0, 'node:172.18.0.196': 1.0, 'object_store_memory': 57921323827.0, 'GPU': 1.0, 'accelerator_type:T4': 1.0, 'node:172.16.1.16': 1.0, 'CPU': 16.0}

Reproduction script

import ray
import time


class ModelPredict:
    def __call__(self, df):
        time.sleep(10)
        return df


ds = ray.data.range_table(10000, parallelism=10000)
ds = ds.map_batches(
    ModelPredict,
    # num_cpus=0.5,
    num_gpus=0.01,
    compute="actors",
    batch_size=1,
)
for batch in ds.iterator().iter_batches(batch_size=1):
    ...

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

clarng · 2023-03-31T20:09:47Z

This seems to be data, since it is using dataset to specify resources, which it uses Ray core internally

choiikkyu · 2023-04-28T09:31:10Z

same issue with me. Did you solve it?

msminhas93 · 2023-10-25T14:51:52Z

Any update on this @hora-anyscale @clarng?

hora-anyscale · 2023-10-26T06:23:27Z

cc: @xieus

sdcope3 · 2024-01-12T17:18:18Z

Any progress on this issue? Is the implication that if num_gpus is defined, that the associated task is constrained to 1 CPU?

seastar105 · 2024-04-30T09:29:07Z

@raulchen Any progress on this issue? or any alternative method for fractional gpu and several cpu worker mapping?

danickzhu · 2024-07-05T09:06:41Z

@raulchen the GPU utilization is bottlenecked by the num_cpus (currently is 1) for the mapper task, do you have any suggestion?

Superskyyy · 2024-07-05T14:33:03Z

This is intentional behavior to avoid deadlocks I believe, but there could be workarounds. I'm planning to look into it in July.

pzdkn · 2024-08-13T13:06:47Z

Is it possible to use placement_groups here?

I tried:

predictions = ds_val.map_batches(predictor_cls,
scheduling_strategy=PlacementGroupSchedulingStrategy(ray.util.placement_group([{"CPU": 1}, {"GPU": 1}]  * num_workers, strategy="PACK")  , placement_group_capture_child_tasks=True))

It seems however, that the resources are not available to the actor.

v4if added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 30, 2023

hora-anyscale added the core Issues that should be addressed in Ray Core label Mar 31, 2023

hora-anyscale changed the title ~~[Datasets] It is not allowed to specify both num_cpus and num_gpus for map tasks~~ [Core] It is not allowed to specify both num_cpus and num_gpus for map tasks Mar 31, 2023

hora-anyscale added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 31, 2023

clarng added data Ray Data-related issues and removed core Issues that should be addressed in Ray Core labels Mar 31, 2023

anyscalesam added enhancement Request for new feature and/or capability and removed bug Something that is supposed to be working; but isn't labels Nov 8, 2023

anyscalesam assigned raulchen Nov 8, 2023

ronyw7 mentioned this issue Apr 23, 2024

Video Inference Pipeline with Map Operator franklsf95/ray-data-eval#45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] It is not allowed to specify both num_cpus and num_gpus for map tasks #33908

[Core] It is not allowed to specify both num_cpus and num_gpus for map tasks #33908

v4if commented Mar 30, 2023 •

edited

Loading

clarng commented Mar 31, 2023

choiikkyu commented Apr 28, 2023 •

edited

Loading

msminhas93 commented Oct 25, 2023

hora-anyscale commented Oct 26, 2023

sdcope3 commented Jan 12, 2024

seastar105 commented Apr 30, 2024

danickzhu commented Jul 5, 2024

Superskyyy commented Jul 5, 2024 •

edited

Loading

pzdkn commented Aug 13, 2024

[Core] It is not allowed to specify both num_cpus and num_gpus for map tasks #33908

[Core] It is not allowed to specify both num_cpus and num_gpus for map tasks #33908

Comments

v4if commented Mar 30, 2023 • edited Loading

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

clarng commented Mar 31, 2023

choiikkyu commented Apr 28, 2023 • edited Loading

msminhas93 commented Oct 25, 2023

hora-anyscale commented Oct 26, 2023

sdcope3 commented Jan 12, 2024

seastar105 commented Apr 30, 2024

danickzhu commented Jul 5, 2024

Superskyyy commented Jul 5, 2024 • edited Loading

pzdkn commented Aug 13, 2024

v4if commented Mar 30, 2023 •

edited

Loading

choiikkyu commented Apr 28, 2023 •

edited

Loading

Superskyyy commented Jul 5, 2024 •

edited

Loading