Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Ray fails when using map_groups function and Gpus as Actors #42340

Closed
shivraj95 opened this issue Jan 11, 2024 · 3 comments · Fixed by #45305
Closed

[Data] Ray fails when using map_groups function and Gpus as Actors #42340

shivraj95 opened this issue Jan 11, 2024 · 3 comments · Fixed by #45305
Assignees
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks

Comments

@shivraj95
Copy link

shivraj95 commented Jan 11, 2024

Description

When using map_groups and setting the arg num_gpus > 0, Ray will raise an exception:

ValueError(
"batch_size must be provided to map_batches when requesting GPUs. "
"The optimal batch size depends on the model, data, and GPU used. "
"It is recommended to use the largest batch size that doesn't result "
"in your GPU device running out of memory. You can view the GPU memory "
"usage via the Ray dashboard."
)

from this code block.

Versions / Dependencies

This bug will occur for all Python and Ray versions as of post date.

Reproduction script

This must be run on a cluster with GPU workers.

ds = ray.data.from_items([ 
    {"group_id": 1, "value": 1},
    {"group_id": 1, "value": 2},
    {"group_id": 2, "value": 3},
    {"group_id": 2, "value": 4}])

ds.groupby('group_id').map_groups(lambda x: {"result": np.array([g["value"][0]])}, num_gpus=1)

Issue Severity

High: It blocks me from completing my task.

@shivraj95 shivraj95 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 11, 2024
@bveeramani bveeramani added P1 Issue that should be fixed within a few weeks data Ray Data-related issues ray 2.10 and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 12, 2024
@bveeramani bveeramani self-assigned this Jan 12, 2024
@bveeramani
Copy link
Member

Ah, yeah. I was able to reproduce this.

Shouldn't be too hard to fix.

@bveeramani bveeramani added P0 Issues that should be fixed in short order and removed P1 Issue that should be fixed within a few weeks labels Jan 16, 2024
@shivraj95
Copy link
Author

Great, thanks! Look forward to the fix :)

@bveeramani bveeramani removed their assignment Jan 19, 2024
@anyscalesam
Copy link
Collaborator

@bveeramani ETA on fix? cc @c21

@bveeramani bveeramani added P1 Issue that should be fixed within a few weeks and removed P0 Issues that should be fixed in short order ray 2.10 labels Feb 6, 2024
@bveeramani bveeramani removed their assignment Feb 6, 2024
@bveeramani bveeramani self-assigned this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants