Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Ray can't reconstruct inputs if Python garbage collects input references #46282

Closed
bveeramani opened this issue Jun 26, 2024 · 0 comments · Fixed by #46191
Closed

[Data] Ray can't reconstruct inputs if Python garbage collects input references #46282

bveeramani opened this issue Jun 26, 2024 · 0 comments · Fixed by #46191
Assignees
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P0 Issues that should be fixed in short order

Comments

@bveeramani
Copy link
Member

What happened + What you expected to happen

I ran a batch inference job on spot instances. When GCP interrupted some instances, I expected Ray Data to recover, but instead my program errored with a message saying that the input objects are missing.

The issue might be caused by us removing references to input objects in the InputDataBuffer physical operator:

def _get_next_inner(self) -> RefBundle:
return self._input_data.pop(0)

When we pop the object reference, the reference goes out scope, and Ray might garbage collect the object. So, when the object is later needed to reconstruct an output, Ray isn't able to find the input object.

Versions / Dependencies

bd9dc16

Reproduction script

Difficult to reproduce.

Issue Severity

High: It blocks me from completing my task.

@bveeramani bveeramani added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order data Ray Data-related issues labels Jun 26, 2024
@bveeramani bveeramani self-assigned this Jun 26, 2024
bveeramani added a commit that referenced this issue Jun 26, 2024
See #46282.

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P0 Issues that should be fixed in short order
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant