Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Tasks with duplicate args sometimes don't get scheduled #16556

Closed
2 tasks
stephanie-wang opened this issue Jun 19, 2021 · 0 comments · Fixed by #16365
Closed
2 tasks

[core] Tasks with duplicate args sometimes don't get scheduled #16556

stephanie-wang opened this issue Jun 19, 2021 · 0 comments · Fixed by #16365
Labels
bug Something that is supposed to be working; but isn't

Comments

@stephanie-wang
Copy link
Contributor

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS): 1.4 and earlier

The raylet overestimates the number of missing args for tasks that have duplicate args. This can lead to the task never being scheduled.

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

def test_many_args(ray_start_cluster):
    # This test ensures that a task will run where its task dependencies are
    # located, even when those objects are borrowed.
    cluster = ray_start_cluster
    object_size = int(1e6)

    # Disable worker caching so worker leases are not reused, and disable
    # inlining of return objects so return objects are always put into Plasma.
    for _ in range(4):
        cluster.add_node(
            num_cpus=1, object_store_memory=(4 * object_size * 25))
    ray.init(address=cluster.address)

    @ray.remote
    def f(i, *args):
        print(i)
        return

    @ray.remote
    def put():
        return np.zeros(object_size, dtype=np.uint8)

    xs = [put.remote() for _ in range(100)]
    ray.wait(xs, num_returns=len(xs), fetch_local=False)
    tasks = []
    for i in range(100):
        args = [np.random.choice(xs) for _ in range(25)]
        tasks.append(f.remote(i, *args))
    ray.get(tasks)
  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@stephanie-wang stephanie-wang added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) 1.4.1 and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jun 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant