-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Data] Estimate object store memory from in-flight tasks (#42504)
Ray Data's streaming executor launches as many as 50 tasks in a single scheduling step. If the executor doesn't account for the potential output of in-flight tasks, it launches too many tasks (since tasks don't immediately output data) and causes spilling. This PR fixes the issue by considering data buffered at the Ray Core level to computations of topology resource usage. --------- Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
- Loading branch information
1 parent
fae8d2f
commit 0c0ed96
Showing
9 changed files
with
109 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import time | ||
|
||
import numpy as np | ||
import pytest | ||
|
||
import ray | ||
from ray._private.internal_api import memory_summary | ||
from ray.data._internal.execution.backpressure_policy import ( | ||
ENABLED_BACKPRESSURE_POLICIES_CONFIG_KEY, | ||
StreamingOutputBackpressurePolicy, | ||
) | ||
|
||
|
||
def test_scheduler_accounts_for_in_flight_tasks(shutdown_only, restore_data_context): | ||
# The executor launches multiple tasks in each scheduling step. If it doesn't | ||
# account for the potential output of in flight tasks, it may launch too many tasks | ||
# and cause spilling. | ||
ctx = ray.init(object_store_memory=100 * 1024**2) | ||
|
||
ray.data.DataContext.get_current().use_runtime_metrics_scheduling = True | ||
ray.data.DataContext.get_current().set_config( | ||
ENABLED_BACKPRESSURE_POLICIES_CONFIG_KEY, [StreamingOutputBackpressurePolicy] | ||
) | ||
|
||
def f(batch): | ||
time.sleep(0.1) | ||
return {"data": np.zeros(24 * 1024**2, dtype=np.uint8)} | ||
|
||
# If the executor doesn't account for the potential output of in flight tasks, it | ||
# will launch all 8 tasks at once, producing 8 * 24MiB = 192MiB > 100MiB of data. | ||
ds = ray.data.range(8, parallelism=8).map_batches(f, batch_size=None) | ||
|
||
for _ in ds.iter_batches(batch_size=None, batch_format="pyarrow"): | ||
pass | ||
|
||
meminfo = memory_summary(ctx.address_info["address"], stats_only=True) | ||
assert "Spilled" not in meminfo, meminfo | ||
|
||
|
||
if __name__ == "__main__": | ||
import sys | ||
|
||
sys.exit(pytest.main(["-v", __file__])) |