Skip to content

[pull] master from ray-project:master#826

Merged
pull[bot] merged 7 commits intogarymm:masterfrom
ray-project:master
Mar 14, 2026
Merged

[pull] master from ray-project:master#826
pull[bot] merged 7 commits intogarymm:masterfrom
ray-project:master

Conversation

@pull
Copy link

@pull pull bot commented Mar 14, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Sparks0219 and others added 7 commits March 13, 2026 14:14
…61663)

It looks like we changed some log lines that were used to detect when
memory pressure monitor killed a worker:
#61210
Causing the memory pressure test to now consistently fail since the test
conditions waited for these lines to be generated. Updating the memory
pressure test to now wait on these new log lines being generated

Signed-off-by: Joshua Lee <joshlee@anyscale.com>
…ests (#61668)

java_test targets do not produce a _deploy.jar in Bazel 7+. Add a
companion
java_binary (all_tests_bin) that produces all_tests_bin_deploy.jar in
both
Bazel 6 and Bazel 7, and update all references accordingly.

The java_binary includes //cpp:counter.so and //cpp:plus.so as resources
so
that CrossLanguageInvocationTest.getResourceAsStream("/cpp/counter.so")
finds
them in the deploy jar classpath.

Signed-off-by: andrew <andrew@anyscale.com>
## Description

1. Inlining `ActorPoolResizingPolicy`
2. Rebasing `_ActorPool` to compute utilization based on all actors, not
just running
3. Allow autoscaler to scale up while pending actors are still starting
up
4. Updated tests


## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
`test_data_parallel_trainer::test_config_accelerator_type` has been
timing out in CI ([Buildkite
#61885](https://buildkite.com/ray-project/premerge/builds/61885#019ce2b6-7daa-4dfb-932c-d687cc33edac)).
This PR deflakes the test by replacing the expensive 6-node
heterogeneous cluster with a single-node `ray.init` cluster and reducing
the parameter space from 6 cases to 2. This cuts runtime significantly
while preserving the core coverage of the `accelerator_type` scheduling
constraint.

Signed-off-by: JasonLi1909 <jasli1909@gmail.com>
… store budget with outputs (#61605)" (#61729)

Reverts #61605, since this PR which more strictly caps object store
memory budget per operator caused regressions in the batch inference
benchmark `image_classification_fixed_size` and the training ingest
`preserve_order=True` benchmarks.
…#61374)

## Description
This PR optimizes the amount of calls to `_try_schedule_one` which are
causing the autoscaler to hang. It reduces the time complexity of trying
to fit resource requests on in-flight nodes by grouping requests by
their shape. Currently, the v2 scheduler evaluates every individual
request against every node, the time complexity is approximately
O(N^2*M).

By using `SerializeToString(deterministic=True)` to generate a
deterministic hash, we cache infeasible request shapes per node. If a
shape fails to fit on a given node, the scheduler now skips the
expensive `_try_schedule_one` check for all subsequent identical
requests on that node.

This PR includes a unit test in `test_scheduler.py` to verify the
caching logic correctly short-circuits redundant evaluations, a manual
test is included in the additional info.

## Related issues
[#3794](ray-project/kuberay#3794)

## Additional information
Can verify the optimization by running the below test on a RayCluster
with Autoscaler V2 enabled:
```
import ray
import time
import os
import logging
logging.getLogger("ray").setLevel(logging.DEBUG)
@ray.remote
def ten_minute_task(task_id):
    start = time.time()
    while time.time() - start < 300:
        _ = sum([i * i for i in range(10000)])
        time.sleep(0.1) 
    return task_id

def main():
    tasks = []
    for i in range(4000):
        task = ten_minute_task.remote(i)
        tasks.append(task)
    results = ray.get(tasks)

if __name__ == "__main__":
    main()
```

---------

Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
#61731)

When a deployment starts up and replicas are scheduled but not yet
RUNNING (`current_num_replicas=0`), the autoscaling policy runs with
`total_num_requests=0`. The cold start fast path returns `None` (no
traffic), so the core policy returns `target_num_replicas` and it flows
into `_apply_scaling_factors`.

The scaling formula is: `ceil(current + factor * (desired - current))`

When `current=0`, this becomes `ceil(factor * desired)`, which amplifies
the entire target as if it were growth. Combined with the delay bypass
for `current==0`, this compounds every tick:

| Tick | target\_in | formula | target\_out |
|------|-----------|---------|------------|
| 0 | 2 | `ceil(2.0 × 2)` | **4** |
| 1 | 4 | `ceil(2.0 × 4)` | **8** |
| 2 | 8 | `ceil(2.0 × 8) → 16, clamped` | **10 (max)** |

In 3 ticks, with zero traffic, a `min_replicas=2, max_replicas=10,
upscaling_factor=2.0` deployment scales to `max_replicas`.

This was introduced in #60851 which removed the cold start fallback
(`return ctx.target_num_replicas` when `current==0` and no traffic) so
that custom policies like `AsyncInferenceAutoscalingPolicy` could detect
queue work. That change was correct for custom policies but exposed the
default policy to the amplification loop.

## Fix

Skip scaling factor amplification when `current_num_replicas == 0` in
`_apply_scaling_factors`. Scaling factors control the *rate of change
from a baseline* — when there is no baseline, amplifying the full target
as delta is incorrect. The cold start fast path already handles the
`current==0` with-traffic case separately (applying `upscaling_factor`
once), so this is consistent.

This preserves the async inference scale-from-zero behavior: custom
policies still run, return their desired value (e.g. `1` for queue work,
`0` for idle), and the delay bypass lets legitimate scale-ups through
immediately.

Signed-off-by: abrar <abrar@anyscale.com>
@pull pull bot locked and limited conversation to collaborators Mar 14, 2026
@pull pull bot added the ⤵️ pull label Mar 14, 2026
@pull pull bot merged commit 495220a into garymm:master Mar 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants