[DNM] macos py14 wheel 2.55.1#62777
Conversation
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request updates the Ray version to 2.55.1 across the repository and introduces an experimental centralized capacity queue request router for Ray Serve. It also refactors the Ray Serve autoscaling delay logic to use wall-clock timestamps instead of iteration counts, ensuring consistent behavior regardless of control loop duration. Additionally, exponential backoff is implemented for ACTOR_UNAVAILABLE task errors, and potential overflow issues in the backoff utility are fixed. CI and Docker changes address OpenSSL version mismatches and support one-off builds. Review feedback identifies performance concerns in the experimental router regarding expensive GCS queries and O(N) list operations, and recommends avoiding non-deterministic apt-get upgrades in Dockerfiles.
I am having trouble creating individual review comments. Click here to see my feedback.
docker/base-slim/Dockerfile (41)
Using apt-get upgrade in a Dockerfile is generally discouraged as it makes the build non-deterministic and can lead to unexpected breakages when upstream packages are updated. It is better to pin specific package versions or rely on updated base images to ensure reproducible builds.
docker/base-slim/Dockerfile (151)
Using apt-get upgrade in a Dockerfile is generally discouraged as it makes the build non-deterministic. It is better to pin specific package versions or rely on updated base images.
python/ray/serve/experimental/capacity_queue.py (114)
If token_ttl_s is set to a very small value, this will result in a very frequent background task execution, potentially impacting performance. Consider enforcing a minimum interval for the TTL reaper.
python/ray/serve/experimental/capacity_queue.py (247)
Reconstructing the _waiters deque using a generator expression is an _fulfill_waiters already checks if future.done(): continue, you might consider relying on lazy cleanup or using a data structure that supports more efficient removal to avoid blocking the actor.
python/ray/serve/experimental/capacity_queue_router.py (100)
ray.util.list_named_actors(all_namespaces=True) is an expensive operation, especially in large clusters, as it queries the GCS for all named actors across all namespaces. Since the CapacityQueue actor name follows a predictable pattern, it would be more efficient to construct the expected name and use ray.get_actor directly. If the exact name is not known, consider a more efficient discovery mechanism.
No description provided.