Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] KubeRay Scalability Benchmarking #2069

Open
2 tasks done
andrewsykim opened this issue Apr 5, 2024 · 3 comments
Open
2 tasks done

[Feature] KubeRay Scalability Benchmarking #2069

andrewsykim opened this issue Apr 5, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@andrewsykim
Copy link
Contributor

andrewsykim commented Apr 5, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

In last week's KubeRay community meeting we discussed kicking off some work to benchmark KubeRay and Ray on different aspects of scalability.

The end result should be something like:

  1. Create a simple tool to create Kubernetes clusters, RayCluster and run some benchmarking tests
  2. Published benchmark results based on the tests run

As a bonus step, it would be great to setup periodic runs of scalability tests to catch possible regressions in performance.

As a starting point I would like to propose the following metrics to measure:

  • Max total Ray nodes in a single Kubernetes cluster (over X Ray clusters)
  • Time to scale up a Ray cluster to Xk nodes
  • Max RayJob resources (should be in the order of hundreds to a thousand depending on the size of the job)
  • Some latency / QPS benchmarks for inference with RayServe

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@andrewsykim andrewsykim added enhancement New feature or request triage labels Apr 5, 2024
@andrewsykim
Copy link
Contributor Author

andrewsykim commented Apr 5, 2024

For reference some work has been done in this area already but it primarily focuses on memory scalability https://docs.ray.io/en/latest/cluster/kubernetes/benchmarks/memory-scalability-benchmark.html#kuberay-mem-scalability

@kevin85421
Copy link
Member

cc @morhidi

@andrewsykim
Copy link
Contributor Author

This week is Google Cloud NEXT, but @kevin85421, @morhidi and I plan to meet some time next week to kick off this work.

If you have any ideas or feedback on what areas of scalability you would like us to test, please leave a note in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants