[REP] Ray on spark autoscaling #43

WeichenXu123 · 2023-08-21T08:04:29Z

REP: Ray on spark autoscaling

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

reps/2023-8-18-ray-on-spark-autoscaling.md

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

reps/2023-8-18-ray-on-spark-autoscaling.md

ericl

@jjyao --- LGTM, no high level concerns.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

jjyao

Will continue later.

reps/2023-8-18-ray-on-spark-autoscaling.md

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

reps/2023-8-18-ray-on-spark-autoscaling.md

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

reps/2023-8-18-ray-on-spark-autoscaling.md

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

reps/2023-8-18-ray-on-spark-autoscaling.md

scv119 · 2023-10-02T20:39:21Z

How do we ensure head node's availability? does spark terminate worker node?

WeichenXu123 · 2023-10-03T09:58:48Z

How do we ensure head node's availability?

We don't. This is out of autoscaling scope, but it is scope of head node HA.
In current implementation, if Ray head node dies unexpectedly, then Ray on spark cluster dies and all Ray worker nodes will exit.

does spark terminate worker node?

Yes. We launch / terminate Ray worker node by starting / canceling spark job. Internally, it uses a script to start Ray worker node as subprocess and the script will monitoring parent process died event, if its parent process dies, then it kills ray worker node process and all subprocesses of ray worker node

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

rickyyx

looks pretty good to me. Just some questions for my own learning.

rickyyx · 2023-10-04T02:03:21Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+Integrate autoscaling feature into existing `ray.util.spark.cluster_init.setup_ray_cluster` API,
+The following new arguments are added:
+
+ - autoscale: bool (default False)


Do we ever would want to start an autoscaling cluster with some non-zero worker counts?

In my test, launching a Ray worker node is very fast, so I think starting with zero worker node should fulfill most use-cases. It is also the have the optimal resource utilization. If user requires min worker node number > 0, we can support it in future.

rickyyx · 2023-10-04T02:05:50Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+task to hold this Ray worker node, and we set a unique spark job group ID to
+this spark job. When we need to terminate this Ray worker node, we cancel
+the corresponding spark job by canceling corresponding spark job group, so that
+the spark job and its spark task are killed, then it triggers the Ray worker node termination.


when a ray worker node is terminated - how does that work? Does it kill the raylet? or through calling things like ray stop?

See #43 (comment)

You can see this script: https://github.com/WeichenXu123/ray/blob/autoscale-prototyping/python/ray/util/spark/start_ray_node.py

We launch Ray worker node by:

launch a spark job with only one spark task

the spark launches the ray worker node by executing start_ray_node.py script

when we need to kill ray worker node, we kill the spark job, then the spark task is killed, then in start_ray_node.py script, it has a check_parent_alive thread, once it detects parent process (i.e. spark task process) dies, then it triggers killing ray worker routine (including kill all processes related to ray worker nodes, and then clean temp directory if this is the last killed ray worker node on this spark worker node, etc.)

Gotcha, thanks!

rickyyx · 2023-10-04T02:07:37Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+we have to have one spark job for each Ray worker node.
+
+One thing critical is that spark node provider runs in autoscaler process that is different process
+than the one that executes "setup_ray_cluster" API. User calls "setup_ray_cluster" in


is it possible to run this spark application in the autoscaler node provider process?

For databricks user, we can't,

because when we create a databricks notebook, once the databricks notebook is attached to a spark cluster, then spark session is created in the notebook REPL, and user have to call setup_ray_cluster API in the notebook REPL, and Ray autoscaler have to reuse the existing spark session.

rickyyx · 2023-10-04T02:12:44Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+
+#### We can use `ray up` command to set up a ray cluster with autoscaling, why we don't call `ray up` command in ray on spark autoscaling implementation ?
+
+In ray on spark, we only provides a python API `setup_ray_cluster` and it does not have a CLI. So in `setup_ray_cluster` implementation, we need to generate autoscale config YAML file according to `setup_ray_cluster` argument values, and then launch the ray head node with "--autoscaling-config" option. In this way ray on spark code can manage ray head node process and ray worker nodes easier.


So with these 2 constraints, it is, however, still possible to support ray up with some sort of config checks and extra works at the config generation routine in ray up right?

Is there gonna be a long term plan to eventually support ray up cli?

Yes we can consider migrating code to use ray up in future, if there are strong reasons and benefits.

rickyyx · 2023-10-04T02:14:15Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+
+#### How to make `NodeProvider` backend support multiple Ray worker nodes running on the same virtual machine ?
+
+By default, `NodeProvider` implementation implement `internal_ip` and `external_ip` methods and convert `node_id` to IP, and different node must have different IP address,


yeah, this is something we are factoring away. Likely in ray 2.9 / 2.10 by end of this year.

The changed semantic will be some instance id that could be defined by the node provider.

rickyyx · 2023-10-04T02:15:49Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+
+#### Ray autoscaler supports setting multiple Ray worker groups, each Ray worker group has its individual CPU / GPU / memory resources configuration, and its own minumum / maximum worker number setting for autoscaling. Shall we support this feature in Ray on spark autoscaling?
+
+Current use-cases only require all Ray worker nodes having the same shape,


curious what's the extra work or limitation like to support multiple node types?

We can support it, just adding some code in ray on spark code (the code generating autoscaler config file to generate multiple worker groups)
but we haven't received related requests from our databricks customers, so we don't implement the feature for now.

rickyyx · 2023-10-04T02:16:24Z

reps/2023-8-18-ray-on-spark-autoscaling.md

+
+#### What's the default Ray on spark minimum worker number we should use ?
+
+I propose to set it to zero. Ray worker node launching is very quick, and setting it to


curious how quick will this be?

In my test on databricks platform, it just cost several seconds to start a Ray worker node.

Gotcha - i guess it's because it's just starting another process + some rpcs calls on the same node?

zhe-thoughts · 2023-10-06T15:38:27Z

Merging since the vote has passed. Thanks @WeichenXu123 for the contribution

WeichenXu123 added 3 commits August 21, 2023 13:09

init

8f36c30

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

f7ba465

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

6cdbd7d

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

rkooo567 assigned jjyao Aug 23, 2023

update

3e28e7e

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 commented Aug 29, 2023

View reviewed changes

reps/2023-8-18-ray-on-spark-autoscaling.md Show resolved Hide resolved

jjyao assigned ericl Aug 29, 2023

update

9529796

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

ericl reviewed Aug 29, 2023

View reviewed changes

reps/2023-8-18-ray-on-spark-autoscaling.md Outdated Show resolved Hide resolved

ericl reviewed Aug 29, 2023

View reviewed changes

update

42e516c

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 mentioned this pull request Sep 18, 2023

[spark] ray on spark autoscaling ray-project/ray#38215

Merged

8 tasks

jjyao reviewed Sep 22, 2023

View reviewed changes

address-comments

00f4f37

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 requested a review from jjyao September 23, 2023 11:24

jjyao reviewed Sep 29, 2023

View reviewed changes

update

9c06668

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

jjyao reviewed Oct 2, 2023

View reviewed changes

reps/2023-8-18-ray-on-spark-autoscaling.md Outdated Show resolved Hide resolved

WeichenXu123 added 2 commits October 2, 2023 13:59

address comments

29dab7a

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

address comments

6e90395

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 requested a review from jjyao October 2, 2023 06:01

jjyao reviewed Oct 2, 2023

View reviewed changes

reps/2023-8-18-ray-on-spark-autoscaling.md Outdated Show resolved Hide resolved

reps/2023-8-18-ray-on-spark-autoscaling.md Outdated Show resolved Hide resolved

update

9a08f57

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

rickyyx reviewed Oct 4, 2023

View reviewed changes

jjyao approved these changes Oct 4, 2023

View reviewed changes

zhe-thoughts merged commit 78b00cd into ray-project:main Oct 6, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REP] Ray on spark autoscaling #43

[REP] Ray on spark autoscaling #43

WeichenXu123 commented Aug 21, 2023

ericl left a comment

jjyao left a comment

scv119 commented Oct 2, 2023

WeichenXu123 commented Oct 3, 2023 •

edited

Loading

rickyyx left a comment

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023 •

edited

Loading

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023

WeichenXu123 Oct 4, 2023 •

edited

Loading

rickyyx Oct 4, 2023

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023

rickyyx Oct 4, 2023

WeichenXu123 Oct 4, 2023

rickyyx Oct 4, 2023

zhe-thoughts commented Oct 6, 2023


		#### We can use `ray up` command to set up a ray cluster with autoscaling, why we don't call `ray up` command in ray on spark autoscaling implementation ?

		In ray on spark, we only provides a python API `setup_ray_cluster` and it does not have a CLI. So in `setup_ray_cluster` implementation, we need to generate autoscale config YAML file according to `setup_ray_cluster` argument values, and then launch the ray head node with "--autoscaling-config" option. In this way ray on spark code can manage ray head node process and ray worker nodes easier.


		#### How to make `NodeProvider` backend support multiple Ray worker nodes running on the same virtual machine ?

		By default, `NodeProvider` implementation implement `internal_ip` and `external_ip` methods and convert `node_id` to IP, and different node must have different IP address,


		#### Ray autoscaler supports setting multiple Ray worker groups, each Ray worker group has its individual CPU / GPU / memory resources configuration, and its own minumum / maximum worker number setting for autoscaling. Shall we support this feature in Ray on spark autoscaling?

		Current use-cases only require all Ray worker nodes having the same shape,


		#### What's the default Ray on spark minimum worker number we should use ?

		I propose to set it to zero. Ray worker node launching is very quick, and setting it to

[REP] Ray on spark autoscaling #43

[REP] Ray on spark autoscaling #43

Conversation

WeichenXu123 commented Aug 21, 2023

ericl left a comment

Choose a reason for hiding this comment

jjyao left a comment

Choose a reason for hiding this comment

scv119 commented Oct 2, 2023

WeichenXu123 commented Oct 3, 2023 • edited Loading

rickyyx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeichenXu123 Oct 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeichenXu123 Oct 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhe-thoughts commented Oct 6, 2023

WeichenXu123 commented Oct 3, 2023 •

edited

Loading

WeichenXu123 Oct 4, 2023 •

edited

Loading

WeichenXu123 Oct 4, 2023 •

edited

Loading