Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support updating RayServices using the KubeRay API Server #633

Merged

Conversation

scarlet25151
Copy link
Collaborator

@scarlet25151 scarlet25151 commented Oct 13, 2022

Why are these changes needed?

support update ray service for current apiserver.

curl -XPUT localhost:8888/apis/v1alpha2/namespaces/ray-system/services/test-1
{
  "name": "test-1",
  "namespace": "ray-system",
  "serveConfigs": [
      {
        "deploymentName": "PearStand",
        "replicas": 2,
        "userConfig": "price: 1",
        "actorOptions": {
          "cpusPerActor": 1.0
        }
      }
  ],
  "workerGroupUpdateSpec": [
    {
      "groupName": "small-wg",
      "replicas": 2,
      "minReplicas": 1,
      "maxReplicas": 5
    }]
}
// Response:
{
    "name": "test-1",
    "namespace": "ray-system",
    "user": "scarlet25151",
    "serveDeploymentGraphSpec": {
        "importPath": "https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip\"\n",
        "serveConfigs": [
            {
                "deploymentName": "RayServeModel",
                "replicas": 1,
                "routePrefix": "/model",
                "actorOptions": {
                    "cpusPerActor": 0.4
                }
            },
            {
                "deploymentName": "OrangeStand",
                "replicas": 1,
                "userConfig": "price: 2",
                "actorOptions": {
                    "cpusPerActor": 0.1
                }
            },
            {
                **"deploymentName": "PearStand",
                "replicas": 2,
                "userConfig": "price: 1",
                "actorOptions": {
                    "cpusPerActor": 1**
                }
            },
            {
                "deploymentName": "FruitMarket",
                "replicas": 1,
                "actorOptions": {
                    "cpusPerActor": 0.1
                }
            },
            {
                "deploymentName": "DAGDriver",
                "replicas": 1,
                "routePrefix": "/",
                "actorOptions": {
                    "cpusPerActor": 0.1
                }
            }
        ]
    },
    "clusterSpec": {
        "headGroupSpec": {
            "computeTemplate": "default-template",
            "image": "hub.byted.org/kuberay/ray:2.0.0",
            "serviceType": "NodePort",
            "rayStartParams": {
                "dashboard-host": "0.0.0.0",
                "metrics-export-port": "8080",
                "node-ip-address": "$MY_POD_IP",
                "port": "6379"
            }
        },
        "workerGroupSpec": [
            {
                "groupName": "small-wg",
                "computeTemplate": "default-template",
                "image": "hub.byted.org/kuberay/ray:2.0.0",
                **"replicas": 2,**
                "minReplicas": 5,
                "maxReplicas": 1,
                "rayStartParams": {
                    "node-ip-address": "$MY_POD_IP"
                }
            }
        ]
    },
    "rayServiceStatus": {
//...
}

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@scarlet25151 scarlet25151 marked this pull request as draft October 13, 2022 22:22
Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @brucez-anyscale @simon-mo

Let's ask anyscale team to help review it and see if the PR makes sense.

@@ -67,6 +73,19 @@ message CreateRayServiceRequest {
string namespace = 2;
}

message UpdateRayServiceRequest {
// The ray service to be updated.
RayService service = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need entire object here? is name + namespace good enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, remove the service from the body.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is service removed?

// The namespace of the ray service to be updated.
string namespace = 3;
// specify the worker group to be update
repeated WorkerGroupUpdateSpec worker_group_update_spec = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we want to use patch mode (partial update)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you make sure it only find the right spec and update it? Do you plan to use name for consistency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I would like to use patch mode, the logic in util/service.go would check the field by workergroup name and serviceconfig name as the key and if need to modify then update it.

@scarlet25151 scarlet25151 marked this pull request as ready for review October 20, 2022 00:05
service.Spec.ServeDeploymentGraphSpec.ServeConfigSpecs = newServeConfigSpecs
}

newService, err := client.Update(ctx, service, metav1.UpdateOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if service resourceVersion changes, the update could fail ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just leverage the client go's update and it will directly put the update event to the k8s workerqueue and wait for previous resourceVersion changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but k8s do update in a reconcile loop. So if it fails, it will retry in next loop. But this api call is 1 time call. Do we want to retry here? Or the caller should retry their side?

}
}
for i, spec := range workerGroupSpecs {
if updateSpec, ok := specMap[spec.GroupName]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is update logic. what if updateSpecs has new worker groups, do we support adding new worker groups?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, for now if we add or delete the worker group by name, do we think it's better to add/delete or just keep the old one? I think we need to support adding/deleting worker groups. WDYT? @Jeffwan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RayService should be able to handle any spec update. So here we just need a design about http request, how to support add/delete/update

}
for i, spec := range serveConfigSpecs {
if updateSpec, ok := specMap[spec.Name]; ok {
newSpec := updateServeConfigSpec(updateSpec, spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same, do we support adding service config? Also how to support deleting?

@brucez-anyscale
Copy link
Contributor

thanks for the pr. I leave some comments

@scarlet25151
Copy link
Collaborator Author

Hi @Jeffwan @brucez-anyscale I've made a new modification that support two types of APIs to update the ray service,

  1. curl -XPUT {BaseURL}/apis/v1alpha2/namespaces/{namespace}/services/{name}
    in this case, we can make any changes to the rayservice spec, including adding/deleting raycluster spec, The request body would be the complete definition of rayservice and we would update it.
  2. curl -XPATCH {BaseURL}/apis/v1alpha2/namespaces/{namespace}/services/{name}
    this case is used by our downstream user in the case that they would just modify the serveConfig sections and the raycluster workers replicas without restarting the new raycluster under the hook,

this might be tricky things since for now in rayservice, we just doing simple check if the raycluster spec should update, we just recreate a new one, I will create a issue to discuss this in detail.

@DmitriGekhtman
Copy link
Collaborator

@brucez-anyscale if you have a spare moment to re-review this, that would be great
(I imagine you might not have a spare moment these days...)

@brucez-anyscale
Copy link
Contributor

it is fine. I think we can merge and improve based on it

@DmitriGekhtman DmitriGekhtman merged commit 599e74b into ray-project:master Nov 5, 2022
@DmitriGekhtman DmitriGekhtman changed the title support update rayservice Support updating RayServices using the KubeRay API Server Nov 5, 2022
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Support updating Ray Services using the KubeRay ApiServer.

Co-authored-by: chenyu.jiang <chenyu.jiang@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants