Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][apiserver] fix apiserver create rayservice missing serve port #734

Conversation

scarlet25151
Copy link
Collaborator

Why are these changes needed?

The rayservice need indicate the serve port for exposing the ray serve service, in the default raycluster template we are missing the serve port, here we need add it at the service part.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@scarlet25151 scarlet25151 self-assigned this Nov 17, 2022
@scarlet25151 scarlet25151 added the bug Something isn't working label Nov 17, 2022
@scarlet25151 scarlet25151 added this to the v0.4.0 release milestone Nov 17, 2022
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @scarlet25151 for your contribution! In addition, do you mind adding some unit tests or describing more details about the manual tests? Thank you!

newRayClusterSpec := *buildRayClusterSpec(rayServiceDefaultVersion, nil, apiService.ClusterSpec, computeTemplateMap)
newRayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Ports = append(newRayClusterSpec.HeadGroupSpec.Template.Spec.Containers[0].Ports, v1.ContainerPort{
Name: defaultServePortName,
ContainerPort: defaultServePort,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the API server. Are there any use cases in that users want to use another port (e.g. 9000 in #570)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RayCluster controller's behavior includes the serve port in the head service by default.

For consistency with the RayCluster controller's behavior, I think we should include this with the rest of the ports in buildRayClusterSpec.
Alternatively, don't include the ports in buildRayClusterSpec. The RayCluster controller will configure them automatically.

func getServicePorts(cluster rayiov1alpha1.RayCluster) map[string]int32 {

Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin85421 I think configurability of the port via the KubeRay API server is a separate issue.
(Though as you've pointed out, some users do need to configure the port, so that configuration should indeed be exposed at some point.)

The point of this PR is just to make it possible to configure a working RayService with the KubeRay API server.

Copy link
Collaborator Author

@scarlet25151 scarlet25151 Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DmitriGekhtman Thank you for your feedback and it's a great discussion, I was originally thinking we should add the serve port in the buildRayClusterSpec because in kuberay-apiserver the create server api will leverage the buildRayClusterSpec to populate the template of a ray service. and for now the missing of serve port will lead to an error that the rayservice cannot expose the serve. in rayservice-controller it will create a ray service with no serve port and we cannot access the rayservice that create by kuberay-apiserver.
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The things is that we may not only use the buildRayClusterSpec in building a ray service, but also in creating jobs:

clusterSpec = buildRayClusterSpec(rayJobDefaultVersion, nil, apiJob.ClusterSpec, computeTemplateMap)

or building ray cluster,
Spec: *buildRayClusterSpec(apiCluster.Version, apiCluster.Envs, apiCluster.ClusterSpec, computeTemplateMap),

in those cases, the serve port may not be necessary and it may conflict with the configuration there. So I add it in the build service layer and add a comment in apiserver/pkg/util/cluster.go:L138

@DmitriGekhtman
Copy link
Collaborator

If the KubeRay API server has any unit tests, it would be good to add a unit test for this PR.
(If there are no unit tests, you should track adding unit tests.)

@scarlet25151 scarlet25151 force-pushed the fix/apiserver-create-rayservice-missing-serve-port branch from 103826d to be5894f Compare November 22, 2022 06:16
@scarlet25151
Copy link
Collaborator Author

Thank @scarlet25151 for your contribution! In addition, do you mind adding some unit tests or describing more details about the manual tests? Thank you!

@kevin85421 Thank you for your advise, I think as creating ray cluster there is a UT, but for creating rayjob and rayservice, we do not have UT yet, I think we can create an issue to add the UT and I'm willing to contribute on that part.

Talking back to the manually test we can follow the instruction in https://github.com/ray-project/kuberay/tree/master/apiserver#create-ray-service-in-a-given-namespace and I use that case to create a rayservice,

curl --location --request POST '10.203.96.39:31888/apis/v1alpha2/namespaces/ray-system/services' \
--header 'Content-Type: application/json' \
--data'{
  "name": "chenyu-test-1",
  "namespace": "ray-dev",
  "user": "chenyu.jiang",
  "serveDeploymentGraphSpec": {
      "importPath": "serve.ray_serve_model",
      "runtimeEnv": "https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip\"\n",
      "serveConfigs": [
      {
        "deploymentName": "OrangeStand",
        "replicas": 1,
        "userConfig": "price: 2",
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      },
      {
        "deploymentName": "PearStand",
        "replicas": 1,
        "userConfig": "price: 1",
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      },
      {
        "deploymentName": "FruitMarket",
        "replicas": 1,
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      },{
        "deploymentName": "DAGDriver",
        "replicas": 1,
        "routePrefix": "/",
        "actorOptions": {
          "cpusPerActor": 0.1
        }
      }]
  },
  "clusterSpec": {
    "headGroupSpec": {
      "computeTemplate": "default-template",
      "image": "hub.byted.org/kuberay/ray:2.0.0",
      "serviceType": "NodePort",
      "rayStartParams": {
            "port": "6379",
            "node-ip-address": "$MY_POD_IP",
            "dashboard-host": "0.0.0.0",
            "metrics-export-port": "8080"
        },
       "volumes": [] 
    },
    "workerGroupSpec": [
      {
        "groupName": "small-wg",
        "computeTemplate": "default-template",
        "image": "hub.byted.org/kuberay/ray:2.0.0",
        "replicas": 1,
        "minReplicas": 0,
        "maxReplicas": 5,
        "rayStartParams": {
                "node-ip-address": "$MY_POD_IP"
            }
      }
    ]
  }
}'

Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me.

Please open an issue tracking unit testing for RayService configuration.

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
Signed-off-by: Chenyu Jiang <38214590+scarlet25151@users.noreply.github.com>
@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 23, 2022

@scarlet25151 Can you create an issue to track UT per Dmitri's request?

@scarlet25151
Copy link
Collaborator Author

@scarlet25151 Can you create an issue to track UT per Dmitri's request?

Yes, already created #754

@wilsonwang371
Copy link
Collaborator

wilsonwang371 commented Nov 23, 2022

overall, this LGTM.

Discussed with @scarlet25151 offline. regarding @DmitriGekhtman 's comment, @scarlet25151 will add test cases in a separate patch.

@wilsonwang371 wilsonwang371 merged commit c9802e9 into ray-project:master Nov 23, 2022
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…ay-project#734)

* fix apiserver create rayservice missing serve port

* Update apiserver/pkg/util/cluster.go

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
Signed-off-by: Chenyu Jiang <38214590+scarlet25151@users.noreply.github.com>

Signed-off-by: Chenyu Jiang <38214590+scarlet25151@users.noreply.github.com>
Co-authored-by: chenyu.jiang <chenyu.jiang@bytedance.com>
Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants