add support for rayserve in apiserver #456

scarlet25151 · 2022-08-10T23:33:17Z

Why are these changes needed?

Same with ray job, we need to add support for CRUD of ray service in apiserver.

Related issue number

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

scarlet25151 · 2022-08-16T21:16:20Z

For now, test on local environment has success and ready for review.

brucez-anyscale

thanks. lgtm. Please make sure the tests pass and let @Jeffwan approve

proto/serve.proto

Jeffwan · 2022-08-17T18:15:18Z

proto/serve.proto

+  double gpu = 3;
+  int32 memory = 4;
+  int32 object_store_memory = 5;
+  string resource = 6;


what's the resource field for?

this part here is aligned with the apis of services

kuberay/ray-operator/apis/ray/v1alpha1/rayservice_types.go

Lines 56 to 63 in fdee883

type RayActorOptionSpec struct {

RuntimeEnv string `json:"runtimeEnv,omitempty"`

NumCpus *float64 `json:"numCpus,omitempty"`

NumGpus *float64 `json:"numGpus,omitempty"`

Memory *int32 `json:"memory,omitempty"`

ObjectStoreMemory *int32 `json:"objectStoreMemory,omitempty"`

Resources string `json:"resources,omitempty"`

AcceleratorType string `json:"acceleratorType,omitempty"`

the resource definition here is custom define resource in the ray serve schema

Minor: This is actually the custom resource? I feel ray serve's definition is not that clear. As a user, I may be confused on cpu,memory,gpu and resources here. I can not figure out what to put. Do you have the same feeling? If we can use different name to clarify that, that would be great.

yes, I've replace the name with cpu_per_actor ... and add more comments for this field.

proto/serve.proto

apiserver/pkg/model/converter.go

apiserver/pkg/util/service.go

Jeffwan · 2022-08-17T18:21:16Z

@scarlet25151

Could you give a comparison between RayServe yaml and HTTP POST payload and that's easier to compare.
Let's also make sure docs are updated.
I do see few TODO items there, are they blockers for RayServe endpoint? If so, let's add the status summary in the PR description

scarlet25151 · 2022-08-17T19:18:30Z

@Jeffwan

Could you give a comparison between RayServe yaml and HTTP POST payload and that's easier to compare.

yes, I will add it to the top of the conversation.

Let's also make sure docs are updated.

this part is not updated yet, I will update in the third commit.

I do see few TODO items there, are they blockers for RayServe endpoint? If so, let's add the status summary in
the PR description

here is one things I would like to discuss, should we use the raycluster's ingress or rayservice ingress here?

kuberay/ray-operator/controllers/ray/rayservice_controller.go

Lines 601 to 603 in fdee883

    
           // TODO: When start Ingress in RayService, we can disable the Ingress from RayCluster. 
        
           func (r *RayServiceReconciler) reconcileIngress(ctx context.Context, rayServiceInstance *rayv1alpha1.RayService, rayClusterInstance *rayv1alpha1.RayCluster) error { 
        
           	if rayClusterInstance.Spec.HeadGroupSpec.EnableIngress == nil || !*rayClusterInstance.Spec.HeadGroupSpec.EnableIngress {

scarlet25151 · 2022-08-19T19:06:31Z

The json body of the request would be like:

{
  "name": "chenyu-test",
  "user": "chenyu.jiang",
  "serveDeploymentGraphSpec": {
      "importPath": "fruit.deployment_graph",
      "runtimeEnv": "working_dir: \"https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip\"",
      "serveConfigs": [{
        "deploymentName": "MangoStand",
        "replicas": 1,
        "userConfig": "price: 3",
        "actorOptions": {
          "cpus": 0.1
        }
      },
      {
        "deploymentName": "OrangeStand",
        "replicas": 1,
        "userConfig": "price: 2",
        "actorOptions": {
          "cpus": 0.1
        }
      },
      {
        "deploymentName": "PearStand",
        "replicas": 1,
        "userConfig": "price: 1",
        "actorOptions": {
          "cpus": 0.1
        }
      },
      {
        "deploymentName": "FruitMarket",
        "replicas": 1,
        "actorOptions": {
          "cpus": 0.1
        }
      },{
        "deploymentName": "DAGDriver",
        "replicas": 1,
        "routePrefix": "/",
        "actorOptions": {
          "cpus": 0.1
        }
      }]
  },
  "clusterSpec": {
  // ...
  }
}

which is compatible to the example of the yaml:

kuberay/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml

Lines 11 to 43 in 40ea21c

    
             deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60. 
        
             serveConfig: 
        
               importPath: fruit.deployment_graph 
        
               runtimeEnv: | 
        
                 working_dir: "https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip" 
        
               deployments: 
        
                 - name: MangoStand 
        
                   numReplicas: 1 
        
                   userConfig: | 
        
                     price: 3 
        
                   rayActorOptions: 
        
                     numCpus: 0.1 
        
                 - name: OrangeStand 
        
                   numReplicas: 1 
        
                   userConfig: | 
        
                     price: 2 
        
                   rayActorOptions: 
        
                     numCpus: 0.1 
        
                 - name: PearStand 
        
                   numReplicas: 1 
        
                   userConfig: | 
        
                     price: 1 
        
                   rayActorOptions: 
        
                     numCpus: 0.1 
        
                 - name: FruitMarket 
        
                   numReplicas: 1 
        
                   rayActorOptions: 
        
                     numCpus: 0.1 
        
                 - name: DAGDriver 
        
                   numReplicas: 1 
        
                   routePrefix: "/" 
        
                   rayActorOptions: 
        
                     numCpus: 0.1

Jeffwan · 2022-08-20T23:31:32Z

here is one things I would like to discuss, should we use the raycluster's ingress or rayservice ingress here?

Do you think rayserve should leverage raycluster's ingress to accept external traffic?

scarlet25151 · 2022-08-25T19:21:26Z

here is one things I would like to discuss, should we use the raycluster's ingress or rayservice ingress here?

Do you think rayserve should leverage raycluster's ingress to accept external traffic?

I see, here is an outdate comment, for now I've implement in the way to poulate response with active rayservice cluster's ingress and endpoints .

* add support for rayserve in apiserver * add implementation for rayservice in apiserver * rename the actor config Co-authored-by: chenyu.jiang <chenyu.jiang@bytedance.com>

scarlet25151 marked this pull request as draft August 10, 2022 23:33

scarlet25151 force-pushed the apiserver/support-ray-serve branch 3 times, most recently from 65d153d to cf76d08 Compare August 16, 2022 21:06

scarlet25151 marked this pull request as ready for review August 16, 2022 21:08

scarlet25151 requested review from Jeffwan and brucez-anyscale August 16, 2022 21:08

scarlet25151 force-pushed the apiserver/support-ray-serve branch from cf76d08 to c108120 Compare August 16, 2022 21:12

brucez-anyscale approved these changes Aug 16, 2022

View reviewed changes

Jeffwan reviewed Aug 17, 2022

View reviewed changes

scarlet25151 force-pushed the apiserver/support-ray-serve branch from c108120 to 5053de6 Compare August 19, 2022 19:00

scarlet25151 force-pushed the apiserver/support-ray-serve branch 2 times, most recently from e89323d to 54d1d46 Compare August 19, 2022 20:58

scarlet25151 mentioned this pull request Aug 19, 2022

apiserver add new api docs #498

Merged

4 tasks

scarlet25151 force-pushed the apiserver/support-ray-serve branch from 54d1d46 to 71a75f3 Compare August 19, 2022 22:53

scarlet25151 self-assigned this Aug 19, 2022

chenyu.jiang added 2 commits August 22, 2022 14:02

add support for rayserve in apiserver

7723e3b

add implementation for rayservice in apiserver

386776f

scarlet25151 force-pushed the apiserver/support-ray-serve branch from 71a75f3 to 5ddf134 Compare August 23, 2022 00:09

rename the actor config

527dbdd

scarlet25151 force-pushed the apiserver/support-ray-serve branch from 5ddf134 to 527dbdd Compare August 25, 2022 19:11

Jeffwan approved these changes Aug 26, 2022

View reviewed changes

Jeffwan merged commit c478669 into ray-project:master Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for rayserve in apiserver #456

add support for rayserve in apiserver #456

scarlet25151 commented Aug 10, 2022 •

edited

Loading

scarlet25151 commented Aug 16, 2022

brucez-anyscale left a comment

Jeffwan Aug 17, 2022

scarlet25151 Aug 17, 2022

Jeffwan Aug 20, 2022

scarlet25151 Aug 25, 2022

Jeffwan commented Aug 17, 2022

scarlet25151 commented Aug 17, 2022

scarlet25151 commented Aug 19, 2022

Jeffwan commented Aug 20, 2022

scarlet25151 commented Aug 25, 2022

	type RayActorOptionSpec struct {
	RuntimeEnv string `json:"runtimeEnv,omitempty"`
	NumCpus *float64 `json:"numCpus,omitempty"`
	NumGpus *float64 `json:"numGpus,omitempty"`
	Memory *int32 `json:"memory,omitempty"`
	ObjectStoreMemory *int32 `json:"objectStoreMemory,omitempty"`
	Resources string `json:"resources,omitempty"`
	AcceleratorType string `json:"acceleratorType,omitempty"`

add support for rayserve in apiserver #456

add support for rayserve in apiserver #456

Conversation

scarlet25151 commented Aug 10, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

scarlet25151 commented Aug 16, 2022

brucez-anyscale left a comment

Choose a reason for hiding this comment

Jeffwan Aug 17, 2022

Choose a reason for hiding this comment

scarlet25151 Aug 17, 2022

Choose a reason for hiding this comment

Jeffwan Aug 20, 2022

Choose a reason for hiding this comment

scarlet25151 Aug 25, 2022

Choose a reason for hiding this comment

Jeffwan commented Aug 17, 2022

scarlet25151 commented Aug 17, 2022

scarlet25151 commented Aug 19, 2022

Jeffwan commented Aug 20, 2022

scarlet25151 commented Aug 25, 2022

scarlet25151 commented Aug 10, 2022 •

edited

Loading