Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for rayserve in apiserver #456

Merged
merged 3 commits into from
Aug 26, 2022

Conversation

scarlet25151
Copy link
Collaborator

@scarlet25151 scarlet25151 commented Aug 10, 2022

Why are these changes needed?

Same with ray job, we need to add support for CRUD of ray service in apiserver.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@scarlet25151 scarlet25151 marked this pull request as draft August 10, 2022 23:33
@scarlet25151 scarlet25151 force-pushed the apiserver/support-ray-serve branch 3 times, most recently from 65d153d to cf76d08 Compare August 16, 2022 21:06
@scarlet25151 scarlet25151 marked this pull request as ready for review August 16, 2022 21:08
@scarlet25151
Copy link
Collaborator Author

For now, test on local environment has success and ready for review.

Copy link
Contributor

@brucez-anyscale brucez-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. lgtm. Please make sure the tests pass and let @Jeffwan approve

proto/serve.proto Show resolved Hide resolved
proto/serve.proto Outdated Show resolved Hide resolved
double gpu = 3;
int32 memory = 4;
int32 object_store_memory = 5;
string resource = 6;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the resource field for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part here is aligned with the apis of services

type RayActorOptionSpec struct {
RuntimeEnv string `json:"runtimeEnv,omitempty"`
NumCpus *float64 `json:"numCpus,omitempty"`
NumGpus *float64 `json:"numGpus,omitempty"`
Memory *int32 `json:"memory,omitempty"`
ObjectStoreMemory *int32 `json:"objectStoreMemory,omitempty"`
Resources string `json:"resources,omitempty"`
AcceleratorType string `json:"acceleratorType,omitempty"`

the resource definition here is custom define resource in the ray serve schema

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is actually the custom resource? I feel ray serve's definition is not that clear. As a user, I may be confused on cpu,memory,gpu and resources here. I can not figure out what to put. Do you have the same feeling? If we can use different name to clarify that, that would be great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I've replace the name with cpu_per_actor ... and add more comments for this field.

proto/serve.proto Outdated Show resolved Hide resolved
apiserver/pkg/model/converter.go Show resolved Hide resolved
apiserver/pkg/util/service.go Outdated Show resolved Hide resolved
@Jeffwan
Copy link
Collaborator

Jeffwan commented Aug 17, 2022

@scarlet25151

  1. Could you give a comparison between RayServe yaml and HTTP POST payload and that's easier to compare.
  2. Let's also make sure docs are updated.
  3. I do see few TODO items there, are they blockers for RayServe endpoint? If so, let's add the status summary in the PR description

@scarlet25151
Copy link
Collaborator Author

@Jeffwan

  1. Could you give a comparison between RayServe yaml and HTTP POST payload and that's easier to compare.

yes, I will add it to the top of the conversation.

  1. Let's also make sure docs are updated.

this part is not updated yet, I will update in the third commit.

  1. I do see few TODO items there, are they blockers for RayServe endpoint? If so, let's add the status summary in
    the PR description

here is one things I would like to discuss, should we use the raycluster's ingress or rayservice ingress here?

// TODO: When start Ingress in RayService, we can disable the Ingress from RayCluster.
func (r *RayServiceReconciler) reconcileIngress(ctx context.Context, rayServiceInstance *rayv1alpha1.RayService, rayClusterInstance *rayv1alpha1.RayCluster) error {
if rayClusterInstance.Spec.HeadGroupSpec.EnableIngress == nil || !*rayClusterInstance.Spec.HeadGroupSpec.EnableIngress {

@scarlet25151
Copy link
Collaborator Author

The json body of the request would be like:

{
  "name": "chenyu-test",
  "user": "chenyu.jiang",
  "serveDeploymentGraphSpec": {
      "importPath": "fruit.deployment_graph",
      "runtimeEnv": "working_dir: \"https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip\"",
      "serveConfigs": [{
        "deploymentName": "MangoStand",
        "replicas": 1,
        "userConfig": "price: 3",
        "actorOptions": {
          "cpus": 0.1
        }
      },
      {
        "deploymentName": "OrangeStand",
        "replicas": 1,
        "userConfig": "price: 2",
        "actorOptions": {
          "cpus": 0.1
        }
      },
      {
        "deploymentName": "PearStand",
        "replicas": 1,
        "userConfig": "price: 1",
        "actorOptions": {
          "cpus": 0.1
        }
      },
      {
        "deploymentName": "FruitMarket",
        "replicas": 1,
        "actorOptions": {
          "cpus": 0.1
        }
      },{
        "deploymentName": "DAGDriver",
        "replicas": 1,
        "routePrefix": "/",
        "actorOptions": {
          "cpus": 0.1
        }
      }]
  },
  "clusterSpec": {
  // ...
  }
}

which is compatible to the example of the yaml:

deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
serveConfig:
importPath: fruit.deployment_graph
runtimeEnv: |
working_dir: "https://github.com/ray-project/test_dag/archive/c620251044717ace0a4c19d766d43c5099af8a77.zip"
deployments:
- name: MangoStand
numReplicas: 1
userConfig: |
price: 3
rayActorOptions:
numCpus: 0.1
- name: OrangeStand
numReplicas: 1
userConfig: |
price: 2
rayActorOptions:
numCpus: 0.1
- name: PearStand
numReplicas: 1
userConfig: |
price: 1
rayActorOptions:
numCpus: 0.1
- name: FruitMarket
numReplicas: 1
rayActorOptions:
numCpus: 0.1
- name: DAGDriver
numReplicas: 1
routePrefix: "/"
rayActorOptions:
numCpus: 0.1

@scarlet25151 scarlet25151 force-pushed the apiserver/support-ray-serve branch 2 times, most recently from e89323d to 54d1d46 Compare August 19, 2022 20:58
@scarlet25151 scarlet25151 mentioned this pull request Aug 19, 2022
4 tasks
@scarlet25151 scarlet25151 self-assigned this Aug 19, 2022
@Jeffwan
Copy link
Collaborator

Jeffwan commented Aug 20, 2022

here is one things I would like to discuss, should we use the raycluster's ingress or rayservice ingress here?

Do you think rayserve should leverage raycluster's ingress to accept external traffic?

@scarlet25151
Copy link
Collaborator Author

here is one things I would like to discuss, should we use the raycluster's ingress or rayservice ingress here?

Do you think rayserve should leverage raycluster's ingress to accept external traffic?

I see, here is an outdate comment, for now I've implement in the way to poulate response with active rayservice cluster's ingress and endpoints .

@Jeffwan Jeffwan merged commit c478669 into ray-project:master Aug 26, 2022
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
* add support for rayserve in apiserver

* add implementation for rayservice in apiserver

* rename the actor config

Co-authored-by: chenyu.jiang <chenyu.jiang@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants