-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RayJob CRD and controller logic #303
Conversation
f190e01
to
8473077
Compare
Thanks for the contribution! I will have a check today |
8473077
to
10248a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems some core logic in job controller is not ready yet. I left some comments and I will come back later for another round of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Please update the naming in job controller and do some manual integration test to ensure it works.
@@ -0,0 +1,7 @@ | |||
# The following patch adds a directive for certmanager to inject CA into the CRD | |||
apiVersion: apiextensions.k8s.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@harryge00 Did you get a chance to update the PR and address the comments? We are close to v0.3.0 release and we plan to include this feature in the release. |
@Jeffwan @brucez-anyscale I have just updated this PR, could u take another review? |
@harryge00 Could you help fix the linter and build issue? |
434f624
to
591246c
Compare
591246c
to
b53dce1
Compare
I think the polished version is close to complete. Great work! Can you help address my above comments @harryge00 |
ray-operator/config/crd/patches/cainjection_in_ray_rayjobs.yaml
Outdated
Show resolved
Hide resolved
We should at least add unit tests. Also please manually test it in EKS or GKE or kind cluster. |
1272c3d
to
9c724da
Compare
9c724da
to
9b7970d
Compare
@harryge00 Please help fix the build issue and lint issues. The logics look good to me and we can file follow up PRs to address potential issue. |
|
||
// Set rayClusterName and rayJobId first, to avoid duplicate submission | ||
err = r.setRayJobIdAndRayClusterNameIfNeed(ctx, rayJobInstance) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brucez-anyscale
#303 (comment)
I now save cluster name and job Id before reconciling rayJob. In this case, rayCluster will not be created duplicately because only the first updating of status will succeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you update status in setRayJobIdAndRayClusterNameIfNeed
, you should better just return this reconcile loop.
Or you should get rayJobInstance
again, since the resource version
of rayJobInstance has changed.
0d4a827
to
85ca646
Compare
85ca646
to
d715ffc
Compare
9b668f2
to
6fd234e
Compare
@Jeffwan @brucez-anyscale Could you take another review? I have address the remaining issues |
@harryge00 Awesome! There're few issues need to be improved later, I can help on it
|
dashboardAgentService.Name, | ||
dashboardAgentService.Namespace, | ||
dashboardPort) | ||
log.V(1).Info("fetchDashboardAgentURL ", "dashboardURL", dashboardAgentURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.V(1).Info("fetchDashboardAgentURL ", "dashboardAgentURL", dashboardAgentURL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brucez-anyscale Seem this is the key change dashboardURL -> dashboardAgentURL? I can help update it in follow up PRs and let's merge this one. (We need this change in downstream to unblock other stories..)
Great! Generally looks good. Pls address my comments and then feel free to merge. |
* Generated RayJob CRD * Add the RayJob controller * Update vendor * Update generated code * Add unit tests * Refactor rayJob controller * Update to pass CI
Why are these changes needed?
Add RayJob CRD and its controller.
Includes:
This PR borrows a lot from @brucez-anyscale , thanks so much.
Related issue number
#106
Checks