Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enableIngress validation check in RayCluster webhook #26

Merged

Conversation

ChristianZaccaria
Copy link

@ChristianZaccaria ChristianZaccaria commented Apr 10, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR adds validation to check if the RayCluster headGroupSpec contains the enableIngress field OR if it is set to true, and if conditions are met, the webhook will forbid the creation of the RayCluster resource.

Acceptance criteria:

  • As a data scientist, I cannot create a RayCluster resource with EnableIngress set to true
  • As a data scientist, I cannot create a RayCluster resource when EnableIngress is not specified
  • As a data scientist, I can create a Ray cluster via the CodeFlare SDK

Which issue(s) this PR fixes:

Jira: https://issues.redhat.com/browse/RHOAIENG-5116

Special notes for your reviewer:

Verification Steps:

  1. Login to your OpenShift cluster.
  2. Run KubeRay v1.1.0 - This could work
  3. Run the CFO locally against the cluster with make run NAMESPACE=somenamespace
  4. Run Kueue - To test this change:
    • You would need to build and push a new Kueue image i.e., :
    IMAGE_REGISTRY=quay.io/yourusername make image-local-push
    
    • Make that new quay repository public.
    • Then replace the params.env file with your new image i.e.: quay.io/yourusername/kueue:v0.6.0-rhoai-2.9-tests-1-gdb3e81af-dirty
    • Deploy Kueue with kubectl apply --server-side -k config/rhoai
  5. Create ClusterQueue, LocalQueue, and ResourceFlavor if not done already.
  6. Install the SDK from this PR: Add enableIngress and set to false by default project-codeflare/codeflare-sdk#506
  7. Attempt to create a RayCluster in a notebook with:
cluster = Cluster(ClusterConfiguration(
    name='jobtest',
    namespace='default', # or localqueue namespace
    num_workers=1,
    min_cpus=1,
    max_cpus=1,
    min_memory=2,
    max_memory=2,
    num_gpus=0,
    write_to_file=True,
    image="quay.io/project-codeflare/ray:latest-py39-cu118"
))
  1. In terminal, you could try vi .codeflare/resources/jobtest.yaml and modify enableIngress field under headGroupSpec to be True/False or remove the field entirely.
  2. If enableIngress is not False, the RayCluster will not be created.

Does this PR introduce a user-facing change?

The user will not be allowed to create RayCluster resource if the enableIngress field is true or unset.

Copy link

openshift-ci bot commented Apr 10, 2024

@ChristianZaccaria: The label(s) kind/validation/bug-fix cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?

/kind validation/bug-fix

What this PR does / why we need it:

This PR adds validation to check if the RayCluster headGroupSpec contains the enableIngress field OR if it is set to true, and if conditions are met, the webhook will forbid the creation of the RayCluster resource.

Acceptance criteria:

  • As a data scientist, I cannot create a RayCluster resource with EnableIngress set to true
  • As a data scientist, I cannot create a RayCluster resource when EnableIngress is not specified
  • As a data scientist, I can create a Ray cluster via the CodeFlare SDK

Which issue(s) this PR fixes:

Jira: https://issues.redhat.com/browse/RHOAIENG-5116

Special notes for your reviewer:

Verification Steps:

  1. Login to your OpenShift cluster.
  2. Run the CFO locally against the cluster with make install and make run NAMESPACE=somenamespace
  3. Run KubeRay v1.1.0 - This could work
  4. Run Kueue - To test this change:
    • You would need to build and push a new Kueue image i.e., :
    IMAGE_REGISTRY=quay.io/yourusername make image-local-push
    
    • Make that new quay repository public.
    • Then replace the params.env file with your new image i.e.: quay.io/yourusername/kueue:v0.6.0-rhoai-2.9-tests-1-gdb3e81af-dirty
    • Deploy Kueue with kubectl apply --server-side -k config/rhoai
  5. Create ClusterQueue and LocalQueue if not done already.
  6. Install the SDK from this PR.
  7. Attempt to create a RayCluster in a notebook with:
cluster = Cluster(ClusterConfiguration(
   name='jobtest',
   namespace='default', # or localqueue namespace
   num_workers=1,
   min_cpus=1,
   max_cpus=1,
   min_memory=2,
   max_memory=2,
   num_gpus=0,
   write_to_file=True,
   image="quay.io/project-codeflare/ray:latest-py39-cu118"
))
  1. In terminal, you could try vi .codeflare/resources/jobtest.yaml and modify enableIngress field under headGroupSpec to be True/False or remove the field entirely.
  2. If enableIngress is not False, the RayCluster will not be created.

Does this PR introduce a user-facing change?


The user will not be allowed to create RayCluster resource if the enableIngress field is true or unset.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@Fiona-Waters Fiona-Waters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this alongside project-codeflare/codeflare-sdk#506 and it works as expected.
/lgtm

Copy link

@Ygnas Ygnas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected.

@ChristianZaccaria
Copy link
Author

ChristianZaccaria commented Apr 12, 2024

Hey @sutaakar, CI is expected to fail here right?

Copy link

openshift-ci bot commented Apr 12, 2024

@ChristianZaccaria: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-odh-kueue 7dddbff link true /test e2e-odh-kueue

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sutaakar
Copy link

@ChristianZaccaria Yes, CI is broken at this moment

@Fiona-Waters
Copy link

Re-reviewed with line removed. Works as expected. Think we are good to merge it now.
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 12, 2024
@zdtsw
Copy link
Member

zdtsw commented Apr 12, 2024

took me bit time to find out EnableIngress is a bool pointer.... nice!

/lgtm

@sutaakar
Copy link

/lgtm

Copy link
Member

@anishasthana anishasthana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as per offline discussion.

Copy link

openshift-ci bot commented Apr 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anishasthana, ChristianZaccaria

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anishasthana anishasthana merged commit 71f413c into opendatahub-io:dev Apr 12, 2024
0 of 2 checks passed
@astefanutti
Copy link

This needs to be reverted / removed as the logic does not belong to Kueue, and should work independently with Kuberay, whether Kueue is installed or not. This logic should be added to the DW RayCluster webhook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants