Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UPSTREAM] Add check for cluster gpus before apply pod gpu config #54

Merged

Conversation

LaVLaS
Copy link

@LaVLaS LaVLaS commented Mar 16, 2022

Add cluster gpu check before applying the gpu config to the notebook pod spec

Testing instructions:

  1. Deploy RHODS with no gpu nodes present
  2. Login to JH and spawn a notebook to create a JH user profile ConfigMap that stores the last selected spawner values
  3. Shutdown the user notebook
  4. Manually edit the JH user profile ConfigMap to add a gpu value >0 to replicate the scenario where the previous user notebook pod had gpus attached
  5. Login to the JH Spawner UI and confirm that there is no UI element for selecting gpu and attempt to spawn a notebook
  6. Confirm that the user notebook pod is scheduled anddoes not request any gpus. .spec.resources.limits['nvidia.com/gpu'] should not exist

You can verify that this does not break existing functionality with the following steps.

  1. Deploy RHODS + GPU Operator
  2. Add a GPU node to the cluster and wait for the gpu operator to enable the GPUs (gpu node has label nvidia.com/gpu: X)
  3. Login to JH and spawn a notebook that requests a gpu
  4. Wait for the notebook to spawn successfully and shut it down
  5. Remove the GPU node from the cluster wait until there are no GPUs in the cluster
  6. Go to the JH spawner and verify that there is no UI element for requesting GPUs and start any notebook
  7. Confirm that the notebook pod spawns successfully with no GPUs requested

Signed-off-by: Landon LaSmith LLaSmith@redhat.com

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • For commits that came from upstream, [UPSTREAM] has been prepended to the commit message
  • JIRA link(s): RHODS-3069
  • The Jira story is acked
  • An entry has been added to the latest build document in Build Announcements Folder.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious)

@LaVLaS LaVLaS force-pushed the rhods/fix/check_gpu_nodes_exist branch 3 times, most recently from b3aee27 to 7def3c2 Compare March 22, 2022 02:14
@LaVLaS LaVLaS force-pushed the rhods/fix/check_gpu_nodes_exist branch from 7def3c2 to 5c795c1 Compare March 24, 2022 13:40
@LaVLaS LaVLaS changed the title WIP: Add check for cluster gpus before apply pod gpu config Add check for cluster gpus before apply pod gpu config Mar 24, 2022
@LaVLaS LaVLaS changed the title Add check for cluster gpus before apply pod gpu config [UPSTREAM] Add check for cluster gpus before apply pod gpu config Mar 28, 2022
@VaishnaviHire
Copy link

@LaVLaS Can you share the link to the live-build ?

@VaishnaviHire
Copy link

This did not work for me on quay.io/modh/rhods-operator-live-catalog:1.9.0-rhods-3059 live-build. I was unable to schedule a notebook pod after updating the user profile configmap.

data:
  profile: |
    gpu: '1'
    last_selected_image: s2i-minimal-notebook:py3.8-rhodsversion
    last_selected_size: Default

Notebook pod:

  containers:
    - resources:
        limits:
          cpu: '2'
          memory: 8Gi
          nvidia.com/gpu: '1'
        requests:
          cpu: '1'
          memory: 4Gi
          nvidia.com/gpu: '1'

@LaVLaS
Copy link
Author

LaVLaS commented Apr 1, 2022

Live build avaiable at quay.io/llasmith/rhods-operator-live-catalog:1.9.0-rhods-3069

@VaishnaviHire
Copy link

/lgtm

Worked well with the updated build.

@LaVLaS LaVLaS merged commit 3140ca4 into red-hat-data-services:master Apr 4, 2022
@LaVLaS LaVLaS deleted the rhods/fix/check_gpu_nodes_exist branch April 4, 2022 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants