[UPSTREAM] Add check for cluster gpus before apply pod gpu config #54

LaVLaS · 2022-03-16T03:19:04Z

Add cluster gpu check before applying the gpu config to the notebook pod spec

Testing instructions:

Deploy RHODS with no gpu nodes present
Login to JH and spawn a notebook to create a JH user profile ConfigMap that stores the last selected spawner values
Shutdown the user notebook
Manually edit the JH user profile ConfigMap to add a gpu value >0 to replicate the scenario where the previous user notebook pod had gpus attached
Login to the JH Spawner UI and confirm that there is no UI element for selecting gpu and attempt to spawn a notebook
Confirm that the user notebook pod is scheduled anddoes not request any gpus. .spec.resources.limits['nvidia.com/gpu'] should not exist

You can verify that this does not break existing functionality with the following steps.

Deploy RHODS + GPU Operator
Add a GPU node to the cluster and wait for the gpu operator to enable the GPUs (gpu node has label nvidia.com/gpu: X)
Login to JH and spawn a notebook that requests a gpu
Wait for the notebook to spawn successfully and shut it down
Remove the GPU node from the cluster wait until there are no GPUs in the cluster
Go to the JH spawner and verify that there is no UI element for requesting GPUs and start any notebook
Confirm that the notebook pod spawns successfully with no GPUs requested

Signed-off-by: Landon LaSmith LLaSmith@redhat.com

The commits are squashed in a cohesive manner and have meaningful messages.
For commits that came from upstream, [UPSTREAM] has been prepended to the commit message
JIRA link(s): RHODS-3069
The Jira story is acked
An entry has been added to the latest build document in Build Announcements Folder.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious)

VaishnaviHire · 2022-03-31T13:33:54Z

@LaVLaS Can you share the link to the live-build ?

VaishnaviHire · 2022-03-31T17:05:49Z

This did not work for me on quay.io/modh/rhods-operator-live-catalog:1.9.0-rhods-3059 live-build. I was unable to schedule a notebook pod after updating the user profile configmap.

data:
  profile: |
    gpu: '1'
    last_selected_image: s2i-minimal-notebook:py3.8-rhodsversion
    last_selected_size: Default

Notebook pod:

  containers:
    - resources:
        limits:
          cpu: '2'
          memory: 8Gi
          nvidia.com/gpu: '1'
        requests:
          cpu: '1'
          memory: 4Gi
          nvidia.com/gpu: '1'

LaVLaS · 2022-04-01T15:02:44Z

Live build avaiable at quay.io/llasmith/rhods-operator-live-catalog:1.9.0-rhods-3069

VaishnaviHire · 2022-04-01T15:35:14Z

/lgtm

Worked well with the updated build.

LaVLaS force-pushed the rhods/fix/check_gpu_nodes_exist branch 3 times, most recently from b3aee27 to 7def3c2 Compare March 22, 2022 02:14

[UPSTREAM] update GPU number to 0 when GPU nodes are gone

5c795c1

LaVLaS force-pushed the rhods/fix/check_gpu_nodes_exist branch from 7def3c2 to 5c795c1 Compare March 24, 2022 13:40

LaVLaS changed the title ~~WIP: Add check for cluster gpus before apply pod gpu config~~ Add check for cluster gpus before apply pod gpu config Mar 24, 2022

LaVLaS changed the title ~~Add check for cluster gpus before apply pod gpu config~~ [UPSTREAM] Add check for cluster gpus before apply pod gpu config Mar 28, 2022

LaVLaS merged commit 3140ca4 into red-hat-data-services:master Apr 4, 2022

LaVLaS deleted the rhods/fix/check_gpu_nodes_exist branch April 4, 2022 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UPSTREAM] Add check for cluster gpus before apply pod gpu config #54

[UPSTREAM] Add check for cluster gpus before apply pod gpu config #54

LaVLaS commented Mar 16, 2022 •

edited

VaishnaviHire commented Mar 31, 2022

VaishnaviHire commented Mar 31, 2022

LaVLaS commented Apr 1, 2022

VaishnaviHire commented Apr 1, 2022

[UPSTREAM] Add check for cluster gpus before apply pod gpu config #54

[UPSTREAM] Add check for cluster gpus before apply pod gpu config #54

Conversation

LaVLaS commented Mar 16, 2022 • edited

VaishnaviHire commented Mar 31, 2022

VaishnaviHire commented Mar 31, 2022

LaVLaS commented Apr 1, 2022

VaishnaviHire commented Apr 1, 2022

LaVLaS commented Mar 16, 2022 •

edited