[UPSTREAM] Get Maximum Available GPU using Prometheus Data #57

VaishnaviHire · 2022-04-07T19:11:49Z

(cherry picked from commit 5a20fb9)

The commits are squashed in a cohesive manner and have meaningful messages.
For commits that came from upstream, [UPSTREAM] has been prepended to the commit message
JIRA link(s): https://issues.redhat.com/browse/RHODS-3182
The Jira story is acked
An entry has been added to the latest build document in Build Announcements Folder.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious)

VaishnaviHire · 2022-04-07T19:14:16Z

/hold

until upstream PR is merged.

Live Build: quay.io/modh/rhods-operator-live-catalog:1.10.0-rhods-3182

VaishnaviHire · 2022-04-11T14:22:33Z

@lugi0 Tagging early for test implementation. This is not merged in upstream yet.

lugi0 · 2022-04-11T14:37:44Z

@VaishnaviHire thanks. Is this going to return the maximum number of GPUs in a single node? I remember there were discussions around having separate counts for separate nodes and letting the user pick and choose.

VaishnaviHire · 2022-04-11T15:01:36Z

@VaishnaviHire thanks. Is this going to return the maximum number of GPUs in a single node? I remember there were discussions around having separate counts for separate nodes and letting the user pick and choose.

This is just for the maximum number of GPUs in a node. The difference from the previous implementation is that now the maximum number of GPU will be updated to display just the available GPUs.

lugi0 · 2022-04-11T15:06:10Z

Understood, so the dropdown will be dinamically updated to reflect how many GPUs are still available for use?
Does this also take into account the situation in which no taint has been applied to the node and cpu pods can schedule on it? Will running the prometheus query have any effect on the loading of the spawner page?

VaishnaviHire · 2022-04-11T15:11:22Z

Understood, so the dropdown will be dinamically updated to reflect how many GPUs are still available for use? Does this also take into account the situation in which no taint has been applied to the node and cpu pods can schedule on it? Will running the prometheus query have any effect on the loading of the spawner page?

Yes. It should not have any effect in case of no taint, I have added a check here https://github.com/red-hat-data-services/jupyterhub-singleuser-profiles/pull/57/files#diff-98a012f258f1bc42077e3b4bd199d163917d2a7d7e9a864ea3ac9f0ea3accc84R202 to only query Prometheus when GPU operator namespace is present.

lugi0 · 2022-04-11T15:24:41Z

What I had in mind was slightly different, i.e. the gpu operator is present but no taint was applied to the gpu node when provisioning it on the cluster. I think it's a corner case we don't even want to explicitly support, so it should be fine either way.

jupyterhub_singleuser_profiles/openshift.py

(cherry picked from commit 22ff067)

LaVLaS

/lgtm

This worked perfectly from the live build quay.io/modh/rhods-operator-live-catalog:1.10.0-rhods-3182

VaishnaviHire changed the title ~~Get Maximum Available GPU using Prometheus Data~~ [UPSTREAM] Get Maximum Available GPU using Prometheus Data Apr 11, 2022

LaVLaS requested changes Apr 11, 2022

View reviewed changes

jupyterhub_singleuser_profiles/openshift.py Outdated Show resolved Hide resolved

Get Maximum Available GPU using Prometheus Data

b340688

(cherry picked from commit 22ff067)

VaishnaviHire force-pushed the rhods_3182 branch from 83f0d35 to b340688 Compare April 11, 2022 20:20

VaishnaviHire mentioned this pull request Apr 13, 2022

[UPSTREAM] Update JH Cluster Role red-hat-data-services/odh-manifests#214

Merged

6 tasks

LaVLaS self-requested a review April 18, 2022 20:15

LaVLaS approved these changes Apr 18, 2022

View reviewed changes

LaVLaS merged commit 4858fb9 into red-hat-data-services:master Apr 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UPSTREAM] Get Maximum Available GPU using Prometheus Data #57

[UPSTREAM] Get Maximum Available GPU using Prometheus Data #57

VaishnaviHire commented Apr 7, 2022 •

edited

VaishnaviHire commented Apr 7, 2022

VaishnaviHire commented Apr 11, 2022

lugi0 commented Apr 11, 2022

VaishnaviHire commented Apr 11, 2022

lugi0 commented Apr 11, 2022

VaishnaviHire commented Apr 11, 2022

lugi0 commented Apr 11, 2022

LaVLaS left a comment

[UPSTREAM] Get Maximum Available GPU using Prometheus Data #57

[UPSTREAM] Get Maximum Available GPU using Prometheus Data #57

Conversation

VaishnaviHire commented Apr 7, 2022 • edited

VaishnaviHire commented Apr 7, 2022

VaishnaviHire commented Apr 11, 2022

lugi0 commented Apr 11, 2022

VaishnaviHire commented Apr 11, 2022

lugi0 commented Apr 11, 2022

VaishnaviHire commented Apr 11, 2022

lugi0 commented Apr 11, 2022

LaVLaS left a comment

Choose a reason for hiding this comment

VaishnaviHire commented Apr 7, 2022 •

edited