Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UPSTREAM] Get Maximum Available GPU using Prometheus Data #57

Merged
merged 1 commit into from Apr 20, 2022

Conversation

VaishnaviHire
Copy link

@VaishnaviHire VaishnaviHire commented Apr 7, 2022

(cherry picked from commit 5a20fb9)

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • For commits that came from upstream, [UPSTREAM] has been prepended to the commit message
  • JIRA link(s): https://issues.redhat.com/browse/RHODS-3182
  • The Jira story is acked
  • An entry has been added to the latest build document in Build Announcements Folder.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious)

@VaishnaviHire
Copy link
Author

/hold

until upstream PR is merged.

Live Build: quay.io/modh/rhods-operator-live-catalog:1.10.0-rhods-3182

@VaishnaviHire VaishnaviHire changed the title Get Maximum Available GPU using Prometheus Data [UPSTREAM] Get Maximum Available GPU using Prometheus Data Apr 11, 2022
@VaishnaviHire
Copy link
Author

@lugi0 Tagging early for test implementation. This is not merged in upstream yet.

@lugi0
Copy link

lugi0 commented Apr 11, 2022

@VaishnaviHire thanks. Is this going to return the maximum number of GPUs in a single node? I remember there were discussions around having separate counts for separate nodes and letting the user pick and choose.

@VaishnaviHire
Copy link
Author

@VaishnaviHire thanks. Is this going to return the maximum number of GPUs in a single node? I remember there were discussions around having separate counts for separate nodes and letting the user pick and choose.

This is just for the maximum number of GPUs in a node. The difference from the previous implementation is that now the maximum number of GPU will be updated to display just the available GPUs.

@lugi0
Copy link

lugi0 commented Apr 11, 2022

Understood, so the dropdown will be dinamically updated to reflect how many GPUs are still available for use?
Does this also take into account the situation in which no taint has been applied to the node and cpu pods can schedule on it? Will running the prometheus query have any effect on the loading of the spawner page?

@VaishnaviHire
Copy link
Author

Understood, so the dropdown will be dinamically updated to reflect how many GPUs are still available for use? Does this also take into account the situation in which no taint has been applied to the node and cpu pods can schedule on it? Will running the prometheus query have any effect on the loading of the spawner page?

Yes. It should not have any effect in case of no taint, I have added a check here https://github.com/red-hat-data-services/jupyterhub-singleuser-profiles/pull/57/files#diff-98a012f258f1bc42077e3b4bd199d163917d2a7d7e9a864ea3ac9f0ea3accc84R202 to only query Prometheus when GPU operator namespace is present.

@lugi0
Copy link

lugi0 commented Apr 11, 2022

What I had in mind was slightly different, i.e. the gpu operator is present but no taint was applied to the gpu node when provisioning it on the cluster. I think it's a corner case we don't even want to explicitly support, so it should be fine either way.

Copy link

@LaVLaS LaVLaS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

This worked perfectly from the live build quay.io/modh/rhods-operator-live-catalog:1.10.0-rhods-3182

@LaVLaS LaVLaS merged commit 4858fb9 into red-hat-data-services:master Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants