Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KFNBC]: GPU dropdown should not be visibile if no GPUs are available #301

Closed
1 task done
lugi0 opened this issue Jul 15, 2022 · 16 comments · Fixed by #397
Closed
1 task done

[KFNBC]: GPU dropdown should not be visibile if no GPUs are available #301

lugi0 opened this issue Jul 15, 2022 · 16 comments · Fixed by #397
Assignees
Labels
feature/notebook-controller KubeFlow NoteBook Controller (KFNBC) Feature kind/enhancement New functionality request (existing augments or new additions) priority/normal An issue with the product; fix when possible

Comments

@lugi0
Copy link
Contributor

lugi0 commented Jul 15, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

In the spawner page I see the Number of GPUs dropdown, even though the only value available is 0

Expected Behavior

Dropdown should not be shown when GPUs are not available

Steps To Reproduce

  1. Visit spawner
  2. Consume all GPUs (if needed)
  3. Reload spawner

Workaround (if any)

No response

OpenShift Version

No response

Openshift Version

No response

What browsers are you seeing the problem on?

Firefox

Open Data Hub Version

No response

Relevant log output

No response

@samuelvl samuelvl added the feature/notebook-controller KubeFlow NoteBook Controller (KFNBC) Feature label Jul 15, 2022
@samuelvl samuelvl added the needs-info Further information is requested from the reporter or from another source label Jul 15, 2022
@samuelvl
Copy link
Contributor

@maroroman will review this behavior to confirm if it is a bug

@andrewballantyne
Copy link
Member

@kywalker-rh Should we hide the dropdown when it cannot select anything or should we disable it? Having it clickable serves nothing but to waste the user's time as they cannot do anything with the field anyways. Alternatively (I'm not sure if this is a pattern yet) we could print text instead of a field stating there are no GPUs available or something like that.

@andrewballantyne andrewballantyne added the priority/normal An issue with the product; fix when possible label Jul 18, 2022
@DaoDaoNoCode
Copy link
Member

@andrewballantyne I think there is a discussion here https://odh-io.slack.com/archives/C035J0KA6UB/p1658149905199269

@jkoehler-redhat
Copy link

@kywalker-rh @jedemo please comment on Andrew's question.

@jedemo
Copy link

jedemo commented Jul 22, 2022

If a cluster doesn't have any GPU nodes that can be used (either from active nodes or via nodes that can be triggered by auto-scaling), I don't think we should show the GPU drop down, as it adds unnecessary clutter. In other words, if a customer/cluster has not enabled GPUs, no reason to show the GPU drop down in the spawner.

If GPUs are enabled, I think it should be clickable if a GPU session can be scheduled. This could be from unused GPUs on active nodes or GPUs on inactive nodes that can be triggered via auto scaling. The number in the drop down should be the max available on any single node. The intent is to only present options that can be successfully scheduled. If there aren't any GPUs available, I think we would keep the drop down at 0 & disable it. That way a user doesn't think there's a bug - i.e. why don't I see the GPU field? We could add doc that if the GPU field is visible but not clickable (value = 0), it means all available GPUs are in use. Some example use cases:

  1. Cluster has 2 active nodes with 2 GPUs each. 1 of the GPUs on 1 node is in use. User starts NB server session - GPU drop down should show 2 GPUs - the max available on any single node.
  2. Cluster has 2 active nodes with 2 GPUs each and all GPUs are in use. User starts NB server session - GPU drop down should be displayed with 0 & disabled. The rationale is we don't want to show options that will result in an error.
  3. Cluster has 2 active nodes with 2 GPUs each and 1 inactive node (can be triggered via auto scaling) with 1 GPU. All GPUs in active nodes are in use. User starts NB server session - GPU drop down should show 1 GPU because that can be scheduled on inactive node via auto scaling.

@kywalker-rh
Copy link

I agree with @jedemo for both cases. This makes the most sense from a UX standpoint as well.

@andrewballantyne andrewballantyne added kind/enhancement New functionality request (existing augments or new additions) and removed needs-info Further information is requested from the reporter or from another source labels Jul 25, 2022
@andrewballantyne
Copy link
Member

I appreciate the breakdown @jedemo -- I wonder in case no2 if we should not show an info alert next to the disabled field stating GPUs are in use, so none are available at this time. Typically I would say not to show things to the user they cannot change, but it may have value to showcase it's just not available at this time. cc @kywalker-rh (more ux questions 🙂 )

@andrewballantyne andrewballantyne added priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. priority/normal An issue with the product; fix when possible and removed priority/normal An issue with the product; fix when possible priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. labels Jul 26, 2022
@kywalker-rh
Copy link

@andrewballantyne Sorry I missed this before. I see your point. I think a simple line of text below the dropdown when its disabled to say "All GPUs are currently in use, try again later." would work for this. I'll double check the wording with Katie.
Screen Shot 2022-08-04 at 2 22 47 PM
.

@andrewballantyne andrewballantyne moved this from In progress to In review in ODH Notebook Controller Aug 4, 2022
@erwangranger
Copy link

erwangranger commented Aug 5, 2022

Just a reminder that we got into the habit of "hiding" things that were not available (GPUs, L/XL container sizes) because in JupyterHub, picking something that was unavailable was very painful: it would get stuck for 10 whole minutes before timing out and letting you pick something else.

The current work is target against Kubeflow Notebook Controller, not JupyterHub. In KFNBC, if you pick a "bad" combination, you can cancel things as soon as you want, and pick something else (or so I'm told).
So displaying something that is not quite available is not as annoying for the user, because they can quickly back out of it and choose something else.
For example ... If a user pick an XL pod size and 1 GPU, it might be that no machine meets both those requirements. So the notebook would never launch. They can cancel things quickly and ask for Small with 1 GPU instead.

If we need to err on one side, it should be on the side of showing more things rather than less. If we hide/disable something, we have to be really sure that it would never work.

@andrewballantyne
Copy link
Member

It'll never be a problem to show or hide an element for the UI. Disabling an element that can never be enabled by the user is a personal gripe of mine. But in the case of GPUs, technically we can fetch a new state mid viewage of the SpawnerPage and get a different result then when they started up the page. So it can sorta auto-correct from disabled to enabled. So in this case hiding it would be the worst of the UI options we have at our finger tips.

Note: Currently we do not keep fetching GPU status, but that doesn't mean we couldn't. I'm talking with Kyle right now on what the "loading" state of our fetch call for the GPU number will look like -- hence my PR being on hold.

@bdattoma
Copy link

sorry to go back on this closed issue, I just have a comment about the text below the dropdown when GPUs are disabled. It says All GPUs are currently in use, try again later. but if no GPU nodes exist in the cluster, no GPU can be actual in use. So it could be misleading for the user. @kywalker-rh what do you think? Maybe sth like 0/No GPUs are currently available would be more generic to cover both the case of no existing GPUs and All the GPUs are busy.

@jkoehler-redhat
Copy link

@bdattoma i believe this addressed the issue? #397

@andrewballantyne
Copy link
Member

It was not @jkoehler-redhat -- Sorry, I have a todo item to reply to this. I spoke with Kyle about it, I think I implemented a partial result -- Not sure we have enough backend functionality to do what we need to do now. I need to investigate.

@bdattoma
Copy link

I think I implemented a partial result

@andrewballantyne what is missing from the current partial implementation?

@andrewballantyne
Copy link
Member

@bdattoma Ah sorry, meant to get back to this before the end of yesterday but forgot.

We do not hide the GPU dropdown when it is completely unavailable. So you noting the wording is part of the side effect of that.

I spoke with Kyle about what needs to be done. I do not know if we are technically setup to do that out of the box, but I need to log a ticket to address the gap (before we get to auto scaling work, which is already logged)

@andrewballantyne
Copy link
Member

If a cluster doesn't have any GPU nodes that can be used (either from active nodes or via nodes that can be triggered by auto-scaling), I don't think we should show the GPU drop down, as it adds unnecessary clutter. In other words, if a customer/cluster has not enabled GPUs, no reason to show the GPU drop down in the spawner.

From Jeff's comment.

This is an omission I made when coding this solution. Logged a bug to handle this: #428

@bdattoma Thanks for catching this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/notebook-controller KubeFlow NoteBook Controller (KFNBC) Feature kind/enhancement New functionality request (existing augments or new additions) priority/normal An issue with the product; fix when possible
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.