Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issues in Model Serving Global #2334

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

lucferbux
Copy link
Contributor

Description

Closes https://issues.redhat.com/browse/RHOAIENG-549

How Has This Been Tested?

As a cluster admin user

  1. Open the network tab on Chrome
  2. Go to Model Serving Global
  3. Select "All projects"
  4. Check that it's listing InferenceServices and Servingruntimes are being listed as cluster wide fetches

As a regular admin user

  1. Open the network tab on Chrome
  2. Go to Model Serving Global
  3. Select "All projects"
  4. Check that it first requests all projects and then it does several calls to fecth all InferenceServices and ServingRuntimes in those projects

General

  1. Go to Model Serving Global
  2. Select "All projects"
  3. Check that you can cancel the request

test

Test Impact

Added e2e tests

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@lucferbux
Copy link
Contributor Author

@vconzola let me know if this is what you had in mind to cancel the "All projects" request

@vconzola
Copy link

@lucferbux The video goes by really quickly. Is the gist that you're showing a spinner until the first pieces of data come back then you're loading the table with skeletons in cells where the data is still being fetched?

@lucferbux
Copy link
Contributor Author

@lucferbux The video goes by really quickly. Is the gist that you're showing a spinner until the first pieces of data come back then you're loading the table with skeletons in cells where the data is still being fetched?

Yes, the flow will be:

  1. You go to "All projects"
  2. You'll have the loading skeleton with the "Cancel" button
  3. If you press it, you'll be redirected to the last namespace/preferred namespace
  4. You'll get that namespace loaded

@mturley
Copy link
Contributor

mturley commented Jan 17, 2024

@vconzola here's that loading state with the cancel button (used throttling in the network dev tools to make it load slowly)

Screenshot 2024-01-17 at 5 40 14 PM

Copy link
Contributor

@mturley mturley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and works as described. I did notice with throttled dev tools that when you press the cancel button to navigate away it doesn't actually cancel the pending network requests (as far as I can tell) and they continue in the background. Maybe that's fine? I imagine it would be a pain to figure out how to actually send a cancel signal to them (I haven't looked at the fetch implementation).

That's probably out of scope here, so this looks good ✅

@vconzola
Copy link

@mturley Thanks, but not sure what that screenshot is showing me. I guess I don't understand the gist of the issue or what was done to fix it. Is the problem that the user was stuck on the loading page with no way to cancel out if fetching the data from all projects takes too long? Or is the problem that we're waiting for all data from all projects to be returned before exiting the loading page? If it's the latter we should close the loading page without canceling the fetch requests as soon as the first data is returned from a project and just show skeletons in the table cells while we're waiting for the rest of the data. Is something like that possible?

@mturley
Copy link
Contributor

mturley commented Jan 18, 2024

@vconzola I'm not sure exactly, that's a question for @lucferbux . I just figured I'd share a screenshot of a state that's hard to see in that video. From what I can tell here there's nothing that affects the loading skeleton state within the displayed table itself, this change just makes the fetches themselves more efficient and adds the cancel button to the existing spinner in case it still takes too long.

@lucferbux
Copy link
Contributor Author

@mturley Thanks, but not sure what that screenshot is showing me. I guess I don't understand the gist of the issue or what was done to fix it. Is the problem that the user was stuck on the loading page with no way to cancel out if fetching the data from all projects takes too long? Or is the problem that we're waiting for all data from all projects to be returned before exiting the loading page? If it's the latter we should close the loading page without canceling the fetch requests as soon as the first data is returned from a project and just show skeletons in the table cells while we're waiting for the rest of the data. Is something like that possible?

@vconzola I'm sorry to confuse you, don't worry about the implementation, the UI will do the following:

  1. Show you the cancel button in "All projects"
  2. If you press cancel it will navigate to the next project

If that's ok we can lgtm this pr

@lucferbux
Copy link
Contributor Author

@vconzola I'm not sure exactly, that's a question for @lucferbux . I just figured I'd share a screenshot of a state that's hard to see in that video. From what I can tell here there's nothing that affects the loading skeleton state within the displayed table itself, this change just makes the fetches themselves more efficient and adds the cancel button to the existing spinner in case it still takes too long.

Yes, you are right, I took a look at that the other day, but that's something we might wanna fix in a follow up issue, it's not related to this enhancement (more or less) and it's more of an structural issue, we are not implementing well the useFetchData so we don't abort the requests on change, I've just tested it and it seems to be working great in pipelines, but not here.

@lucferbux
Copy link
Contributor Author

I've created https://issues.redhat.com/browse/RHOAIENG-2002 as a follow up, we need to fix this asap.

@vconzola
Copy link

vconzola commented Jan 19, 2024

If you press cancel it will navigate to the next project

Not sure what you mean by 'next project". My expectation would be (1) if the "all projects" page is loading from another page link (if that's even possible) or from the main nav link, then Cancel will navigate to the first project in the dropdown list; (2) if the "all projects" page is loading from being selected in the dropdown, then Cancel will navigate to the previous project view that was displayed. But if that's not the way it works, and you need to merge today, we can get it in a follow-up PR.

So consider this my LGTM for this PR.

@andrewballantyne
Copy link
Member

@lucferbux we may want to use history back -- it's a common practice in OpenShift Console to do a history back when canceling out of a form -- but it has drawbacks -- like if someone navigates from a bookmark or copy paste of a URL -- it can be awkward to kick them outside of our app or to the empty state (or just flat out doesn't work).

But I think we can improve on this with another PR.

@lucferbux
Copy link
Contributor Author

@lucferbux we may want to use history back -- it's a common practice in OpenShift Console to do a history back when canceling out of a form -- but it has drawbacks -- like if someone navigates from a bookmark or copy paste of a URL -- it can be awkward to kick them outside of our app or to the empty state (or just flat out doesn't work).

But I think we can improve on this with another PR.

Ok, great, let me log another issue for that, I think that's the preferred behaviour, but we can improve it in the next version.

@lucferbux
Copy link
Contributor Author

/approve

Copy link
Contributor

openshift-ci bot commented Jan 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lucferbux, mturley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 2dd7f08 into opendatahub-io:main Jan 19, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants