Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319

mturley · 2024-01-03T23:30:09Z

Description

When a user has "edit" permission on a data science project they don't own, they don't have permission to update that namespace. When adding the first model or model server to a project, the namespace needs to be updated by addSupportServingPlatformProject to add labels that describe which serving platform the project is using, which fails with a 403 error for users in this position (as expected). Currently, even though this namespace update fails, the ServingRuntime gets created anyway. It does not become visible in the UI because it still shows the platform-selection view (choosing single- or multi-model serving), but it is visible in the cluster.

The submitServingRuntimeResources function that is called to make these updates and create the ServingRuntime currently operates like this:

Execute a dry-run of the operations concurrently with a Promise.all.
If no error in the dry-run, execute the same operations with another Promise.all, but this time include the namespace update (because the namespace update performed by addSupportServingPlatformProject has no dry-run option).

This means that if addSupportServingPlatformProject fails, the other requests will run anyway. The fix is to add a step in between the dry-run and the actual run of these requests where we perform the namespace update before proceeding, which means if it fails the ServingRuntime will not be created. As part of the fix, we convert the submitServingRuntimeResources to an async function so we can use await for readability.

There are also a few other notable changes:

Removes the allowCreate condition on updating the namespace, which was a relic of an earlier workaround that is no longer needed (@lucferbux can provide more context on this -- thanks for your help Lucas!).
Changes the applyNamespaceChange function on the backend to perform a more accurate selfSubjectAccessReview: users with "edit" permission should still be able to create ServingRuntimes when the namespace update is not needed (a serving platform has already been chosen for the project), so we check for that permission only for the MODEL_MESH_PROMOTION and KSERVE_PROMOTION cases.
When deploying a Kserve model, there was another Promise.all for concurrently calling submitServingRuntimeResources and submitInferenceServiceResource, which means if the error described above happens here the InferenceService will also still be created. Changing this to be a sequential .then also fixes that problem. (Edit: I hadn't realized this was the separate issue RHOAIENG-556, so this PR fixes that now too :))
Adds a new throwErrorFromAxios utility and uses it in addSupportServingPlatformProject (and in createProject, the only other place we're directly calling axios()). This will ensure that if an error thrown by the axios() call has a response object containing a message from the backend, that message will be used instead of the top-level message property from Axios ("Request failed with status code X"). If the error being displayed does not contain an inner error message in this expected structure, the current behavior is retained and the top-level error.message is used.

Considering that last point above, the error message that used to look like this:

now looks like this:

@vconzola, does that error text look good to you?

How Has This Been Tested?

With @lucferbux's help, I reproduced the error in a cluster and verified the fix by using the PR image in that cluster. Testing locally gave us trouble because apparently an impersonated regular user cannot list templates in a namespace they don't own (which does not seem to match the behavior in production), so we can't get the "Models and model servers" section of the data science project page to render when trying to reproduce the bug this way.

Test Impact

New Cypress tests have been added to submit the "deploy model" and "add model server" modals and intercept their requests, ensuring that the request to create ServingRuntimes and InferenceServices are not made when there is an error updating the namespace.

Request review criteria:

Self checklist (all need to be checked):

The developer has manually tested the changes and verified that the changes work
Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

If you have UI changes:

Included any necessary screenshots or gifs if it was a UI change.
Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

The developer has tested their solution on a cluster by using the image produced by the PR to main

openshift-ci · 2024-01-03T23:30:20Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

lucferbux

Everything looks good to me, just take a look at the package-lock.json situation (I mean, it could be something expected but it's kinda weird).
You can undraft it if you want!

backend/src/routes/api/namespaces/namespaceUtils.ts

frontend/package-lock.json

mturley · 2024-01-09T16:16:28Z

Sorry for the delay here, I'm back from PTO. @lucferbux I only still have this in draft because I figured it should have updated unit tests before it's ready to merge. Getting back to that now.

mturley · 2024-01-09T19:22:52Z

/test all

mturley · 2024-01-09T19:23:57Z

Hey @andrewballantyne @lucferbux, quick question here: I've been poking around and reading code trying to figure out how to mock things correctly to test the submitServingRuntimeResources function in frontend/src/pages/modelServing/screens/projects/__tests__/utils.spec.ts by somehow mocking a user with and without the appropriate permissions and then calling the function and inspecting the returned promise(s). Do you know if I'm on the right track there? I'm a bit stuck on that, and it may be telling that there aren't any tests for that function yet. Maybe instead I should add an integration test for ManageServingRuntimeModal or ManageKServeModal. Or I wonder if we should just let this PR merge without new tests.

andrewballantyne · 2024-01-09T19:34:46Z

We don't have e2e tests... why do you need to mock user permissions? @christianvogt can you help out here?

EDIT: I bumped this to slack partially as well... I'm feeling there isn't a lot we can do here, but Christian should have the final say in it.

mturley · 2024-01-09T19:35:32Z

It also looks like the accessibility tests are failing to run in CI but they pass locally :(

mturley · 2024-01-09T19:37:53Z

We don't have e2e tests... why do you need to mock user permissions? @christianvogt can you help out here?

I figured since the fix here has to do with behavior being different for users with different permissions, I would want to mock each of the two cases (should create model server if user can update the namespace, should not create model server if user cannot update the namespace). My apologies for my inexperience with the repo getting in the way here.

christianvogt · 2024-01-09T20:45:04Z

@mturley as far as mocking goes. Use jest to mock whatever you need on the frontend being used by the function you're testing. It's sometimes best to mock down to the k8s apis from @openshift/dynamic-plugin-sdk-utils - eg k8sListResource, k8sGetResource, etc...

But sometimes those mocks end up being very deep in the call tree and therefore you may want to mock another function that's used directly by your function. The choice is yours.

The other option is to add a new cypress test where you'd end up mocking the network calls.

mturley · 2024-01-09T21:01:06Z

/test all

mturley · 2024-01-09T21:02:02Z

Ok thanks @christianvogt , I'll keep playing with it.

lucferbux · 2024-01-09T22:10:22Z

@mturley @christianvogt I wouldn't wast too much time trying to mock e2e within the backend, the main point i would cover here would be this call. Just try to mock it making sure it's failing so we can prove the rest of the requests are not being executed.

andrewballantyne · 2024-01-16T17:00:02Z

backend/src/routes/api/namespaces/namespaceUtils.ts

    default:
      throw createCustomError('Unknown configuration', 'Cannot apply namespace change', 400);
  }

+  const selfSubjectAccessReview = await checkPermissionsFn(fastify, request, name);
+  if (isK8sStatus(selfSubjectAccessReview)) {


It'll fall into the next if-statement after this -- but I am wondering if we want to put a "dev setup" if statement first -- that basically says "checkPermissionsFn is null -- you need to set permissions for all actions" -- basically as a "fail immediately" sorta deal with a dev message so the dev knows the new case they added needs to have a check permissions.

Just my two cents. It should fail in some fashion on the checkPermissionsFn() call but I dunno how that will look.

Ah, fair enough. Just another throw createCustomError maybe? 500 error code I guess, since it shouldn't ever happen? It would be impossible to hit with the current code but that's a good point about future switch cases. Alternatively though we could just have it fall back to checkAdminNamespacePermission which is the behavior in all cases before this PR? I'm leaning towards your idea.

Yeah, like a 500 error "invalid backend state -- dev broken workflow" or something like that. Something annoying and blunt it was a dev setup. It will be easy to search in the codebase and should prevent anything from being poorly configured.

It should never happen in production, but it won't mask when code is poorly written -- so it gets caught in the PR for the change. Or in our test infra when we have that going nightly or whatever.

Pushed with this change as well.

frontend/src/concepts/dashboard/DashboardModalFooter.tsx

...nd/src/pages/modelServing/screens/projects/ServingRuntimeModal/ManageServingRuntimeModal.tsx

vconzola · 2024-01-16T18:59:46Z

@mturley Can you make the error message more specific by replacing "labels" with "serving platform labels". A normal DS might not know what a label is, but they'll know it has something to do with a serving platform and their permissions. Also, put a period at the end.

mturley · 2024-01-16T20:44:21Z

@andrewballantyne I've rebase and pushed with changes to address your feedback. Updated the PR description to match. Let me know what you think of the new error message approach.

@vconzola you got it, updated the message and the screenshot.

mturley · 2024-01-16T20:46:23Z

Fixed some lint errors I missed. Should be ready for re-review.

…ix promise logic in serving runtime create Signed-off-by: Mike Turley <mike.turley@alum.cs.umass.edu>

mturley · 2024-01-17T16:19:42Z

Rebased and addressed remaining comments.

backend/src/routes/api/namespaces/namespaceUtils.ts

lucferbux

/lgtm

Ok, after andrew's suggestion, I think is ready to go, the message is descriptive enough to get a grasp of what's happening. Works great in both kserve and modelmesh, great work!

openshift-ci · 2024-01-17T17:27:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lucferbux

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lucferbux]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Mike Turley <mike.turley@alum.cs.umass.edu>

Followup to PR #2319 - permission check typo

openshift-ci bot added the do-not-merge/work-in-progress This PR is in WIP state label Jan 3, 2024

lucferbux previously requested changes Jan 4, 2024

View reviewed changes

backend/src/routes/api/namespaces/namespaceUtils.ts Outdated Show resolved Hide resolved

frontend/package-lock.json Outdated Show resolved Hide resolved

openshift-ci bot assigned lucferbux Jan 4, 2024

mturley force-pushed the 548-deploy-model-403-error branch 4 times, most recently from 3ceb1e7 to ba6a311 Compare January 9, 2024 16:12

mturley marked this pull request as ready for review January 9, 2024 19:28

openshift-ci bot removed the do-not-merge/work-in-progress This PR is in WIP state label Jan 9, 2024

openshift-ci bot requested review from dpanshug and manaswinidas January 9, 2024 19:28

mturley requested a review from lucferbux January 9, 2024 19:29

mturley force-pushed the 548-deploy-model-403-error branch from ba6a311 to 07ab929 Compare January 11, 2024 00:33

mturley changed the title ~~Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic in serving runtime create~~ [WIP] Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic in serving runtime create Jan 11, 2024

openshift-ci bot added the do-not-merge/work-in-progress This PR is in WIP state label Jan 11, 2024

mturley force-pushed the 548-deploy-model-403-error branch 4 times, most recently from cfc7243 to 39dc5e2 Compare January 15, 2024 19:19

andrewballantyne requested changes Jan 16, 2024

View reviewed changes

openshift-ci bot assigned andrewballantyne Jan 16, 2024

mturley force-pushed the 548-deploy-model-403-error branch 4 times, most recently from c88568e to d93dbd3 Compare January 16, 2024 20:42

mturley force-pushed the 548-deploy-model-403-error branch 2 times, most recently from 8f1a68a to 3e49d71 Compare January 16, 2024 20:45

mturley force-pushed the 548-deploy-model-403-error branch from 3e49d71 to c07bcb8 Compare January 16, 2024 20:47

mturley requested a review from andrewballantyne January 16, 2024 20:49

mturley force-pushed the 548-deploy-model-403-error branch 2 times, most recently from 3ba54a7 to 1652a56 Compare January 16, 2024 21:19

Assert edit permissions on appropriate mutations in namespaceUtils, f…

739adc8

…ix promise logic in serving runtime create Signed-off-by: Mike Turley <mike.turley@alum.cs.umass.edu>

mturley force-pushed the 548-deploy-model-403-error branch from 1652a56 to 739adc8 Compare January 17, 2024 15:57

lucferbux reviewed Jan 17, 2024

View reviewed changes

backend/src/routes/api/namespaces/namespaceUtils.ts Show resolved Hide resolved

lucferbux approved these changes Jan 17, 2024

View reviewed changes

openshift-ci bot added the lgtm label Jan 17, 2024

openshift-ci bot added the approved label Jan 17, 2024

openshift-merge-bot bot merged commit e328be0 into opendatahub-io:main Jan 17, 2024
6 checks passed

mturley deleted the 548-deploy-model-403-error branch January 17, 2024 18:21

mturley added a commit to mturley/odh-dashboard that referenced this pull request Jan 18, 2024

Followup to PR opendatahub-io#2319 - permission check typo

a3253cf

Signed-off-by: Mike Turley <mike.turley@alum.cs.umass.edu>

mturley mentioned this pull request Jan 18, 2024

Followup to PR #2319 - permission check typo #2357

Merged

7 tasks

openshift-merge-bot bot added a commit that referenced this pull request Jan 18, 2024

Merge pull request #2357 from mturley/rhoaieng-548-followup

c795e54

Followup to PR #2319 - permission check typo

mturley mentioned this pull request Jan 25, 2024

[WIP] Followup to fix RHOAIENG-556 correctly - Properly dry-run all operations in KServe modal before submitting #2403

Closed

7 tasks

dpanshug mentioned this pull request Feb 7, 2024

Properly dry-run all operations in KServe modal before submitting #2470

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319

Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319

mturley commented Jan 3, 2024 •

edited

openshift-ci bot commented Jan 3, 2024

lucferbux left a comment

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

andrewballantyne commented Jan 9, 2024 •

edited

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

christianvogt commented Jan 9, 2024

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

lucferbux commented Jan 9, 2024

andrewballantyne Jan 16, 2024

mturley Jan 16, 2024 •

edited

andrewballantyne Jan 16, 2024

mturley Jan 16, 2024

vconzola commented Jan 16, 2024

mturley commented Jan 16, 2024

mturley commented Jan 16, 2024

mturley commented Jan 17, 2024

lucferbux left a comment

openshift-ci bot commented Jan 17, 2024

Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319

Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319

Conversation

mturley commented Jan 3, 2024 • edited

Description

How Has This Been Tested?

Test Impact

Request review criteria:

openshift-ci bot commented Jan 3, 2024

lucferbux left a comment

Choose a reason for hiding this comment

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

andrewballantyne commented Jan 9, 2024 • edited

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

christianvogt commented Jan 9, 2024

mturley commented Jan 9, 2024

mturley commented Jan 9, 2024

lucferbux commented Jan 9, 2024

andrewballantyne Jan 16, 2024

Choose a reason for hiding this comment

mturley Jan 16, 2024 • edited

Choose a reason for hiding this comment

andrewballantyne Jan 16, 2024

Choose a reason for hiding this comment

mturley Jan 16, 2024

Choose a reason for hiding this comment

vconzola commented Jan 16, 2024

mturley commented Jan 16, 2024

mturley commented Jan 16, 2024

mturley commented Jan 17, 2024

lucferbux left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 17, 2024

mturley commented Jan 3, 2024 •

edited

andrewballantyne commented Jan 9, 2024 •

edited

mturley Jan 16, 2024 •

edited