-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319
Assert edit permissions on appropriate mutations in namespaceUtils, fix promise logic when deploying a model or adding a model server, show inner error messages from the backend when present #2319
Conversation
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me, just take a look at the package-lock.json situation (I mean, it could be something expected but it's kinda weird).
You can undraft it if you want!
3ceb1e7
to
ba6a311
Compare
Sorry for the delay here, I'm back from PTO. @lucferbux I only still have this in draft because I figured it should have updated unit tests before it's ready to merge. Getting back to that now. |
/test all |
Hey @andrewballantyne @lucferbux, quick question here: I've been poking around and reading code trying to figure out how to mock things correctly to test the |
We don't have e2e tests... why do you need to mock user permissions? @christianvogt can you help out here? EDIT: I bumped this to slack partially as well... I'm feeling there isn't a lot we can do here, but Christian should have the final say in it. |
It also looks like the accessibility tests are failing to run in CI but they pass locally :( |
I figured since the fix here has to do with behavior being different for users with different permissions, I would want to mock each of the two cases (should create model server if user can update the namespace, should not create model server if user cannot update the namespace). My apologies for my inexperience with the repo getting in the way here. |
@mturley as far as mocking goes. Use jest to mock whatever you need on the frontend being used by the function you're testing. It's sometimes best to mock down to the k8s apis from But sometimes those mocks end up being very deep in the call tree and therefore you may want to mock another function that's used directly by your function. The choice is yours. The other option is to add a new cypress test where you'd end up mocking the network calls. |
/test all |
Ok thanks @christianvogt , I'll keep playing with it. |
@mturley @christianvogt I wouldn't wast too much time trying to mock e2e within the backend, the main point i would cover here would be this call. Just try to mock it making sure it's failing so we can prove the rest of the requests are not being executed. |
ba6a311
to
07ab929
Compare
cfc7243
to
39dc5e2
Compare
default: | ||
throw createCustomError('Unknown configuration', 'Cannot apply namespace change', 400); | ||
} | ||
|
||
const selfSubjectAccessReview = await checkPermissionsFn(fastify, request, name); | ||
if (isK8sStatus(selfSubjectAccessReview)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll fall into the next if-statement after this -- but I am wondering if we want to put a "dev setup" if statement first -- that basically says "checkPermissionsFn
is null -- you need to set permissions for all actions" -- basically as a "fail immediately" sorta deal with a dev message so the dev knows the new case they added needs to have a check permissions.
Just my two cents. It should fail in some fashion on the checkPermissionsFn()
call but I dunno how that will look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough. Just another throw createCustomError
maybe? 500 error code I guess, since it shouldn't ever happen? It would be impossible to hit with the current code but that's a good point about future switch cases. Alternatively though we could just have it fall back to checkAdminNamespacePermission
which is the behavior in all cases before this PR? I'm leaning towards your idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, like a 500 error "invalid backend state -- dev broken workflow" or something like that. Something annoying and blunt it was a dev setup. It will be easy to search in the codebase and should prevent anything from being poorly configured.
It should never happen in production, but it won't mask when code is poorly written -- so it gets caught in the PR for the change. Or in our test infra when we have that going nightly or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed with this change as well.
...nd/src/pages/modelServing/screens/projects/ServingRuntimeModal/ManageServingRuntimeModal.tsx
Outdated
Show resolved
Hide resolved
@mturley Can you make the error message more specific by replacing "labels" with "serving platform labels". A normal DS might not know what a label is, but they'll know it has something to do with a serving platform and their permissions. Also, put a period at the end. |
c88568e
to
d93dbd3
Compare
@andrewballantyne I've rebase and pushed with changes to address your feedback. Updated the PR description to match. Let me know what you think of the new error message approach. @vconzola you got it, updated the message and the screenshot. |
8f1a68a
to
3e49d71
Compare
Fixed some lint errors I missed. Should be ready for re-review. |
3e49d71
to
c07bcb8
Compare
3ba54a7
to
1652a56
Compare
…ix promise logic in serving runtime create Signed-off-by: Mike Turley <mike.turley@alum.cs.umass.edu>
1652a56
to
739adc8
Compare
Rebased and addressed remaining comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lucferbux The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Mike Turley <mike.turley@alum.cs.umass.edu>
Followup to PR #2319 - permission check typo
Resolves RHOAIENG-548 and RHOAIENG-556.
Description
When a user has "edit" permission on a data science project they don't own, they don't have permission to update that namespace. When adding the first model or model server to a project, the namespace needs to be updated by
addSupportServingPlatformProject
to add labels that describe which serving platform the project is using, which fails with a 403 error for users in this position (as expected). Currently, even though this namespace update fails, the ServingRuntime gets created anyway. It does not become visible in the UI because it still shows the platform-selection view (choosing single- or multi-model serving), but it is visible in the cluster.The
submitServingRuntimeResources
function that is called to make these updates and create the ServingRuntime currently operates like this:Promise.all
.Promise.all
, but this time include the namespace update (because the namespace update performed byaddSupportServingPlatformProject
has no dry-run option).This means that if
addSupportServingPlatformProject
fails, the other requests will run anyway. The fix is to add a step in between the dry-run and the actual run of these requests where we perform the namespace update before proceeding, which means if it fails the ServingRuntime will not be created. As part of the fix, we convert thesubmitServingRuntimeResources
to an async function so we can useawait
for readability.There are also a few other notable changes:
allowCreate
condition on updating the namespace, which was a relic of an earlier workaround that is no longer needed (@lucferbux can provide more context on this -- thanks for your help Lucas!).applyNamespaceChange
function on the backend to perform a more accurateselfSubjectAccessReview
: users with "edit" permission should still be able to create ServingRuntimes when the namespace update is not needed (a serving platform has already been chosen for the project), so we check for that permission only for theMODEL_MESH_PROMOTION
andKSERVE_PROMOTION
cases.Promise.all
for concurrently callingsubmitServingRuntimeResources
andsubmitInferenceServiceResource
, which means if the error described above happens here the InferenceService will also still be created. Changing this to be a sequential.then
also fixes that problem. (Edit: I hadn't realized this was the separate issue RHOAIENG-556, so this PR fixes that now too :))throwErrorFromAxios
utility and uses it inaddSupportServingPlatformProject
(and increateProject
, the only other place we're directly callingaxios()
). This will ensure that if an error thrown by theaxios()
call has a response object containing a message from the backend, that message will be used instead of the top-levelmessage
property from Axios ("Request failed with status code X"). If the error being displayed does not contain an inner error message in this expected structure, the current behavior is retained and the top-levelerror.message
is used.Considering that last point above, the error message that used to look like this:
now looks like this:
@vconzola, does that error text look good to you?
How Has This Been Tested?
With @lucferbux's help, I reproduced the error in a cluster and verified the fix by using the PR image in that cluster. Testing locally gave us trouble because apparently an impersonated regular user cannot list templates in a namespace they don't own (which does not seem to match the behavior in production), so we can't get the "Models and model servers" section of the data science project page to render when trying to reproduce the bug this way.
Test Impact
New Cypress tests have been added to submit the "deploy model" and "add model server" modals and intercept their requests, ensuring that the request to create ServingRuntimes and InferenceServices are not made when there is an error updating the namespace.
Request review criteria:
Self checklist (all need to be checked):
If you have UI changes:
After the PR is posted & before it merges:
main