Parse forbidden errors to return forbidden actions #1463

andresmgot · 2020-01-16T17:09:57Z

Description of the change

Follow up of #1450

With Helm 3, we are currently not doing the pre-validation to check if the user has enough permissions. Because of that, we are currently returning the raw error to the user. That's a regression in the UX since with Helm 2 we give a detailed list of roles needed to perform the requested action.

What this PR proposes is to parse the error returned by Helm and give an user-friendly error (like we do in Helm 2) with the permissions needed.

Benefits

When trying to do an operation over an application, we are parsing the error returned by the Helm client, returning all the information that we have:

The good thing about this approach is that we are not assuming anything. We are just leaving Helm do its work and forward its response.

Possible drawbacks

From the screenshot above, you can notice that we are returning that the user needs permission to create PVCs and to delete a bunch of things. This is because helm stops processing the chart once one element fails to be installed but then it tries to delete everything (even if it has not been created). This can be an issue with Helm because the error message is a bit misleading (and because we are using atomic installations).

In any case, since we are not pre-checking that the user has permissions to do everything that the chart contains, the user may need to iterate several times over the error page (if the admin gives permissions one by one).

absoludity · 2020-01-16T20:10:43Z

I do like the idea of just letting helm do the work and we just parse its error, but the experience is still quite a bit worse that we had. Details below:

Possible drawbacks

From the screenshot above, you can notice that we are returning that the user needs permission to create PVCs and to delete a bunch of things. This is because helm stops processing the chart once one element fails to be installed but then it tries to delete everything (even if it has not been created). This can be an issue with Helm because the error message is a bit misleading (and because we are using atomic installations).

I don't think it's misleading to include the required deletes. Just checking locally with helm3, what I see is the following:

# Without atomic:
~$ helm install apache bitnami/apache
Error: services is forbidden: User "system:serviceaccount:default:readonlyuser" cannot create resource "services" in API group "" in the namespace "default"

# With atomic:
~$ helm install apache bitnami/apache --atomic
Error: an error occurred while uninstalling the release. original install error: services is forbidden: User "system:serviceaccount:default:readonlyuser" cannot create resource "services" in API group "" in the namespace "default": uninstallation completed with 3 error(s): services "apache" is forbidden: User "system:serviceaccount:default:readonlyuser" cannot
delete resource "services" in API group "" in the namespace "default"; deployments.apps "apache" is forbidden: User "system:serviceaccount:default:readonlyuser" cannot delete resource "deployments" in API group "apps" in the namespace "default"; uninstall: Failed to purge the release: secrets "sh.helm.release.v1.apache.v1" is forbidden: User "system:serviceaccount:default:readonlyuser" cannot delete resource "secrets" in API group "" in the namespace "default"

So looking at the output from the --atomic command I assume we could just parse the original install error if we wanted (ignoring the uninstallation errors), but realistically, people need to be able to delete releases that they create, even if it's only so that they can iterate... see below:

In any case, since we are not pre-checking that the user has permissions to do everything that the chart contains, the user may need to iterate several times over the error page (if the admin gives permissions one by one).

This is the bigger problem from my perspective, but also the reason why it's important (not misleading) to keep the delete permissions from above. If I go off and fix the one create permission which was presented to me, then try again, I will see:

~$ helm install apache bitnami/apache
Error: cannot re-use a name that is still in use

unless the user either uses a different name for each attempt (which is bad for other reasons), or is able to delete the existing release (ie having those delete perms, including the secret one). This is required for the user to be able to iterate (helm uninstall first, then retry).

What is interesting here is that helm reports all errors during the deletion - iterating all the resources that could have been created (even though they were not). Unfortunately I don't see a way we could use that either (since RBAC can be set for deletions separately of course).

So my ideal situation would be that helm itself verifies the user can create all required resources during an install, and returns a complete error message that we can just use. We can create an issue (and potentially work to fix that, if we want). But in the mean-time, I would be keen to maintain our current behaviour: checking create for all resources + driver (ie. secret), rather than regressing. I don't feel too strongly about it, so if you're keen to push ahead with this while creating the helm issue (and potentially fixing it there later), I'm ok with that too - it's a small regression, not too important.

andresmgot · 2020-01-17T09:16:50Z

So my ideal situation would be that helm itself verifies the user can create all required resources during an install, and returns a complete error message that we can just use. We can create an issue (and potentially work to fix that, if we want). But in the mean-time, I would be keen to maintain our current behaviour: checking create for all resources + driver (ie. secret), rather than regressing. I don't feel too strongly about it, so if you're keen to push ahead with this while creating the helm issue (and potentially fixing it there later), I'm ok with that too - it's a small regression, not too important.

I'd rather start deprecating the current behavior because we are hard-coding the verbs that we check. For example, when creating a release, we check that the user has permissions to create resources, but as the Helm error reports now, it's also needed to be able to delete resources (because of the atomic installation). When upgrading, we check that the user has permissions to create, update and delete while maybe some of those permissions are not needed... In other words, our assumptions don't represent reality 100%. While this has been good enough (no issue has been filled because of that), it's not something I would promote for the future.

As you say, I'd rather open an issue in Helm to check if we can "fix" it there. I can do so once I am done checking everything in this PR.

absoludity

Great testing of the parse function, as much as I don't like parsing strings it's much nicer when it's well tested.

absoludity · 2020-01-20T23:23:04Z

pkg/auth/auth.go

+			current = append(current, v)
+		}
+	}
+	return current


This function modifies the current slice already - which might be unexpected by a call-site. Why not keep it purely functional (ie. returning a new slice).

slices get copied to the function context so modifying current has no effect for the caller: https://play.golang.org/p/NYgKBEaCIk8

In any case, if it's more readable I can create a different variable

slices get copied to the function context so modifying current has no effect for the caller: https://play.golang.org/p/NYgKBEaCIk8

Small change: https://play.golang.org/p/Gu_Ta4jM46n showing that the slice is not being copied and that modifying the slice does have an effect for the caller: it's just the slice descriptor which is copied.

But what I missed above is that in this case we're always returning a new slice descriptor anyway (append creates one), so you're right - it's safe as is :)

ah, got it, good to know

absoludity · 2020-01-20T23:30:20Z

pkg/auth/auth.go

 			resMap[req] = Action{
 				APIVersion: action.APIVersion,
 				Resource:   action.Resource,
 				Namespace:  action.Namespace,
-				Verbs:      verbs,
+				Verbs:      uniqVerbs(resMap[req].Verbs, action.Verbs),


Woah - so yes, here, it works, but the call to uniqVerbs is modifying resMap[req].Verbs to the desired value and setting it back. You could instead just do a call to uniqVerbs after you set resMap[req], and it would work currently, though it'd be simpler to reason about if it was just a pure function.

absoludity · 2020-01-20T23:31:28Z

pkg/auth/auth.go

+
+// ParseForbiddenActions parses a forbidden error returned by the Kubernetes API and return the list of forbidden actions
+func ParseForbiddenActions(message string) []Action {
+	re := regexp.MustCompile(`User "(.*?)" cannot (.*?) resource "(.*?)" in API group "(.*?)"(?: in the namespace "(.*?)")?`)


Please include a big TODO to remove this with a link to the helm bug, or similar?

I think we meant different things with what we expect from Helm. My idea is for them to return all the elements that fails when creating a release (not just the first) but we would still need to parse this message with a regexp.

Apparently, you are thinking on completely removing this regexp but what would replace it?

Yes, I assumed you meant both that helm would return all the elements that fail and that it would do so in a structured way rather than as a big string message. With the new error package it's pretty easy to have an error type which includes a json representation of the error in addition to the textual string, for example. (Or even without the new errors functionality - it's just neater using errors.As or errors.IsA - the wrapping itself wouldn't be necessary).

okay, noted but even if they do that I think some regex will be needed.

Parse forbidden errors to return forbidden actions

e215005

andresmgot requested a review from absoludity January 16, 2020 17:10

Andres Martinez Gotor added 2 commits January 20, 2020 10:48

Merge branch 'master' into parseForbiddenError

b391aa4

Clean up code and tests

f230af3

andresmgot marked this pull request as ready for review January 20, 2020 14:35

absoludity approved these changes Jan 20, 2020

View reviewed changes

Andres Martinez Gotor added 2 commits January 21, 2020 10:45

Merge branch 'master' into parseForbiddenError

f88b6a3

Review

b1e781b

andresmgot merged commit 94a9617 into master Jan 22, 2020

andresmgot deleted the parseForbiddenError branch January 22, 2020 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse forbidden errors to return forbidden actions #1463

Parse forbidden errors to return forbidden actions #1463

andresmgot commented Jan 16, 2020 •

edited

Loading

absoludity commented Jan 16, 2020

Possible drawbacks

andresmgot commented Jan 17, 2020

absoludity left a comment

absoludity Jan 20, 2020

andresmgot Jan 21, 2020

absoludity Jan 21, 2020

andresmgot Jan 22, 2020

absoludity Jan 20, 2020

absoludity Jan 20, 2020

andresmgot Jan 21, 2020

absoludity Jan 21, 2020

andresmgot Jan 22, 2020

Parse forbidden errors to return forbidden actions #1463

Parse forbidden errors to return forbidden actions #1463

Conversation

andresmgot commented Jan 16, 2020 • edited Loading

Description of the change

Benefits

Possible drawbacks

absoludity commented Jan 16, 2020

Possible drawbacks

andresmgot commented Jan 17, 2020

absoludity left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andresmgot commented Jan 16, 2020 •

edited

Loading