Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse forbidden errors to return forbidden actions #1463

Merged
merged 5 commits into from
Jan 22, 2020

Conversation

andresmgot
Copy link
Contributor

@andresmgot andresmgot commented Jan 16, 2020

Description of the change

Follow up of #1450

With Helm 3, we are currently not doing the pre-validation to check if the user has enough permissions. Because of that, we are currently returning the raw error to the user. That's a regression in the UX since with Helm 2 we give a detailed list of roles needed to perform the requested action.

What this PR proposes is to parse the error returned by Helm and give an user-friendly error (like we do in Helm 2) with the permissions needed.

Benefits

When trying to do an operation over an application, we are parsing the error returned by the Helm client, returning all the information that we have:

Screenshot from 2020-01-16 18-01-16

The good thing about this approach is that we are not assuming anything. We are just leaving Helm do its work and forward its response.

Possible drawbacks

From the screenshot above, you can notice that we are returning that the user needs permission to create PVCs and to delete a bunch of things. This is because helm stops processing the chart once one element fails to be installed but then it tries to delete everything (even if it has not been created). This can be an issue with Helm because the error message is a bit misleading (and because we are using atomic installations).

In any case, since we are not pre-checking that the user has permissions to do everything that the chart contains, the user may need to iterate several times over the error page (if the admin gives permissions one by one).

@absoludity
Copy link
Contributor

I do like the idea of just letting helm do the work and we just parse its error, but the experience is still quite a bit worse that we had. Details below:

Possible drawbacks

From the screenshot above, you can notice that we are returning that the user needs permission to create PVCs and to delete a bunch of things. This is because helm stops processing the chart once one element fails to be installed but then it tries to delete everything (even if it has not been created). This can be an issue with Helm because the error message is a bit misleading (and because we are using atomic installations).

I don't think it's misleading to include the required deletes. Just checking locally with helm3, what I see is the following:

# Without atomic:
~$ helm install apache bitnami/apache
Error: services is forbidden: User "system:serviceaccount:default:readonlyuser" cannot create resource "services" in API group "" in the namespace "default"

# With atomic:
~$ helm install apache bitnami/apache --atomic
Error: an error occurred while uninstalling the release. original install error: services is forbidden: User "system:serviceaccount:default:readonlyuser" cannot create resource "services" in API group "" in the namespace "default": uninstallation completed with 3 error(s): services "apache" is forbidden: User "system:serviceaccount:default:readonlyuser" cannot
delete resource "services" in API group "" in the namespace "default"; deployments.apps "apache" is forbidden: User "system:serviceaccount:default:readonlyuser" cannot delete resource "deployments" in API group "apps" in the namespace "default"; uninstall: Failed to purge the release: secrets "sh.helm.release.v1.apache.v1" is forbidden: User "system:serviceaccount:default:readonlyuser" cannot delete resource "secrets" in API group "" in the namespace "default"

So looking at the output from the --atomic command I assume we could just parse the original install error if we wanted (ignoring the uninstallation errors), but realistically, people need to be able to delete releases that they create, even if it's only so that they can iterate... see below:

In any case, since we are not pre-checking that the user has permissions to do everything that the chart contains, the user may need to iterate several times over the error page (if the admin gives permissions one by one).

This is the bigger problem from my perspective, but also the reason why it's important (not misleading) to keep the delete permissions from above. If I go off and fix the one create permission which was presented to me, then try again, I will see:

~$ helm install apache bitnami/apache
Error: cannot re-use a name that is still in use

unless the user either uses a different name for each attempt (which is bad for other reasons), or is able to delete the existing release (ie having those delete perms, including the secret one). This is required for the user to be able to iterate (helm uninstall first, then retry).

What is interesting here is that helm reports all errors during the deletion - iterating all the resources that could have been created (even though they were not). Unfortunately I don't see a way we could use that either (since RBAC can be set for deletions separately of course).

So my ideal situation would be that helm itself verifies the user can create all required resources during an install, and returns a complete error message that we can just use. We can create an issue (and potentially work to fix that, if we want). But in the mean-time, I would be keen to maintain our current behaviour: checking create for all resources + driver (ie. secret), rather than regressing. I don't feel too strongly about it, so if you're keen to push ahead with this while creating the helm issue (and potentially fixing it there later), I'm ok with that too - it's a small regression, not too important.

@andresmgot
Copy link
Contributor Author

So my ideal situation would be that helm itself verifies the user can create all required resources during an install, and returns a complete error message that we can just use. We can create an issue (and potentially work to fix that, if we want). But in the mean-time, I would be keen to maintain our current behaviour: checking create for all resources + driver (ie. secret), rather than regressing. I don't feel too strongly about it, so if you're keen to push ahead with this while creating the helm issue (and potentially fixing it there later), I'm ok with that too - it's a small regression, not too important.

I'd rather start deprecating the current behavior because we are hard-coding the verbs that we check. For example, when creating a release, we check that the user has permissions to create resources, but as the Helm error reports now, it's also needed to be able to delete resources (because of the atomic installation). When upgrading, we check that the user has permissions to create, update and delete while maybe some of those permissions are not needed... In other words, our assumptions don't represent reality 100%. While this has been good enough (no issue has been filled because of that), it's not something I would promote for the future.

As you say, I'd rather open an issue in Helm to check if we can "fix" it there. I can do so once I am done checking everything in this PR.

@andresmgot andresmgot marked this pull request as ready for review January 20, 2020 14:35
Copy link
Contributor

@absoludity absoludity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great testing of the parse function, as much as I don't like parsing strings it's much nicer when it's well tested.

pkg/auth/auth.go Outdated
current = append(current, v)
}
}
return current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function modifies the current slice already - which might be unexpected by a call-site. Why not keep it purely functional (ie. returning a new slice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slices get copied to the function context so modifying current has no effect for the caller: https://play.golang.org/p/NYgKBEaCIk8

In any case, if it's more readable I can create a different variable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slices get copied to the function context so modifying current has no effect for the caller: https://play.golang.org/p/NYgKBEaCIk8

Small change: https://play.golang.org/p/Gu_Ta4jM46n showing that the slice is not being copied and that modifying the slice does have an effect for the caller: it's just the slice descriptor which is copied.

But what I missed above is that in this case we're always returning a new slice descriptor anyway (append creates one), so you're right - it's safe as is :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, got it, good to know

resMap[req] = Action{
APIVersion: action.APIVersion,
Resource: action.Resource,
Namespace: action.Namespace,
Verbs: verbs,
Verbs: uniqVerbs(resMap[req].Verbs, action.Verbs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah - so yes, here, it works, but the call to uniqVerbs is modifying resMap[req].Verbs to the desired value and setting it back. You could instead just do a call to uniqVerbs after you set resMap[req], and it would work currently, though it'd be simpler to reason about if it was just a pure function.


// ParseForbiddenActions parses a forbidden error returned by the Kubernetes API and return the list of forbidden actions
func ParseForbiddenActions(message string) []Action {
re := regexp.MustCompile(`User "(.*?)" cannot (.*?) resource "(.*?)" in API group "(.*?)"(?: in the namespace "(.*?)")?`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a big TODO to remove this with a link to the helm bug, or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we meant different things with what we expect from Helm. My idea is for them to return all the elements that fails when creating a release (not just the first) but we would still need to parse this message with a regexp.

Apparently, you are thinking on completely removing this regexp but what would replace it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I assumed you meant both that helm would return all the elements that fail and that it would do so in a structured way rather than as a big string message. With the new error package it's pretty easy to have an error type which includes a json representation of the error in addition to the textual string, for example. (Or even without the new errors functionality - it's just neater using errors.As or errors.IsA - the wrapping itself wouldn't be necessary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, noted but even if they do that I think some regex will be needed.

@andresmgot andresmgot merged commit 94a9617 into master Jan 22, 2020
@andresmgot andresmgot deleted the parseForbiddenError branch January 22, 2020 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants