Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helm revoke functionality #50

Merged

Conversation

@bwhaley
Copy link
Contributor

commented May 21, 2019

Hi team, this PR adds the kubergrunt helm revoke feature to kubergrunt. I've started by updating README.md with clarity on precisely what this feature does (and does not do). Does this seem along the lines of what we'd want for this feature?

@yorinasub17
Copy link
Contributor

left a comment

These changes and what the command should do make sense given your research! Can you also update the the corresponding section in the HELM_GUIDE too? Thanks!

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved

bwhaley and others added some commits May 21, 2019

Update README.md
Co-Authored-By: Yoriyasu Yano <430092+yorinasub17@users.noreply.github.com>
HELM_GUIDE.md Show resolved Hide resolved
Implements helm revoke
Adds the 'kubergrunt helm revoke' subcommand to delete a role, role binding,
and TLS key pair secret from a given set of RBAC entities. Adds a test to the
existing Deploy/Configure/Undeploy test since the revocation fits cleanly. Also
bumping tiller version to latest.
@yorinasub17
Copy link
Contributor

left a comment

Looks good for a first pass! Have some suggestions on improving the experience, but overall solid work!

.circleci/config.yml Outdated Show resolved Hide resolved
cmd/helm.go Show resolved Hide resolved
helm/deploy_test.go Outdated Show resolved Hide resolved
helm/revoke.go Outdated Show resolved Hide resolved
helm/revoke.go Outdated Show resolved Hide resolved

emptyOptions := &metav1.DeleteOptions{}
roleName := getTillerAccessRoleName(rbacEntity.EntityID(), tillerNamespace)
err = client.RbacV1().Roles(tillerNamespace).Delete(roleName, emptyOptions)

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 21, 2019

Contributor

I believe this will error if the object doesn't exist. Might be better to check if it exists, and if it doesn't, continue (logging a warning of course). This way, you can rerun the command for retryable errors (e.g rate limiting).

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 21, 2019

Contributor

Don't forget to do this for all the delete calls!

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 21, 2019

Author Contributor

Sounds good, I will update these to log warnings and move on. E.g.

err = client.RbacV1().Roles(tillerNamespace).Delete(roleName, emptyOptions)
if err != nil {
	logger.Warningf("Unable to delete RBAC role: %s", err)
} else {
	logger.Infof("Successfully deleted RBAC role %s", roleName)
}

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 22, 2019

Contributor

Oh can you actually specifically check if the object exists and only skip if the object check fails? Best would be to introspect the error and only skip on the error that is thrown when the object doesn't exist. I know this isn't easy to discover because client-go isn't particularly well documented, so you might have to resort to trial and error to figure this out.

I don't think I have done this anywhere else (I use List interfaces in the other places I need it, and that works because the objects have sufficient labels). Alternatively, you can update the grant function to attach unique labels so that we can use the List function interface for existence checks. I slightly prefer the latter, since that has other, user friendly benefits.

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 22, 2019

Author Contributor

Oh, I understand you now. I'll make this change.

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 22, 2019

Author Contributor

@yorinasub17 this change will make revoke incompatible with roles/bindings created with earlier versions of kubergrunt that don't have labels. Are you okay with that?

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 22, 2019

Contributor

Yes I think that is ok. It isn't ideal, but it is not the end of the world since we don't have as many customers using this right now. We can provide migration instructions as part of the release notes, which can use kubectl commands to delete the previous certs and then regrant using the new helm grant.

helm/revoke.go Outdated Show resolved Hide resolved
helm/deploy_test.go Outdated Show resolved Hide resolved
assert.NoError(t, RevokeAccess(kubectlOptions, namespaceName, rbacGroups, rbacUsers, serviceAccounts))

// No role for service account
assert.Error(t, validateNoRole(t, kubeClient, namespaceName, testServiceAccountName))

This comment has been minimized.

Copy link
@josh-padnick

josh-padnick May 21, 2019

The one issue I see with this is that you're assuming a certain type of error will be returned: role does not exist. But if a different error occurs here (e.g. "permissions denied") the test won't distinguish between the two.

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 21, 2019

Author Contributor

I'm assuming that I cannot get the role (or role binding, or secret; more or less the same approach for each of these tests). If I can get the role, then something went wrong on the revocation. If there was a permissions problem or otherwise, it would've been thrown in RevokeAccess().

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 22, 2019

Contributor

If you take the updated labels approach I proposed here (#50 (comment)), you can use the List interface instead for a more targeted existence check.

cmd/helm.go Outdated Show resolved Hide resolved
@robmorgan

This comment has been minimized.

Copy link
Member

commented May 22, 2019

Nice work @bwhaley 👍

@brikis98
Copy link

left a comment

Overall code and tests look great. Added some nitpicky things to make sure we're thinking through the various revoke scenarios.

cmd/helm.go Outdated Show resolved Hide resolved
assert.Error(t, validateNoRoleBinding(t, kubeClient, namespaceName, testServiceAccountName))

// No TLS keypair secret for service account
assert.Error(t, validateNoTLSSecret(t, kubectlOptions, namespaceName, serviceAccountName))

This comment has been minimized.

Copy link
@brikis98

brikis98 May 22, 2019

+1 on Josh's comment above: we should check which error comes back in all these checks, as these could be returning errors for a variety of reasons, and not all of them are necessarily what we're checking for.

This comment has been minimized.

Copy link
@brikis98

brikis98 May 22, 2019

Also, is the real way to test this to try to make requests as the user, check they succeed, revoke the user's access, and check the requests again to make sure they fail?

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 24, 2019

Author Contributor

Yes, that would certainly be the more complete test. However, kubergrunt currently doesn't run any tests from the perspective of multiple users. It relies on the kubectl credentials currently available in the executing user's path. Adding the ability to switch from admin/user context is a bit more time than I can afford at the moment unfortunately and would have some downstream implications.

)

// RevokeAccess revokes access to a Tiller pod from a provided RBAC user, group, or serviceaccount in a
// provided Tiller namespace by deleting the secret, role, and rolebindings associated with said entities.

This comment has been minimized.

Copy link
@brikis98

brikis98 May 22, 2019

Sanity check: is this the comprehensive list of everything that could be associated with an entity? Could the entity get permissions any other way (e.g., perhaps indirectly through some group)? Is there a way to specify a role that explicitly denies all permissions to prevent that?

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 22, 2019

Author Contributor

role, role binding, and secret is everything that kubergrunt creates with the grant subcommand. As I understand it, that's what we want to support with revoke. There are likely some edge cases that could occur. For instance, if a user is a member of a group that has been authorized via kubergrunt, and an operator runs kubergrunt helm revoke --rbac-user user, that wouldn't remove the user's access. They'd need to be removed from the group, which I'd consider out of scope for this feature.

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 22, 2019

Contributor

+1 to Ben's comment. Also, there is no deny permission in Kubernetes RBAC.

This comment has been minimized.

Copy link
@brikis98

brikis98 May 22, 2019

Agreed it's out of scope for this change. That said, Yori, should we file a TODO to think through the group use case? That seems pretty significant if we can't easily revoke the TLS certs...

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 22, 2019

Contributor

That said, Yori, should we file a TODO to think through the group use case?

I am not sure what there is to do there.

Users are added to the group at the IAM mapping level in EKS and GKE. Both are handled in our terraform modules and it is relatively easy to drop users from those groups. I don't think this is the responsibility of kubergrunt.

If we play out the scenario from an operator, the way they probably would do the groups is:

  • Create an RBAC group that represents access to Tiller
  • Run kubergrunt helm grant to grant access to Tiller to that group
  • Now, allowing users is controlled by adding users to that group.

In a revoke scenario, the user is removed from the group. Because the user is no longer in the group, they can't download new TLS certs. They also lose the ability to open port-forwards to the Tiller pod, effectively dropping network access as well. Even if they have the certs locally, they need to somehow gain access to the Tiller pod endpoint. However, the endpoint is bound on localhost in the container and they need a port-forward or exec permission to the Pod to gain that endpoint (this is what the RoleBinding grants). Without the RBAC group, they won't have this permission.

There are some ways to work around this (e.g by deploying a Pod that has enough permissions to open the port-forward and installing the certs there, though this requires an RBAC screwup that allows the user enough permissions to create a Service Account with access to the tiller namespace), which is why we still need a true revoke command (which looks like needs to be implemented via a kubergrunt helm rotate command that rotates the issued certs). This is captured in the README, and we should definitely create an issue for it. But I would argue this risk is relatively low enough (since it hinges on exploiting an RBAC loophole granted to the user somehow) that I am comfortable living without it.

The existence of the loophole, by the way, is why I designed kubergrunt to force users to treat the Tiller namespace (where the Tiller pod lives) separately from the resource namespace (the namespace where Tiller deploys things). If Tiller lives in a different namespace, you are much less likely to shoot yourself in the foot by granting broad RBAC permissions that gives you access to that namespace accidentally.

FWIW, I also explicitly recommend having a TLS cert for each unique helm client, rather than granting access to a group in the helm guide for this specific reason.

logger.Infof("Revoking access from RBAC Service Accounts")
serviceAccountInfos, err := convertToServiceAccountInfos(serviceAccounts)
if err != nil {
return err

This comment has been minimized.

Copy link
@brikis98

brikis98 May 22, 2019

If we return an error part way through, is there any chance of us ending up in some weird partial state that can't be recovered from? That is, the user's access is partially revoked, but not fully, and some possible error condition as a result?

@bwhaley

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2019

@yorinasub17 @brikis98 @josh-padnick It seems a consensus has been reached that the grant/revoke process needs more rigorous testing. To do so, I can refactor the role/rolebinding operations to mirror how secrets are handled (e.g. via functions in the kubectl package as opposed to in helm). Then I'll write tests for role/rolebindings and for grant/revoke operations as per comments above. How does this sound?

@yorinasub17

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

That sounds good to me!

[WIP] Refactor helm grant/revoke for easier tests
Moved role & rolebindings to kubectl package to set up helm grant and revoke for easier testing.
The refactor is more or less finished but still needs more tests.
@bwhaley

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2019

This most recent commit turns out to be fairly involved. Note that it's a WIP - still need to add tests and do a bit of clean up - but I'm interested in hearing if this is heading in the right direction. The tests are passing. I believe I've prevailed in an intense battle with the dependency snarl of helm, terratest, and kubernetes. Feedback on this WIP is welcome.

Gopkg.toml Outdated

[[override]]
name = "k8s.io/kubernetes"
revision = "54a352dda957bce0f88e49b65a6ee8bba8c0ba74"
version = "v1.13.6"
# revision = "721bfa751924da8d1680787490c54b9179b1fed0"

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 23, 2019

Contributor

I'm surprised you had to bump the version. Did you import a new variable or pkg?

This comment has been minimized.

Copy link
@yorinasub17

yorinasub17 May 23, 2019

Contributor

And I'm sorry you had to go through this. The package dependency war has scarred Kubernetes development, with client-go and k8s using godep and helm using glide, making it impossible to use dep or gomodules with them :(

I appreciate your fight through!

This comment has been minimized.

Copy link
@bwhaley

bwhaley May 24, 2019

Author Contributor

I wanted a newer version of terratest (0.13.30) that has k8s.GetRole(). From there I just muddled my way through the dependency errors until I found a winning combination. This may not be the only (or best) configuration...

@yorinasub17
Copy link
Contributor

left a comment

Thanks for your persistence in fighting through these suggestions! Left some comments. I clarified one of the suggestions because it looks like I wasn't clear in my suggestion. Hopefully the new example and explanation helps!

helm/revoke.go Outdated Show resolved Hide resolved
helm/revoke.go Outdated Show resolved Hide resolved
kubectl/helpers.go Show resolved Hide resolved
kubectl/role.go Show resolved Hide resolved
kubectl/test_helpers.go Outdated Show resolved Hide resolved
@yorinasub17

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

Also just wanted to let you know that I will be out the next two days so won't be able to check in on further updates until next Tuesday when I am back. Sorry about that, and really appreciate your contribution so far!

Feel free to hand off to me if you run out of time. I will be happy to take this further and get it merged once I'm back.

@bwhaley

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2019

Just pushed my latest changes per feedback above. Sadly I'm out of time to spend on this, but hopefully it's in a pretty decent place and can be easily merged from here. Thanks for all the feedback and review.

@yorinasub17 yorinasub17 merged commit 9bdc2e8 into gruntwork-io:master May 28, 2019

3 checks passed

ci/circleci: kubergrunt_tests Your tests passed on CircleCI!
Details
ci/circleci: setup Your tests passed on CircleCI!
Details
test-and-deploy Workflow: test-and-deploy
Details

yorinasub17 added a commit that referenced this pull request May 28, 2019

Merge pull request #51 from gruntwork-io/pull-request-50
Add helm revoke functionality: build on #50
@yorinasub17

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

Thanks for your hard work @bwhaley ! I took on the torch and made a few polishing touches to take this across the finish line. You can see my commits on #50 for the changes I made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.