Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision to IAM Policies created by Kops #3343

Merged

Conversation

KashifSaadat
Copy link
Contributor

Based off of the work done by @chrislovecnm in PR #2497.

This PR tightens down the IAM policies created for Master & Node instance groups. The Cluster Spec IAMSpec.Legacy flag is used to control application of stricter policy rules, which is defaulted to true for existing clusters (to limit potential regression impact), and false for new cluster creation.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 5, 2017
@KashifSaadat
Copy link
Contributor Author

Ping @gambol99 @chrislovecnm
/assign @justinsb

This will need some thorough review to make sure I've not accidentally dropped any permissions for the legacy behaviour (I think I've captured it all), also maybe some SID renaming and refactoring to make it more readable.

I've tested on a couple local clusters (creation, updates, rolling-updates, both with and without flag) and it seems fine so far. Maybe the Jenkins E2E test will pick something up!

@KashifSaadat
Copy link
Contributor Author

Not sure why the E2E tests failed, there's no log output. Will kick off again.

/test pull-kops-e2e-kubernetes-aws

@chrislovecnm
Copy link
Contributor

E2E wth kops is not happy currently. Known issue that @justinsb is workin on

@KashifSaadat
Copy link
Contributor Author

/test pull-kops-e2e-kubernetes-aws

@chrislovecnm
Copy link
Contributor

This is a biggy. Need to set aside some time to review. The conversation between @justinsb and I was to initially get this under a feature flag, or do you think the API settings that I think you are using are sufficient?

@chrislovecnm
Copy link
Contributor

/assign

@KashifSaadat
Copy link
Contributor Author

Hey @chrislovecnm! Yeah this will need some careful review, it's easy for a permission to slip by and potentially break something.

I think when we initially discussed about this back in PR #3210 the plan was to have any future policy revisions wrapped around the Cluster Spec IAM Legacy flag, so I've tried to re-use that (it was already in use for the S3 and EC2 permissions). Do you think a different flag should be used instead?

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Sep 10, 2017

@KashifSaadat I think we can use your API element just fine :)

Copy link
Contributor

@chrislovecnm chrislovecnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start. This is going to need a lot of testing.

ELB
EBS
KMS
S3

What else does k-c-m play with?

@@ -14,6 +14,13 @@ See the License for the specific language governing permissions and
limitations under the License.
*/

// TODO: We have a couple different code paths until with do lifecycles, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these TODOs updated??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little unsure about the reference to using lifecycles and refactor of s3 code, so kept your comment as-is from the previous PR. I'll update the second TODO as we're now doing some more restrictive behaviour with use of resource paths and condition keys (but still room for improvement).

@@ -48,6 +55,7 @@ func (p *IAMPolicy) AsJSON() (string, error) {
}

type IAMStatementEffect string
type IAMSid string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type SID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. All of the types / constants are being picked up by golint for stuttering (iam.IAMSid etc) so I'll clean it up.

case kops.InstanceGroupRoleBastion:
p, err = b.BuildAWSIAMPolicyBastion()
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I've done that in a few places. Will update.

HostedZoneID string
S3Bucket string
ResourceARN *string
CreateECRPerms bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we wiring this in? What is setting this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this in from #2497 but I don't believe anything is actually making use of it yet (aside from me, setting KMSKeys in the unit tests). Shall I take the extra fields out for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cleaned up the unused fields, leaving in KMSKeys, ResourceARN & CreateECRPerms (although those aren't set elsewhere on use of the struct yet) as we try to make use of these within the builder.

"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:BatchGetImage",
"ec2:AllocateAddress",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to prune these. The kops installer needs a lot of these, but the master and nodes to do not. Take a look at https://gist.github.com/chrislovecnm/0e23d11903cc36b99ccf73a013d5ae56 and https://gist.github.com/chrislovecnm/a93941b4afbbd5afe746354af0cfddcc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take this function out based on the other comment below, we can raise as a separate PR and review it independently.

"autoscaling:GetAsgForInstance"
],
"Resource": [
"*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we limit on ASGs that are tagged?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is read metadata for ASGs, so perhaps could have one statement with Sid "ReadAWSMetadataAllResources")

{
"Effect": "Allow",
"Action": [
"s3:List*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tighten and remove the *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I'll update and test (I think I can remove this completely, will validate).

"kms:ReEncrypt*"
],
"Resource": [
"key-id-1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test by adding a different key for a ebs volume? This may be too restricting. I am not familiar with kms perms, we need to double check if we have too many.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@faraazkhan do you know IAM kms perms?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this is a complete set to work with (and delegate use of ) a specific key. There are some other conditions that delegate use only to aws resources (ie to EC2 so that it can mount the volumes) but I have never got good information out of AWS for exactly what is required for a minimal set here.

{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DescribeRegion? Why do we need DescribeInstances?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced in #2497 with a comment that Protokube makes a DescribeInstances call. If it's not needed at all we can drop the Node EC2 policy completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check. We may only need for gossip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This permission is still required, I got the below error when dropping it for compute nodes:

kubelet[8706]: error: failed to run Kubelet: could not init cloud provider "aws": error finding instance i-12345678: error listing AWS instances: UnauthorizedOperation: You are not authorized to perform this operation

"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/cluster.spec",
"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/config",
"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/instancegroup/*",
"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/pki/issued/*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which pki keys do we actually need?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there are limits on policy sizes, I would suggest you consider this role to own the whole iam-builder-test.k8s.local/* prefix - particularly for read actions. If there are specific prefixes under here that might hold secrets that a node should not be able to read, then you could use a DENY on those, or move that to a different prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back in PR #3158 I initially trialled this with open access and then explicit DENY rules for certain resources, however it became a bit more complex to manage. The /pki/private/ path can differ on content depending on your deployment configuration and there would be more rules to add in as the nodes shouldn't have access to most of these files, hence why I switched to this method.

The open access to /pki/issued/ was also brought up in #3158, but deemed acceptable (#3158 (review)).

@chrislovecnm
Copy link
Contributor

@KashifSaadat thanks for your patience with this PR, it is a bear. How are u testing this? We really need to figure out if we can run e2e against this so that we can get better testing coverage. Otherwise, we can install a bunch of example apps.

@chrislovecnm
Copy link
Contributor

@KashifSaadat let me know if you want to go over these changes on a video call. This is a complicated enough PR that we may want to review on a call.

"ec2:DescribeVolumes"
],
"Resource": [
"*"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Chris, Grant Gardner here as requested. I would not worry too much about the "read metadata" permissions such as ec2:Describe* in a first pass at a stricter policy. Would concentrate on permissions that can read end user data (eg S3) or can create/update/delete AWS resources.

I have just commented on the *strict permissions json, if there is anything else you'd like me to look at please let me know.

"ec2:ModifyInstanceAttribute"
],
"Resource": [
"*"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases you can use the aws:RequestTags condition to force the creation of a resource (Volume/Security group) to contain a specific tag value (which you use in the next statement to ensure that you are only working on your own resources)
and then move the Delete* actions into the "with tagged resources" below.
with Create/ModifyTags you can use a ForAllValues:StringEquals condition on Aws::TagKeys to limit which tags this role allows you to play with (ie only the tags associated with Kubernetes)

"autoscaling:GetAsgForInstance"
],
"Resource": [
"*"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is read metadata for ASGs, so perhaps could have one statement with Sid "ReadAWSMetadataAllResources")

"autoscaling:UpdateAutoScalingGroup"
],
"Resource": [
"*"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoScaling now has resource level permissions so instead of tags you also have the option of using a prefix on your AutoScalingGroup and LaunchConfiguration names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be good to add in. Would you be able to point me in the right direction of retrieving the autoscaling & launch configuration names and account id to populate the resources list?

"kms:ReEncrypt*"
],
"Resource": [
"key-id-1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this is a complete set to work with (and delegate use of ) a specific key. There are some other conditions that delegate use only to aws resources (ie to EC2 so that it can mount the volumes) but I have never got good information out of AWS for exactly what is required for a minimal set here.

"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/cluster.spec",
"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/config",
"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/instancegroup/*",
"arn:aws:s3:::kops-tests/iam-builder-test.k8s.local/pki/issued/*",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there are limits on policy sizes, I would suggest you consider this role to own the whole iam-builder-test.k8s.local/* prefix - particularly for read actions. If there are specific prefixes under here that might hold secrets that a node should not be able to read, then you could use a DENY on those, or move that to a different prefix.

@k8s-github-robot
Copy link

@KashifSaadat PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 11, 2017
@KashifSaadat
Copy link
Contributor Author

Thanks for the detailed feedback guys! I'll go through it later and try to address as much as I can :)

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 11, 2017
@KashifSaadat
Copy link
Contributor Author

I've made some changes based on the feedback and updated tests accordingly (hopefully the E2E will still be clean). A call would be good to discuss! Maybe something to chat about on the next Friday kops call?

@KashifSaadat KashifSaadat force-pushed the iam-policy-revision branch 2 times, most recently from 5395b84 to 8c9d1af Compare September 11, 2017 15:41
@KashifSaadat KashifSaadat force-pushed the iam-policy-revision branch 2 times, most recently from 409ee5f to 7d99013 Compare September 12, 2017 15:04
@KashifSaadat
Copy link
Contributor Author

KashifSaadat commented Sep 12, 2017

/retest

@KashifSaadat
Copy link
Contributor Author

KashifSaadat commented Sep 12, 2017

@chrislovecnm do you know if there's any known issues with the E2E tests at the moment? My latest update didn't include any policy changes (just unused code cleanup) and still works locally, but getting failures in the E2E tests now. It looks like the E2E tests are missing some log files (kube-controller-manager not present).

@KashifSaadat
Copy link
Contributor Author

/test pull-kops-e2e-kubernetes-aws

@chrislovecnm
Copy link
Contributor

Looks like it passed

@chrislovecnm
Copy link
Contributor

@KashifSaadat how do we want to proceed with this? File some issues for the things we did not address and iterate? With these changes I assume this is opt in changes. We will not be changing the default behavior?

@KashifSaadat
Copy link
Contributor Author

KashifSaadat commented Sep 15, 2017

Hi @chrislovecnm, yeah it would be good if we could get this one in and then separately pick up the remaining tasks, such as optional ASG permissions, IAM SSL certs, validation around ECR permissions.

I've aimed to keep the existing permission set / behaviour under the legacy flag, which will be enabled by default for existing clusters so users shouldn't be impacted (unless they explicitly set the flag to false in their cluster spec).

@chrislovecnm
Copy link
Contributor

Can you open some issues on those remaining items? We will need release notes and probably more documentation ;)

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 15, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrislovecnm

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 15, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants