Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump aws-sdk-go #7458

Merged
merged 1 commit into from
Dec 17, 2019
Merged

Conversation

joelthompson
Copy link
Contributor

In support of #7450.

I think that there's an additional step needed to fulfill the goal of #7450 and will do some testing to follow up and reply in that other PR.

I ran manual integration tests for the AWS secrets engine and auth method.

@joelthompson
Copy link
Contributor Author

Actually, looks like this version of the SDK might have a bug in it. Let's hold off until aws/aws-sdk-go#2828 gets addressed -- if it gets fixed, let's just update to the fixed version.

@joelthompson joelthompson changed the title Bump aws-sdk-go [WIP] Bump aws-sdk-go Sep 12, 2019
@dgozalo
Copy link
Contributor

dgozalo commented Oct 24, 2019

aws/aws-sdk-go#2828 doesn't seem to be a bug. Vault code just needs to call session.NewSession instead of New and it should be fixed.

@catsby catsby added bug Used to indicate a potential bug auth/aws secret/aws waiting-for-response labels Nov 22, 2019
@catsby
Copy link
Member

catsby commented Nov 22, 2019

Hey @joelthompson - how do you feel about this PR right now? aws/aws-sdk-go#2828 seems to have lead to aws/amazon-vpc-cni-k8s#729, which removes the session.New() usage.

@joelthompson
Copy link
Contributor Author

Hey @catsby, my thought is to just skip this upgrade and go to the latest version to support the EC2 Instance Metadata Service (IMDS) v2, as requested in #7924

To support EKS IRSA, Vault will also need to be updated to use the NewSession method instead of just New.

@joelthompson joelthompson changed the title [WIP] Bump aws-sdk-go Bump aws-sdk-go Nov 25, 2019
@joelthompson
Copy link
Contributor Author

@catsby -- OK, I updated this PR with an update to the newer version of the golang sdk which should support IRSA (with a code update) and IMDSv2.

Note my comment in #7924 about this potentially being a breaking change: beware that this can actually cause breaking changes for clients that are running in Docker containers. See, e.g., boto/botocore#1892 (comment)

@tyrannosaurus-becks
Copy link
Contributor

tyrannosaurus-becks commented Nov 26, 2019

Awesome! Looks like the rubber hits the road here on this and since the new metadata service will be enabled by default for everyone, that should do it.

I'm not sure yet if there will be any backwards compatibility issues for this. I saw this note here:

Both IMDSv1 and IMDSv2 will be available and enabled by default, and customers can choose which they will use. The IMDS can now be restricted to v2 only, or IMDS (v1 and v2) can also be disabled entirely. AWS recommends adopting v2 and restricting access to v2 only for added security. IMDSv1 remains available for customers who have tools and scripts using v1, and who are comfortable with the existing security posture of their instances.

I'm not sure yet if the client is backwards-compatible to people who actually disable v2 and only want v1. I don't know why you'd do that since it's less secure, though I could see someone doing it if maybe they were using an AWS client outside of Go that only supported v1. However, in that case we could simply advise them to not disable v2 since that's what Vault uses. So I'm not sure if we really need to worry about v1 compatibility.

@tyrannosaurus-becks tyrannosaurus-becks added this to the 1.4 milestone Nov 26, 2019
@tyrannosaurus-becks
Copy link
Contributor

I'm adding a 1.4 milestone to this because that's at least a couple of months out, and Joel pointed out that this may break clients running in Docker. Since 1.4 is at least a few months out, this'll give Boto a little time to get the fix in, and us some extra time to test it more fully.

@joelthompson
Copy link
Contributor Author

TBC, boto is really just aws-sdk-python by a different name, and it's the first project that noticed the issue since the awscli is really a wrapper around boto.

Anyway, I may have raised a bit of a false alarm. Reading through the documentation and the SDK code a bit more, I think what will happen when this is run from a docker container is that it will wait 5 seconds to fetch the IMDSv2 token, and when that times out, it will fall back to IMDSv1 and mark internally that the instance of the API client shouldn't use IMDSv2 and subsequent calls will fall back to v1. My guess is that boto just failed to fall back to IMDSv1 (though I haven't read through the boto code). Of course, all this would need to be tested. And, of course, this can happen serially, resulting in noticeable slowdowns (see, e.g., aws/aws-sdk-go#2972) and poor user experience.

The solution to avoid this 5-second timeout and enable admins to disable IMDSv1 completely is to call ec2:ModifyInstanceMetadataOptions on the instance and specify an HttpPutResponseHopLimit value of 2 for the instance. Unfortunately, there doesn't appear to be a way to disable IMDSv2 entirely via config or environment variables, which would make this more palatable (i.e., release this as a breaking change with an option to revert back to the old behavior). I've opened a feature request with the aws-sdk-go folks for this at aws/aws-sdk-go#2980 -- please comment there and/or give 👍 on what you think a good user experience would be.

Lastly, I don't believe it's possible to disable v2. You can disable v1 by specifying HttpTokens as required in ec2:ModifyInstanceMetadataOptions, but I don't think you can disable v2.

Copy link
Contributor

@tyrannosaurus-becks tyrannosaurus-becks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for working on it @joelthompson .

I'm thinking we should probably do a second step where we audit the 4 locations in Vault that could potentially be using the instance metadata service:

  • The Vault AWS agent
  • The Vault AWS CLI handler
  • The clients created in the Vault AWS auth engine
  • The clients created in the Vault AWS secrets engine

If we go through those and write a test for each that creates a client using only creds from V1 instance metadata, and another test using only creds from V2 instance metadata, that should help us know if we need to do any additional code updates.

Definitely out of scope for this PR, and I like keeping this separate, just noting the TODO's for posterity. :-) Going to wait for one more approval on this before merging.

@joelthompson
Copy link
Contributor Author

Hey @tyrannosaurus-becks,

We also need to think about the clients created for:

  • The AWS KMS auto-unseal
  • S3 storage backend
  • DynamoDB storage backend

However, I'm not quite sure what you mean by "If we go through those and write a test for each that creates a client using only creds from V1 instance metadata." Currently, unless/until my AWS golang SDK feature request is implemented, the only way to achieve that is to run all the tests inside of a Docker container (or some other similar magic that will cause IMDSv2 to fail), And writing automated tests for these is hard because it really depends on the execution environment the test is being run in -- you really need a way to run these tests on different EC2 instances with different IMDS configurations.

That being said, I'm happy to manually run these tests manually on an EC2 instance with IMDSv1 disabled and report the results (though I'm not as familiar with the parts of Vault other than the secrets engine and auth method so might need some help). And I'm happy to continue brainstorming about how we can fully and properly test this to make sure it works properly for Vault users.

@tyrannosaurus-becks
Copy link
Contributor

I've had a chance to re-review this and look more. I see that the outward bound calls are the main change, and the responses appear the same. The version of the AWS SDK here should be good as long as we also use session.New when instantiating clients. Merging this and will make sure that's incorporated before 1.4 is released.

@tyrannosaurus-becks tyrannosaurus-becks merged commit d725f97 into hashicorp:master Dec 17, 2019
@joelthompson joelthompson deleted the bump_aws_sdk branch January 1, 2020 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auth/aws bug Used to indicate a potential bug secret/aws
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants