Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/aws/vpc: Add an S3 endpoint to new VPCs #745

Merged
merged 2 commits into from
Dec 14, 2018

Conversation

wking
Copy link
Member

@wking wking commented Nov 28, 2018

As suggested by @cuppett, this allows registry <-> S3 transfers to bypass the (NAT) gateways. Traffic over the NAT gateways costs money, so the new endpoint should make S3 access from the cluster cheaper (and possibly more reliable). This also allows for additional security policy flexibility, although I'm not taking advantage of that in this commit. Docs for VPC endpoints are here, here, here, and here.

Endpoints do not currently support cross-region requests. And based on discussion with @cuppett, adding an endpoint may break access to S3 on other regions. But I can't find docs to back that up, and the docs say:

We use the most specific route that matches the traffic to determine how to route the traffic (longest prefix match). If you have an existing route in your route table for all internet traffic (0.0.0.0/0) that points to an internet gateway, the endpoint route takes precedence for all traffic destined for the service, because the IP address range for the service is more specific than 0.0.0.0/0. All other internet traffic goes to your internet gateway, including traffic that's destined for the service in other regions.

which suggests that access to S3 on other regions may be unaffected. In any case, our registry buckets, and likely any other buckets associated with the cluster, will be living in the same region.

concat is documented here.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 28, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2018
@wking wking changed the title data/data/aws/vpc/vpc: Add an S3 endpoint to new VPCs data/aws/vpc: Add an S3 endpoint to new VPCs Nov 28, 2018
@wking wking force-pushed the s3-vpc-endpoint branch 2 times, most recently from af161ff to 1a1dc49 Compare November 28, 2018 07:11
@wking
Copy link
Member Author

wking commented Nov 28, 2018

Only route_table_ids applies to Gateway mode.

@abhinavdahiya
Copy link
Contributor

Only route_table_ids applies to Gateway mode.

Oh, and Gateway is deafult https://www.terraform.io/docs/providers/aws/r/vpc_endpoint.html#vpc_endpoint_type 👍

@abhinavdahiya
Copy link
Contributor

Is there a precedent for this kind of setup in OpenShift @cuppett ?

@cuppett
Copy link
Member

cuppett commented Nov 28, 2018

Not in our hosted environments. Those and our reference architectures have historically taken a much more vanilla approach to each cloud.

Using ALBs, NAT gateways and VPC endpoints is new and applies optimizations and competencies to this cloud. Excited to bring more maturity to our offering. All are relatively new within AWS (last couple years) and is helping to catch us up.

@abhinavdahiya
Copy link
Contributor

@dgoodwin not sure who keeps track of all these unmanaged things. heads up.

/approve

@dgoodwin
Copy link
Contributor

Will this come out tagged with the expected tags?

And I assume we need to add VPC endpoint cleanup to hiveutil's code.

@wking
Copy link
Member Author

wking commented Nov 28, 2018

Will this come out tagged with the expected tags?

At least Terraform does not seem to allow tagging for this resource. It may be untaggable in AWS.

And I assume we need to add VPC endpoint cleanup to hiveutil's code.

Maybe it doesn't block VPC deletion and is automatically removed when the VPC is deleted (in which case we would not need Hive changes). I'll see if I can tell from the CI logs.

@wking
Copy link
Member Author

wking commented Nov 28, 2018

I'll see if I can tell from the CI logs.

The logs won't be particularly helpful until #730 lands. I'll circle back to this after that happens.

@abhinavdahiya
Copy link
Contributor

The logs won't be particularly helpful until #730 lands. I'll circle back to this after that happens.

deletion doesn't need that right?

@wking
Copy link
Member Author

wking commented Nov 28, 2018

The logs won't be particularly helpful until #730 lands. I'll circle back to this after that happens.

deletion doesn't need that right?

No, but we're not going to see anything about this endpoint in the deletion logs. What I want is to see creation (and ideally an ID) for the new resource, and then I'll check to make sure it's gone after CI wraps up.

@abhinavdahiya
Copy link
Contributor

/hold
need 👍 from hive team @joelddiaz

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2018
@joelddiaz
Copy link
Contributor

I looked at this a bit, and adding the endpoint does case panics in the uninstall due to some assumptions about the kinds of default fields found in a route table (easy fix).

You can't leave the endpoint hanging around as that will block the VPC from ever being deleted. Unfortunately, the endpoints don't appear to be taggable objects, so we'll need to add a dependency in the VPC deletion path to find any endpoints associated with the VPC being deleted.

joelddiaz pushed a commit to joelddiaz/hive that referenced this pull request Dec 3, 2018
installer wants to start using VPC endpoints (openshift/installer#745), so the uninstaller should find and remove them (they are untaggable objects)

also, remove the code that would remove the individual routes from a route table (it's not needed to remove routes to be able to delete RouteTables)

needed to add these permissions to get the installer/uninstaller working:
ec2:DescribePrefixLists
ec2:DescribeVpcEndpoints
ec2:CreateVpcEndpoint
ec2:DeleteVpcEndpoints
joelddiaz pushed a commit to joelddiaz/hive that referenced this pull request Dec 4, 2018
installer wants to start using VPC endpoints (openshift/installer#745), so the uninstaller should find and remove them (they are untaggable objects)

also, remove the code that would remove the individual routes from a route table (it's not needed to remove routes to be able to delete RouteTables)

needed to add these permissions to get the installer/uninstaller working:
ec2:DescribePrefixLists
ec2:DescribeVpcEndpoints
ec2:CreateVpcEndpoint
ec2:DeleteVpcEndpoints
@joelddiaz
Copy link
Contributor

FWIW, we've got the endpoint deletion merged now. so we're ready for this on the uninstall side of things openshift/hive#122

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2018
As suggested by Stephen Cuppett, this allows registry <-> S3 transfers
to bypass the (NAT) gateways.  Traffic over the NAT gateways costs
money, so the new endpoint should make S3 access from the cluster
cheaper (and possibly more reliable).  This also allows for additional
security policy flexibility, although I'm not taking advantage of that
in this commit.  Docs for VPC endpoints are in [1,2,3,4].

Endpoints do not currently support cross-region requests [1].  And
based on discussion with Stephen, adding an endpoint may *break*
access to S3 on other regions.  But I can't find docs to back that up,
and [3] has:

  We use the most specific route that matches the traffic to determine
  how to route the traffic (longest prefix match).  If you have an
  existing route in your route table for all internet traffic
  (0.0.0.0/0) that points to an internet gateway, the endpoint route
  takes precedence for all traffic destined for the service, because
  the IP address range for the service is more specific than
  0.0.0.0/0.  All other internet traffic goes to your internet
  gateway, including traffic that's destined for the service in other
  regions.

which suggests that access to S3 on other regions may be unaffected.
In any case, our registry buckets, and likely any other buckets
associated with the cluster, will be living in the same region.

concat is documented in [5].  The wrapping brackets avoid [6]:

  level=error msg="Error: module.vpc.aws_vpc_endpoint.s3: route_table_ids: should be a list"

although I think that's a Terraform bug.  See also 8a37f72
(modules/aws/bootstrap: Pull AWS bootstrap setup into a module,
2018-09-05, openshift#217), which talks about this same issue.

[1]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html
[2]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html
[3]: https://docs.aws.amazon.com/vpc/latest/userguide/vpce-gateway.html
[4]: https://www.terraform.io/docs/providers/aws/r/vpc_endpoint.html
[5]: https://www.terraform.io/docs/configuration/interpolation.html#concat-list1-list2-
[6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/745/pull-ci-openshift-installer-master-e2e-aws/1673/build-log.txt
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 13, 2018
@wking
Copy link
Member Author

wking commented Dec 13, 2018

I've rebased onto master and pulled in openshift/hive#122 with 1a1dc49 -> e205898.

Pulling in openshift/hive@b7d71518 (remove VPC endpoints, 2018-12-03,
openshift/hive#122).

Generated with:

  $ sed -i s/2349f175d3e4fc6542dec79add881a59f2d7b1b8/802db5420da6a88f034fc2501081e2ab12e8463e/ Gopkg.toml
  $ dep ensure

using:

  $ dep version
  dep:
   version     : v0.5.0
   build date  :
   git hash    : 22125cf
   go version  : go1.10.3
   go compiler : gc
   platform    : linux/amd64
   features    : ImportDuringSolve=false
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 13, 2018
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 13, 2018

@wking: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-libvirt 1a1dc49 link /test e2e-libvirt

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

I've rebased onto master and pulled in openshift/hive#122 with 1a1dc49 -> e205898.

/lgtm

/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 13, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 13, 2018
@openshift-merge-robot openshift-merge-robot merged commit 5813f61 into openshift:master Dec 14, 2018
@wking wking deleted the s3-vpc-endpoint branch December 14, 2018 00:26
wking added a commit to wking/openshift-installer that referenced this pull request Dec 14, 2018
Covering 08e4dbc (CHANGELOG: Document changes since v0.5.0,
2018-12-07, openshift#841) through 5813f61 (Merge pull request openshift#745 from
wking/s3-vpc-endpoint, 2018-12-13).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 14, 2018
Covering 08e4dbc (CHANGELOG: Document changes since v0.5.0,
2018-12-07, openshift#841) through 5813f61 (Merge pull request openshift#745 from
wking/s3-vpc-endpoint, 2018-12-13).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 15, 2018
Covering 5813f61 (Merge pull request openshift#745 from wking/s3-vpc-endpoint,
2018-12-13) through 14505f3 (Merge pull request openshift#911 from
hardys/issues/891, 2018-12-14).
wking added a commit to wking/openshift-installer that referenced this pull request Dec 15, 2018
Covering 5813f61 (Merge pull request openshift#745 from wking/s3-vpc-endpoint,
2018-12-13) through 874876e (Merge pull request openshift#711 from
staebler/improve_input_validation, 2018-12-14).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants