Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Create etcd and workers in private subnets, controllers in public subnet #169

Merged
merged 20 commits into from Jan 24, 2017

Conversation

neoandroid
Copy link
Contributor

@neoandroid neoandroid commented Dec 15, 2016

This PR belongs to issue #152

The issue talks about the following use-cases supported by the PR:

  • Place worker nodes in their own private subnet
  • Place controller nodes in their own private subnet
  • Place etcd nodes in their own private subnet
  • Place node-pools on non pre-allocated private subnets
  • Create NAT gateways to give Internet access to private subnets
  • Use pre-allocated EIP's for the NAT gateways
  • Use pre-allocated NAT gateways

@codecov-io
Copy link

codecov-io commented Dec 15, 2016

Current coverage is 67.86% (diff: 85.10%)

No coverage report found for master at c538e61.

Powered by Codecov. Last update c538e61...b379650

@mumoshu
Copy link
Contributor

mumoshu commented Dec 22, 2016

@neoandroid Thanks for the PR! Could I help you with anything towards #152 (comment)?

@neoandroid
Copy link
Contributor Author

I'm still working on it. I plan to add the proposed changes before the end of this week.

@neoandroid neoandroid force-pushed the masters-private-subnet branch 4 times, most recently from f654e6d to 3b0562a Compare December 29, 2016 20:45
@neoandroid
Copy link
Contributor Author

neoandroid commented Dec 29, 2016

I finally merged the work from @icereval in this PR , I liked the approach and made a couple of modifications to have the same capabilities that I had with the previous PoC.

  • Add a parameter on cluster.yml called "controllerLoadBalancerPrivate" to have the option to put the ELB on the public subnet although I have my controllers on a private subnet. This way I can access them via Internet without the need of setting up a bastion host (a work that could be done later, although it could be considered out of the scope of this project too).
  • Add support to create workers on private subnets that are not part of a node pool since it was possible on the PoC and I still need it.

The rest of the changes I've added are small fixes: a typo in cluster.yml and a wrong reference for the Internet Gateway on the cfn template.

@andrejvanderzee
Copy link
Contributor

@neoandroid I also prefer to be able to create workers on a private subnet that are not in a node pool.

@andrejvanderzee
Copy link
Contributor

andrejvanderzee commented Dec 30, 2016

@neoandroid Could you please post a working cluster.yaml? I have tried the origin/masters-private-subnet branch with this config:

clusterName: PRD
externalDNSName: xxx
createRecordSet: true
recordSetTTL: 320
hostedZoneId: "xxx
keyName: aws-vanderzee-it
region: eu-west-1
kmsKeyArn: xxx
controllerCount: 3
controller:
  privateSubnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.3.0/24"
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.4.0/24"
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.5.0/24"
workerCount: 3
worker:
  privateSubnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.3.0/24"
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.4.0/24"
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.5.0/24"
etcdCount: 3
etcd:
  privateSubnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.3.0/24"
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.4.0/24"
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.5.0/24"
vpcCIDR: "10.0.0.0/16"
subnets:
- availabilityZone: eu-west-1a
  instanceCIDR: "10.0.0.0/24"
- availabilityZone: eu-west-1b
  instanceCIDR: "10.0.1.0/24"
- availabilityZone: eu-west-1c
  instanceCIDR: "10.0.2.0/24"
serviceCIDR: "10.3.0.0/24"
podCIDR: "10.2.0.0/16"
dnsServiceIP: 10.3.0.10
tlsCADurationDays: 3650
tlsCertDurationDays: 365
kubernetesVersion: v1.5.1_coreos.0
hyperkubeImageRepo: quay.io/coreos/hyperkube
awsCliImageRepo: quay.io/coreos/awscli
useCalico: true
containerRuntime: docker
stackTags:
  Name: "Kubernetes"
  Environment: "Production"

But it does not create any private subnets.

Probably I am missing something?

@neoandroid
Copy link
Contributor Author

neoandroid commented Dec 30, 2016

Here you have a full example:

clusterName: foo
externalDNSName: foo.example.tld
createRecordSet: true
hostedZoneId: "ABCDEFGHJ12345"
keyName: foobar
region: eu-west-1
kmsKeyArn: "arn:aws:kms:eu-west-1:foobar"
controller:
  # Auto Scaling Group definition for controllers. If only `controllerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
  autoScalingGroup:
    minSize: 3
    maxSize: 3
    rollingUpdateMinInstancesInService: 2
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: "10.0.6.0/24"
    - availabilityZone: eu-west-1b
      instanceCIDR: "10.0.7.0/24"
    - availabilityZone: eu-west-1c
      instanceCIDR: "10.0.8.0/24"
worker:
  # Auto Scaling Group definition for workers. If only `workerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
  autoScalingGroup:
    minSize: 3
    maxSize: 6
    rollingUpdateMinInstancesInService: 2
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: "10.0.9.0/24"
    - availabilityZone: eu-west-1b
      instanceCIDR: "10.0.10.0/24"
    - availabilityZone: eu-west-1c
      instanceCIDR: "10.0.11.0/24"
etcdCount: 3
etcd:
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: "10.0.3.0/24"
    - availabilityZone: eu-west-1b
      instanceCIDR: "10.0.4.0/24"
    - availabilityZone: eu-west-1c
      instanceCIDR: "10.0.5.0/24"
subnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.0.0/24"
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.1.0/24"
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.2.0/24"
useCalico: true
stackTags:
  Name: "Kubernetes"
  Environment: "Production"

At first glance I see that you have repeated the same CIDRs for worker, controller and etcd subnets. Probably that is the reason why its failing.

@andrejvanderzee
Copy link
Contributor

andrejvanderzee commented Dec 30, 2016

Tried to differ subnets too does not help:

externalDNSName: xxx
createRecordSet: true
recordSetTTL: 320
hostedZoneId: xxx
keyName: xxx
region: eu-west-1
kmsKeyArn: xxx
controller:
  autoScalingGroup:
    minSize: 3
    maxSize: 3
    rollingUpdateMinInstancesInService: 2
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: "10.0.6.0/24"
    - availabilityZone: eu-west-1b
      instanceCIDR: "10.0.7.0/24"
    - availabilityZone: eu-west-1c
      instanceCIDR: "10.0.8.0/24"
worker:
  autoScalingGroup:
    minSize: 3
    maxSize: 6
    rollingUpdateMinInstancesInService: 2
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: "10.0.9.0/24"
    - availabilityZone: eu-west-1b
      instanceCIDR: "10.0.10.0/24"
    - availabilityZone: eu-west-1c
      instanceCIDR: "10.0.11.0/24"
etcdCount: 3
etcd:
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: "10.0.3.0/24"
    - availabilityZone: eu-west-1b
      instanceCIDR: "10.0.4.0/24"
    - availabilityZone: eu-west-1c
      instanceCIDR: "10.0.5.0/24"
subnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: "10.0.0.0/24"
  - availabilityZone: eu-west-1b
    instanceCIDR: "10.0.1.0/24"
  - availabilityZone: eu-west-1c
    instanceCIDR: "10.0.2.0/24"
useCalico: true
stackTags:
  Name: "Kubernetes"
  Environment: "Production"

It creates three public subnets, but no private subnets.

No time to dig into sources now, maybe in the new year :-)

@icereval
Copy link
Contributor

Can you elaborate on the use case for needing more than three public subnets?

@icereval
Copy link
Contributor

Also over at the KOPS project a contributor created an awesome diagram of their desired network (mostly matching) that might make it easier to discuss.

kubernetes/kops#428 (comment)

@andrejvanderzee
Copy link
Contributor

andrejvanderzee commented Dec 30, 2016

For us, we require only 3 private and 3 public subnets (1 private+1 public per AZ). Therore we would prefer to put etcd, workers and controllers spread over 3 private subnets.

@andrejvanderzee
Copy link
Contributor

@icereval What I meant is that three public subnets were created as expected, but no private subnets. I have updated the comment to make it more obvious.

@icereval
Copy link
Contributor

Hmmm, I just checked out the latest code and I'm seeing the correct results. Let me get my config together and share it.

@icereval
Copy link
Contributor

icereval commented Dec 30, 2016

I'm running an environment setup script via cloudformation which generates the base VPC/DNS/Bastion/VPC Peering and so on, but other than that its a typical setup.

With this configuration running kube-aws up --s3-uri s3://${S3_BUCKET}/ cloudformation proceeds to build the following environment.

cluster.yaml

clusterName: cluster-1
externalDNSName: cluster-1.dev.example.internal
amiId: ami-69cbda7e  # copy of coreos stable for encrypted root volume
createRecordSet: true
hostedZoneId: ZWB2UKRTYXHY0  # private hosted zone dev.example.internal
region: us-east-1
keyName: ...
kmsKeyArn: ...
vpcId: vpc-76c11e10
internetGatewayId: igw-adce55ca
vpcCIDR: 10.0.0.0/16
tlsCADurationDays: 3650
tlsCertDurationDays: 1825  # modified (5 years)
useCalico: true 

# limited public subnet cidr needed since the bastion will grant
# access into this space via vpc peering
subnets:
  - availabilityZone: us-east-1a
    instanceCIDR: 10.0.0.0/26
  - availabilityZone: us-east-1b
    instanceCIDR: 10.0.0.64/26
  - availabilityZone: us-east-1d
    instanceCIDR: 10.0.0.128/26

controller:
  autoScalingGroup:
    minSize: 3
    maxSize: 4
    rollingUpdateMinInstancesInService: 1
  privateSubnets:
    - availabilityZone: us-east-1a
      instanceCIDR: 10.0.1.0/26
    - availabilityZone: us-east-1b
      instanceCIDR: 10.0.1.64/26
    - availabilityZone: us-east-1d
      instanceCIDR: 10.0.1.128/26

etcdCount: 3
etcd:
  privateSubnets:
    - availabilityZone: us-east-1a
      instanceCIDR: 10.0.2.0/26
    - availabilityZone: us-east-1b
      instanceCIDR: 10.0.2.64/26
    - availabilityZone: us-east-1d
      instanceCIDR: 10.0.2.128/26

worker:
  autoScalingGroup:
    minSize: 3
    maxSize: 4
    rollingUpdateMinInstancesInService: 1
  privateSubnets:
    - availabilityZone: us-east-1a
      instanceCIDR: 10.0.3.0/24
    - availabilityZone: us-east-1b
      instanceCIDR: 10.0.4.0/24
    - availabilityZone: us-east-1d
      instanceCIDR: 10.0.5.0/24

Subnets

screen shot 2016-12-30 at 3 00 40 pm

Route Tables

  • Private route tables for each AZ -> NAT Gateway

screen shot 2016-12-30 at 3 09 14 pm

Nat Gateways

screen shot 2016-12-30 at 3 03 08 pm

EC2 Instances

screen shot 2016-12-30 at 3 02 04 pm

}

{{if (or .WorkerTopologyPrivate .Etcd.TopologyPrivate .Controller.TopologyPrivate)}}
Copy link
Contributor

@icereval icereval Dec 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also add .Worker.TopologyPrivate for the case where workers are private yet somehow etcd & controllers are public and not using node pools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Nice catch! 👍

@andrejvanderzee
Copy link
Contributor

andrejvanderzee commented Dec 30, 2016

Hmmmm not sure what is going on. I more-or-less have copied your cluster.yaml now (except that I don't create VPC before running kube-aws):

clusterName: PRD
externalDNSName: xxx
createRecordSet: true
recordSetTTL: 320
hostedZoneId: xxx
keyName: XXX
region: eu-west-1
kmsKeyArn: xxx
tlsCADurationDays: 3650
tlsCertDurationDays: 1825  # modified (5 years)
vpcCIDR: 10.0.0.0/16

# limited public subnet cidr needed since the bastion will grant
# access into this space via vpc peering
subnets:
  - availabilityZone: eu-west-1a
    instanceCIDR: 10.0.0.0/26
  - availabilityZone: eu-west-1b
    instanceCIDR: 10.0.0.64/26
  - availabilityZone: eu-west-1c
    instanceCIDR: 10.0.0.128/26

controller:
  autoScalingGroup:
    minSize: 3
    maxSize: 4
    rollingUpdateMinInstancesInService: 1
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: 10.0.1.0/26
    - availabilityZone: eu-west-1b
      instanceCIDR: 10.0.1.64/26
    - availabilityZone: eu-west-1c
      instanceCIDR: 10.0.1.128/26

etcdCount: 3
etcd:
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: 10.0.2.0/26
    - availabilityZone: eu-west-1b
      instanceCIDR: 10.0.2.64/26
    - availabilityZone: eu-west-1c
      instanceCIDR: 10.0.2.128/26

worker:
  autoScalingGroup:
    minSize: 3
    maxSize: 4
    rollingUpdateMinInstancesInService: 1
  privateSubnets:
    - availabilityZone: eu-west-1a
      instanceCIDR: 10.0.3.0/24
    - availabilityZone: eu-west-1b
      instanceCIDR: 10.0.4.0/24
    - availabilityZone: eu-west-1c
      instanceCIDR: 10.0.5.0/24

No NATs are created, no private subnets.

$ git status
On branch masters-private-subnet
Your branch is up-to-date with 'origin/masters-private-subnet'.
$ git remote -v
origin	https://github.com/neoandroid/kube-aws.git (fetch)
origin	https://github.com/neoandroid/kube-aws.git (push)
./bin/kube-aws render stack
./bin/kube-aws  up --s3-uri XXX

@cknowles
Copy link
Contributor

@mumoshu Trying to think of a use case, perhaps using one of the controllers as the bastion host to cut down on instance cost? Not that I especially want to support that case! We could write the minimum cluster instance count in the docs we agreed on above. Do you want me to write up some of that?

I'm good with whatever you decide :) I'd prefer the work to be merged than not. How about we stick with what is here but agree we may later decide to simplify (and we may have to drop some compatibility to keep it simple)?

@mumoshu
Copy link
Contributor

mumoshu commented Jan 16, 2017

@c-knowles I'll decide to go with the simpler one: subnets: without private: true|false which might be less confusing/conflicting then!

Even with that:

could both be achieved. Am I correct?

@cknowles
Copy link
Contributor

@mumoshu I think the second point is less achievable that way. It's not a huge issue, more a logical one of how we structure in general. Sorry I hijacked this PR a little bit! Happy to chat outside of here too. For example, if sets of both private and public subnets are specified which "wins"? Currently I believe it's the private subnets. We can document it but ultimately we will still have multiple places where one piece of infrastructure can be specified (in this case the worker subnets).

@mumoshu
Copy link
Contributor

mumoshu commented Jan 16, 2017

if sets of both private and public subnets are specified which "wins"? Currently I believe it's the private subnets.

My assumption is that:

  • If you provide both top-level subnets: and worker/controller/etcd specific subnets:, the latter wins.
  • If you provide both private and public subnets under a single subnets:, everyone wins i.e. all the nodes distribute over these subnets without useful labels/taints (for now) hence I'm trying to avoid mixing public/private subnets (for now)

Also note that:

  • I agree that we should allow defining subnets elsewhere than of today and then reference them from worker|controller|etcd subnets but that's an another issue in IMHO.

Did I cover all the general concerns you might have considered?

@c-knowles Are you in the kubernetes slack? 😃

@mumoshu
Copy link
Contributor

mumoshu commented Jan 16, 2017

Hi @neoandroid, sorry for a long silence but I'm still wanting this to be merged!

In addition to #169 (comment),

  • would you mind renaming privateSubnets to just subnets in cluster.yaml after the long discussion above?
  • Edit: found it. I was seeing stack-template.json for node pools would you mind adding support for NAT gateways? (It seems to be missing in the stack-template.json i)

@cknowles
Copy link
Contributor

@mumoshu great yes, I agree we can simplify slightly later especially if we can leave behind some legacy prior to v1. I am indeed on the k8s slack, username without the hyphen.

ControllerCount int `yaml:"controllerCount,omitempty"`
ControllerCreateTimeout string `yaml:"controllerCreateTimeout,omitempty"`
ControllerInstanceType string `yaml:"controllerInstanceType,omitempty"`
ControllerLoadBalancerPrivate string `yaml:"controllerLoadBalancerPrivate,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Good catch 👍

@mumoshu
Copy link
Contributor

mumoshu commented Jan 17, 2017

Hi @neoandroid, I'm going to merge several PRs from now on.
If one of those PRs conflicted with your PR badly, would you mind me to resolve the conflict and merge this?
After that, we can tackle on #169 (comment) if necessary.

@neoandroid
Copy link
Contributor Author

I'm working on the changes commented here and here . I expect to push them tomorrow. If you can wait until then to merge it's fine for me. 😃
Of course, if it appears any merge conflict with master branch when you merge any PR I will try to solve it asap.

rename privateSubnets prpoerty to subnets in cluster.yml file
merge public and private subnet variables into one generic variable
@neoandroid
Copy link
Contributor Author

neoandroid commented Jan 20, 2017

Finally I've applied all the changes discussed in the comments, I hope you like them. I've done some e2e tests in my environment and haven't found bugs using the different supported scenarios but I encourage you to make your own verifications.

Since the parameter in cluster.yml is now called subnets instead of privateSubnets I thought it didn't make any sense to have two variables in the models that handle public and private subnets respectively. It was needlessly complicating things so currently there is only one Subnets variable in each model and it's populated with top-level defined subnets if no subnets are defined under etcd/controller/worker.
I've also removed the remaining if/else blocks in cfn template due to private network topology.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 24, 2017

@neoandroid Thanks for all the hard works!
I've merged your work locally and started testing.
FYI, issues I've ever seen so far are only two.

(1) Node pools fail while creating when the topology isn't private, because of the missing import:

CREATE_FAILED AWS::CloudFormation::Stack kubeawstest6-nodepool1 No export named PublicRouteTable found

(2) and the missing subnets for the worker ASG:

CREATE_FAILED AWS::CloudFormation::Stack kubeawstest6-nodepool1 The following resource(s) failed to create: [Workers].
CREATE_FAILED AWS::AutoScaling::AutoScalingGroup Workers Property AvailabilityZones cannot be empty.

They were not that hard to fix. Please give me a day or two until it gets merged 👍

@mumoshu mumoshu merged commit b379650 into kubernetes-retired:master Jan 24, 2017
@mumoshu
Copy link
Contributor

mumoshu commented Jan 24, 2017

Merged 😄
Thanks for your contribution @neoandroid!
Please feel free to submit another pull request(s) to add further fixes/improvements to this feature.

@whereisaaron
Copy link
Contributor

whereisaaron commented Jan 29, 2017

This looks fab @neoandroid! But does it remove the ability to deploy to private subnets without creating a NAT? We already provide non-AWS NAT for our VPCs, and I was using the add-security-group and add-route-table cluster.yaml option to add a Security Group that automatically provided access to NAT for instances created by kube-aws installs. Does this PR take away that use case? Or is there still a way to specify a private subnet without creating an AWS NAT? I just couldn't spot how to express that in the syntax above?

@neoandroid
Copy link
Contributor Author

neoandroid commented Jan 29, 2017

No, that scenario is not possible. I didn't expect users were not using AWS NAT gateways. If you need that, you'll probably have to create your own subnets and NAT's and use the ability to reuse preexisting infrastructure #227

@whereisaaron
Copy link
Contributor

Well it was possible before, so this is a substantial breaking change for me if I can't work around it 😞. I would be moving from the fully automated cluster deployment kube-aws provides now, back to semi-manual deployment.

I guess the workaround is to patch out the AWS NATs from the stack.json? So long as I can still nominate the Route Table and additional Security Group to be applied to private subnets it should be workable. Or maybe if I script/CF the creation of the subnets for each cluster first, just before kube-aws up will that avoid creating the unnecessary AWS NAT's?

The other possible problem with mandating people create AWS NAT's is that AWS imposes is a limit of only 5 NAT gateways per AZ. If every cluster deployed with kube-aws creates its own NAT instances, you'll quickly hit that limit.

Anyway, I don't want to rain on your parade 🎈 🎉 I'll do my best to make this new approach work for me too. There's always a way!

@cknowles
Copy link
Contributor

I guess it's still possible for kube-aws to create the subnets but link to existing route tables? That's how I'm using kube-aws, at least on an older version. This does remind me a little of @redbaron's comments, perhaps we could discuss as a group how we can allow extensions so not every customisation is embedded deeply in kube-aws.

@mumoshu
Copy link
Contributor

mumoshu commented Jan 30, 2017

@c-knowles I believe that I've already implemented a support for an existing route table per subnet in #284 (comment)
Would you mind taking a look into my comment there and leave your feedback?

@mumoshu
Copy link
Contributor

mumoshu commented Jan 30, 2017

Let me also add that as it isn't an general purpose aws provisioning tool, kube-aws won't automate all the necessary steps required in order to deploy your cluster into an existing vpc(which aren't uniformly configured at all because everyone's requirement varies) but will definitely try to support at least general part of those steps via explicit configuration options(like vpcId, routeTableId) and possibly hooks and/or a framework to support your customization(s).

Also, I'll try to support your existing use-case(s) as much as possible so please don't hesitate to leave your comments in #284!

@neoandroid
Copy link
Contributor Author

@whereisaaron For your interest I inform you that you can request a limit increase for NAT gateways. 5 is the default limit but it's not a hard limit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants