Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Allow defining multiple k8s API endpoints with/without managed ELB #373

Closed
mumoshu opened this issue Mar 1, 2017 · 25 comments
Closed

Allow defining multiple k8s API endpoints with/without managed ELB #373

mumoshu opened this issue Mar 1, 2017 · 25 comments

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Mar 1, 2017

Accordingly to use-cases:

  • internal or external
  • stable/unversioned one for CI&CD or dynamic/versioned one as we have today

This is intended to eventually resolve all the related issues #281 #285 #343


Configuration examples

cluster.yaml after this feature is implemented would look like:

# before
externalDNSName: k8s.external.example.com
createRecordSet: true
hostedZoneId: myhostedzone
controller:
  loadBalancer:
    private: false

# after
apiEndpoints:
- dnsName: k8s.external.example.com 
  loadBalancer:
    private: false
    hostedZone:
      id: myhostedzone

and

# before
externalDNSName: k8s.external.example.com
createRecordSet: false
controller:
  loadBalancer:
    private: true

# after
apiEndpoints:
- dnsName: k8s.external.example.com 
  loadBalancer:
    private: true

and

# before
externalDNSName: k8s.external.example.com
createRecordSet: false
controller:
  loadBalancer:
    private: false

# after
apiEndpoints:
- dnsName: k8s.external.example.com 
  loadBalancer:
    private: false

and

# before
externalDNSName: k8s.external.example.com
createRecordSet: false
controller:
  loadBalancer:
    subnets:
    - name: managedPublicSubnet1a

# after
apiEndpoints:
- dnsName: k8s.external.example.com 
  loadBalancer:
    subnets:
    - name: managedPublicSubnet1a

If you'd like to just add alternate DNS names to a k8s api endpoint:
cc @c-knowles #343

apiEndpoints:
- dnsName: v1.k8s.external.example.com 
  loadBalancer:
    public: true
- dnsName: v1.k8s.external.example.com # this is also included in CNs for TLS cert generation, without creating an ELB

If you'd like to choose Route53 round-robin rather than ELB for load-balancing:
cc @spacepluk #281

apiEndpoints:
- dnsName: v2.k8s.external.example.com
  dnsRoundRobin: # will enable a bash script in cloud-config-controller to update Route 53 record sets
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups: # associated to controller nodes
    - id: sg-toallowexternalaccess

If you'd like to have both public and private ELBs for a k8s endpoint:
cc @tarvip #281 (comment)

apiEndpoints:
- dnsName: v2.k8s.external.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups:
    - id: sg-toallowexternalaccess
- dnsName: v2.k8s.internal.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone internal.example.com>
    securityGroups:
    - id: sg-toallowinternalacess

In an extreme case it would even look like:
cc @c-knowles #343 & @spacepluk @tarvip #281

controller:
  # Instead of this
  # loadBalancer:
  #   private: true

# Introduce this
apiEndpoints:
# DON'T FORGET replacing `credentials/admin.pem` and `credentials/admin-key.pem` with the ones from a previous cluster so that you don't need to replace `admin-key.pem` in e.g. an external CI service or your laptop
- name: stableExternalEndpoint
  # `dnsName` this will be added to CNs in the apiserver cert/cc @c-knowles
  dnsName: k8s.external.example.com 
  # You can omit this. If omitted, it is your responsibility to add controller nodes to an ELB serving `k8s.external.example.com`
  loadBalancer:
    id: id-of-existing-internet-facing-elb
    # Notice the missing `hostedZone` key here!
    # kube-aws doesn't create ALIAS record for the ELB
    # This way, we don't need to worry about the possibly existing ALIAS record for the stable external endpoint
# DON'T FORGET replacing `credentials/admin.pem` and `credentials/admin-key.pem` with the ones from a previous cluster so that you don't need to replace `admin-key.pem` in e.g. an external CI service or your laptop
- name: stableInternalEndpoint
  dnsName: k8s.internal.example.com
  loadBalancer:
    id: id-of-existing-internal-elb
# Former `externalDNSName` + `hostedZoneId` + newly added SG definitions for controller ELB
- name: versionedExternalEndpoint
  dnsName: v2.k8s.external.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups:
    - id: sg-toallowexternalaccess
# Former `externalDNSName` + `hostedZoneId` without an ELB + newly added SG definitions for controller nodes
- name: versionedInternalEndpoint
  dnsName: v2.k8s.internal.example.com
  dnsRoundRobin: # will enable a bash script in cloud-config-controller to update Route 53 record sets
    hostedZone:
      id: <id for the hosted zone internal.example.com>
    securityGroups:
    - id: sg-toallowinternalacess

Types

type Endpoint struct {
    // Name is used for identifying an endpoint and specifying which one is used for communication with worker nodes
    Name string
    DNSName string
    LoadBalancer LoadBalancer
    DNSRoundRobin DNSRoundRobin
}

type LoadBalancer struct {
    EndpointProvider `yaml:",inline"`
    Identifier `yaml:",inline"`
    // Validation: If `private: true` is specified, all the `Subnets` should be private ones
    // Validation: If `identifier` is specified i.e. reusing an existing ELB, `Subnets` should be empty.
    Subnets []Subnet
}

type DNSRoundRobin struct {
    EndpointProvider `yaml:",inline"`
}

type EndpointProvider struct {
    // Private determines the resulting load balancer or DNS round robin uses private IPs of the lb or nodes for an endpoint
    Private bool
    // HostedZone is where the resulting Alias or A record are created for an endpoint
    HostedZone HostedZone
    // SecurityGroups contains security groups must be associated to the lb or the nodes serving API requests from clients
    SecurityGroups SecurityGroups
}

type HostedZone struct {
    Identifier `yaml:",inline"`
}

Non-goals

  • Detailed configuration of security groups like customizations of security group inbound/outbound rules
    • Although it is already close 😄 but not to make cluster.yaml a cloudformation.yaml, we won't support it.
    • Bring your own VPC and security groups and specify them via vpcId and securityGroups[].id
@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 2, 2017

Hi @c-knowles @tarvip @spacepluk, could you confirm if this feature would resolve all of your use-cases including #281 and #281 (comment) and #343?

@tarvip
Copy link
Contributor

tarvip commented Mar 2, 2017

apiEndpoints:
- dnsName: v2.k8s.external.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups:
    - id: sg-toallowexternalaccess
- dnsName: v2.k8s.internal.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone internal.example.com>
    securityGroups:
    - id: sg-toallowinternalacess

Here is something missing, how can I define that the latter one is internal ELB?

I guess private: true is missing.

On the other hand I'm trying to understand what is the difference between
private: true and public: false or private: false and public: true

@tarvip
Copy link
Contributor

tarvip commented Mar 3, 2017

dnsRoundRobin support made me think why do we need ELBs at all? What is the advantage using ELB over roundrobin DNS? I tested roundrobin DNS for kubelet and kube-proxy, it worked very well. When one master went down, kubelet and kube-proxy recreated connection to another node.

I understand one problem is keeping DNS entry up to date.

@redbaron
Copy link
Contributor

redbaron commented Mar 3, 2017

@tarvip ELB can be made accessible externally while API servers remain in the private subnet

@tarvip
Copy link
Contributor

tarvip commented Mar 3, 2017

@tarvip ELB can be made accessible externally while API servers remain in the private subnet

Ok. True, but for kubelet and kube-proxy DNS should be enough.

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 5, 2017

dnsRoundRobin support made me think why do we need ELBs at all? What is the advantage using ELB over roundrobin DNS?

Using ELB allows us to automatically remove non-functioning controller nodes from the API endpoint backed by a controller ELB.
How are you going to achieve the same functionality, having only dnsRoundRobin?

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 5, 2017

@tarvip Thanks for the feedbacks!

Here is something missing, how can I define that the latter one is internal ELB?

I guess private: true is missing.

Yes and no. I did miss the private: true it make it private but I'm still scratching my head how to express it is internal or not.

"internal" here means that it is to where kubelets in worker nodes connect. On the other hand, "private: true" vs "private: false" means whether it should be within private subnets or public subnets.
How would you like to express which k8s api endpoint worker nodes connect to, when there are 2 or more endpoints?

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 5, 2017

On the other hand I'm trying to understand what is the difference between
private: true and public: false or private: false and public: true

Ah, sorry. You can just transform public: true to private: false for now. They're remainings of my brainstorming to make configuration more explicit than now 😃

@tarvip
Copy link
Contributor

tarvip commented Mar 6, 2017

How are you going to achieve the same functionality, having only dnsRoundRobin?

I understood you had plan for that:
will enable a bash script in cloud-config-controller to update Route 53 record sets

Actually there are ways to update route53 record automatically according to ASG changes, but I think it is easier to stick to ELB.

@tarvip
Copy link
Contributor

tarvip commented Mar 6, 2017

How would you like to express which k8s api endpoint worker nodes connect to, when there are 2 or more endpoints?

I guess you have to use separate option for that: internal: true or internal: false, I understand at the moment there is no such option. internal keyword is only in dnsName.

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 6, 2017

I understood you had plan for that:

No(for now)!
What I meant is just updating record sets according to the list of nodes but not implementing our own health checks.
Implementing our own heath checks(with or without Route 53 health checks??) for kubelets on worker nodes would probably allow us to dynamically update record sets to only include healthy nodes but I'd rather go with ELB for simplicity if you need such functionality, hence my previous question.

Actually there are ways to update route53 record automatically according to ASG changes, but I think it is easier to stick to ELB.

Could you share us options you're aware of?

@tarvip
Copy link
Contributor

tarvip commented Mar 6, 2017

These options are based on cloudwatch events and lambda, e.g: https://aws.amazon.com/blogs/compute/building-a-dynamic-dns-for-route-53-using-cloudwatch-events-and-lambda/
or
https://objectpartners.com/2015/07/07/aws-tricks-updating-route53-dns-for-autoscalinggroup-using-lambda/
I haven't tried this myself, although I have plan to do that (just not for kubernetes)
Anyway, I guess it is overkill at the moment and it is much easier to use ELB.

@cknowles
Copy link
Contributor

cknowles commented Mar 7, 2017

@mumoshu I can confirm that this seems to resolve #343. I don't need the extreme examples with multiple ELBs etc but can see how people may wish for those. I wonder if it's possible to provide some separation so we don't need to directly support every case explicitly, a bit like we do with security groups and managed role ARNs (note, I had a look but haven't spotted a way to do it here yet).

@spacepluk
Copy link
Contributor

@mumoshu thanks! I think this adds a lot of flexibility. I'm planning on testing it next week.

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 10, 2017

Considering to add worker.apiEndpointName so that we can configure our worker kubelets to contact to specific k8s endpoint if there're multiple candidates:

worker:
  apiEndpointName: v2.k8s.internal.example.com

apiEndpoints:
- dnsName: v2.k8s.external.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups:
    - id: sg-toallowexternalaccess
- dnsName: v2.k8s.internal.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone internal.example.com>
    securityGroups:
    - id: sg-toallowinternalacess

@cknowles
Copy link
Contributor

Could we designate one of them as the primary or internal one instead of needing the reference or do we need that flexibility?

apiEndpoints:
- dnsName: v2.k8s.external.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups:
    - id: sg-toallowexternalaccess
  kubelet: true

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 10, 2017

@c-knowles Thanks as always for your feedback!
apiEndpoints[].kubelet is ok but I just prefer worker.apiEndpointName because then:

  1. we won't trap users to create multiple api endpoints with kubelet: true and emit validation errors. worker-apiEndpointName naturally prevents users from doing so
  2. we can extend it to also support worker.nodePools[].apiEndpointName in the future if necessary. Probably to audit/control/log access from a node pool to an api endpoint at the infrastructure level but not e.g. RBAC?

@cknowles
Copy link
Contributor

On the flip side, we'd need to add validation errors for when the user specifies an API endpoint that doesn't exist in their list of endpoints under apiEndpoints. Unless you are proposing that the worker API endpoints would always add to the set of endpoints.

e.g. in your above example if we changed worker to:

worker:
  apiEndpointName: extra-endpoint.k8s.internal.example.com

Is there a validation error or do we end up with 3 possible endpoints?

@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 10, 2017

Ah, sorry! I had mistakes in my last example.
It should be:

worker:
  apiEndpointName: versionedInternalEndpoint

apiEndpoints:
- name: versionedExternalEndpoint
  dnsName: v2.k8s.external.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone external.example.com>
    securityGroups:
    - id: sg-toallowexternalaccess
- name: versionedInternalEndpoint
  dnsName: v2.k8s.internal.example.com
  loadBalancer:
    hostedZone:
      id: <id for the hosted zone internal.example.com>
    securityGroups:
    - id: sg-toallowinternalacess

And yes, an incorrect name in worker.apiEndpointName should just emit an error like "No api endpoint named ***** defined under the key apiEndpoints in cluster.yaml"

@cknowles
Copy link
Contributor

OK sure. My opinion is either way is fine because either way must provide validation rules and errors. So depends how verbose you wish the config to be I suppose.

@mumoshu mumoshu added this to the v0.9.6 milestone Mar 20, 2017
@mumoshu
Copy link
Contributor Author

mumoshu commented Mar 23, 2017

Got it. Thanks for your confirmation and suggestion 👍

@mumoshu
Copy link
Contributor Author

mumoshu commented Apr 30, 2017

Multi API endpoints & API endpoints without ELBs have been introduced via #468
DNS round-robin is still a TODO though.
My biggest concern for now on adding the support for DNS round-robin is a robust and not too complex way to continuously sync A records with IPs of controller nodes.

@mumoshu mumoshu modified the milestones: v0.9.7, v0.9.6 Apr 30, 2017
@mumoshu mumoshu modified the milestones: v0.9.7-rc.4, v0.9.7-rc.3 Jun 9, 2017
@mumoshu mumoshu modified the milestones: v0.9.7-rc.4, backlog Jul 12, 2017
@spacepluk
Copy link
Contributor

Any progress with the DNS round-robin thing?

@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 22, 2018

@spacepluk No, sorry. Honestly I have been never considered to use DNS round-robin for this myself, due to lack of benefits(for me). Probably I'll be open to accept PRs though.

@mumoshu
Copy link
Contributor Author

mumoshu commented Feb 22, 2018

Opened #1151 for the round-robin thing. Closing this the original issue was already resolved.

@mumoshu mumoshu closed this as completed Feb 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants