Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using bridge network for nodes #484

Open
wants to merge 15 commits into
base: master
from

Conversation

@tao12345666333
Copy link
Member

commented May 5, 2019

ref: #408 (comment) and #408 (comment)

we can using bridge network for nodes. after this we can using hostname instead of IPs.
It will push #408 forward.

things need todo:

  • create bridge network
  • create nodes using bridge netwrok

- [ ] using hostname instead of IPs (config files.)

  • load-balancer config

  • delete network while deleting cluster.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tao12345666333
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: bentheelder

If they are not already assigned, you can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from BenTheElder and munnerz May 5, 2019

@k8s-ci-robot k8s-ci-robot added size/M size/L and removed size/M labels May 5, 2019

@BenTheElder

This comment has been minimized.

Copy link
Member

commented May 5, 2019

@BenTheElder

This comment has been minimized.

Copy link
Member

commented May 5, 2019

The custom bridge network's magical hostname resolution is causing coreDNS to detect a loop and fail.

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented May 5, 2019

It seems that I missed some useful information. 😯

@neolit123

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

The custom bridge network's magical hostname resolution is causing coreDNS to detect a loop and fail.

hm, the following is not an option:

  • editing the coredns Corefile to modify the forward or disable the coredns loop detection.

we need to think of a way to avoid the loop.
docker dns configuration flags:
https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/

/cc
/cc @aojea

@k8s-ci-robot k8s-ci-robot requested review from aojea and neolit123 May 5, 2019

@aojea

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

The DNS loop seems caused because the docker DNS uses always the address 127.0.0.11 as quoted from the docker docs

Note: The DNS server is always at 127.0.0.11

and that´s the address configured on the node´s resolv.conf


nameserver 127.0.0.11
options ndots:0

CoreDNS documentation explains why such addresses causes a loop https://github.com/coredns/coredns/blob/master/plugin/loop/README.md

A common cause of forwarding loops in Kubernetes clusters is an interaction with a local DNS cache on the host node (e.g. systemd-resolved). For example, in certain configurations systemd-resolved will put the loopback address 127.0.0.53 as a nameserver into /etc/resolv.conf. Kubernetes (via kubelet) by default will pass this /etc/resolv.conf file to all Pods using the default dnsPolicy rendering them unable to make DNS lookups (this includes CoreDNS Pods). CoreDNS uses this /etc/resolv.conf as a list of upstreams to forward requests to. Since it contains a loopback address, CoreDNS ends up forwarding requests to itself.

It offers 3 solutions, but personally I only see feasible the 1st want mentioned modifying the kubelet yaml. Let me try to see what´s needed to implement it and come back to you.

In the other hand, I don´t know if creating a different bridge per cluster will break some functionality, I´m especially concerned about the multicluster/federation scenarios. I think it will be safer first to create a network for kind and then see if we need to move to a network per cluster

@aojea

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

@tao12345666333 in order to keep backward compatibility and avoid the DNS crash we should:

  1. Copy the resolv.conf from the host to /kind/resolv.conf per example filtering all the loopback addresses (this is what Docker is doing when using the default bridge https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/)

When creating the container’s /etc/resolv.conf, the daemon filters out all localhost IP address nameserver entries from the host’s original file.

  1. Tell kubelet to use /kind/resolv.conf for Pods.
    Seems kubeadm has a flag to configure this,,
@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented May 5, 2019

@aojea Thank you for the information, I will try it later.

@BenTheElder

This comment has been minimized.

Copy link
Member

commented May 5, 2019

does docker only rewrite etc/resolv.conf on container creation or continually? if the former, then we can just overwrite that?

otherwise we will have inconsistent DNS for system components.

@aojea

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

does docker only rewrite etc/resolv.conf on container creation or continually? if the former, then we can just overwrite that?

otherwise we will have inconsistent DNS for system components.

if you overwrite the resolv.conf of the nodes you lose the Docker DNS and the host discovery. I mean, there is no reason to move to user defined bridges

@BenTheElder

This comment has been minimized.

Copy link
Member

commented May 6, 2019

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented May 6, 2019

This fix can work at my side.

Show resolved Hide resolved pkg/cluster/nodes/create.go Outdated
@aojea

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

It seems I misunderstood the use of the resolv-conf flag in kubelet

May 06 11:05:34 kind-worker kubelet[200]: E0506 11:05:34.447977 200 kuberuntime_manager.go:693] createPodSandbox for pod "weave-net-khpnb_kube-system(0ebd9f5c-6fec-11e9-8010-0242ac120002)" failed: open /kind/resolv.conf: no such file or directory

The pods are not able to boot because they can´t find the new resolv.conf file. I was thinking that the file was copied inside each pod
The resolv.conf file has to be copied in all nodes, currently is not available on the worker nodes

Show resolved Hide resolved pkg/cluster/nodes/util.go Outdated
@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented May 6, 2019

/test pull-kind-conformance-parallel

@BenTheElder

This comment has been minimized.

Copy link
Member

commented May 6, 2019

In the other hand, I don´t know if creating a different bridge per cluster will break some functionality, I´m especially concerned about the multicluster/federation scenarios. I think it will be safer first to create a network for kind and then see if we need to move to a network per cluster

If you look at my strawman proposal, the idea is to make it configurable so federation can configure the same bridge for all.

Show resolved Hide resolved pkg/cluster/nodes/create.go Outdated
@@ -237,6 +239,7 @@ evictionHard:
nodefs.available: "0%"
nodefs.inodesFree: "0%"
imagefs.available: "0%"
resolvConf: /kind/resolv.conf

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder May 6, 2019

Member

I really don't like the idea that containerd and debugging tools will have a different resolv conf than kubelet.

This comment has been minimized.

Copy link
@tao12345666333

tao12345666333 May 6, 2019

Author Member

as you said

we could selectively merge the resolv.conf

This comment has been minimized.

Copy link
@tao12345666333

tao12345666333 May 6, 2019

Author Member

After considering it, I think the current practice is relatively simple. If we do resolv.conf merge, we need a lot of testing. Why don't we put it later? 🤔

Keep this practice for a while, then change it after the verification is passed.

I hope we can move forward. ❤️

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder May 6, 2019

Member

this means other components have an inconsistent resolv conf and we're not actually solving the issue. if we're going to use hostnames we need kubelet and everything else to have the same lookup.

This comment has been minimized.

Copy link
@aojea

aojea May 9, 2019

Contributor

technically having a different resolv.conf doesn't mean that have a different lookup, just that the pods will not be able to resolv the nodes names, the other requests will use the same dns server, however I agree that reality is always different and we can be broken because docker changes on the DNS behaviour, that will not be surprising 😅

Docker daemon runs an embedded DNS server which provides DNS resolution among containers connected to the same user-defined network, so that these containers can resolve container names to IP addresses. If the embedded DNS server is unable to resolve the request, it will be forwarded to any external DNS servers configured for the container. To facilitate this when the container is created, only the embedded DNS server reachable at 127.0.0.11 will be listed in the container’s resolv.conf file.

I agree with tao here, I think that we should left until last the DNS exercise

This comment has been minimized.

Copy link
@neolit123

neolit123 May 9, 2019

Contributor

+1 on merging the resolv.confs if feasable.

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

The most basic features of this PR expectation have been achieved. Includes:

  • create bridge network
  • create nodes using bridge netwrok
  • load-balancer config
  • delete network while deleting cluster.

By using the bridge network, we can use Docker's embedded DNS to help push #408 forward.

After this PR, we only need to modify /etc/kubernetes/pki/admin.conf in order to deal with #408. We no longer need to regenerate the certificate because the current certificate is still available(with this PR).

i.e.

root@moe-control-plane3:/etc/kubernetes# openssl x509 -in pki/apiserver.crt -noout -text                        
Certificate:                                                                                                     
    Data:                                                                                        
        Version: 3 (0x2)                                                                                         
        Serial Number: 4829775455184542867 (0x4306cf2fef2cdc93)                                                 
        Signature Algorithm: sha256WithRSAEncryption                                                             
        Issuer: CN = kubernetes                                    
        Validity                                                                                                 
            Not Before: May  8 13:55:37 2019 GMT                             
            Not After : May  7 13:56:57 2020 GMT                                                                 
        Subject: CN = kube-apiserver                                              
        Subject Public Key Info:                                                                                 
            Public Key Algorithm: rsaEncryption                                           
                RSA Public-Key: (2048 bit)                                                                      
                Modulus:                                                                                        
                    00:c3:cb:9e:9f:e5:d4:bf:c7:b1:a8:77:98:95:e5:  
                    (ignore)
                    fb:66:2a:d0:ab:b1:23:30:25:39:e5:91:91:6a:f6:
                    a9:15
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Subject Alternative Name:
                DNS:moe-control-plane3, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:k
ubernetes.default.svc.cluster.local, DNS:localhost, IP Address:10.96.0.1, IP Address:172.28.0.6, IP Address:172.2
8.0.2
    Signature Algorithm: sha256WithRSAEncryption
         70:65:07:9c:54:50:24:9f:59:59:8b:de:4f:60:c9:4e:47:5f:
         (ignore)

I want to know your thoughts. @BenTheElder @aojea @neolit123 Thanks.

@aojea

This comment has been minimized.

Copy link
Contributor

commented May 8, 2019

Ben created a proposal, I think that only the labeling networks thing is missing

#273 (comment)

Strawman:

if the specified network exists, we use it without creating
if it doesn't exist, we create it, and label it as belonging to the cluster
on cluster delete, we list networks labeled with the cluster, and delete only those (so not just whatever the containers use, only if we labeled it)
we name this field / functionality somehow such that it is clear that this feature is docker specific, leaving room for podman etc. in the immediate future xref #154.

There was some controversy regarding the different resolv.conf for nodes and clusters.
If we want to use the same the resolv.conf for everything I guess that the only options is rewriting it in the nodes and add the hostnames to /etc/hosts manually ... but don't know if docker is doing some magic behind the scenes on those files

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

On the processing of resolv.conf, I skipped the Windows system directly.
Because I don't have a Windows machine that can be used to test this feature. 😟 And I can't accurately judge the behavior of windows.

@aojea

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

On the processing of resolv.conf, I skipped the Windows system directly.
Because I don't have a Windows machine that can be used to test this feature. 😟 And I can't accurately judge the behavior of windows.

I have one, but I'm using WSL (Windows Subsystem for Linux) to run kind on it and it does have resolv.conf

@neolit123

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

info on the the windows host file:
https://gist.github.com/zenorocha/18b10a14b2deb214dc4ce43a2d2e2992

some notes:

meta question:

  • why did we end up modifying the DNS resolver on the host - can we avoid that?
  • what about MacOS?
@aojea

This comment has been minimized.

Copy link
Contributor

commented May 16, 2019

meta question:
why did we end up modifying the DNS resolver on the host - can we avoid that?

is about copying the resolv.conf from the hosts to the nodes, filtering the loopback addresses to avoid the CoreDNS crash loop, I think that this was the solution to unblock the work.

Using user defined bridges docker creates a resolv.conf that points to the docker internal DNS with address 127.0.0.11

I guess that now is time to decide how do we want to move forward :)

Show resolved Hide resolved pkg/cluster/nodes/util.go Outdated
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

@yeya24: changing LGTM is restricted to assignees, and only kubernetes-sigs/kind repo collaborators may be assigned issues.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

scoped err.
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

@tao12345666333 tao12345666333 force-pushed the tao12345666333:using-network branch 4 times, most recently from 95fa838 to eda987d Jun 5, 2019

only delete network which created by kind.
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>

@tao12345666333 tao12345666333 force-pushed the tao12345666333:using-network branch from eda987d to 48977ec Jun 5, 2019

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

/test pull-kind-conformance-parallel

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

It seems that the docker related commands cannot be tested in the CI environment, so I deleted the test case for the docker network.

@tao12345666333

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

I think we should do something to push this PR forward.

/assign @BenTheElder

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

@tao12345666333: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kind-conformance-parallel 48977ec link /test pull-kind-conformance-parallel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

}

// filter the loopback addresses
re := regexp.MustCompile("(?m)[\r\n]+^.*((127.([0-9]{1,3}.){2}[0-9]{1,3})|(::1)).*$")

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

this shouldn't even be necessary in the hosts's resolv.conf ?

// DeleteNetwork delete the special network
// only when the network was created by kind
func DeleteNetwork(networkName string) error {
if isNetworkCreatedByKind(networkName) {

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

this is orthogonal to the functionality of delete network. this package should be kind agnostic. we don't do this for delete containers

"docker", "network",
"create",
"--driver=bridge",
"--label="+fmt.Sprintf("%s=%s", constants.ClusterLabelKey, networkName),

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

this package should not be aware of the cluster label key. this package just implements docker functionality

}

// isNetworkCreatedByKind checks if it was created by kind
func isNetworkCreatedByKind(networkName string) bool {

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

this should live somewhere in cluster/

@@ -181,6 +182,7 @@ evictionHard:
nodefs.available: "0%"
nodefs.inodesFree: "0%"
imagefs.available: "0%"
resolvConf: /kind/resolv.conf

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

this is not being done for all api versions

@@ -193,16 +214,16 @@ func nodesToCreate(cfg *config.Cluster, clusterName string) []nodeSpec {
}

// TODO(bentheelder): remove network in favor of []cri.PortMapping when that is in
func (d *nodeSpec) Create(clusterLabel string) (node *nodes.Node, err error) {
func (d *nodeSpec) Create(networkName, clusterLabel string) (node *nodes.Node, err error) {

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

why isn't this in the nodeSpec?

@@ -102,4 +102,7 @@ type Networking struct {
// If DisableDefaultCNI is true, kind will not install the default CNI setup.
// Instead the user should install their own CNI after creating the cluster.
DisableDefaultCNI bool
// BridgeName is the bridge network name.
// Defaults to same as cluster name.
BridgeName string

This comment has been minimized.

Copy link
@BenTheElder

This comment has been minimized.

Copy link
@BenTheElder

BenTheElder Jun 5, 2019

Member

(this config field name is not suitable, we need to separate docker specific functionality and make that clear)

@BenTheElder

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

It seems that the docker related commands cannot be tested in the CI environment, so I deleted the test case for the docker network.

they absolutely can, but the unit tests should be unit tests, not invoking docker commands (!)

this is why we need #528

I think we should do something to push this PR forward.

@tao12345666333 unfortunately I need to prioritize some other fixes etc related to testing Kubernetes (EG the HA fixes), and we have many open PRs.. This is pretty far into P3 https://kind.sigs.k8s.io/docs/contributing/project-scope/

I'm still not convinced that the DNS approach is a good one to solve this problem and may introduce more problems, and the field name needs thought / is not in line with the proposal. Left some more specific review feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.