Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 e2e Tests #47666

Closed
danehans opened this issue Jun 16, 2017 · 30 comments
Closed

IPv6 e2e Tests #47666

danehans opened this issue Jun 16, 2017 · 30 comments
Labels
area/ipv6 lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@danehans
Copy link

An effort is underway to add IPv6 functionality across Kubernetes. Currently, no tests exist to verify e2e functionality for Kubernetes when using IPv6.

@danehans
Copy link
Author

@dcbw any info you can share regarding the OpenShift DIND e2e test setup would be appreciated. I would like to explore using OpenShift DIND as a reference model for the v6 e2e test.

@danehans
Copy link
Author

danehans commented Jun 16, 2017

/area ipv6

@cmluciano
Copy link

/sig network

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Jun 16, 2017
@cmluciano
Copy link

/sig testing

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jun 16, 2017
@dcbw
Copy link
Member

dcbw commented Jun 20, 2017

@danehans the core script is https://github.com/openshift/origin/blob/master/hack/dind-cluster.sh . The workflow is basically:

  1. hack/dind-cluster.sh start
  2. wait a while
  3. . dind-openshift.rc
  4. run any 'oc' or "kubectl" commands you want; create pods/services/resources/whatever
  5. docker exec -it openshift-node-2 bash (or openshift-node-1, or openshift-master, etc)
  6. poke around
  7. hack/dind-cluster.sh stop

If you want to test it out, just git clone the openshift/origin repo, cd into it, type 'make', and then once that's done proceed with hack/dind-cluster.sh start. Trivial to test.

Each "machine" or "node" is just a docker container, in which the given openshift process runs. Inside that container, another 'docker' also runs which is the docker that actually creates the containers in which the pods run. Hence "docker in docker". Note that openshift builds combined binaries, much like hyperkube. So openshift-node is a combined kubelet+kube-proxy, while openshift-master is a combined apiserver+controller-manager+etcd.

The "machines" are just normal docker containers attached to the docker0 bridge; nothing special. Your e2e script could set up a custom docker network that is IPv6-enabled, and then docker should assign the "machine" IPv6 address to eth0 inside the machine container.

Then you're golden, inside the machine container you can do whatever you want.

The mechanics of the script itself are a bit more complicated obviously, and it builds some docker images on first-run that are based on Fedora 25. See https://github.com/openshift/origin/tree/master/images/dind for those. These images are then the basis of the "machine" containers for docker.

The script then proceeds to configure a bunch of stuff, which is roughly analagous to what Kubernetes' local-up-cluster.sh does. So some combination of dind-cluster.sh and local-up-cluster.sh would be the path forward here. It's going to be a bit messy to do the integration, but perhaps starting by stripping down the dind-cluster.sh script to a bare minimum where it just starts the "machine" docker containers would be a good start, then proceed to call out to bits (and maybe some refactoring) of local-up-cluster.sh after that.

@dcbw
Copy link
Member

dcbw commented Jun 20, 2017

@marun points out that https://github.com/Mirantis/kubeadm-dind-cluster might be a better place to start

@marun
Copy link
Contributor

marun commented Jun 20, 2017

cc: @ivan4th

@ivan4th
Copy link
Contributor

ivan4th commented Jun 20, 2017

I'll be glad to help with kubeadm-dind-cluster if it really can be used for this task. It used to have bare subcommand that starts just the 'machine' container. I disabled that some time ago but can bring it back quickly if it indeed can be useful.

@danehans
Copy link
Author

@ivan4th thanks for the offer. The sig-network IPv6 Working Group needs to create an e2e test that runs on GCE. Since GCE does not support IPv6, kubeadm-dind-cluster may be a helpful solution to the problem. I've done a quick read of kubeadm-dind-cluster and have a few questions:

kubeadm-dind-cluster supports k8s versions 1.4.x (tested with 1.4.9), 1.5.x (tested with 1.5.4) and 1.6.x (tested with 1.6.1). 1.6 branch currently has some stability issues because of pod termination taking too long so your mileage may vary.

I see kubeadm-dind-cluster also supports a src option. This is the option we are most interested in because the working group has several outstanding PRs that need e2e testing. Have you successfully tested the src option recently?

As of now, running kubeadm-dind-cluster on Docker with btrfs storage driver is not supported.

Is ^ still an issue even though the moby/moby#9939 and #38337 have been fixed?

@jellonek
Copy link
Contributor

Looks like good case to use kubeadm-dind-cluster with virtlet, to spawn kubernetes on kubernetes. As we have a possibility to run on cloud without enabled nested virtualization (with fallback to plain emulation instead of virtualization) in this case we could spawn tested environment in virtualized network, separated from real one - what could be useful in case of e2e tests.
This should possibly work even on GCE.

@ivan4th
Copy link
Contributor

ivan4th commented Jun 21, 2017

@jellonek Well, on GCE Virtlet-based VMs will be slow (non-KVM). Although in non-GCE case k8s-on-k8s example may be indeed useful, here we're talking about GCE env where DIND is much faster.

@danehans
Concerning src option, yes, it works, also it's run on CI, e.g. here's a test run for relatively fresh k8s master (bash -x is used on CI to run the scripts, thus overly verbose output)

Concerning btrfs, as far as I understand moby/moby#9939 is not completely fixed, i.e. orphan subvolumes will still cause docker rm -fv ... fail on DIND containers. We'll do some checks. If this is critical, I can try and add this subvolume cleanup hack to (hopefully) make k-d-c work on btrfs.

@ivan4th
Copy link
Contributor

ivan4th commented Jun 21, 2017

(typo in mention, wrote Jell instead of jellonek initially -- sorry for spam)

@danehans
Copy link
Author

@ivan4th I successfully deployed a 3-node v1.6 k8s cluster using https://github.com/Mirantis/kubeadm-dind-cluster in my Docker for Mac (10.12.5) dev environment . I had to brew install md5sha1sum and update my kubectl client (v1.6.3).

$ docker info
Containers: 3
 Running: 2
 Paused: 0
 Stopped: 1
Images: 120
Server Version: 17.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 158
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.27-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952 GiB
Name: moby
ID: FAOD:PDCR:VQJG:GQF2:RRI6:HSG7:E3YW:BF4N:UY5R:2HNO:7DV3:YAPX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 25
 Goroutines: 36
 System Time: 2017-06-21T22:46:22.140322309Z
 EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

I'm super impressed on how easy and fast this multi-node setup is. Nice work!

I am starting to test changes that are needed for IPv6. I first tried updating DIND_SUBNET to IPv6 and that failed:

-DIND_SUBNET=10.192.0.0
+DIND_SUBNET=2001::

$ ./dind-cluster-v1.6.sh up
* Making sure DIND image is up to date 
v1.6: Pulling from mirantis/kubeadm-dind-cluster
Digest: sha256:b81a47264b1992bfeb76f0407e886feded413edd7f5fcbab02ea296831b43db2
Status: Image is up to date for mirantis/kubeadm-dind-cluster:v1.6
* Saving a copy of docker host's /lib/modules 
* Starting DIND container: kube-master
docker: Error response from daemon: invalid IPv4 address: 2001::.2.
See 'docker run --help'.
* Running kubeadm: init --pod-network-cidr=10.244.0.0/16 --skip-preflight-checks
*** kubeadm failed

I opened kubernetes-retired/kubeadm-dind-cluster#17 which is related to the above.

Do you have bandwidth to help me fix issues I find in the kubeadm-dind-cluster project?

/cc: @leblancd @pmichali @rpothier

@danehans
Copy link
Author

danehans commented Jun 22, 2017

I moved kubeadm-dind-cluster testing from my Mac to a Cent7 box b/c Docker for Mac apparently does not support IPv6. I am now unable to create a cluster using kubeadm-dind-cluster due to the following docker daemon error:

Jun 22 19:23:34 kube-master systemd[1]: Starting Docker Application Container Engine...
Jun 22 19:23:34 kube-master rundocker[180]: Trying to load overlay module (this may fail)
Jun 22 19:23:34 kube-master rundocker[180]: time="2017-06-22T19:23:34.197080260Z" level=info msg="libcontainerd: new containerd process, pid: 189"
Jun 22 19:23:35 kube-master rundocker[180]: time="2017-06-22T19:23:35.208001616Z" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"
Jun 22 19:23:35 kube-master systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 22 19:23:35 kube-master systemd[1]: Failed to start Docker Application Container Engine.
Jun 22 19:23:35 kube-master systemd[1]: docker.service: Unit entered failed state.
Jun 22 19:23:35 kube-master systemd[1]: docker.service: Failed with result 'exit-code'.

Here is my docker and OS details:

$ docker version
Client:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Tue Jan 10 20:20:01 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.6
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   78d1802
 Built:        Tue Jan 10 20:20:01 2017
 OS/Arch:      linux/amd64

$ rpm --query centos-release
centos-release-7-3.1611.el7.centos.x86_64

I'm going to upgrade Docker and try again.

@danehans
Copy link
Author

@ivan4th I'm still having the same problem after the Docker upgrade. Can you share the details of your setup so I can replicate? I would like to know your OS, kernel and docker versions. Thanks!

@ivan4th
Copy link
Contributor

ivan4th commented Jun 22, 2017

@danehans
Would you please paste the output from docker info|grep Storage? Perhaps your kernel is missing overlay (overlayfs) driver which is by default used in the inner docker?
I usually run DIND on stock Ubuntu 16.04.2 without any problems. There's gce-setup.sh script bundled that may help with running on GCE (although I didn't try it lately, need to recheck):

. ./gce-setup.sh

(it needs to be sourced; also see the contents of the script for settings etc.)
The script will use docker-machine to launch GCE instance and then use it
to run the cluster. In case if k8s is built from source, the instance will be used
to run the build container, too.

As of IPv6, I didn't use it yet with DIND, but I'll try to see what I can do to make it possible to specify IPv6 subnet.
Sorry for slow response, we have time zone difference...

@danehans
Copy link
Author

@ivan4th I configured docker to use the overlay driver, but I am hitting the same error. I can successfully deploy k8s using kubeadm directly. Here are the details:

# docker info|grep Storage
Storage Driver: overlay

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.centos.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Fri May 26 17:28:18 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.centos.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Fri May 26 17:28:18 2017
 OS/Arch:         linux/amd64

# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

I'll try on Ubuntu 16.04.2 since that works for you.

@ivan4th
Copy link
Contributor

ivan4th commented Jun 23, 2017

I'll try DIND on CentOS 7 too & see what's going on there.

k8s-github-robot pushed a commit that referenced this issue Jun 24, 2017
Automatic merge from submit-queue (batch tested with PRs 47869, 48013, 48016, 48005)

Adds IPv6 unit test cases to kubeadm

**What this PR does / why we need it**:
Adds IPv6 test cases to kubeadm. It's needed to ensure test cases cover IPv6 related networking scenarios for kubeadm-based k8s deployments.

**Which issue this PR fixes**
This PR is in support of Issue #1443.

**Special notes for your reviewer**:
Additional PR's may follow as e2e testing is being developed by Issue #47666.

**Release note**:
```NONE
```
@danehans
Copy link
Author

kubernetes/community#629

@danehans
Copy link
Author

xref #48227

@pmichali
Copy link
Contributor

@ivan4th I'm seeing the same issue. I get the same docker error with graphdriver, when kubeadm init starts.

OS: cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)

kernel: 3.10.0-514.26.1.el7.x86_64

Docker version:
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:20:36 2017
OS/Arch: linux/amd64

Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:21:56 2017
OS/Arch: linux/amd64
Experimental: false

go version go1.8.1 linux/amd64

Docker Storage Driver: devicemapper

@pmichali
Copy link
Contributor

@ivan4th Any thoughts as to what I can try to get this working?

@pmichali
Copy link
Contributor

@ivan4th I saw another issue as well. When I use the BUILD_KUBEADM and BUILD_HYPERKUBE flags it creates the image, and then tries to get rid of the docker container (kill and wait), but there is no container running, so it hangs forever.

@luxas
Copy link
Member

luxas commented Jun 30, 2017

@pmichali Not worth discussing here -- open issues in the kubeadm-dind-cluster repo instead

k8s-github-robot pushed a commit that referenced this issue Aug 15, 2017
Automatic merge from submit-queue

Adds IPv6 test case to kubeadm bootstrap

**What this PR does / why we need it**:
Adds IPv6 test cases in support of kubeadm bootstrap functionality. It's needed to ensure test cases cover IPv6 related networking scenarios.

**Which issue this PR fixes**
This PR is in support of Issue #1443 and Issue #47666

**Special notes for your reviewer**:
Additional PR's will follow to ensure kubeadm fully supports IPv6.

**Release note**:
```NONE
```

/area ipv6
k8s-github-robot pushed a commit that referenced this issue Aug 17, 2017
Automatic merge from submit-queue

Updates Kubeadm Master Endpoint for IPv6

**What this PR does / why we need it**:
Previously, kubeadm would use ip:port to construct a master
endpoint. This works fine for IPv4 addresses, but not for IPv6.
Per [RFC 3986](https://www.ietf.org/rfc/rfc3986.txt), IPv6 requires the ip to be encased in brackets
when being joined to a port with a colon.

This patch updates kubeadm to support wrapping a v6 address with
[] to form the master endpoint url. Since this functionality is
needed in multiple areas, a dedicated util function was created
for this purpose.

**Which issue this PR fixes**
Fixes Issue kubernetes/kubeadm#334

**Special notes for your reviewer**:
As part of a bigger effort to add IPv6 support to Kubernetes:
Issue #1443
Issue #47666

**Release note**:
```NONE
```
/area kubeadm
/area ipv6
/sig network
/sig cluster-ops
@leblancd
Copy link

leblancd commented Oct 6, 2017

The following PRs are needed for IPv6 e2e testing:
PR #52748
PR #53384
PR #53389
PR #53531
PR #53555 (not in e2e tests, but needed for UDP connection testing)

@leblancd
Copy link

Another PR needed for IPv6 e2e testing (in addition to those listed above):
PR #53569

@leblancd
Copy link

The Kubernetes sig/network IPv6 work group has put together an IPv6 Test Suite that's a start of a test plan for what we intend to (eventually) test as an upstream gating test. At some point, we're hoping to be running this set of test cases on a virtualized, multi-node, IPv6 cluster that's instantiated using kubeadm-dind-cluster with a bunch of IPv6 support added by @danehans and @pmichali, with help from @ivan4th. Appreciate any feedback/suggestions.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipv6 lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests