New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded dns server breaks DIND #20037

Open
clinta opened this Issue Feb 5, 2016 · 33 comments

Comments

Projects
None yet
@clinta
Copy link

clinta commented Feb 5, 2016

$ docker --version
Docker version 1.10.0, build 590d5108

Steps to reproduce:

$ docker network create test
1d8159ad6dd00935a91f4cb1d3c61d9f55c0e6c08292780fc43936fcf171cb6e

$ docker run -d --name=dind --net=test --privileged docker:dind
51681e1f9e0a69216ff206011f420a490f8a518a1c68776bf9cc1b71e4783974

$ docker exec dind cat /etc/resolv.conf
search domain.local
nameserver 127.0.0.11
options ndots:0

$ # prove that network and dns connectivity work for other commands

$ docker exec dind ping -c1 google.com 
PING google.com (216.58.216.78): 56 data bytes
64 bytes from 216.58.216.78: seq=0 ttl=52 time=7.998 ms

--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 7.998/7.998/7.998 ms

$ docker exec dind docker run busybox                    
Unable to find image 'busybox:latest' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/library/busybox/manifests/latest: Get https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io: dial tcp: lookup auth.docker.io on 127.0.0.11:53: no such host.
See 'docker run --help'

@thaJeztah thaJeztah added this to the 1.10.1 milestone Feb 5, 2016

@cpuguy83

This comment has been minimized.

Copy link
Contributor

cpuguy83 commented Feb 5, 2016

What distro? Is firewalld enabled?

@dnephin

This comment has been minimized.

Copy link
Member

dnephin commented Feb 5, 2016

I've hit this problem with a dind image as well.

After talking with the libnetwork team I opened docker/libnetwork#924 to track the problem.

A workaround is to set the --dns param: docker run --dns 8.8.8.8 docker:dind
Another is to use the bridge network instead of a user defined network.

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 5, 2016

Ubuntu 14.04, no firewalld.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 5, 2016

@clinta that could also be because docker:dind is 1.9.1 based. can you try both the outer and inner docker daemon running 1.10.0 ?

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 5, 2016

@mavenugo, this is the latest dind image, pulled just a few minutes before submitting the issue which runs 1.10.0.

Here's the dind image I tested with:

       "Id": "sha256:6a4561653ad5445dcd95fcd377507a9a5f1b231da8383c294ea62f921b211f26",
        "RepoTags": [
            "docker:dind"
        ],
        "RepoDigests": [],
        "Parent": "",
        "Comment": "",
        "Created": "2016-02-05T00:17:31.287853182Z",
@cpuguy83

This comment has been minimized.

Copy link
Contributor

cpuguy83 commented Feb 5, 2016

@clinta Can you paste output of docker info and docker version

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 5, 2016

Sure, here' s a full test with docker info for both inside and outside dind:

$ docker info
Containers: 2
 Running: 0
 Paused: 0
 Stopped: 2
Images: 13
Server Version: 1.10.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 17
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: host bridge null
Kernel Version: 3.13.0-77-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 980 MiB
Name: dc0-cadev1
ID: 55YA:XPQK:DRLZ:LULX:MIAJ:2J7W:TG7X:AZ3Y:Q7AU:MBEX:5BTX:UJZR
WARNING: No swap limit support


$ docker version
Client:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 18:36:33 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 18:36:33 2016
 OS/Arch:      linux/amd64


$ docker network create test
29b61f02909bea6c266007ae0334c1935fd63d09ee7a31860f5f9f3317f78874


$ docker run -d --name=dind --net=test --privileged docker:dind
Unable to find image 'docker:dind' locally
dind: Pulling from library/docker
c52e3ed763ff: Pull complete 
66d741d85d02: Pull complete 
a3ed95caeb02: Pull complete 
252570792da3: Pull complete 
94830311dd27: Pull complete 
c09eab511044: Pull complete 
6b276d34e45e: Pull complete 
0894bb56ca80: Pull complete 
Digest: sha256:6a097397cd50b9613bd7e8cd884d29bcc15450d8526145f4485bedc60cd54530
Status: Downloaded newer image for docker:dind
eaec7189f87cbb3220b5f188ff15975f9cd49c8de124eb4a87cb6dfb09aee936


$ docker exec dind docker info
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.10.0
Storage Driver: vfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: null host bridge
Kernel Version: 3.13.0-77-generic
Operating System: Alpine Linux v3.3 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 980 MiB
Name: eaec7189f87c
ID: TMFM:BNA6:G7GZ:FQRX:ZYRO:F3LG:TCKC:WNLT:2GOO:GS3R:6VQB:MI32


$ docker exec dind docker version
Client:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 19:55:25 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.0
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   590d5108
 Built:        Thu Feb  4 19:55:25 2016
 OS/Arch:      linux/amd64


$ docker exec dind docker run busybox
Unable to find image 'busybox:latest' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/library/busybox/manifests/latest: Get https://auth.docker.io/token?scope=repository%3Alibrary%2Fbusybox%3Apull&service=registry.docker.io: dial tcp: lookup auth.docker.io on 127.0.0.11:53: no such host.
See 'docker run --help'.
@clinta

This comment has been minimized.

Copy link

clinta commented Feb 5, 2016

I can confirm the --dns workaround does work. Unfortunately this has broken our gitlab ci which runs a dind container and does not expose the --dns option.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 5, 2016

@clinta thanks. FYI, running the outer DNS in default bridge network should also work (without the --net). Can you please confirm ?
I will try and reproduce it as well.

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 5, 2016

Yes, it does work with the default bridge network, in which case /etc/resolv.conf is not touched.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 6, 2016

@clinta am unable to reproduce the issue.
It could be environment related.

Can you share

  1. the contents of /etc/resolv.conf in your host machine (including the comment lines)
  2. docker daemon configuration
@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

resolv.conf simply contains our internal DNS servers and search domain:

$ cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.32.0.1
nameserver 10.32.1.1
search ad.trilliumstaffing.com

The docker dameon config on this machine is all default:

$ sudo ps -aux | grep docker | grep -v grep
root       1067  0.2  3.0 424572 31072 ?        Ssl  07:44   0:01 /usr/bin/docker daemon


$ cat /etc/default/docker
# Docker Upstart and SysVinit configuration file

#
# THIS FILE DOES NOT APPLY TO SYSTEMD
#
#   Please see the documentation for "systemd drop-ins":
#   https://docs.docker.com/engine/articles/systemd/
#

# Customize location of Docker binary (especially for development testing).
#DOCKER="/usr/local/bin/docker"

# Use DOCKER_OPTS to modify the daemon startup options.
#DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"

# If you need Docker to use an HTTP proxy, it can also be specified here.
#export http_proxy="http://127.0.0.1:3128/"

# This is also a handy place to tweak where Docker's temporary files go.
#export TMPDIR="/mnt/bigdrive/docker-tmp"

/etc/resolv.conf inside the dind container appears to inherit the search domain and gets options ndots:0.

$ docker exec dind cat /etc/resolv.conf
search ad.trilliumstaffing.com
nameserver 127.0.0.11
options ndots:0

If I remove the search domain from my resolv.conf, then restart the container, I get a different error when trying to pull:

$ docker exec dind docker run busybox
Unable to find image 'busybox:latest' locally
Pulling repository docker.io/library/busybox
Error while pulling image: Get https://index.docker.io/v1/repositories/library/busybox/images: dial tcp: lookup index.docker.io on 127.0.0.11:53: cannot unmarshal DNS message

I decided to get some packet captures of port 53 while doing these tests. It look like with the search domain enabled, docker is trying to query index.docker.io.<searchdomain>

$ sudo tcpdump -n -i any port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes

08:05:23.344658 IP 10.0.2.138.36827 > 10.32.0.1.53: 57766+ A? registry-1.docker.io. (38)
08:05:23.344990 IP 10.0.2.138.42786 > 10.32.0.1.53: 21421+ AAAA? registry-1.docker.io. (38)
08:05:23.368382 IP 10.32.0.1.53 > 10.0.2.138.42786: 21421 1/1/0 CNAME registry-origin.docker.io. (152)
08:05:23.369151 IP 10.0.2.138.47538 > 10.32.0.1.53: 65494+ AAAA? registry-1.docker.io.ad.trilliumstaffing.com. (62)
08:05:23.369661 IP 10.32.0.1.53 > 10.0.2.138.47538: 65494 NXDomain 0/1/0 (117)
08:05:23.389725 IP 10.32.0.1.53 > 10.0.2.138.36827: 57766 4/4/0 CNAME registry-origin.docker.io., A 52.3.65.18, A 52.72.218.20, A 52.5.201.27 (256)
08:05:23.512199 IP 10.0.2.138.41716 > 10.32.0.1.53: 28999+ A? auth.docker.io. (32)
08:05:23.512584 IP 10.0.2.138.60294 > 10.32.0.1.53: 13918+ AAAA? auth.docker.io. (32)
08:05:23.535731 IP 10.32.0.1.53 > 10.0.2.138.41716: 28999 5/4/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.3.65.18, A 52.5.201.27, A 52.72.218.20 (347)
08:05:23.535985 IP 10.32.0.1.53 > 10.0.2.138.60294: 13918 2/1/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com. (244)
08:05:23.536730 IP 10.0.2.138.44603 > 10.32.0.1.53: 9185+ AAAA? auth.docker.io.ad.trilliumstaffing.com. (56)
08:05:23.537099 IP 10.0.2.138.45149 > 10.32.0.1.53: 44061+ A? auth.docker.io. (32)
08:05:23.537227 IP 10.32.0.1.53 > 10.0.2.138.44603: 9185 NXDomain 0/1/0 (111)
08:05:23.537725 IP 10.32.0.1.53 > 10.0.2.138.45149: 44061 5/4/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.3.65.18, A 52.5.201.27, A 52.72.218.20 (347)
08:05:23.538231 IP 10.0.2.138.40126 > 10.32.0.1.53: 49618+ A? auth.docker.io.ad.trilliumstaffing.com. (56)
08:05:23.538629 IP 10.32.0.1.53 > 10.0.2.138.40126: 49618 NXDomain 0/1/0 (111)
08:05:23.540341 IP 10.0.2.138.46956 > 10.32.0.1.53: 57472+ A? index.docker.io. (33)
08:05:23.540683 IP 10.0.2.138.50078 > 10.32.0.1.53: 16943+ AAAA? index.docker.io. (33)
08:05:23.562622 IP 10.32.0.1.53 > 10.0.2.138.46956: 57472 5/4/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.164.250.255, A 54.152.78.181, A 52.0.10.162 (338)
08:05:23.562697 IP 10.32.0.1.53 > 10.0.2.138.50078: 16943 2/1/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com. (235)
08:05:23.563663 IP 10.0.2.138.42041 > 10.32.0.1.53: 20862+ A? index.docker.io. (33)
08:05:23.564004 IP 10.0.2.138.33971 > 10.32.0.1.53: 32505+ AAAA? index.docker.io.ad.trilliumstaffing.com. (57)
08:05:23.564497 IP 10.32.0.1.53 > 10.0.2.138.42041: 20862 5/4/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.152.78.181, A 54.164.250.255, A 52.0.10.162 (338)
08:05:23.564978 IP 10.0.2.138.45788 > 10.32.0.1.53: 9310+ A? index.docker.io.ad.trilliumstaffing.com. (57)
08:05:23.565020 IP 10.32.0.1.53 > 10.0.2.138.33971: 32505 NXDomain 0/1/0 (112)
08:05:23.565793 IP 10.32.0.1.53 > 10.0.2.138.45788: 9310 NXDomain 0/1/0 (112)
^C
26 packets captured
26 packets received by filter
0 packets dropped by kernel

And here is what the dns traffic looks like without a search domain:

$ sudo tcpdump -n -i any port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes




08:17:44.926989 IP 10.0.2.138.45624 > 10.32.0.1.53: 26645+ A? registry-1.docker.io. (38)
08:17:44.927316 IP 10.0.2.138.36018 > 10.32.0.1.53: 24194+ AAAA? registry-1.docker.io. (38)
08:17:44.972485 IP 10.32.0.1.53 > 10.0.2.138.36018: 24194 1/1/0 CNAME registry-origin.docker.io. (152)
08:17:44.974771 IP 10.32.0.1.53 > 10.0.2.138.45624: 26645 4/4/0 CNAME registry-origin.docker.io., A 52.5.201.27, A 52.72.218.20, A 52.3.65.18 (256)
08:17:45.108428 IP 10.0.2.138.49591 > 10.32.0.1.53: 2674+ A? auth.docker.io. (32)
08:17:45.108742 IP 10.0.2.138.57268 > 10.32.0.1.53: 20656+ AAAA? auth.docker.io. (32)
08:17:45.192245 IP 10.32.0.1.53 > 10.0.2.138.49591: 2674 5/4/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.72.218.20, A 52.3.65.18, A 52.5.201.27 (347)
08:17:45.192300 IP 10.32.0.1.53 > 10.0.2.138.57268: 20656 2/1/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com. (244)
08:17:45.193331 IP 10.0.2.138.35990 > 10.32.0.1.53: 17915+ A? auth.docker.io. (32)
08:17:45.193919 IP 10.32.0.1.53 > 10.0.2.138.35990: 17915 5/4/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.72.218.20, A 52.5.201.27, A 52.3.65.18 (347)
08:17:45.195625 IP 10.0.2.138.42937 > 10.32.0.1.53: 35881+ A? index.docker.io. (33)
08:17:45.196159 IP 10.0.2.138.55903 > 10.32.0.1.53: 2005+ AAAA? index.docker.io. (33)
08:17:45.267934 IP 10.32.0.1.53 > 10.0.2.138.42937: 35881 5/4/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.164.250.255, A 52.0.10.162, A 54.152.78.181 (338)
08:17:45.268188 IP 10.32.0.1.53 > 10.0.2.138.55903: 2005 2/1/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com. (235)
08:17:45.269153 IP 10.0.2.138.43772 > 10.32.0.1.53: 10501+ A? index.docker.io. (33)
08:17:45.269653 IP 10.32.0.1.53 > 10.0.2.138.43772: 10501 5/4/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 52.0.10.162, A 54.152.78.181, A 54.164.250.255 (338)
^C
16 packets captured
16 packets received by filter
0 packets dropped by kernel
@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

I'm speculating that with the search domain the same error that throws the cannot unmarshal DNS message message occurs, which causes the query to retry with the search domain appended.

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

Based on other searches it looks like this message may be thrown by go when a DNS response is larger than 512 bytes. As you can see in the captures above, the responses from our internal recursive resolver are smaller than 512 bytes, but I can't figure out how to capture the traffic betweeen Docker's dns server and the container to see if somehow those responses might be larger.

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

I believe I've confirmed this. Using nsenter I was able to get a capture of the docker inernal dns traffic.

# nsenter -t 5500 -n tcpdump -i lo
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
09:29:40.631847 IP localhost.46367 > 127.0.0.11.35971: UDP, length 38
09:29:40.632350 IP localhost.45839 > 127.0.0.11.35971: UDP, length 38
09:29:40.656328 IP 127.0.0.11.domain > localhost.45839: 8000 1/1/0 CNAME registry-origin.docker.io. (190)
09:29:40.678447 IP 127.0.0.11.domain > localhost.46367: 36410 4/4/0 CNAME registry-origin.docker.io., A 52.72.218.20, A 52.5.201.27, A 52.3.65.18 (396)
09:29:40.779104 IP localhost.57611 > 127.0.0.11.35971: UDP, length 32
09:29:40.779609 IP localhost.48132 > 127.0.0.11.35971: UDP, length 32
09:29:40.803164 IP 127.0.0.11.domain > localhost.48132: 35669 2/1/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com. (324)
09:29:40.803322 IP 127.0.0.11.domain > localhost.57611: 39801 5/4/1 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.5.201.27, A 52.3.65.18, A 52.72.218.20 (754)
09:29:40.803599 IP localhost.39203 > 127.0.0.11.35971: UDP, length 32
09:29:40.804521 IP 127.0.0.11.domain > localhost.39203: 55390 5/4/1 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.3.65.18, A 52.5.201.27, A 52.72.218.20 (754)
09:29:40.805031 IP localhost.35407 > 127.0.0.11.35971: UDP, length 33
09:29:40.806009 IP localhost.38148 > 127.0.0.11.35971: UDP, length 33
09:29:40.827526 IP 127.0.0.11.domain > localhost.35407: 3834 5/4/1 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.152.78.181, A 52.0.10.162, A 54.164.250.255 (728)
09:29:40.827804 IP localhost.36823 > 127.0.0.11.35971: UDP, length 33
09:29:40.828253 IP 127.0.0.11.domain > localhost.38148: 8437 2/1/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com. (310)
09:29:40.828756 IP 127.0.0.11.domain > localhost.36823: 47955 5/4/1 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.152.78.181, A 52.0.10.162, A 54.164.250.255 (728)
^C
16 packets captured
32 packets received by filter
0 packets dropped by kernel

You can see the packet returned by the embedded dns server is 728 bytes. This is despite the fact that our internal dns server never returns a packet so big. This is a capture at the same time of the external dns traffic.

$ sudo tcpdump -n -i any port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
09:29:40.632163 IP 10.0.2.138.56515 > 10.32.0.1.53: 36410+ A? registry-1.docker.io. (38)
09:29:40.632489 IP 10.0.2.138.38633 > 10.32.0.1.53: 8000+ AAAA? registry-1.docker.io. (38)
09:29:40.655972 IP 10.32.0.1.53 > 10.0.2.138.38633: 8000 1/1/0 CNAME registry-origin.docker.io. (152)
09:29:40.678030 IP 10.32.0.1.53 > 10.0.2.138.56515: 36410 4/4/0 CNAME registry-origin.docker.io., A 52.72.218.20, A 52.5.201.27, A 52.3.65.18 (256)
09:29:40.679093 IP 172.18.0.2.48362 > 10.32.0.1.53: 44267+ PTR? 11.0.0.127.in-addr.arpa. (41)
09:29:40.679093 IP 172.18.0.2.48362 > 10.32.0.1.53: 44267+ PTR? 11.0.0.127.in-addr.arpa. (41)
09:29:40.679130 IP 10.0.2.138.48362 > 10.32.0.1.53: 44267+ PTR? 11.0.0.127.in-addr.arpa. (41)
09:29:40.679871 IP 10.32.0.1.53 > 10.0.2.138.48362: 44267 NXDomain* 0/1/0 (91)
09:29:40.679890 IP 10.32.0.1.53 > 172.18.0.2.48362: 44267 NXDomain* 0/1/0 (91)
09:29:40.679895 IP 10.32.0.1.53 > 172.18.0.2.48362: 44267 NXDomain* 0/1/0 (91)
09:29:40.779394 IP 10.0.2.138.60393 > 10.32.0.1.53: 39801+ A? auth.docker.io. (32)
09:29:40.779728 IP 10.0.2.138.49289 > 10.32.0.1.53: 35669+ AAAA? auth.docker.io. (32)
09:29:40.802814 IP 10.32.0.1.53 > 10.0.2.138.49289: 35669 2/1/0 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com. (244)
09:29:40.802918 IP 10.32.0.1.53 > 10.0.2.138.60393: 39801 5/4/1 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.5.201.27, A 52.3.65.18, A 52.72.218.20 (363)
09:29:40.803855 IP 10.0.2.138.50368 > 10.32.0.1.53: 55390+ A? auth.docker.io. (32)
09:29:40.804364 IP 10.32.0.1.53 > 10.0.2.138.50368: 55390 5/4/1 CNAME elb-registry.us-east-1.aws.dckr.io., CNAME us-east-1-elbregis-10fucsvj1tcgy-133821800.us-east-1.elb.amazonaws.com., A 52.3.65.18, A 52.5.201.27, A 52.72.218.20 (363)
09:29:40.805189 IP 10.0.2.138.53391 > 10.32.0.1.53: 3834+ A? index.docker.io. (33)
09:29:40.806158 IP 10.0.2.138.54387 > 10.32.0.1.53: 8437+ AAAA? index.docker.io. (33)
09:29:40.827096 IP 10.32.0.1.53 > 10.0.2.138.53391: 3834 5/4/1 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.152.78.181, A 52.0.10.162, A 54.164.250.255 (354)
09:29:40.827882 IP 10.32.0.1.53 > 10.0.2.138.54387: 8437 2/1/0 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com. (235)
09:29:40.828114 IP 10.0.2.138.47230 > 10.32.0.1.53: 47955+ A? index.docker.io. (33)
09:29:40.828606 IP 10.32.0.1.53 > 10.0.2.138.47230: 47955 5/4/1 CNAME elb-io.us-east-1.aws.dckr.io., CNAME us-east-1-elbio-rm5bon1qaeo4-623296237.us-east-1.elb.amazonaws.com., A 54.152.78.181, A 52.0.10.162, A 54.164.250.255 (354)
^C
22 packets captured
23 packets received by filter
0 packets dropped by kernel
@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

Testing more and it appears that the embedded DNS server does not use message compression in it's responses which causes the packets it returns to the container to be significantly larger than the packets it recieves from it's resolver.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 8, 2016

@clinta thanks a lot for the info. we will look into the message compression issue that you brought up.
Given the fact that this is a specific case of dind with custom network without --dns, I think the priority of this can be reduced.

Since we have the workaround of either running the dind in default bridge (or) pass the outer DNS configuration via --dns, I guess we can reduce the priority of it and we can concentrate on the message compression issue.

@tiborvass tiborvass added priority/P2 and removed priority/P1 labels Feb 8, 2016

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

I happen to be seeing this in dind, but I don't believe this isuse is specific to dind. This issue will probably affect anyone running a container with a go app which uses go's net dns library which is limited to 512 bytes, though the workaround of using --dns should work for those apps as well.

I think this is related to miekg/dns#216

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 8, 2016

@clinta this is specific to dind because, for all those apps that you mentioned, they automatically pick up the dns entries from the host's /etc/resolv.conf. The difference in dind is that the dns servers in the host /etc/resolv.conf is not recursively passed to the container launched in the inner docker daemon. Hope that make sense.
So, the workaround of passing the --dns is only for the dind case.

@mavenugo mavenugo assigned sanimej and unassigned mavenugo Feb 8, 2016

@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

Here is a quick proof of concept to show how this can affect other applications than dind, anything that uses the dns functions in the go net package.

$ docker run clinta/go-dns-test
[Tested at 2016-02-08 21:24:13 UTC 192.96.176.129 sent EDNS buffer size 4096 192.96.176.129 DNS reply size limit is at least 4090]%

$ docker run --net=test clinta/go-dns-test
lookup rs.dns-oarc.net on 127.0.0.11:53: no such host
[]%      
@clinta

This comment has been minimized.

Copy link

clinta commented Feb 8, 2016

Sorry, I suppose two issues are being mixed up here.

docker/libnetwork#924 addresses an issue where the 127.0.0.11 resolver is not used by containers spun up by dind. But that is not the cause of the error that I originally posted.

The issue I originally posted was the docker daemon in dind not being able to download images which is caused by the docker daemon in dind getting a dns response that is larger than 512 bytes from the 127.0.0.11 resolver. And the reason those packets are too large is because they are not compressed. This issue of dns responses not being compressed can cause issues for other apps than dind.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 9, 2016

@clinta Thanks for the pointers again.

I agree that there are 2 issues here.

  1. That is specific to dind (for which this issue was originally opened for) which has a workaround of passing the preferred DNS server via --dns and it works fine.
  2. Compression issue which will happen regardless of dind or not. It seems like an issue in miekg/dns package.

Thanks to your analysis, could you please open a new issue for #2 which will track the specific compression issue. Lets work with miekg/dns and try to get a fix and update our vendor package.

Regarding #1, lets use the --dns workaround for dind in the 1.10 release and try to handle it correctly in the subsequent release.

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 9, 2016

BTW, I tried your application in my local enviornment (not in AWS), and it seems to work just fine -

$ docker run clinta/go-dns-test
[69.252.96.24 sent EDNS buffer size 4096 Tested at 2016-02-08 23:37:52 UTC 69.252.96.24 DNS reply size limit is at least 4064]

$ docker run --net=test clinta/go-dns-test
[Tested at 2016-02-08 23:39:20 UTC 69.252.96.22 DNS reply size limit is at least 4064 69.252.96.22 sent EDNS buffer size 4096]

It seems like specific to AWS environment...

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Feb 9, 2016

With #20132 opened, this issue addresses only the particular case where, when a dind container is launched in an user-defined network with a custom dns server, it doesnt automatically inherit the custom DNS. The workaround is to supply the custom DNS servers via the --dns option when launching containers inside the dind container.

Removed the priority and 1.10.1 milestone from this issue as this is not critical use-case for the 1.10.1 release. We should address this issue properly by finding a way to recursively pass on the custom DNS server configurations inside of the dind container running in user-defined networks.

@kopax

This comment has been minimized.

Copy link

kopax commented Mar 31, 2016

I have the same issue.
I am using consul dns server using the startups options -recursor=8.8.8.8 -recursor=8.8.4.4"

My /etc/resolv.conf look like this :

domain myapp.com
search myapp.com consul
nameserver 192.168.1.4

If I change my /etc/resolv.conf nameserver to 8.8.8.8 I can pull the image, otherwise, I keep having this error :

Unable to find image 'ubuntu:latest' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/library/ubuntu/manifests/latest: Get https://auth.docker.io/token?account=kopax&scope=repository%3Alibrary%2Fubuntu%3Apull&service=registry.docker.io: dial tcp: lookup auth.docker.io on 192.168.1.4:53: no such host.

However, if I am pulling from a private docker registry, it does work without having to do any change.
I really don't understand how I should debug this issue.

$ docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:38:58 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:38:58 2016
 OS/Arch:      linux/amd64

@mavenugo

This comment has been minimized.

Copy link
Contributor

mavenugo commented Apr 1, 2016

@kopax are you using dind ? are you using user-defined networks ?
Also lookup auth.docker.io on 192.168.1.4:53: no such host also seem to indicate that the expected DNS server is being looked up. I don't think yours is related to this issue.

@sanimej

This comment has been minimized.

Copy link

sanimej commented Apr 1, 2016

@kopax I don't think this is related to the original problem reported in this issue. It looks like consul DNS server is not able to resolve registry-1.docker.io correctly. You can try to capturing the dns packets sent and received by the consul server when it tries to resolve registry-1.docker.io. It might give some clues.

@kopax

This comment has been minimized.

Copy link

kopax commented Apr 2, 2016

@mavenugo I am using consul for local service discovery, consul is started with -recursor 8.8.8.8 -recursor 8.8.4.4 while `DOCKER_OPTS=--dns 8.8.8.8 --dns 8.8.4.4"
@sanimek do you know how to do that ? I have few #netfilter command but not this one.

@jhgorse

This comment has been minimized.

Copy link

jhgorse commented Sep 20, 2016

Confirmed @kopax's /etc/resolve.conf edit to 8.8.8.8 fixes this.

It is not entirely clear to me why this was/is still occuring.

@electrofelix

This comment has been minimized.

Copy link

electrofelix commented Oct 2, 2017

Ran into this problem recently. Using dind for a jenkins slave in a compose testing/development environment so that the dockerized jenkins slave can launch containers as part of jobs using volumes to mount in the workspace with a path context that is understood by the docker daemon running in the same container. Basically that arguments such as `-v /home/jenkins-slave/workspace/:/srv' to docker work from jenkins jobs in this env.

For any containers launched by the slave/dind need to use corporate DNS resolvers, but also need to be able to resolve the other services in the compose environment.

It's not possible to tell dind to use 127.0.0.11 in this case as that gets masked resulting in 8.8.4.4 and 8.8.8.8 appearing instead in /etc/resolv.conf for any subsequent launched containers which won't work when the jobs being developed/tested need to resolve corporate names, such as the GitHub Enterprise instance or the proxy server if pulling from something external, from within launched containers.

My current solution is to add a dns forwarder into the compose environment (bind9) that will perform lookups on 127.0.0.11. Then in the slave container with dind, during start up resolve the slave-dns service and configure dind to use it for any lookups through /etc/docker/daemon.json.

Seems a bit overkill given there is already an embedded dns server in docker.

Perhaps it could perform forwarding in the case of the configured address is 127.0.0.11 as a special case, instead of dropping and assuming that it should use google's dns servers. Seeing as that is intended to be the docker DNS server.

@Morriz

This comment has been minimized.

Copy link

Morriz commented Oct 5, 2017

I have also spent quite some days on this problem. It is essential to a lot of apps like Jenkins/Drone etc to be able to talk to host/cluster services.
It’s been a year since the last active conversation. Any idea where this talk went? It seems like a major problem to me, and it’s too quiet here for my taste ;)

@electrofelix

This comment has been minimized.

Copy link

electrofelix commented Oct 5, 2017

My workaround as follows:

Add a dns lookup instance and have my jenkins slave with dind depend-on it to my compose env:

+  # docker-in-docker doesn't do DNS lookups in a way that allows it to find
+  # both other containers brought up by compose and corporate services at
+  # the same. Therefore use a simple forwarder setup for bind.
+  slave-dns:
+    image: ventz/bind:latest
+    volumes:
+      - ./slaves/dns/:/etc/bind/
+
   slave-xenial:
     image: jenkins-swarm-slave-xenial:latest
+    depends_on:
+      - slave-dns

contents of slaves/dns/named.conf

options {
       directory "/var/cache/bind";

       // Assume the ip address of the embedded DNS server in docker
       // will remain the same. Just forward all requests to it.
       forwarders {
               127.0.0.11;
       };

       auth-nxdomain no;    # conform to RFC1035
       listen-on-v6 { any; };
       listen-on { any; };
};

Add a script to the slave to perform lookup of the slave-dns instance slaves/xenial/docker-pre.sh

#!/bin/bash
#
# (c) Copyright 2017 Hewlett Packard Enterprise Development LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#

DNS_NAMESERVER=$(getent hosts slave-dns | cut -d' ' -f1)
echo "Found local nameserver: ${DNS_NAMESERVER}"
if [[ -n "${DNS_NAMESERVER}" ]]
then
    mkdir /etc/docker/
    cat <<EOF >/etc/docker/daemon.json
{
    "dns": ["${DNS_NAMESERVER}"]
}
EOF
fi

echo "Executing process"
exec $@

Dockerfile for the slave

FROM ubuntu:16.04

# quieten apt during build
ARG DEBIAN_FRONTEND=noninteractive

# much taken from https://github.com/carlossg/jenkins-swarm-slave-docker
#   and https://github.com/docker-library/docker

USER root

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        aufs-tools \
        bash \
        curl \
        cgroup-lite \
        default-jdk \
        gosu \
        git \
        make \
        openssh-client \
        supervisor \
        tar \
        ;

ENV DOCKER_VERSION 17.06.0~ce-0~ubuntu

# version 17.06 hack/dind has not been released
ENV DOCKER_VERSION_TAG v17.05.0-ce

RUN set -e \
    && curl -fSsL "https://raw.githubusercontent.com/moby/moby/${DOCKER_VERSION_TAG}/hack/dind" -o /usr/local/bin/dind \
    && chmod a+x /usr/local/bin/dind \
    ;

# for caching containers between runs for docker-in-docker
VOLUME ["/var/lib/docker"]
# required to see other containers
EXPOSE 2375

# setup up jenkins slave
ENV HOME /home/jenkins-slave
ENV JENKINS_SWARM_VERSION 2.0
ENV JENKINS_SWARM_JAR /usr/share/jenkins/swarm-client-jar-with-dependencies.jar


RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - \
    && apt-get install -y --no-install-recommends \
        apt-transport-https \
        ca-certificates \
        gnupg2 \
        software-properties-common \
    && add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" \
    && apt-get update \
    && apt-get -y install docker-ce=${DOCKER_VERSION} \
    ;

RUN set -x \
    && useradd -c "Jenkins Slave user" -d $HOME -s /bin/bash -m -G docker jenkins-slave \
    && curl -fLSs --create-dirs \
        https://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/swarm-client/${JENKINS_SWARM_VERSION}/swarm-client-${JENKINS_SWARM_VERSION}-jar-with-dependencies.jar \
        -o ${JENKINS_SWARM_JAR} \
    && chmod 755 /usr/share/jenkins/ \
    ;

# install drone executables
RUN set -x \
    && curl -sSL -f https://github.com/drone/drone-cli/releases/download/v0.7.0/drone_linux_amd64.tar.gz | tar zx \
    && mv drone /usr/local/bin/drone-0.7 \
    ;

COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY docker-pre.sh jenkins-slave.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-pre.sh /usr/local/bin/jenkins-slave.sh

# to support configuring of slave need to use custom entrypoint to setup service
ENTRYPOINT ["/usr/local/bin/jenkins-slave.sh"]
CMD ["supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

Supervisor config calling the docker-pre.sh script before the dind script:

#
# (c) Copyright 2017 Hewlett Packard Enterprise Development LP
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#

[supervisord]
user=root
nodaemon=true

[program:docker]
user=root
command=/usr/local/bin/docker-pre.sh /usr/local/bin/dind /usr/bin/dockerd -H unix:///var/run/docker.sock
autorestart=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
redirect_stderr=true
autorestart=true
stopasgroup=true
killasgroup=true

[program:jenkins-slave]
user=jenkins-slave
command=/usr/bin/java %(ENV_JAVA_OPTS)s -jar %(ENV_JENKINS_SWARM_JAR)s -fsroot %(ENV_HOME)s %(ENV_PARAMS)s %(ENV_JENKINS_SLAVE_OPTIONS)s
directory=%(ENV_HOME)s
autorestart=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
redirect_stderr=true
autorestart=true
stopasgroup=true
killasgroup=true
environment=JENKINS_SLAVE_OPTIONS="",

[supervisorctl]

[inet_http_server]
port = 127.0.0.1:9001

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

I'm using supervisor as a simple way to start both the slave via jenkins swarm and the dind instance. Doesn't include everything but hopefully anyone else running into this has a template for a workaround. Though I'd much prefer if docker could nest it's embedded DNS and perform lookup forwarding as needed.

@Morriz

This comment has been minimized.

Copy link

Morriz commented Oct 5, 2017

Yeah and wow, what a workaround. If it works for now, then it's the only way I guess. Tnx a lot for sharing.

Let's hope we get more ppl in here knowing more about it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment