Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build fails with 'could not find bridge docker0' #33745

Closed
JorritSalverda opened this issue Sep 29, 2016 · 10 comments
Closed

Docker build fails with 'could not find bridge docker0' #33745

JorritSalverda opened this issue Sep 29, 2016 · 10 comments
Assignees
Labels
area/kubelet priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@JorritSalverda
Copy link

*BUG REPORT *

When trying to run docker build inside a container with docker mounted into it the build starts of fine, but fails to create and endpoint on network bridge docker0 as soon as the build process hits the first RUN command.

Executing "docker build --force-rm=true --no-cache=true --file=target/docker/Dockerfile --tag=****:1.0.258 ."

Sending build context to Docker daemon 557.1 kB
...
Sending build context to Docker daemon 78.04 MB

Step 1 : FROM travix/base-debian-jre8
 ---> a130b5e1b4d4
Step 2 : ADD ***-1.0.258.jar ***.jar
 ---> 8d53e68e93a0
Removing intermediate container d1a758c9baeb
Step 3 : ADD target/newrelic newrelic
 ---> 9dbbb1c1db58
Removing intermediate container 461e66978c53
Step 4 : RUN bash -c "touch /***.jar"
 ---> Running in 6a28f48c9fd1
Removing intermediate container 6a28f48c9fd1
failed to create endpoint stupefied_shockley on network bridge: adding interface veth095b905 to bridge docker0 failed: could not find bridge docker0: route ip+net: no such network interface

The agents are deployed with the following manifest (leaving out a couple of envvars that don't seem relevant for sake of brevity):

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: gocd-agent
spec:
  replicas: 2
  strategy:
    type: Recreate
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: gocd-agent
  template:
    metadata:
      labels:
        app: gocd-agent
    spec:
      containers:
      - name: gocd-agent
        image: travix/gocd-agent:16.10.0
        imagePullPolicy: Always
        securityContext:
          privileged: true
        volumeMounts:
        - name: docker-sock
          mountPath: /var/run/docker.sock
        - name: docker-bin
          mountPath: /usr/bin/docker
        env:
        - name: "DOCKER_GID_ON_HOST"
          value: "107"
      volumes:
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock
      - name: docker-bin
        hostPath:
          path: /usr/bin/docker

Inside the container I make a docker group and add the go user that runs the build to that group so it works without sudo.

groupadd -g $DOCKER_GID_ON_HOST docker && gpasswd -a go docker

The interesting thing is that it used to work fine when running a vm based on the following image:

With this container manifest:

apiVersion: v1
kind: Pod
metadata:
  name: gocd-agent
spec:
  containers:
  - name: gocd-agent
    image: travix/gocd-agent:16.10.0
    imagePullPolicy: Always
    volumeMounts:
    - name: docker-sock
      mountPath: /var/run/docker.sock
    - name: docker-bin
      mountPath: /usr/bin/docker
    env:
    - name: "DOCKER_GID_ON_HOST"
      value: "107"
  restartPolicy: Always
  dnsPolicy: Default
  volumes:
  - name: docker-sock
    hostPath:
      path: /var/run/docker.sock
  - name: docker-bin
    hostPath:
      path: /usr/bin/docker

The big difference seems to lie in the networking setup for the container vm and the container engine cluster. We also tested it with host networking, but that didn't make a difference.

A couple of stats about the Kubernetes / Container Engine cluster that might help.

Kubernetes version

1.3.5 on Google Container Engine using the non-GCI host image. GCI fails in a different way and has a different gid for the docker group.

Docker info

$ sudo docker info
Containers: 15
 Running: 14
 Paused: 0
 Stopped: 1
Images: 67
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 148
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 25.57 GiB
Name: gke-tooling-default-pool-1fa283a6-8ufa
ID: JBQ2:Q3AR:TFJG:ILTX:KMHV:M67A:NYEM:NK4G:R43J:K5PS:26HY:Q57S
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

Network bridge

$ sudo brctl show

bridge name     bridge id               STP enabled     interfaces
cbr0            8000.063c847a631e       no              veth0a58740b
                                                        veth1f558898
                                                        veth8797ea93
                                                        vethb11a7490
                                                        vethc576cc01
docker0         8000.02428db6a46e       no     

OS info

$ uname -a
Linux gke-tooling-default-pool-1fa283a6-8ufa 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
@JorritSalverda
Copy link
Author

I know there's issue #1806 but that seems to deal more with the security implications of having to run the container in privileged mode.

@JorritSalverda JorritSalverda changed the title Docker build fails with could not find bridge docker0 Docker build fails with 'could not find bridge docker0' Sep 29, 2016
@k8s-github-robot k8s-github-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 29, 2016
@vishh
Copy link
Contributor

vishh commented Sep 29, 2016

We do not recommend consuming the host docker daemon within pods because k8s assumes complete control over the host docker daemon. It can and might delete images as they are being built. Networking configurations are not guaranteed to compatible with upstream docker distribution.
Can you try your builds with a pod scoped docker daemon running as a side-car container within your pod? On GCI we test that docker-in-docker works with overlayfs storage driver. @Amey-D

@vishh vishh self-assigned this Sep 29, 2016
@vishh vishh added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Sep 29, 2016
@JorritSalverda
Copy link
Author

@vishh that sounds pretty neat. Do you happen to have an example yaml of this?

@JorritSalverda
Copy link
Author

I managed to get this to work now with the following deployment manifest and our own docker-in-docker container:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: gocd-agent
spec:
  replicas: 2
  strategy:
    type: Recreate
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: gocd-agent
  template:
    metadata:
      labels:
        app: gocd-agent
    spec:
      containers:
      - name: gocd-agent
        image: travix/gocd-agent:16.10.0
        imagePullPolicy: Always
        volumeMounts:
          mountPath: /var/go/.gcloud
        - name: docker-sock
          mountPath: /var/run
        - name: docker-bin
          mountPath: /usr/local/bin
        env:
        - name: "DOCKER_GID_ON_HOST"
          value: "107"
      - name: docker-in-docker
        image: travix/dind:1.12
        imagePullPolicy: Always
        securityContext:
          privileged: true
        volumeMounts:
        - name: docker-sock
          mountPath: /var/run
        - name: docker-bin
          mountPath: /volume/usr/local/bin
        env:
        - name: "STORAGE_DRIVER"
          value: "vfs"
      volumes:
      - name: docker-sock
        emptyDir:
          medium: "Memory"
      - name: docker-bin
        emptyDir:
          medium: "Memory"

It doesn't deserve any prize for how docker.sock and docker binary is mounted from the side-car container into the build agent container. Especially that the docker-in-docker container copies files from /usr/local/bin into a mounted directory. Is there a better way for this?

I couldn't get overlay nor overlay2 to work with GCI though. Do you need to install the storage drivers in your container? I've based mine on the official Docker one, which can be found at
based on https://github.com/docker-library/docker/blob/746d9052066ccfbcb98df7d9ae71cf05d8877419/1.12/dind/Dockerfile

@JorritSalverda
Copy link
Author

JorritSalverda commented Sep 30, 2016

Docker info inside the docker-in-docker container looks like

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.1
Storage Driver: vfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge overlay null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.14+
Operating System: Alpine Linux v3.4 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 25.47 GiB
Name: agent-golang-189385254-4jr1m
ID: KN6M:Z2YI:VMVP:FY7C:G67J:QZ2O:CHJO:2574:2LW7:T4QX:KKLJ:ZCKL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Whereas on the GCI host vm it's:

Containers: 41
 Running: 41
 Paused: 0
 Stopped: 0
Images: 22
Server Version: 1.11.2
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.4.14+
Operating System: Google Container-VM Image
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 25.47 GiB
Name: gke-tooling-gci-pool-ff8b0f32-6bmr
ID: 22X3:OLYJ:U3QQ:W5GV:HNWY:7TSS:HNIX:KMZU:FAXF:RT6N:OZMJ:XNT5
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

@vishh
Copy link
Contributor

vishh commented Sep 30, 2016

cc @Amey-D on dind with overlay storage driver not working.

@adityakali
Copy link
Contributor

@vishh why do you think this is related to storage driver? I was able to follow almost all the instructions from https://hub.docker.com/_/docker/ successfully, including the one with overlay storage driver:

$ docker run --privileged --name some-overlay-docker -d docker:dind --storage-driver=overlay
...
$ docker run -it --rm --link some-overlay-docker:docker docker:1.12-rc version
Client:
 Version:      1.12.0-rc5
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   a3f2063
 Built:        Tue Jul 26 13:12:18 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 17:52:38 2016
 OS/Arch:      linux/amd64
$  docker run -it --rm --link some-overlay-docker:docker docker:1.12-rc info   
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge overlay null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.21+
Operating System: Alpine Linux v3.4 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.298 GiB
Name: ea2245b019f9
ID: 7UF7:JSEP:Z2OA:AC2Z:2Y7J:AUZ6:52SM:SAL6:B2U6:CBSD:POLG:HTD6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

I say almost because only the following command failed:

$ docker run --rm --link some-overlay-docker:docker docker:git build https://github.com/docker-library/hello-world.git
unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /tmp/docker-build-git588814834/Dockerfile: no such file or directory

which seems to be failing on older container-vm as well. (looks like broken example).

Looking at the original error, this may have something to do with the network configuration on k8s node.

@vishh
Copy link
Contributor

vishh commented Sep 30, 2016

@adityakali

My comment was based on @JorritSalverda's previous comment #33745 (comment)

I couldn't get overlay nor overlay2 to work with GCI though. Do you need to install the storage drivers in your container

@Amey-D
Copy link
Contributor

Amey-D commented Oct 4, 2016

@JorritSalverda Could you please elaborate on "I couldn't get overlay nor overlay2 to work with GCI"? As adityakali pointed out, Docker-in-Docker works on GCI with overlayfs storage driver. Also note that GCI ships with Docker 1.11.2, whereas your manifest appears to base 1.12. I'm not sure if that's the problem though.

@JorritSalverda
Copy link
Author

I managed to get overlay fs to work in the Docker in Docker image. I use the following in the manifest to get this to work:

      - name: docker-in-docker
        image: travix/dind:1.12
        command: ["/usr/local/bin/dockerd-entrypoint.sh"]
        args: ["--storage-driver=overlay","--group=dockremap"]

I've tested it with a number of builds an everything seems to be running fine, so I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants