Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suddenly getting very slow builds on inner container with Sysbox v0.4.1 #435

Closed
TomKeyte opened this issue Nov 17, 2021 · 29 comments
Closed
Assignees
Labels
bug Something isn't working
Projects

Comments

@TomKeyte
Copy link

Our company runs our GitHub actions CI pipelines on a docker-in-docker setup using sysbox.

The setup is

  • Ubuntu 20.04.3 LTS
  • Docker 20.10.10
  • sysbox 0.4.1 (latest)
    The GitHub actions run inside a container running on the sysbox runtime

All of a sudden on Monday - our docker builds which usually take ~5 minutes started not completing even after ~2 hours

If I exec into the action-runner container and run them manually, I can see that especially the transferring context step is taking an eternity. Here's a portion of the output:

#14 transferring context: 40.21MB 186.6s
#14 transferring context: 41.55MB 191.7s
#14 transferring context: 43.38MB 196.8s
#14 transferring context: 45.16MB 201.8s
#14 transferring context: 46.61MB 206.9s
#14 transferring context: 48.03MB 212.0s
#14 transferring context: 49.46MB 217.1s
#14 transferring context: 51.19MB 222.2s
#14 transferring context: 52.90MB 227.3s
#14 transferring context: 54.62MB 232.4s
#14 transferring context: 56.08MB 237.4s
#14 transferring context: 57.44MB 242.4s
#14 transferring context: 58.66MB 247.5s
#14 transferring context: 59.10MB 252.6s
#14 transferring context: 59.58MB 257.7s
#14 transferring context: 60.12MB 262.7s
#14 transferring context: 60.68MB 267.8s
#14 transferring context: 61.26MB 272.9s

The total build context is ~350M

So far I have:

  • Updated all system packages
  • Purged (via apt purge & removing /var/lib/docker) & Reinstalled both docker & sysbox
  • Checked docker builds on the host are unaffected (Even tried mapping the runner directory to the host, and running the app build on the host - takes ~5mins as expected)
  • Downgraded to the previous kernel
  • Just completely destroyed the original box & reinstalled everything on a new one
    However the problem persists

The server does have unattended upgrades enabled, and in the period of last week (when things were fine), to this week (when things have gone wrong), I can see the following packages have upgraded on the host:

xxd (2:8.1.2269-1ubuntu5.4) over (2:8.1.2269-1ubuntu5.3)
openssl 1.1.1f-1ubuntu2.9
libssl1.1:amd64 (1.1.1f-1ubuntu2.9) over (1.1.1f-1ubuntu2.8)
libssl1.1:amd64 (1.1.1f-1ubuntu2.9) over (1.1.1f-1ubuntu2.8)
libtdb1:amd64 (1.4.3-0ubuntu0.20.04.1) over (1.4.2-3build1)
ufw (0.36-6ubuntu1) over (0.36-6)
python3-software-properties (0.99.9.8) over (0.98.9.5)

The docker-compose file, used to start the runner, is

version: '3.9'

services:
  worker:
    image: our-org/gh-action-runner:latest
    env_file: .env
    runtime: sysbox-runc
    security_opt:
      - no-new-privileges:true
    restart: always
    tmpfs:
      - /tmp:size=512M

Any help debugging/fixing this greatly appreciated :)

@TomKeyte TomKeyte changed the title [bug] [0.4.1] Suddenly getting very slow builds on inner container Suddenly getting very slow builds on inner container [bug] [0.4.1] Nov 17, 2021
@rodnymolina rodnymolina added the bug Something isn't working label Nov 17, 2021
@rodnymolina rodnymolina added this to To do in Sysbox Dev via automation Nov 17, 2021
@rodnymolina rodnymolina self-assigned this Nov 17, 2021
@rodnymolina
Copy link
Member

Hi @TomKeyte, thanks for your detailed description of the issue.

Question. Is problem reproduced when a particular workflow (e.g. image build) is executed within the gha-runner? If so, can you provide an example of this ci-pipeline job so that I can reproduce it?

@rodnymolina
Copy link
Member

Also, assuming that the instruction that is causing the delay is part of your CI pipeline (i.e. it's not a GHA internal instruction -- which is unlikely), could you please try to run it directly within the sysbox container itself? For example, if you suspect of an image-build step, you can try to run an equivalent action by making use of the docker build instruction. The purpose here is to try to decouple GHA from Sysbox ...

@rodnymolina
Copy link
Member

Aside from my questions above, please provide this info too:

  • Obtain the sysbox-fs logs during problem reproduction: sudo journalctl -u sysbox-fs > sysbox-fs.logs
  • Indicate which GHA runner are you using within the sysbox container.

I think I should be able to reproduce the problem once that I got all this info.

Thanks!

@TomKeyte
Copy link
Author

Hi @rodnymolina

Appreciate you getting back so quickly :)

I have (temporarily) resolved our issue by downgrading from sysbox 0.4.1 -> 0.4.0

I'm afraid I can't really show the particular action being run, as it could expose company internals, but in summary it's just

name: Run Tests
  
jobs:
  test:
    runs-on: [self-hosted, test-runner]
    steps:
    # SETUP
      - name: Checkout
        uses: actions/checkout@v2

      # ...Steps to configure the env

      - name: Build images
        run: >
          docker build ...

I can also confirm that if I exec into the inner container & manually run docker build ... the problem is also there on its own (without running in the context of GHA)

Tailing the sysbox-fs journal, I can see a lot of messages like

Nov 19 15:11:55 runner sysbox-fs[9055]: time="2021-11-19 15:11:55" level=warning msg="Unexpected error during NotifReceive() execution (no such file or directory) on fd 10 pid 3533071"

@rodnymolina
Copy link
Member

rodnymolina commented Nov 21, 2021

@TomKeyte, thanks, it's good to know that issue is not GHA specific ...

Unfortunately, the log message above is too generic and doesn't really help us root-cause the problem. That's why I was wondering if you could help us here by trying to reproduce with a stripped-down Dockerfile (w/o company-specific instructions/tools) so that we can expedite root-cause analysis (?).

Also, please be aware that we are currently making important optimizations in the code to address issues like this one, but we won't know for sure if this fixes your setup till we are able to reproduce it locally.

@ctalledo
Copy link
Member

ctalledo commented Mar 6, 2022

I am pretty sure the underlying cause for this is Sysbox's interception of the *xattr syscalls inside the container, which was added in v0.4.1. Intercepting those syscalls was required to fix another problem, but unfortunately ended up affecting performance significantly in some cases such as in this issue or as reported in issue 484.

In the upcoming v0.5.0 release, it will be possible to disable Sysbox's interception of the *xattr syscalls by passing the --allow-trusted-xattr=false flag in the Sysbox command line, or by passing the SYSBOX_ALLOW_TRUSTED_XATTR=FALSE env var to the container (e.g., docker run --runtime=sysbox-runc -e SYSBOX_ALLOW_TRUSTED_XATTR=FALSE <container-image>). Doing this however can break some docker pulls inside the Sysbox container, unless the host's kernel supports the user.overlay.opaque xattr on overlayfs (i.e., kernel >= 5.11) and the version of Docker inside the Sysbox container is >= 20.10.9.

@ctalledo ctalledo moved this from To do to Done in Sysbox Dev Mar 23, 2022
@ctalledo ctalledo changed the title Suddenly getting very slow builds on inner container [bug] [0.4.1] Suddenly getting very slow builds on inner container with Sysbox v0.4.1 Mar 23, 2022
@rodnymolina
Copy link
Member

@TomKeyte, could you please check v0.5.0 release when have a chance? The optimizations made in this release, along with the knobs mentioned by @ctalledo in the previous comment, should considerably expedite your building process.

@nudgegoonies
Copy link

nudgegoonies commented Apr 4, 2022

As you know we had problems with 0.4.1 so we directly updated from 0.4.0 to 0.5.1.

We updated to 0.5.0 and now have an enormous increase in build times in our gitlab infrastructure.

I tested with a gitlab-ci job that builds all our teams docker images on same server (pure hardware):
docker 20.10.14 - sysbox 0.4.0 - docker-dind 20.10.7 = 55 minutes
docker 20.10.14 - sysbox 0.5.0 - docker-dind 20.10.7 = 85 minuted
docker 20.10.14 - sysbox 0.5.0 - docker-dind 20.10.14 = reached 120 minutes timeout and build was hanging for 30 minutes during a docker build at a labelling step.

We run on Debian 11 with backport Kernel 5.16.12-1 and userns-remap set to sysbox.

The log size of sysbox-fs increased heavily with ca. 40000 of these kind of lines today:

Unexpected error during NotifReceive() execution (no such file or directory) on fd 86, pid 2958884, cntr 19193dfb3dd2

With sysbox 0.4.0 - docker-dind 20.10.7 we had these lines ca. 5 times per day.

Seems like building several docker images sequentionally in the same (temporary) private dind can trigger this behavior. Every new docker build is slower than the one before.
We cannot see that heavy increase in typical software builds that use a (temporary) private dind only once for building exactly one image.
We stay on sysbox 0.5.0 but reverted the dind back to 20.10.7.

@ctalledo
Copy link
Member

ctalledo commented Apr 4, 2022

Hi @nudgegoonies, thanks for the update.

For sysbox v0.5.0, did you try the passing the -e SYSBOX_ALLOW_TRUSTED_XATTR=FALSE env var to the container?

See this sysbox doc for more details.

@ctalledo
Copy link
Member

ctalledo commented Apr 5, 2022

Apart from the Sysbox's interception of the *xattr syscalls (described in this prior comment), we've also noticed another factor that contributes to inner Docker builds running slower inside a Sysbox container: the inner Docker engine does not use "native overlay diffs" when creating the build.

This can be seen in the following docker engine log inside the Sysbox container:

level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: running in a user namespace" storage-driver=overlay2 

The relevant source code is in the Moby repo, here.

Basically, dockerd is assuming that when it's running inside a user-namespace (as in Sysbox containers), it can't use the native diff for overlay2, because it assumes that it can't use the trusted.overlay.opaque xattr. This is generally true inside a vanilla user-namespace, but it's not true inside a Sysbox container because Sysbox intercepts the xattr syscalls and allows that xattribute to be set from inside the container. IMHO, dockerd should first check this inside the user-ns, rather than just assume it won't work. This requires a change in dockerd itself (which I plan to work on when I get some cycles).

The slow down is significant: I noticed up to 6x on a quick experiment, where the inner dockerd was modified to use native overlay2 diffs even when running inside a user-namespace, versus an unmodified inner dockerd.

@nudgegoonies
Copy link

@ctalledo Thank you very much! With that option it works! Now sysbox, docker and docker-dind are up to date and it is even faster than sysbox.0.4.0 then:
docker 20.10.14 - sysbox 0.5.0 - docker-dind 20.10.14 = 35 minutes
We cannot pass the variable easily in gitlab-ci so i used a systemd override for sysbox-mgr.

@ctalledo
Copy link
Member

ctalledo commented Apr 5, 2022

@ctalledo Thank you very much! With that option it works! Now sysbox, docker and docker-dind are up to date and it is even faster than sysbox.0.4.0 then: docker 20.10.14 - sysbox 0.5.0 - docker-dind 20.10.14 = 35 minutes.

Hi @nudgegoonies, that's great to hear! Yes Sysbox v0.5.0 has some performance optimizations that make it faster than v0.4.0, glad you are seeing those.

We cannot pass the variable easily in gitlab-ci so i used a systemd override for sysbox-mgr.

That's interesting; I would have assumed that passing env-vars to containers would be easy in all CI systems, given that it's a fairly common operation.

@ctalledo
Copy link
Member

ctalledo commented Apr 5, 2022

Closing this issue since we now have a solution for the slowdown in Sysbox v0.5.0. Please re-open if you see any such issues with v0.5.0.

@ctalledo ctalledo closed this as completed Apr 5, 2022
@nudgegoonies
Copy link

That's interesting; I would have assumed that passing env-vars to containers would be easy in all CI systems, given that it's a fairly common operation.

Yes, in theory. There is a private dind configured via gitlab runner running with every build, dinds configured as service in pipelines, shared dinds started by systemd, etc. Using the one place in the sysbox-mgr is easier.

@mike-chen-samsung
Copy link

mike-chen-samsung commented Jun 1, 2023

What's the preferred way to configure --allow-trusted-xattr=false for the k8s daemonset? The directions in the readme, when applied to the files in the daemonset pod, say to modify /opt/sysbox/systemd/sysbox-fs.service but ideally there would be a way to add these flags from the K8s Daemonset rather than mount a custom config file to the pod as a volume to replace /opt/sysbox/systemd/sysbox-fs.service

@mike-chen-samsung
Copy link

I'm finding that docker builds are still about 2x slower with -e SYSBOX_ALLOW_TRUSTED_XATTR=FALSE (though faster than the ~6x slowdown without that setting). Is something like this expected or not?

@ctalledo
Copy link
Member

ctalledo commented Jun 2, 2023

What's the preferred way to configure --allow-trusted-xattr=false for the k8s daemonset? The directions in the readme, when applied to the files in the daemonset pod, say to modify /opt/sysbox/systemd/sysbox-fs.service but ideally there would be a way to add these flags from the K8s Daemonset rather than mount a custom config file to the pod as a volume to replace /opt/sysbox/systemd/sysbox-fs.service

Hi @mike-chen-samsung, yes we need to add a ConfigMap or similar to allow users to easily configure Sysbox flags via the sysbox-deploy-k8s daemonset. We've not yet had cycles to do this unfortunately.

Having said that, I think Sysbox should switch the default of --allow-trusted-xattr from true -> false, given that the base requirements (kernel >= 5.11 and Docker >= 20.10.9) are fairly common now. I'll discuss it and possibly add in the upcoming v0.6.2 release.

@ctalledo
Copy link
Member

ctalledo commented Jun 2, 2023

I'm finding that docker builds are still about 2x slower with -e SYSBOX_ALLOW_TRUSTED_XATTR=FALSE (though faster than the ~6x slowdown without that setting). Is something like this expected or not?

It's been a while since I measured, but I don't recall seeing 2x slower.

A couple of factors that could be contributing could be Docker not using native overlay diffs and the fact the the Docker Engine inside a Sysbox container goes through an extra network bridge. I suspect the former could be the main issue though.

@mike-chen-samsung
Copy link

A couple of factors that could be contributing could be Docker not using native overlay

I checked docker info and you are right, it's using vsd instead of overlay2. But based on your comment it sounds like there isn't a "friendly" solution for this? Since I'd rather not deploy a custom version of dockerd

@ctalledo
Copy link
Member

ctalledo commented Jun 2, 2023

I checked docker info and you are right, it's using vsd instead of overlay2.

Can you show me the log from the docker daemon inside the Sysbox container, so I can double check? It should be using it's overlay2 driver in general, except for native overlay diffs.

@mike-chen-samsung
Copy link

mike-chen-samsung commented Jun 2, 2023

Sorry, I should have mentioned that I am using GitHub ARC with Sysbox, so it's running a dind setup. I've pasted a cleaned up Pod yaml at the bottom

$ kubectl logs <podname> -c docker
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
mount: permission denied (are you root?)
Could not mount /sys/kernel/security.
AppArmor detection and --privileged mode might break.
time="2023-06-02T18:17:30.657613550Z" level=info msg="Starting up"
time="2023-06-02T18:17:30.719159637Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
time="2023-06-02T18:17:30.719349646Z" level=info msg="containerd not running, starting managed containerd"
time="2023-06-02T18:17:30.720213807Z" level=info msg="started new containerd process" address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=110
time="2023-06-02T18:17:30.744050970Z" level=info msg="starting containerd" revision=1677a17964311325ed1c31e2c0a3589ce6d5c30d version=v1.7.1
time="2023-06-02T18:17:30.794274645Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
time="2023-06-02T18:17:30.855435085Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
time="2023-06-02T18:17:30.855576638Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
time="2023-06-02T18:17:30.855677649Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
time="2023-06-02T18:17:30.855935210Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
time="2023-06-02T18:17:30.855986906Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
time="2023-06-02T18:17:30.856014239Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
time="2023-06-02T18:17:30.856341543Z" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
time="2023-06-02T18:17:30.856390914Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
time="2023-06-02T18:17:30.856467141Z" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
time="2023-06-02T18:17:30.856616741Z" level=info msg="metadata content store policy set" policy=shared
time="2023-06-02T18:17:30.863518392Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
time="2023-06-02T18:17:30.863615463Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
time="2023-06-02T18:17:30.863658515Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
time="2023-06-02T18:17:30.863707704Z" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1
time="2023-06-02T18:17:30.863768368Z" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2023-06-02T18:17:30.863797030Z" level=info msg="NRI interface is disabled by configuration."
time="2023-06-02T18:17:30.863821579Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
time="2023-06-02T18:17:30.863999942Z" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2
time="2023-06-02T18:17:30.864035397Z" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2023-06-02T18:17:30.864062410Z" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2023-06-02T18:17:30.864099045Z" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1
time="2023-06-02T18:17:30.864127674Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864154902Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864181284Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864206131Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864232443Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864259134Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864284818Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864341894Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
time="2023-06-02T18:17:30.864456174Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
time="2023-06-02T18:17:30.864859222Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2023-06-02T18:17:30.864913865Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.864947521Z" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2023-06-02T18:17:30.864986804Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2023-06-02T18:17:30.865060006Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865088646Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865114812Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865141846Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865169084Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865259648Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865285966Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865312338Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865374079Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
time="2023-06-02T18:17:30.865568987Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865602457Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865630296Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865655107Z" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865682063Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865710912Z" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865736657Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2023-06-02T18:17:30.865768691Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2023-06-02T18:17:30.865800564Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2023-06-02T18:17:30.865825299Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2023-06-02T18:17:30.865849598Z" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin"
time="2023-06-02T18:17:30.866853891Z" level=info msg=serving... address=/var/run/docker/containerd/containerd-debug.sock
time="2023-06-02T18:17:30.867026415Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
time="2023-06-02T18:17:30.867111033Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2023-06-02T18:17:30.867138860Z" level=info msg="containerd successfully booted in 0.124787s"
time="2023-06-02T18:17:31.065177288Z" level=error msg="failed to mount overlay: invalid argument" storage-driver=overlay2
time="2023-06-02T18:17:31.065422316Z" level=error msg="exec: \"fuse-overlayfs\": executable file not found in $PATH" storage-driver=fuse-overlayfs
time="2023-06-02T18:17:31.069423139Z" level=info msg="Loading containers: start."
time="2023-06-02T18:17:31.221226791Z" level=warning msg="Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'\nbridge                307200  0 \nstp                    16384  1 bridge\nllc                    16384  2 bridge,stp\nip: can't find device 'br_netfilter'\nmodprobe: can't load module br_netfilter (kernel/net/bridge/br_netfilter.ko): Operation not permitted\n, error: exit status 1"
time="2023-06-02T18:17:32.022341965Z" level=info msg="Loading containers: done."
time="2023-06-02T18:17:32.088800459Z" level=warning msg="WARNING: bridge-nf-call-iptables is disabled"
time="2023-06-02T18:17:32.088824303Z" level=warning msg="WARNING: bridge-nf-call-ip6tables is disabled"
time="2023-06-02T18:17:32.088859083Z" level=info msg="Docker daemon" commit=659604f graphdriver=vfs version=24.0.2
time="2023-06-02T18:17:32.089004866Z" level=info msg="Daemon has completed initialization"
time="2023-06-02T18:17:32.464562515Z" level=info msg="API listen on [::]:2376"
time="2023-06-02T18:17:32.464646930Z" level=info msg="API listen on /var/run/docker.sock"
$ kubectl exec -it <podname> -c docker docker info
Client:
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /usr/local/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/local/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 24.0.2
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1677a17964311325ed1c31e2c0a3589ce6d5c30d
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.0-1037-aws
 Operating System: Alpine Linux v3.18
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 7.664GiB
 Name: dev-usw2-large-bcjfr-bzvxx
 ID: 7d0dceb6-93a5-4de3-98e0-43deec1e09ce
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Pod yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    io.kubernetes.cri-o.userns-mode: auto:size=65536
spec:
  containers:
  - env:
    - name: DOCKER_HOST
      value: tcp://localhost:2376
    name: runner
  - env:
    - name: SYSBOX_ALLOW_TRUSTED_XATTR
      value: "FALSE"
    - name: DOCKER_TLS_CERTDIR
      value: /certs
    image: docker:dind
    name: docker
    securityContext:
      privileged: false
    volumeMounts:
    - mountPath: /runner
      name: runner
    - mountPath: /certs/client
      name: certs-client
    - mountPath: /runner/_work
      name: work
  runtimeClassName: sysbox-runc
  volumes:
  - emptyDir: {}
    name: runner
  - emptyDir: {}
    name: work
  - emptyDir: {}
    name: certs-client

@ctalledo
Copy link
Member

ctalledo commented Jun 2, 2023

Thanks; the dockerd logs inside the sysbox container don't look good, it should have used it's overlayfs driver (as it would on a regular host or VM).

Can you post the output of findmnt or mount inside the pod please?

@mike-chen-samsung
Copy link

# echo "$(findmnt)"
TARGET                                                       SOURCE                                                                                                                                FSTYPE   OPTIONS
/                                                            overlay                                                                                                                               overlay  rw,relatime,lowerdir=/var/lib/containers/storage/overlay/l/SOUG6R3XFCIZEGF3C5ZAYLVV7Y:/var/lib/containers/storage/overlay/l/SOUG6R3XFCIZEGF3C5ZAYLVV7Y/../diff1:/var/lib/containers/storage/overlay/l/XS23ZRGJSLEOOCFBWFTSXFTKMT:/var/lib/containers/storage/overlay/l/5DMM6S5AMMNPMB6CEAWC5LSFZN:/var/lib/containers/storage/overlay/l/P2W6UPRTE3M567U4KTK5F5SYDX:/var/lib/containers/storage/overlay/l/SBUW7VEWTPHRNAPNNYENWBKEFB:/var/lib/containers/storage/overlay/l/ALJ4L2U4PLETXFV3EUU5O2GR7Y:/var/lib/containers/storage/overlay/l/NCM6L2GOKIZWO4PUVZNURTY3EV:/var/lib/containers/storage/overlay/l/PECYZTKMXZSTIL5L25WNGS5VPQ:/var/lib/containers/storage/overlay/l/L6S57MIY66XU3LRJYZSZRTVZW2:/var/lib/containers/storage/overlay/l/672J2TBM3NQ7YNKDEO4WFFFD3T:/var/lib/containers/storage/overlay/l/6K5MLPDMMWKHXZM7GUHVRXATKK:/var/lib/containers/storage/overlay/l/QRC3OTQQBBS66ITDS6LQ4NOG6E:/var/lib/containers/storage/overlay/l/EAUONRZKLVQDYUXN7WCFVXQSPF:/var/lib/containers/storage/overlay/l/SB3GAXHC24GHSGZOQDPTLUREDO:/var/lib/containers/storage/overlay/l/AHHBJYTTVKO5IMF34AI2SGWS2W,upperdir=/var/lib/containers/storage/overlay/606a9ceb32fafce9c2ea945221823a66b9adb3101e00802b16a3eef4fb45647a/diff,workdir=/var/lib/containers/storage/overlay/606a9ceb32fafce9c2ea945221823a66b9adb3101e00802b16a3eef4fb45647a/work,metacopy=on,volatile
├─/tmp                                                       none                                                                                                                                  tmpfs    rw,relatime,uid=165536,gid=165536,inode64
├─/sys                                                       sysfs                                                                                                                                 sysfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware                                            tmpfs                                                                                                                                 tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/sys/fs/cgroup                                           tmpfs                                                                                                                                 tmpfs    rw,nosuid,nodev,noexec,relatime,mode=755,uid=165536,gid=165536,inode64
│ │ ├─/sys/fs/cgroup/systemd                                 systemd                                                                                                                               cgroup   rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
│ │ ├─/sys/fs/cgroup/cpu,cpuacct                             cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
│ │ ├─/sys/fs/cgroup/hugetlb                                 cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,hugetlb
│ │ ├─/sys/fs/cgroup/freezer                                 cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,freezer
│ │ ├─/sys/fs/cgroup/net_cls,net_prio                        cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
│ │ ├─/sys/fs/cgroup/rdma                                    cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,rdma
│ │ ├─/sys/fs/cgroup/devices                                 cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,devices
│ │ ├─/sys/fs/cgroup/memory                                  cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,memory
│ │ ├─/sys/fs/cgroup/perf_event                              cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,perf_event
│ │ ├─/sys/fs/cgroup/pids                                    cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,pids
│ │ ├─/sys/fs/cgroup/cpuset                                  cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,cpuset
│ │ ├─/sys/fs/cgroup/misc                                    cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,misc
│ │ └─/sys/fs/cgroup/blkio                                   cgroup                                                                                                                                cgroup   rw,nosuid,nodev,noexec,relatime,blkio
│ ├─/sys/devices/virtual                                     sysboxfs[/sys/devices/virtual]                                                                                                        fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ ├─/sys/kernel                                              sysboxfs[/sys/kernel]                                                                                                                 fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ └─/sys/module/nf_conntrack/parameters                      sysboxfs[/sys/module/nf_conntrack/parameters]                                                                                         fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
├─/proc                                                      proc                                                                                                                                  proc     rw,nosuid,nodev,noexec,relatime
│ ├─/proc/bus                                                proc[/bus]                                                                                                                            proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/fs                                                 proc[/fs]                                                                                                                             proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/irq                                                proc[/irq]                                                                                                                            proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/sysrq-trigger                                      proc[/sysrq-trigger]                                                                                                                  proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/acpi                                               tmpfs                                                                                                                                 tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/keys                                               devtmpfs[/null]                                                                                                                       devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/proc/timer_list                                         devtmpfs[/null]                                                                                                                       devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/proc/scsi                                               tmpfs                                                                                                                                 tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/swaps                                              sysboxfs[/proc/swaps]                                                                                                                 fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ ├─/proc/sys                                                sysboxfs[/proc/sys]                                                                                                                   fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ └─/proc/uptime                                             sysboxfs[/proc/uptime]                                                                                                                fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
├─/dev                                                       tmpfs                                                                                                                                 tmpfs    rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64
│ ├─/dev/mqueue                                              mqueue                                                                                                                                mqueue   rw,nosuid,nodev,noexec,relatime
│ ├─/dev/pts                                                 devpts                                                                                                                                devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
│ ├─/dev/null                                                devtmpfs[/null]                                                                                                                       devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/dev/random                                              devtmpfs[/random]                                                                                                                     devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/dev/kmsg                                                devtmpfs[/null]                                                                                                                       devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/dev/shm                                                 shm                                                                                                                                   tmpfs    rw,nosuid,nodev,noexec,relatime,size=65536k,inode64
│ ├─/dev/termination-log                                     /dev/root[/var/lib/kubelet/pods/50d53a32-30ff-4435-b50c-4913b6b32e16/containers/docker/b4aed478]                                      ext4     rw,relatime,idmapped,discard
│ ├─/dev/full                                                devtmpfs[/full]                                                                                                                       devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/dev/tty                                                 devtmpfs[/tty]                                                                                                                        devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ ├─/dev/zero                                                devtmpfs[/zero]                                                                                                                       devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
│ └─/dev/urandom                                             devtmpfs[/urandom]                                                                                                                    devtmpfs rw,relatime,size=4010404k,nr_inodes=1002601,mode=755,inode64
├─/etc/resolv.conf                                           tmpfs[/containers/storage/overlay-containers/e462bf878937cea9be16bb50037c199df2f966119c69195d0974b058f65128ea/userdata/resolv.conf]   tmpfs    rw,nosuid,nodev,noexec,size=803632k,mode=755,inode64
├─/etc/hostname                                              tmpfs[/containers/storage/overlay-containers/e462bf878937cea9be16bb50037c199df2f966119c69195d0974b058f65128ea/userdata/hostname]      tmpfs    rw,nosuid,nodev,size=803632k,mode=755,inode64
├─/run/.containerenv                                         tmpfs[/containers/storage/overlay-containers/e462bf878937cea9be16bb50037c199df2f966119c69195d0974b058f65128ea/userdata/.containerenv] tmpfs    rw,nosuid,nodev,size=803632k,mode=755,inode64
├─/run/secrets/kubernetes.io/serviceaccount                  tmpfs                                                                                                                                 tmpfs    ro,relatime,size=7278556k,inode64
├─/runner                                                    /dev/root[/var/lib/kubelet/pods/50d53a32-30ff-4435-b50c-4913b6b32e16/volumes/kubernetes.io~empty-dir/runner]                          ext4     rw,relatime,idmapped,discard
│ └─/runner/_work                                            /dev/root[/var/lib/kubelet/pods/50d53a32-30ff-4435-b50c-4913b6b32e16/volumes/kubernetes.io~empty-dir/work]                            ext4     rw,relatime,idmapped,discard
├─/certs/client                                              /dev/root[/var/lib/kubelet/pods/50d53a32-30ff-4435-b50c-4913b6b32e16/volumes/kubernetes.io~empty-dir/certs-client]                    ext4     rw,relatime,idmapped,discard
├─/etc/hosts                                                 /dev/root[/var/lib/kubelet/pods/50d53a32-30ff-4435-b50c-4913b6b32e16/etc-hosts]                                                       ext4     rw,relatime,idmapped,discard
├─/var/lib/rancher/k3s                                       /dev/root[/var/lib/sysbox/rancher-k3s/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                               ext4     rw,relatime,discard
├─/var/lib/rancher/rke2                                      /dev/root[/var/lib/sysbox/rancher-rke2/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                              ext4     rw,relatime,discard
├─/var/lib/kubelet                                           /dev/root[/var/lib/sysbox/kubelet/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                                   ext4     rw,relatime,discard
├─/var/lib/k0s                                               /dev/root[/var/lib/sysbox/k0s/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                                       ext4     rw,relatime,discard
├─/var/lib/buildkit                                          /dev/root[/var/lib/sysbox/buildkit/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                                  ext4     rw,relatime,discard
├─/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs /dev/root[/var/lib/sysbox/containerd/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                                ext4     rw,relatime,discard
├─/var/lib/docker                                            /dev/root[/var/lib/sysbox/docker/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                                    ext4     rw,relatime,idmapped,discard
├─/usr/src/linux-headers-5.15.0-1037-aws                     /dev/root[/usr/src/linux-headers-5.15.0-1037-aws]                                                                                     ext4     ro,relatime,idmapped,discard
├─/usr/src/linux-aws-5.15-headers-5.15.0-1037                /dev/root[/usr/src/linux-aws-5.15-headers-5.15.0-1037]                                                                                ext4     ro,relatime,idmapped,discard
└─/lib/modules/5.15.0-1037-aws                               /dev/root[/usr/lib/modules/5.15.0-1037-aws]                                                                                           ext4     ro,relatime,idmapped,discard
```

@ctalledo
Copy link
Member

ctalledo commented Jun 3, 2023

Thanks @mike-chen-samsung; the output of findmnt looks good, and in particular this is the line I was interested in:

├─/var/lib/docker                                            /dev/root[/var/lib/sysbox/docker/ad932cd4e20668da260a0c8a393e493811db4ca241de5517a39d2687618b562c]                                    ext4     rw,relatime,idmapped,discard

This means that inside the container, /var/lib/docker is an ext4 mount setup by Sysbox, so that's good.

I need to understand why docker engine inside the container is failing with:

time="2023-06-02T18:17:31.065177288Z" level=error msg="failed to mount overlay: invalid argument" storage-driver=overlay2
time="2023-06-02T18:17:31.065422316Z" level=error msg="exec: \"fuse-overlayfs\": executable file not found in $PATH" storage-driver=fuse-overlayfs

I noticed you are using docker engine v24.0.2; let me check if something changed in that version that is causing a problem when running inside a Sysbox container.

@ctalledo
Copy link
Member

ctalledo commented Jun 3, 2023

I noticed you are using docker engine v24.0.2; let me check if something changed in that version that is causing a problem when running inside a Sysbox container.

I tried docker v24.0.2 inside a Sysbox container (created with Docker) and it works fine:

$ docker run --runtime=sysbox-runc -it --rm nestybox/ubuntu-jammy-systemd

<install Docker as described here: https://docs.docker.com/engine/install/ubuntu/>

/ # sudo systemctl start docker

/ # sudo docker version
Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
...

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
...

/ # sudo journalctl -u docker
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.338687098Z" level=info msg="Starting up"
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.432180730Z" level=info msg="Loading containers: start."
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.626424482Z" level=info msg="Loading containers: done."
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.651765460Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: running in a user namespace" storage-driver=overlay2
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.651908293Z" level=info msg="Docker daemon" commit=659604f graphdriver=overlay2 version=24.0.2
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.651996508Z" level=info msg="Daemon has completed initialization"
Jun 03 13:44:25 a7f0930aa572 dockerd[9474]: time="2023-06-03T13:44:25.763894246Z" level=info msg="API listen on /run/docker.sock"

/ # sudo docker run -it --rm alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8a49fdb3b6a5: Pull complete 
Digest: sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11
Status: Downloaded newer image for alpine:latest
/ #

Notice how docker engine reports it's using overlay2 as expected:

graphdriver=overlay2 version=24.0.2

@ctalledo
Copy link
Member

ctalledo commented Jun 3, 2023

I also tried with something that is a lot closer to what you have in your pod yaml and it also works fine:

$ docker run --runtime=sysbox-runc -d --rm -e SYSBOX_ALLOW_TRUSTED_XATTR=FALSE docker:dind
$ docker logs <container-name>

...
time="2023-06-03T13:54:16.585314832Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: running in a user namespace" storage-driver=overlay2
time="2023-06-03T13:54:16.585416671Z" level=info msg="Docker daemon" commit=659604f graphdriver=overlay2 version=24.0.2
time="2023-06-03T13:54:16.585501361Z" level=info msg="Daemon has completed initialization"
...

@ctalledo
Copy link
Member

ctalledo commented Jun 3, 2023

Hi @mike-chen-samsung, I was able to repro in a GKE pod, using a pod spec similar to the one you posted above. I suspect it's something in the docker:dind image entrypoint. I'll debug it and get back to you.

Just for sanity check, I also ran a pod with this spec and it worked fine (i.e., docker used the overlay2 driver inside the pod).

apiVersion: v1
kind: Pod
metadata:
  name: ubu-bio-systemd-docker
  annotations:
    io.kubernetes.cri-o.userns-mode: "auto:size=65536"
spec:
  runtimeClassName: sysbox-runc
  containers:
  - name: ubu-bio-systemd-docker
    image: registry.nestybox.com/nestybox/ubuntu-bionic-systemd-docker
    command: ["/sbin/init"]
  restartPolicy: Never

@mike-chen-samsung
Copy link

hello @ctalledo , wondering if you have an update to share

@ctalledo
Copy link
Member

ctalledo commented Jul 2, 2023

Hi @mike-chen-samsung, apologies for the belated response. I've not yet had a chance to look into this, will allocate some cycles this coming week to get to the bottom of it. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Sysbox Dev
  
Done
Development

No branches or pull requests

5 participants