When using containerd snapshotter docker daemon doesn't check for "Already Exists" #46990

nealef · 2023-12-27T01:26:11Z

Description

When using the containerd snapshotter interface the daemon/create.go code calls the PrepareSnapshotter() function which in turn invokes the containerd Prepare() API. It is possible to get a non-nil err value returned which is not a true error. That is, the snapshotter is able to return ErrAlrExists (already exists) to tell containerd that the snapshot has been prepared. In our case we have a caching snapshotter that has prepared snapshots in advance.

I note the code in daemon/create.go doesn't differentiate a "bad" err value from a "good" one:

        if daemon.UsesSnapshotter() {
                if err := daemon.imageService.PrepareSnapshot(ctx, ctr.ID, opts.params.Config.Image, opts.params.Platform, setupInitLayer(daemon.idMapping)); err != nil {
                        return nil, err
                }

I am also seeing docker images does not report the newly pulled image for images in our content store. The way our content store works is that when containerd calls the Info() we check our cache and if it exists we symlink the cache entry into the content store and tell the caller the layer exists otherwise it goes through normal pull processing. So I am assuming that the docker pull is seeing a layer exists and just ignores the rest of the processing such that its own internal db is not fully updated. I say fully as while docker image doesn't show the image a docker run is fine.

In the following console log I pull an image that's in our cache (we have both a content store plugin and snapshotter plugin) and one that's not in our cache. I am able to run the cached image but docker images doesn't report it:

> docker pull fedora:latest
06df381d697d: Download complete
dfb5e6183f51: Download complete
8404925a71fd: Download complete
docker.io/library/fedora:latest
> docker images
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE
> docker pull python:latest
3733015cdd1b: Download complete
553d2d59b9c4: Download complete
fc7a60e86bae: Download complete
107007f161d0: Download complete
d685eb68699f: Download complete
bc0734b949dc: Download complete
b5de22c0f5cd: Download complete
02b85463d724: Download complete
b43bd898d5fb: Download complete
7fad4bffde24: Download complete
917ee5330e73: Download complete
docker.io/library/python:latest
> docker images
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
python       latest    3733015cdd1b   4 seconds ago   1.46GB
> docker run --rm -it fedora:latest pwd
/
>

Reproduce

I am able to reproduce it on my system where our content store and snapshotter plugins reside but these are not public code (yet).

Expected behavior

docker images should display images that are pulled when the snapshotter reports "Already Exists".

docker version

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:53:10 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:10 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 24.0.2
 Storage Driver: mssnap
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.18.0-477.10.1.el8_8.x86_64
 Operating System: AlmaLinux 8.8 (Sapphire Caracal)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 1.894GiB
 Name: mssnap.devlab.sinenomine.net
 ID: 5bcd4245-c020-40e6-acb1-b8d26235ddcb
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

No response

The text was updated successfully, but these errors were encountered:

thaJeztah · 2023-12-27T10:02:25Z

Quick "off-the-bat" reply to the second part of your description;

I am also seeing docker images does not report the newly pulled image for images in our content store. The way our content store works is that when containerd calls the Info() we check our cache and if it exists we symlink the cache entry into the content store and tell the caller the layer exists otherwise it goes through normal pull processing.

For transparency; the current implementation of the containerd image-store integration is largely written with the assumption that the "moby" namespace(s) are "owned" by the docker engine, and not manipulated separately.

We're aware that this is not ideal, and have discussed making this more permissive in future (to allow for scenarios where multiple tools act on the same namespace), but it's not been our focus of attention for the initial implementation. As you mentioned, currently some state is kept in the Docker Engine itself; there is some synchronisation (e.g. container state), but there's likely many scenarios where this currently won't work well with other tools managing the data.

Perhaps it would be good to create a separate ticket for this with further details so that it can be tracked and discussed separately.

(if "making this work" would require limited changes, then contributions on improving would be welcome though!)

thaJeztah · 2023-12-27T10:09:45Z

I note the code in daemon/create.go doesn't differentiate a "bad" err value from a "good" one:

Thanks for reporting; (also at a quick glance), that seems like an ok change to make.

For handling these error-types, we should have a look to either handle that error immediately, and handle containerd's error-type (errdefs.ErrAlreadyExists)

Or if we need to handle it further "up the stack", see if there's a suitable parallel in Moby's "errdefs" (although I don't think there's s direct 1:1 equivalent;

moby/errdefs/defs.go

Line 1 in 46f7ab8

package errdefs

@nealef As you already dug into the code, are you interested in contributing?

nealef · 2023-12-27T11:47:24Z

I note the code in daemon/create.go doesn't differentiate a "bad" err value from a "good" one:

Thanks for reporting; (also at a quick glance), that seems like an ok change to make.

For handling these error-types, we should have a look to either handle that error immediately, and handle containerd's error-type (errdefs.ErrAlreadyExists)

Or if we need to handle it further "up the stack", see if there's a suitable parallel in Moby's "errdefs" (although I don't think there's s direct 1:1 equivalent;

moby/errdefs/defs.go

Line 1 in 46f7ab8

package errdefs

@nealef As you already dug into the code, are you interested in contributing?

I was looking at something like:

--- a/daemon/create.go
+++ b/daemon/create.go
@@ -8,6 +8,7 @@ import (
        "time"

        "github.com/containerd/containerd/platforms"
+       ctrerr "github.com/containerd/containerd/errdefs"
        "github.com/containerd/log"
        "github.com/docker/docker/api/types/backend"
        containertypes "github.com/docker/docker/api/types/container"
@@ -198,7 +199,7 @@ func (daemon *Daemon) create(ctx context.Context, daemonCfg *config.Config, opts
        ctr.ImageManifest = imgManifest

        if daemon.UsesSnapshotter() {
-               if err := daemon.imageService.PrepareSnapshot(ctx, ctr.ID, opts.params.Config.Image, opts.params.Platform, setupInitLayer(daemon.idMapping)); err != nil {
+               if err := daemon.imageService.PrepareSnapshot(ctx, ctr.ID, opts.params.Config.Image, opts.params.Platform, setupInitLayer(daemon.idMapping)); err != nil && err != ctrerr.ErrAlreadyExists {
                        return nil, err
                }
        } else {

nealef · 2023-12-27T11:53:01Z

Quick "off-the-bat" reply to the second part of your description;

I am also seeing docker images does not report the newly pulled image for images in our content store. The way our content store works is that when containerd calls the Info() we check our cache and if it exists we symlink the cache entry into the content store and tell the caller the layer exists otherwise it goes through normal pull processing.

For transparency; the current implementation of the containerd image-store integration is largely written with the assumption that the "moby" namespace(s) are "owned" by the docker engine, and not manipulated separately.

We're aware that this is not ideal, and have discussed making this more permissive in future (to allow for scenarios where multiple tools act on the same namespace), but it's not been our focus of attention for the initial implementation. As you mentioned, currently some state is kept in the Docker Engine itself; there is some synchronisation (e.g. container state), but there's likely many scenarios where this currently won't work well with other tools managing the data.

Perhaps it would be good to create a separate ticket for this with further details so that it can be tracked and discussed separately.

(if "making this work" would require limited changes, then contributions on improving would be welcome though!)

I'll create a separate ticket and continue to look at the docker images processing to see if things are there and are just getting filtered out for some reason.

I also wonder whether moby is over complicating the containerd stuff. k3s and k8s can also use containerd APIs to pull images but neither appears to need to go to the Prepare() or Stat() level. k3s uses the the imageserver Pull interface which will worry about unpacking and preparing and simply return the Image details back to the caller. However, I am new to the moby internals so there are evidently very good reasons for doing things this way.

thaJeztah · 2023-12-27T12:24:38Z

I was looking at something like:

Hm.. wondering if perhaps that check should live somewhere in imageservice.PrepareSnapshot(). It feels overly broad to ignore any ctrerr.ErrAlreadyExists there (there could be other unexpected paths that return that error). It feels like it should be handled more specifically inside that function.

I also wonder whether moby is over complicating the containerd stuff. k3s and k8s can also use containerd APIs to pull images but neither appears to need to go to the Prepare() or Stat() level. k3s uses the the imageserver Pull interface which will worry about unpacking and preparing and simply return the Image details back to the caller. However, I am new to the moby internals so there are evidently very good reasons for doing things this way.

I haven't looked into those details yet, but know that plans are to move more functionality to containerd code where possible. That said, there's been quite some handling specific to moby that had to be accounted for, and not all features required to handle that was available (yet) in containerd's code.

/cc @dmcgowan @vvoland who may have more ideas here.

rumpl · 2023-12-27T12:35:37Z

I also wonder whether moby is over complicating the containerd stuff. k3s and k8s can also use containerd APIs to pull images but neither appears to need to go to the Prepare() or Stat() level. k3s uses the the imageserver Pull interface which will worry about unpacking and preparing and simply return the Image details back to the caller. However, I am new to the moby internals so there are evidently very good reasons for doing things this way.

It looks like this mostly because we want the switch to the containerd image store to be as transparent as possible and not break anyone, which means we have to reimplement everything the same way it already works. And sometimes the containerd API just doesn't have what we need (reference counting the mounts for example).

rumpl · 2023-12-27T12:38:59Z

As for the pulled image not being listed, could you restart dockerd with debug logging enabled and give us the logs after you run docker images? There are cases in image listing where we silently ingore things we can't list, logs could help

nealef · 2023-12-27T23:11:52Z

As for the pulled image not being listed, could you restart dockerd with debug logging enabled and give us the logs after you run docker images? There are cases in image listing where we silently ingore things we can't list, logs could help

Dec 27 18:06:17 snap.lab.sinenomine.net dockerd[120493]: time="2023-12-27T18:06:17.159563586-05:00" level=debug msg="Calling HEAD /_ping"
Dec 27 18:06:17 snap.lab.sinenomine.net dockerd[120493]: time="2023-12-27T18:06:17.160959857-05:00" level=debug msg="Calling GET /v1.43/images/json"

That's all there is in the dockerd log.

Running an strace on the docker images command I see the GET command and the empty response:

124247 write(3, "GET /v1.43/images/json HTTP/1.1\r\nHost: docker\r\nUser-Agent: Docker-Client/24.0.2 (linux)\r\n\r\n", 91) = 91
: 
124246 <... read resumed>"HTTP/1.1 200 OK\r\nApi-Version: 1.44\r\nContent-Type: application/json\r\nDocker-Experimental: false\r\nOstype: linux\r\nServer: Docker/dev (linux)\r\nDate: Wed, 27 Dec 2023 23:09:00 GMT\r\nContent-Length: 3\r\n\r\n[]\n", 4096) = 200

rumpl · 2023-12-27T23:25:16Z

Strace on the cli won’t get you any useful information no, the daemon is the one doing all the work.

Not having any logs when image list is called is helpful though, we’ll take a closer look.

If we don’t find anything, would you mind helping us debug this issue? I understand your code is proprietary for now but if we gave you a patch with some extra logs in the daemon would that work for you?

nealef · 2023-12-27T23:29:43Z

I am attempting to instrument getImagesJSON to see what it's seeing. It calls ir.backend.Images and I am trying to figure what that resolves to.

nealef · 2023-12-28T00:29:21Z

The following test is failed by the cached image:

func (im *ImageManifest) CheckContentAvailable(ctx context.Context) (bool, error) {
        // The target is already a platform-specific manifest, so no need to match platform.
        pm := cplatforms.All

        available, _, _, missing, err := containerdimages.Check(ctx, im.ContentStore(), im.Target(), pm)
        if err != nil {
                return false, err
        }

        if !available || len(missing) > 0 {
                for _, miss := range(missing) {
                        log.G(ctx).Infof("missing: %v", miss)
                }
                return false, nil
        }

        return true, nil
}

missing is non-zero and contains:

application/vnd.docker.image.rootfs.diff.tar.gzip sha256:8237fce9fd6b5acc12e7010d84e6245c6b00937de832c1987c4769866af1573a 64872433 [] map[] [] <nil>

This looks like my content store problem. This layer should exist in content store. It exists in my cache so when the content store server calls my content store's Info() function I should locate it in the cache and symlink it to the content store's area.

nealef · 2023-12-28T00:59:22Z

It appears that when an image doesn't exist we do the pull and then a commit. It is during the commit that the content store's Info() API is invoked. It is at this point my caching content store will setup the symlink to the cache (if present in the cache). However, if the image does exist pull just says image pulled and we don't commit. I wonder if I could add code to invoke Info() for its layers. This should have no effect if the image exists and its not cached (in fact if the symlink is already present then calling Info() on a cached image will have no effect either).

I am seeing Info() being called for the list of manifests, the configuration, and amd64 manifest but not the diff layer.

nealef · 2023-12-28T01:13:12Z

Also, you are using the client.Pull() API. When unpacking is specified the snapshotter is invoked and the snapshotter will prepare a snapshot (see the Unpack() function in containerd/image.go). So subsequent explicit invocation of the snapshotter seems redundant. Are you gathering information as well as creating a snapshot? Would a Stat() suffice?

nealef · 2023-12-28T07:27:31Z

What I believe is happening is that during the pull when it decides whether to pull in a diff it checks the snapshot area for a corresponding diff (for the 1st layer) or chainid (for subsequent layers), if it finds it (via the Stat() call) it doesn't bother to download the gzipped layer (or in the case of our content store create a symlink to the cache) because it assumes the compressed layer must be in the content store. Therefore the content store doesn't have that/those layer(s) so when we try a docker images it does a Check() operation and it doesn't find the layers so does not return information for that image.

So this is not a problem with docker but the way containerd works. However, does docker need to do the manifest check before it will return details of the image? Could it do the same check(s) it does for docker inspect or docker run - both of which work without they layer existing in the content store?

As an experiment I commented out the CheckContentAvailability() related code in daemon/containerd/image_list.go and docker images worked fine. However, I am not familiar enough with the rest of the code to see what the side-effects of this may be though I note image_list.go is the only code that makes that call.

nealef added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Dec 27, 2023

rumpl added the containerd-integration Issues and PRs related to containerd integration label Dec 27, 2023

thaJeztah added the area/images label Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using containerd snapshotter docker daemon doesn't check for "Already Exists" #46990

When using containerd snapshotter docker daemon doesn't check for "Already Exists" #46990

nealef commented Dec 27, 2023 •

edited

thaJeztah commented Dec 27, 2023

thaJeztah commented Dec 27, 2023

nealef commented Dec 27, 2023

nealef commented Dec 27, 2023

thaJeztah commented Dec 27, 2023

rumpl commented Dec 27, 2023

rumpl commented Dec 27, 2023

nealef commented Dec 27, 2023

rumpl commented Dec 27, 2023

nealef commented Dec 27, 2023 •

edited

nealef commented Dec 28, 2023 •

edited

nealef commented Dec 28, 2023 •

edited

nealef commented Dec 28, 2023

nealef commented Dec 28, 2023 •

edited

When using containerd snapshotter docker daemon doesn't check for "Already Exists" #46990

When using containerd snapshotter docker daemon doesn't check for "Already Exists" #46990

Comments

nealef commented Dec 27, 2023 • edited

Description

Reproduce

Expected behavior

docker version

docker info

Additional Info

thaJeztah commented Dec 27, 2023

thaJeztah commented Dec 27, 2023

nealef commented Dec 27, 2023

nealef commented Dec 27, 2023

thaJeztah commented Dec 27, 2023

rumpl commented Dec 27, 2023

rumpl commented Dec 27, 2023

nealef commented Dec 27, 2023

rumpl commented Dec 27, 2023

nealef commented Dec 27, 2023 • edited

nealef commented Dec 28, 2023 • edited

nealef commented Dec 28, 2023 • edited

nealef commented Dec 28, 2023

nealef commented Dec 28, 2023 • edited

nealef commented Dec 27, 2023 •

edited

nealef commented Dec 27, 2023 •

edited

nealef commented Dec 28, 2023 •

edited

nealef commented Dec 28, 2023 •

edited

nealef commented Dec 28, 2023 •

edited