New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permanent "Removal In Progress" for old containers, using zfs storage driver #40132
Comments
When I drill down into one of these containers:
I see this dataset mentioned: And yet when I check to see which datasets exist with similar names I see this:
So clearly there are some datasets being left, just not the one which docker is trying to delete. |
Maybe there is a connection to docker/for-linux#124 (comment) ? |
I get into this state using zfs and swarm too, but only when rebooting. It doesn't matter if the containers are running or stopped (i.e. node drained), they always appear "dead" after a reboot, and can't be deleted through the cli. Restarting the docker systemd service does not cause the condition for me. All I can think of is that perhaps there is a race condition between systemd bringing docker down, and zfs having committed the filesystem destroys and returning. I haven't had time to test this theory yet. At the moment, I'm draining the node, waiting for things to finish, then doing a system prune before doing the reboot. |
|
No, if they're stuck you have to stop Docker, go into /var/lib/docker and
delete things in, for example, "containers". You may wish to inspect the
containers first to check zfs really did delete the filesystem too.
…On Thu, 31 Oct 2019, 03:40 satmandu, ***@***.***> wrote:
docker system prune -a doesn't appear to remove the dangling containers
for me.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#40132?email_source=notifications&email_token=AAAEGOVPIFZ5V76KZGV42WLQRG2JLA5CNFSM4JEWYP72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECU4AHY#issuecomment-547995679>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEGOS45QXHITEHYATDXFDQRG2JLANCNFSM4JEWYP7Q>
.
|
I have the same issue. The workaround I've found so far consist of:
I'm looking forward to get a real solution |
I got tired of the issue so chose to create a zvol, format it with ext4, mount I don't think the |
I am running in to this problem as well. If you need so info or someone to test I am available for either. This is the error I got: To be able to remove the dead containers I had to create 2 zfs file systems zfs create zfspoola/docker/122185d523e2662465ade50473fb9a8b523d5667e5d7d4eb3f4a6645be2d0f65 and zfs create zfspoola/docker/122185d523e2662465ade50473fb9a8b523d5667e5d7d4eb3f4a6645be2d0f65-init |
This is a partial, ugly, unsafe solution which appears to be helping me.
|
@satmandu thank you for this! works great. |
@satmandu Also thank you so much this is working for me also! Gotta admit though that's a scary command to just run. |
Ran into the same issue - Created a similar script to the one @satmandu created but it somehow works better for me. You need to have jq installed. #!/bin/bash
stuck=$(docker ps -a | grep Removal | cut -f1 -d' ')
echo $stuck
for container in $stuck; do
zfs_path=$(docker inspect $container | jq -c '.[] | select(.State | .Status == "dead")|.GraphDriver.Data.Dataset')
zfs_path=$(echo $zfs_path|tr -d '"')
sudo zfs destroy -R $zfs_path
sudo zfs destroy -R $zfs_path-init
sudo zfs create $zfs_path
sudo zfs create $zfs_path-init
docker rm $container
done |
shellcheck suggests these changes. I imagine you want this as clean as possible since you could really destroy data. Anyways, I just setup a cron job to run this once a day now.
|
Oh my gosh - I find this workaround a little hilarious, but also VERY welcome. Thanks y'all. I'm experimenting with a new infra setup centered around ZFS+Docker, and yeah, I'm having this issue as well. It seems to occur for me on reboots pretty much exclusively - on a fresh system install, I can spin up a bunch of stacks and then remove them, and all containers will get destroyed properly. But if I reboot with all of those stacks running, I'll get a ton of zombie containers left over when it boots back up. This seems to produce additional issues with new containers coming up sometimes, I assume because Swarm gets hung up trying to remove them. |
Can add another datapoint of this happening on every reboot. Make running Ceph with cephadm a huge pain because I have to run the script before restarting Ceph. Ubuntu 20.04, |
The Root Cause is being discussed here: #41055 |
That issue seems unrelated to this issue. That one seems to be about Docker creating many ZFS volumes, making external tools and other tools have issues. I cannot seem to find anything in that issue about Docker failing to start or having dangling containers for removal in that issue. |
That issue is literally about docker creating zfs datasets and not deleting them, which are literally the dangling containers I was talking about when I created this bug report. |
I'm having same issue, recently installed ubuntu over zfs and it seems that docker took zfs to use as file system by default. @dotwaffle could you explain how did you do it (mount a ext4 vol and move docker to overlay fs)? or if you have a link explaining steps to make it would be great. Thanks. |
@xavier83ar Once openzfs/zfs#9414 is merged into zfs the problem should hopefully go away. This won't make it into openzfs before 2.1 though, probably sometime in 2021. I worked around the problem for now by just adding a xfs volume in fstab:
(Of course you need to format a partition with xfs using mkfs.xfs for use there.) Then make sure that /etc/docker/daemon.json has a section like this:
|
I have been trying to deal with this myself, and this was my workaround:
No need for jq, grep or cut, just use the right filter and format for |
Merging the approaches from @satmandu and @sgiacomel, we get: #!/usr/bin/env bash
for container_id in $(docker container ls -qa --filter status=removing); do
zpool_object=$(docker container inspect --format='{{.GraphDriver.Data.Dataset}}' ${container_id})
zfs destroy -R "${zpool_object}"
zfs destroy -R "${zpool_object}-init"
zfs create "${zpool_object}"
zfs create "${zpool_object}-init"
docker container rm $container_id
done |
I have the same problem. Creating/destroying manually does not really help in my case, because the bug already happens when an interim container in the Dockerfile is deleted. Unfortunately, this is something I use quite a bit. So for now I think it would be better to warn people from running Docker on ZFS (which btw is now featured in the newest Ubuntu install which is also the reason I have it). |
I can confirm that this works nicely. |
We recently merged (or are in the process of merging) a change that will use overlay2 as the default driver over zfs or btrfs for new setups. |
I am sadly plagued with the same errors, the only thing I've found to help is Which nukes all containers :/ |
Maybe best avoid docker on ZFS for now. Best workaround in my opinion if you need to use zfs is to create a separate ext4 volume for docker as described above. |
The simplest workaround for the "dataset does not exist" is to just zfs create the dataset it wishes exists, then re-run the docker rm, and it will be destroyed and all will be happy. |
Thank you, that's what I've decided to do :) |
Looks like we're getting closer! openzfs/zfs#12209 just got completed! |
This is still an issue for me. I notice that referenced openzfs PRs don't seem to have been merged. |
As per openzfs/zfs#9414 (comment) overlayfs support has been merged! I'm not sure if that will require a post 2.1.x zfs release though... |
Looks like openzfs/zfs@dbf6108 isn't in the 2.1.x tree. We can ask to have that back ported for overlayfs support in 2.1.8. |
openzfs/zfs#14070 (comment) states that overlayfs support will not be back ported to 2.1.x, but will be added to a subsequent release. |
Sad faces all round. Thanks @satmandu |
ZFS 2.2 is not out yet. Is this near or far future? Does anyone know? |
Maybe bring up that question on the zfs mailing lists? |
Still seeing this problem on Ubuntu 22.04 with Docker 20.10.22 |
This problem won't (likely) go away until we have ZFS 2.2 available. That's not the case yet. |
This issue has been open for 3+ years. This is just ridiculous... |
Almost as ridiculous as someone who comes into an issue and decides to make a useless angry post about it just because they still see a green Open symbol. I'll just offer an alternative approach for right now: Docker on ZFS works great, aside from this annoying but aesthetic issue. It doesn't affect performance or stability, so I just deal with these dead containers every once in a while. I just run the workaround bash script to clean then up when it gets cluttered. |
@oramirite I did move away from "Docker on ZFS" somewhere between today and three years ago. The issue did affect performance and eventually rendered my deployment server unresponsive, sometimes only solvable via a hard reset of the machine. So yes, sorry to say, this is not a minor inconvenience bug and 3+ years to resolve it justifies some hard feelings. |
I use a cronjob to deal with this |
@ThomDietrich That's good to know, thanks. What sort of performance issues were you experiencing? You're sure they were tied to this issue? I'll have to keep an eye out for that (Watch, now I'll get called out for not reading something earlier in the thread too, lol) I didn't mean to sound like the guy that handwaves away legitimate issues, my bad there. We of course all want to improve the software. Y'all are preaching to the choir, is the only issue. |
Happy to contribute where I can. Thanks! |
Note that this issue should be resolved once zfs 2.2 comes out as then you should be able to just use the overlay driver for docker with zfs... |
For anybody interested in another type work-around: you can create an ext4 filesystem within your zfs pool. I was inspired by this article. I created a zvol with an ext4 filesystem in it:
Then I stopped the docker daemon and cleaned Then I mounted the filesystem. In
Then:
Also, in my docker config, I changed the storage driver: "storage-driver": "overlay2" Finally, I restarted docker. |
Feel free to try a build of zfs/master with the overlay2 storage driver on ubuntu 22.10, since zfs-master now supports overlay. I'm doing so right now using my own compiled build of zfs-master (which will eventually become 2.2) here: https://launchpad.net/~satadru-umich/+archive/ubuntu/zfs-experimental |
Sorry, missed this till now. Thanks for the tip. Unfortunately, when I zfs create the dataset and docker rm again, it says "filesystem has dependent clones" and lists ~100 datasets. Then it goes back to the same state, with the "dataset does not exist" error. I guess it's able to remove the dataset but not its clones? Is there any way to have it discard the reliance on the old data set name when re-creating the docker? |
@satmandu thanks for sharing your solution! I am currently in the process of setting up a new system. Just to be clear: I intend to install the zfs package from your ppa on Ubuntu 22.04 LTS, then configure docker with Cheers! |
@ThomDietrich I would hold off for a couple of days or weeks until a patch for the issue at openzfs/zfs#13608 is tested, as it might affect docker volumes. Having said that, I've only tested my ppa with 22.10 and 23.04. The ppa is not built for 22.04. I use the overlay storage driver on my own system, with kernel 6.2.0, and it has worked stably for me with multiple volumes, primarily mirrored. I'm not sure when OpenZFS 2.2 will get officially released though. It might get released in time for inclusion in Ubuntu 23.10, but as I'm not a maintainer of that software, I have no way of knowing. |
Hey guys, i am currently on zfs-2.1.14 and i cant update due to os limitations. I have multiple containers stuck in the "Removal In Progress" status and the various scripts here do not work. For example here is the output of one of the scripts here i tried:
|
Hey I know its been years since you posted this but just wanted to let you know this is a pretty good "hack" workaround to manually recreate the containers. What happened to me was the zfs datasets were created through docker however another container pruning program pruned the dataset and removed them. Thanks for the advice (5 years later). |
Starting to feel like https://m.xkcd.com/2881/ here |
Hi, bug mates ! |
Description
Docker containers slated for removal are piling up, and then when I attempt manual removal, get a message of
driver "zfs" failed to remove root filesystem: exit status 1:
docker ls -a
shows many containers waiting for removal.docker ps -a | grep Removal | cut -f1 -d' ' | xargs -rt docker rm
fails.Steps to reproduce the issue:
This is happening for many containers created, and persists across reboots.
Running on a current Ubuntu 19.10 system with a zfs rpool (though I've seen this recently also on a non-zfs root system with a zfs volume for /var/lib/docker)
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
The text was updated successfully, but these errors were encountered: