lxd forkstart ... failed to get real path for ... #2814

Closed
cpaelzer opened this Issue Jan 24, 2017 · 31 comments

Comments

Projects
None yet
6 participants

Required information

  • Distribution: Ubuntu
  • Distribution version: Xenial
  • The output of "lxc info" or if that fails:
  • lxc info
apiextensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
apistatus: stable
apiversion: "1.0"
auth: trusted
environment:
  addresses: []
  architectures:
  - ppc64le
  certificate: |
  [...]
  certificatefingerprint: 7a49e772f33c732a951b3df6f42921183006aca680df02fb995dd8f68390c4a0
  driver: lxc
  driverversion: 2.0.6
  kernel: Linux
  kernelarchitecture: ppc64le
  kernelversion: 4.4.0-57-generic
  server: lxd
  serverpid: 95372
  serverversion: "2.7"
  storage: zfs
  storageversion: "5"
config:
  storage.zfs_pool_name: lxd
public: false

Issue description

Hi,
I ran into a situation being unable to start any more lxd containers. It only seems to happen on the systems that I have using zfs as backing for the containers. I don't think it is related, but so far only occurring on my ppc64el boxes (just happen to be the zfs ones atm).
It seems to work once (starting a bunch of containers and working with them), but after I removed them and re-create them from scratch I ran into:

Here a lxc info after the failed launch:

$ lxc info --show-log local:testkvm-xenial-from
Name: testkvm-xenial-from
Remote: unix:/var/lib/lxd/unix.socket
Architecture: ppc64le
Created: 2017/01/24 07:43 UTC
Status: Stopped
Type: persistent
Profiles: kvm

Log:

            lxc 20170124074307.470 ERROR    lxc_conf - conf.c:mount_rootfs:807 - No such file or directory - failed to get real path for '/var/lib/lxd/containers/testkvm-xenial-from/rootfs'
            lxc 20170124074307.471 ERROR    lxc_conf - conf.c:setup_rootfs:1221 - failed to mount rootfs
            lxc 20170124074307.471 ERROR    lxc_conf - conf.c:do_rootfs_setup:3671 - failed to setup rootfs for 'testkvm-xenial-from'
            lxc 20170124074307.471 ERROR    lxc_conf - conf.c:lxc_setup:3753 - Error setting up rootfs mount after spawn
            lxc 20170124074307.471 ERROR    lxc_start - start.c:do_start:826 - Failed to setup container "testkvm-xenial-from".
            lxc 20170124074307.471 ERROR    lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 3)
            lxc 20170124074307.526 ERROR    lxc_start - start.c:__lxc_start:1338 - Failed to spawn container "testkvm-xenial-from".
            lxc 20170124074308.080 ERROR    lxc_conf - conf.c:run_buffer:347 - Script exited with status 1
            lxc 20170124074308.080 ERROR    lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "testkvm-xenial-from".
            lxc 20170124074308.080 WARN     lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - command get_cgroup failed to receive response
            lxc 20170124074308.080 WARN     lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - command get_cgroup failed to receive response

Ok, I thought it could be my KVM profile, so I tried without

$ lxc launch ubuntu-daily:xenial/ppc64el testkvm-xenial-test2
Creating testkvm-xenial-test2
error: No such file or directory: "/var/lib/lxd/containers/testkvm-xenial-test2.zfs/rootfs"

Well that is a different error, but still not working.

Yet FYI here the current KVM profile:

$ lxc profile show kvm
name: kvm
config:
  boot.autostart: "true"
  linux.kernel_modules: openvswitch,nbd,ip_tables,ip6_tables,kvm
  security.nesting: "true"
  security.privileged: "true"
description: ""
devices:
  eth0:
    mtu: "9000"
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  kvm:
    path: /dev/kvm
    type: unix-char
  mem:
    path: /dev/mem
    type: unix-char
  root:
    path: /
    type: disk
  sharedimg:
    path: /var/lib/uvtool/libvirt
    source: /tmp/testimages
    type: disk
  tun:
    path: /dev/net/tun
    type: unix-char
usedby:
- /1.0/containers/testkvm-xenial-from
- /1.0/containers/testkvm-xenial-test1

Steps to reproduce

I'm guessing here :-/

  1. set up lxd with zfs
  2. launch and delete containers in a loop until launch fails (maybe on ppc64el maybe with the kvm profile I have)

Information to attach

Related bugs / discussions I've found:

FYI hit that with both:

  • 2.0.8-0ubuntu1~ubuntu16.04.2
  • 2.7-0ubuntu2~ubuntu16.04.1

Deleted all containers, then deleted all local images, to then "lxc launch" from scratch.
Leading to:

lvl=dbug msg=1.0/operations/a26d0665-82fd-4de1-a3eb-25a70bde23c8/wait t=2017-01-24T08:24:50+0000
Retrieving image: 100% (16.50MB/s)lvl=dbug msg="Raw response: {\"type\":\"sync\",\"status\":\"Success\",\"status_code\":200,\"metadata\":{\"id\":\"a26d0665-82fd-4de1-a3eb-25a70bde23c8\",\"class\":\"task\",\"created_at\":\"2017-01-24T08:24:50.052722Z\",\"updated_at\":\"2017-01-24T08:24:59.778552Z\",\"status\":\"Failure\",\"status_code\":400,\"resources\":{\"containers\":[\"/1.0/containers/testkvm-xenial-test7\"]},\"metadata\":{\"download_progress\":\"100% (16.50MB/s)\"},\"may_cancel\":false,\"err\":\"Failed to create ZFS filesystem: cannot mount '/var/lib/lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs': directory is not empty\\nfilesystem successfully created, but not mounted\\n\"}}\n" t=2017-01-24T08:24:59+0000
error: Failed to create ZFS filesystem: cannot mount '/var/lib/lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs': directory is not empty
filesystem successfully created, but not mounted

I recognized 2bbf30d457ff as one of the images I deleted.
So there should be nothing.
I compared what lxd thinks there should be and what really is there:

$ sudo ls /var/lib/lxd/images/
161f64ec0a3ec4701aba377bf64a23aef268c69eb889e23a027af956bda10105.zfs 9bfee64af40b2d2166f8a3f62c4f5beb68027eb7ccfa3d16f0a1811df2ccfced.zfs
28734777bc65a64422ddee88bb3869618b92e2d743ddc9d25a2b52f55fa3c65f.zfs b9ab6b380a5fdb08921aece6b77e9e0999165ad2f55c2f490cdfc679d55c621e.zfs
2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs e82872822030b6e78a70b47c04adfcb77ebfc4cea3e2fe9f47c2609e1197afa2.zfs
5e3046de39a8ea72ff804e0d79e351f3b81b5838c716aec4d6df421d04a5b63d.zfs fef7a29a8e3d898e60ccb0e395f7b75261e935b5945bbbab677ac50ba0b17000.zfs
9997093396383fae35f088056a89e02b5a8eeb96212017a95c2b90c202f74c18.zfs

$ lxc image list
+-------+-------------+--------+-------------+------+------+-------------+
| ALIAS | FINGERPRINT | PUBLIC | DESCRIPTION | ARCH | SIZE | UPLOAD DATE |
+-------+-------------+--------+-------------+------+------+-------------+
$ sudo ls /var/lib/lxd/images/
161f64ec0a3ec4701aba377bf64a23aef268c69eb889e23a027af956bda10105.zfs  9bfee64af40b2d2166f8a3f62c4f5beb68027eb7ccfa3d16f0a1811df2ccfced.zfs
28734777bc65a64422ddee88bb3869618b92e2d743ddc9d25a2b52f55fa3c65f.zfs  b9ab6b380a5fdb08921aece6b77e9e0999165ad2f55c2f490cdfc679d55c621e.zfs
2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs  e82872822030b6e78a70b47c04adfcb77ebfc4cea3e2fe9f47c2609e1197afa2.zfs
5e3046de39a8ea72ff804e0d79e351f3b81b5838c716aec4d6df421d04a5b63d.zfs  fef7a29a8e3d898e60ccb0e395f7b75261e935b5945bbbab677ac50ba0b17000.zfs
9997093396383fae35f088056a89e02b5a8eeb96212017a95c2b90c202f74c18.zfs
zfs list
NAME                                                                          USED  AVAIL  REFER  MOUNTPOINT
lxd                                                                          2.55G  47.3G    19K  none
lxd/containers                                                                 19K  47.3G    19K  none
lxd/deleted                                                                    57K  47.3G    19K  none
lxd/deleted/containers                                                         19K  47.3G    19K  none
lxd/deleted/images                                                             19K  47.3G    19K  none
lxd/images                                                                   2.54G  47.3G    19K  none
lxd/images/161f64ec0a3ec4701aba377bf64a23aef268c69eb889e23a027af956bda10105   358M  47.3G   358M  none
lxd/images/28734777bc65a64422ddee88bb3869618b92e2d743ddc9d25a2b52f55fa3c65f   358M  47.3G   358M  none
lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751    19K  47.3G    19K  /var/lib/lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs
lxd/images/5e3046de39a8ea72ff804e0d79e351f3b81b5838c716aec4d6df421d04a5b63d   358M  47.3G   358M  none
lxd/images/9997093396383fae35f088056a89e02b5a8eeb96212017a95c2b90c202f74c18   314M  47.3G   314M  none
lxd/images/9bfee64af40b2d2166f8a3f62c4f5beb68027eb7ccfa3d16f0a1811df2ccfced   314M  47.3G   314M  none
lxd/images/b9ab6b380a5fdb08921aece6b77e9e0999165ad2f55c2f490cdfc679d55c621e   291M  47.3G   291M  none
lxd/images/e82872822030b6e78a70b47c04adfcb77ebfc4cea3e2fe9f47c2609e1197afa2   314M  47.3G   314M  none
lxd/images/fef7a29a8e3d898e60ccb0e395f7b75261e935b5945bbbab677ac50ba0b17000   291M  47.3G   291M  none

Ok that is definetly more than what should still be there, dropping all of it:
$ zfs destroy -r lxd

So all is empty, zfs list, lxc list and lxc image list - yet something old still seems to affect it:
http://paste.ubuntu.com/23856648/

Afterwards lxc still doesn't know about any images or containers, but zfs has a image and mountpoint:

$ sudo zfs list
NAME                                                                          USED  AVAIL  REFER  MOUNTPOINT
lxd                                                                          9.57M  49.9G    19K  none
lxd/images                                                                     38K  49.9G    19K  none
lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751    19K  49.9G    19K  /var/lib/lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs

In there is the base of an image

$ sudo ls /var/lib/lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs/
metadata.yaml  rootfs  templates

I switched to dir based backing and things work again as they should.
Let me know if I can help debugging on my system.

Member

brauner commented Jan 24, 2017

This sounds like the symlinks in /var/lib/lxd/containers are invalid or not created after the relaunch. Can you reproduce and do:

ls -al /var/lib/lxd/containers

?

Member

brauner commented Jan 24, 2017

Well, the first part of the issue at least.

Hi Christian.

BTW - whatever it is worth I realized it happened on file-backed zfs but not on disk backed.

While reproducing the issue as requested I found that it is now working.
I'll let the automation run that eventually triggered it last time and come back here.

Actually I'll set it up as zfs on disk instead of on file before doing so.

Ha,
while it was running fine against dir backed containers running my tests against zfs (disk backed this time instead of file before) triggered the issue again.

Once more as a reference the current error

At the same time as you asked:

sudo ls -al /var/lib/lxd/containers
total 53
drwx--x--x  3 root root     4096 Jan 24 14:02 .
drwxr-xr-x 10 lxd  nogroup  4096 Jan 24 14:02 ..
-rw-r--r--  1 root root    39328 Jan 24 14:02 lxc-monitord.log
lrwxrwxrwx  1 root root       47 Jan 24 14:02 testkvm-xenial-from -> /var/lib/lxd/containers/testkvm-xenial-from.zfs
drwx------  2 root root        2 Jan 24 14:01 testkvm-xenial-from.zfs

testkvm-xenial-from is the container that failed to launch.

FYI the series of lxc commands grepped from my log leading towards the issue.

Member

tych0 commented Jan 24, 2017

From your log:
err="remove /var/lib/lxd/images/2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751.zfs: directory not empty" fingerprint=2bbf30d457ff27675614ba63f16b594769e51c307c66f377f6a63d52bcdfc751 lvl=eror msg="error deleting the image from storage backend" t=2017-01-24T07:31:31+0000
`

I suspect this is really the root cause of why the images didn't disappear.

Also, although you removed the images from zfs, you also have to remove them from LXD's database in order to really kill them off. I suspect you didn't do that, which is why you're in the confused state. However, the above log entry seems like a legit bug.
`

Hi tycho,
I deleted everything that "lxc image list" showed me until it was empty.
Is there a better way to "remove them from LXD's database" ?

Member

tych0 commented Jan 24, 2017

When you say deleted, do you mean you ran "DELETE FROM images WHERE ..." in lxd's database, or just a zfs delete?

Member

tych0 commented Jan 24, 2017

Oh, I see. That should be okay, then. I was referring to,

Ok that is definetly more than what should still be there, dropping all of it:
$ zfs destroy -r lxd

which isn't enough to actually delete the images.

Owner

stgraber commented Jan 24, 2017

The original error and some of the following problems could be explained by a "zfs umount -a" having been run on the machine. This would have unmounted all the ZFS mountpoints that LXD relies on and removed the mountpoints, causing all our symlinks to be invalid.

Then deleting containers and images would have made things worse as LXD wouldn't have been able to detect that they were on ZFS and cause partial image and container removal, leaving the zfs entries behind and causing a big mess on the system.

Owner

stgraber commented Jan 24, 2017

In your last example, this would mean that "testkvm-xenial-from.zfs" is empty as it's not mounted.

Owner

stgraber commented Jan 24, 2017

Yeah, the output of both "zfs list -t all" and "cat /proc/mounts" would be useful to see what's going on. That and confirming that the directory isn't empty.

@stgraber stgraber added the Incomplete label Jan 24, 2017

FYI - since I run the automation that seems to work as a reproducer on multiple systems I've found that my s390x box also hits the same case (my x86 system has no zfs backing atm, I'm pretty sure it would hit it as well). But that at least confirms that it is not just "one" system but seems to be reproducible on more than that.

After hitting the error on this system I again see this issue and can't launch new containers.

And yeah Stephane - it seems empty right when the issue occurs.

$ sudo ls -al /var/lib/lxd/containers
total 49
drwx--x--x 4 root root     4096 Jan 25 02:30 .
drwxr-xr-x 9 lxd  nogroup  4096 Jan 25 02:30 ..
-rw-r--r-- 1 root root    34911 Jan 25 02:31 lxc-monitord.log
lrwxrwxrwx 1 root root       47 Jan 25 02:27 testkvm-xenial-from -> /var/lib/lxd/containers/testkvm-xenial-from.zfs
drwx------ 2 root root        2 Jan 25 02:27 testkvm-xenial-from.zfs
lrwxrwxrwx 1 root root       48 Jan 25 02:30 testkvm-xenial-test2 -> /var/lib/lxd/containers/testkvm-xenial-test2.zfs
drwx------ 2 root root        2 Jan 25 02:27 testkvm-xenial-test2.zfs

$ sudo ls -al /var/lib/lxd/containers/testkvm-xenial-test2.zfs
total 5
drwx------ 2 root root    2 Jan 25 02:27 .
drwx--x--x 4 root root 4096 Jan 25 02:30 ..

zfs reports that as a known mount point, but while empty it seems mounted to me.
Full list of zfs and /proc/mounts
Here the excerpt that should be the most interesting parts:

$ sudo zfs list -t all | grep test2
lxd/containers/testkvm-xenial-test2                                                     64K  32,3G    96K  /var/lib/lxd/containers/testkvm-xenial-test2.zfs

$ cat /proc/mounts | grep test2
lxd/containers/testkvm-xenial-test2 /var/lib/lxd/containers/testkvm-xenial-test2.zfs zfs rw,relatime,xattr,noacl 0 0
/dev/dasda1 /var/lib/lxd/devices/testkvm-xenial-test2/disk.var-lib-uvtool-libvirt ext4 rw,relatime,errors=remount-ro,data=ordered 0 0

Your comments got me thinking on the mount state and I wanted to see if anything changes if I do sudp zfs mount -a. Doing so I got

$ sudo zfs mount -a
cannot mount '/var/lib/lxd/images/7a53ade547cfc7279b36d35dc5463f90a38fa50893ba90529179a87c452213b9.zfs': directory is not empty

Knowing that hash from the error I found that this is just the image I wanted to spawn (Xenial)

$ lxc image list 7a53ade547cf
+-------+--------------+--------+---------------------------------------------+-------+----------+------------------------------+
| ALIAS | FINGERPRINT  | PUBLIC |                 DESCRIPTION                 | ARCH  |   SIZE   |         UPLOAD DATE          |
+-------+--------------+--------+---------------------------------------------+-------+----------+------------------------------+
|       | 7a53ade547cf | no     | ubuntu 16.04 LTS s390x (daily) (20170119.1) | s390x | 137.13MB | Jan 25, 2017 at 7:27am (UTC) |
+-------+--------------+--------+---------------------------------------------+-------+----------+------------------------------+

And in there is

$ sudo ls -laF /var/lib/lxd/images/7a53ade547cfc7279b36d35dc5463f90a38fa50893ba90529179a87c452213b9.zfs
total 20
drwxr-xr-x  4 root root 4096 Jan 25 02:27 ./
drwx------ 16 root root 4096 Jan 25 02:27 ../
-rw-r--r--  1 root root 1566 Jan 19 12:01 metadata.yaml
drwxr-xr-x 21 root root 4096 Jan 24 21:43 rootfs/
drwxr-xr-x  2 root root 4096 Jan 19 12:01 templates/

So I'd assume what happened was that at some point the Xenial base image (not the individual guest) got unmounted. Yet this was not catched by any tooling, and due to that file content got placed in the path.
Later on other parts stumble over being unable to mount the path and that is my broken state that I end up in.

I moved the content off that path as I wanted to compare it with what would be in there when zfs is mounted, but the zfs mount was empty. After mounting the zfs path to the dir I retried starting an instance, but since the zfs mount for the base image is empty it still runs into: No such file or directory - failed to get real path for '/var/lib/lxd/containers/testkvm-xenial-test3/rootfs'.

Deleting the image via lxc image delete 7a53ade547cf unmounts the path again as expected. E.g. cat /proc/mounts | grep 7a53ad is empty now. Launching an instance now I'd have expected that all blockers are out of the way, but it runs into the same issue. Same image hash got re-synced (saw the download progress and now shows up in lxc image), the path got mounted (is in proc/mounts) but it is still empty and the effective error to the user stays the same as before.

Cleaning and switching to dir backed for now - please advise how to further debug

Owner

stgraber commented Jan 25, 2017

Ok, so that seems to be pretty reliably reproducible for you.

Can you give us step by step instructions (commands please) to get into that state from a cleanly installed LXD host with ZFS?

Since you're the only one who's ever reported anything like this, there must be something odd somewhere in there which is causing a zfs mount failure at some point but without being able to reproduce this ourselves, the odds of figuring it out are pretty slim.

Sure,
Surely one could drop a lot from it, but I don't know what so the current best way to likely trigger is to clone the migration tests.

Run stage 1&2 from test-dev-ppa.sh, just comment out the rest - so far with zfs enabled in about 50% of the cases it ends up in the state. It seems it always works on the first try, so just run that one in a loop until you are in the bad state.

If you want to start simple, these are all lxc commands from my log leading to the issue. You'd just have to pick up a few minor things (like my kvm profile) from the git above. Much simpler for sure, but since I don't know if what I do in the containers is important I wanted to pass the full set test well.

Let me know if this works to reproduce - otherwise we have to consider me passing you a login to a broken system.

powersj commented Feb 28, 2017

Ran into this same issue today on our amd64 test system:
http://paste.ubuntu.com/24084788/

If there is additional data you wish for me to collect let me know, I will try to get steps to reproduce, but my observation has been that it starts happening after we run cpaelzer's tests as he describes above.

powersj commented Mar 11, 2017

Another time:
https://paste.ubuntu.com/24158214/

I have noticed a correlation when this failure occurs: I have existing containers running on the daily image, when the containers attempts to launch, it also attempts to download the new daily image.

I have also noticed that I can go in to /var/lib/lxd/images and blow away the offending image everything will work. However in this case I have running tests on other containers, so I do not want to do that.

Member

brauner commented Mar 11, 2017

@powersj, that sounds like a race between image {deletion,creation} and container creation. Storage-api capable LXD instances are a little better in this regard since I gave them a slightly better locking mechanism. If you have machines using LXD 2.11. can you try to reproduce there?

powersj commented Mar 13, 2017

I have updated our amd64 and ppc64el systems to LXD 2.11 via the ppa. Will report back later this week.

powersj commented Mar 15, 2017

We had LXD failures occur last night while running on LXD 2.11. However, when I got on the system and tried to launch images I was able to. In the past the situation would cause a scenario where to eventually recover I had to go in and delete the offending image in /var/lib/lxd/imges. The error message has also changed, so I am hoping this has to do with your locking mechanism:

https://paste.ubuntu.com/24182975/

Member

brauner commented Mar 15, 2017

@powersj, thanks! Can you please also provide the output of `zfs list -t all? :)

powersj commented Mar 15, 2017

Here you are: https://paste.ubuntu.com/24183037/

@brauner I can also get you on the system if you want.

Member

brauner commented Mar 15, 2017

Hm, I wonder whether this is the same error. If it is than we should be able to fix it because the fact that the umount failed might be caused by not using MNT_DETACH somewhere in our code.

jfkw commented Mar 15, 2017

I'm seeing the same error on Gentoo Linux lxd-2.11 with ZFS. I previously ran lxd-2.8 without hitting this error. I upgraded all versions between 2.9 and 2.11, but first used lxd again with version 2.11 at which point the error was present.

Config info on this system:

$ lxc info --show-log xenial01
Name: xenial01
Remote: unix:/var/lib/lxd/unix.socket
Architecture: x86_64
Created: 2017/03/15 18:58 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc 20170315185837.652 WARN  lxc_cgmanager - cgroups/cgmanager.c:cgm_get:992 - do_cgm_get exited with error

lxc 20170315185837.724 ERROR lxc_conf - conf.c:mount_rootfs:865 - Permission denied - failed to get real path for '/var/lib/lxd/containers/xenial01/rootfs'

lxc 20170315185837.724 ERROR lxc_conf - conf.c:setup_rootfs:1279 - failed to mount rootfs

lxc 20170315185837.724 ERROR lxc_conf - conf.c:do_rootfs_setup:3750 - failed to setup rootfs for 'xenial01'

lxc 20170315185837.724 ERROR lxc_conf - conf.c:lxc_setup:3832 - Error setting up rootfs mount after spawn

lxc 20170315185837.724 ERROR lxc_start - start.c:do_start:811 - Failed to setup container "xenial01".

lxc 20170315185837.724 ERROR lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 3)

lxc 20170315185837.734 ERROR lxc_start - start.c:__lxc_start:1346 - Failed to spawn container "xenial01".

lxc 20170315185837.772 ERROR lxc_conf - conf.c:run_buffer:405 - Script exited with status 1.

lxc 20170315185837.772 ERROR lxc_start - start.c:lxc_fini:546 - Failed to run lxc.hook.post-stop for container "xenial01".

lxc 20170315185837.772 WARN  lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - Command get_init_pid failed to receive response: Connection reset by peer.

lxc 20170315185837.772 WARN  lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - Command get_init_pid failed to receive response: Connection reset by peer.

lxc 20170315185837.773 WARN  lxc_cgmanager - cgroups/cgmanager.c:cgm_get:992 - do_cgm_get exited with error

lxc 20170315185837.773 WARN  lxc_cgmanager - cgroups/cgmanager.c:cgm_get:992 - do_cgm_get exited with error

lxc 20170315185853.583 WARN  lxc_cgmanager - cgroups/cgmanager.c:cgm_get:992 - do_cgm_get exited with error

lxc 20170315185853.639 WARN  lxc_cgmanager - cgroups/cgmanager.c:cgm_get:992 - do_cgm_get exited with error

lxc 20170315185853.655 WARN  lxc_cgmanager - cgroups/cgmanager.c:cgm_get:992 - do_cgm_get exited with error
% lxc info
config:
  storage.zfs_pool_name: lxd
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
api_status: stable
api_version: "1.0"
auth: trusted
public: false
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    (REDACTED)
    -----END CERTIFICATE-----
  certificate_fingerprint: (REDACTED)
  driver: lxc
  driver_version: 2.0.7
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.10.1-gentoo
  server: lxd
  server_pid: 29349
  server_version: "2.11"
  storage: zfs
  storage_version: 0.7.0-rc3_123_gebd9aa8c
% sudo zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
lxd    218G  1.70G   216G         -     0%     0%  1.00x  ONLINE  -
$ getfacl --version
getfacl 2.2.52
                                                             USED  AVAIL  REFER  MOUNTPOINT
lxd                                                                                   1.70G   209G    19K  none
lxd/containers                                                                        6.42M   209G    19K  none
lxd/containers/wheezy01                                                               1.42M   209G   149M  /var/lib/lxd/storage-pools/lxd/containers/wheezy01
lxd/containers/xenial01                                                               4.98M   209G   353M  /var/lib/lxd/storage-pools/lxd/containers/xenial01
lxd/deleted                                                                             38K   209G    19K  none
lxd/deleted/images                                                                      19K   209G    19K  none
lxd/images                                                                            1.69G   209G    19K  none
lxd/images/2b6af665ca618c1e9d0f18a444a13d4da8c7daca74e636a5a30fa70e82960401            149M   209G   149M  none
lxd/images/2b6af665ca618c1e9d0f18a444a13d4da8c7daca74e636a5a30fa70e82960401@readonly      0      -   149M  -
lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039            345M   209G   345M  /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039@readonly      0      -   345M  -
lxd/images/60554d9728a67e0105729e78dccba15dd47af7bb37ab94f85d050bc9c493a1a9            375M   209G   375M  none
lxd/images/60554d9728a67e0105729e78dccba15dd47af7bb37ab94f85d050bc9c493a1a9@readonly      0      -   375M  -
lxd/images/64646a69e8f77a5134d4078e7e3a5c5fa573c15de5303dbfab534eaf687c96aa            353M   209G   353M  none
lxd/images/64646a69e8f77a5134d4078e7e3a5c5fa573c15de5303dbfab534eaf687c96aa@readonly      0      -   353M  -
lxd/images/9276757deba95f3a0c7f0081428c9e369e406763d17784fc58860aed17050fca            149M   209G   149M  /var/lib/lxd/images/9276757deba95f3a0c7f0081428c9e369e406763d17784fc58860aed17050fca.zfs
lxd/images/9276757deba95f3a0c7f0081428c9e369e406763d17784fc58860aed17050fca@readonly      0      -   149M  -
lxd/images/cf76d41f6b6698861b1bb5e05d111b05172967a3abbb64f9286f9cfb05cd90f0            352M   209G   352M  /var/lib/lxd/images/cf76d41f6b6698861b1bb5e05d111b05172967a3abbb64f9286f9cfb05cd90f0.zfs
lxd/images/cf76d41f6b6698861b1bb5e05d111b05172967a3abbb64f9286f9cfb05cd90f0@readonly      0      -   352M  -
lxd/images/ec522e80a83348afdd8c42b8dd322c98f589d1db6dfe73f8c664fa4826929205           4.74M   209G  4.74M  none
lxd/images/ec522e80a83348afdd8c42b8dd322c98f589d1db6dfe73f8c664fa4826929205@readonly      0      -  4.74M  -
lxd/images/f323ce7a65bb1541f9c9281fadc2955f50afb4513814389911487627eec14b5b           4.65M   209G  4.65M  /var/lib/lxd/images/f323ce7a65bb1541f9c9281fadc2955f50afb4513814389911487627eec14b5b.zfs
lxd/images/f323ce7a65bb1541f9c9281fadc2955f50afb4513814389911487627eec14b5b@readonly      0      -  4.65M  -
% cat /proc/mounts
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=10240k,nr_inodes=2036437,mode=755 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
/dev/sda1 / ext4 rw,noatime,nouser_xattr,data=ordered 0 0
tmpfs /run tmpfs rw,nodev,relatime,size=1629656k,mode=755 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
shm /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
cgroup_root /sys/fs/cgroup tmpfs rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755 0 0
openrc /sys/fs/cgroup/openrc cgroup rw,nosuid,nodev,noexec,relatime,release_agent=/lib64/rc/sh/cgroup-release-agent.sh,name=openrc 0 0
cpuset /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset,clone_children 0 0
cpu /sys/fs/cgroup/cpu cgroup rw,nosuid,nodev,noexec,relatime,cpu 0 0
cpuacct /sys/fs/cgroup/cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct 0 0
blkio /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
memory /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
devices /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
freezer /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
net_cls /sys/fs/cgroup/net_cls cgroup rw,nosuid,nodev,noexec,relatime,net_cls 0 0
perf_event /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event,release_agent=/run/cgmanager/agents/cgm-release-agent.perf_event 0 0
net_prio /sys/fs/cgroup/net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio 0 0
hugetlb /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb 0 0
pids /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids,release_agent=/run/cgmanager/agents/cgm-release-agent.pids 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0
lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039 /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs zfs ro,xattr,noacl 0 0
lxd/images/9276757deba95f3a0c7f0081428c9e369e406763d17784fc58860aed17050fca /var/lib/lxd/images/9276757deba95f3a0c7f0081428c9e369e406763d17784fc58860aed17050fca.zfs zfs ro,xattr,noacl 0 0
lxd/images/cf76d41f6b6698861b1bb5e05d111b05172967a3abbb64f9286f9cfb05cd90f0 /var/lib/lxd/images/cf76d41f6b6698861b1bb5e05d111b05172967a3abbb64f9286f9cfb05cd90f0.zfs zfs ro,xattr,noacl 0 0
lxd/images/f323ce7a65bb1541f9c9281fadc2955f50afb4513814389911487627eec14b5b /var/lib/lxd/images/f323ce7a65bb1541f9c9281fadc2955f50afb4513814389911487627eec14b5b.zfs zfs ro,xattr,noacl 0 0
tmpfs /var/lib/lxd/shmounts tmpfs rw,relatime,size=100k,mode=711 0 0
tmpfs /var/lib/lxd/devlxd tmpfs rw,relatime,size=100k,mode=755 0 0
lxd/containers/wheezy01 /var/lib/lxd/storage-pools/lxd/containers/wheezy01 zfs rw,xattr,noacl 0 0
lxd/containers/xenial01 /var/lib/lxd/storage-pools/lxd/containers/xenial01 zfs rw,xattr,noacl 0 0

brauner added a commit to brauner/lxd that referenced this issue Mar 16, 2017

zfs: try lazy umount if zfs umount fails
Supposedly fixes #2814.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

@stgraber stgraber closed this in #3081 Mar 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment