New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LXD with ZFS - Containers keep running out of disk space #1412

Closed
egee-irl opened this Issue Feb 3, 2017 · 9 comments

Comments

4 participants
@egee-irl

egee-irl commented Feb 3, 2017

The problem I am having is either a bug in LXD or ZFS, or poor documentation explaining how disk space works inside of containers backed with ZFS volumes.

Use Case:
Host machine has 500gb available and is running 5 containers, each consuming between 1gb to 8gb.
A container attempts to run a process which consumes disk space. Once 7.9gb of space is used, the process stops and the container crashes. Attempting to restart the container fails due to lack of space.

error: chmod /var/lib/lxd/containers/my-container/backup.yaml: no space left on device

Running sudo zfs list shows a list of containers with columns showing the USED disk space and AVAIL disk space. The AVAIL column shows 0 on each container.

None of the containers have any sort of quota and this artificial size of 8gb appears to be set by LXD. The documentation for Ubuntu's ZFS is rather poor and the Oracle documentation appears to be for a different API because not all of the commands work with Ubuntu's ZFS implementation.

If this is not a bug with LXD, there needs to be information somewhere on how to change this.

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Feb 3, 2017

Member

Please post:

sudo zfs get all NAME-OF-POOL/containers/my-container
lxc config show --expanded my-container
Member

stgraber commented Feb 3, 2017

Please post:

sudo zfs get all NAME-OF-POOL/containers/my-container
lxc config show --expanded my-container

@stgraber stgraber added the Incomplete label Feb 3, 2017

@egee-irl

This comment has been minimized.

Show comment
Hide comment
@egee-irl

egee-irl Feb 3, 2017

The issue seems to be that the default zpool size is only 20gb.

sudo zfs get all NAME-OF-POOL/containers/rust

NAME PROPERTY VALUE SOURCE
lxd/containers/rust type filesystem -
lxd/containers/rust creation Sun Jan 29 18:54 2017 -
lxd/containers/rust used 6.99G -
lxd/containers/rust available 0 -
lxd/containers/rust referenced 7.28G -
lxd/containers/rust compressratio 1.41x -
lxd/containers/rust mounted yes -
lxd/containers/rust origin lxd/images/683cdd3938706deeb66f19854ee27ef0c75e2594ce60ad4525357c3ce46f9772@readonly -
lxd/containers/rust quota none default
lxd/containers/rust reservation none default
lxd/containers/rust recordsize 128K default
lxd/containers/rust mountpoint /var/lib/lxd/containers/rust.zfs local
lxd/containers/rust sharenfs off default
lxd/containers/rust checksum on default
lxd/containers/rust compression on inherited from lxd
lxd/containers/rust atime on default
lxd/containers/rust devices on default
lxd/containers/rust exec on default
lxd/containers/rust setuid on default
lxd/containers/rust readonly off default
lxd/containers/rust zoned off default
lxd/containers/rust snapdir hidden default
lxd/containers/rust aclinherit restricted default
lxd/containers/rust canmount on default
lxd/containers/rust xattr on default
lxd/containers/rust copies 1 default
lxd/containers/rust version 5 -
lxd/containers/rust utf8only off -
lxd/containers/rust normalization none -
lxd/containers/rust casesensitivity sensitive -
lxd/containers/rust vscan off default
lxd/containers/rust nbmand off default
lxd/containers/rust sharesmb off default
lxd/containers/rust refquota none default
lxd/containers/rust refreservation none default
lxd/containers/rust primarycache all default
lxd/containers/rust secondarycache all default
lxd/containers/rust usedbysnapshots 0 -
lxd/containers/rust usedbydataset 6.99G -
lxd/containers/rust usedbychildren 0 -
lxd/containers/rust usedbyrefreservation 0 -
lxd/containers/rust logbias latency default
lxd/containers/rust dedup off default
lxd/containers/rust mlslabel none default
lxd/containers/rust sync standard default
lxd/containers/rust refcompressratio 1.43x -
lxd/containers/rust written 6.99G -
lxd/containers/rust logicalused 9.89G -
lxd/containers/rust logicalreferenced 10.5G -
lxd/containers/rust filesystem_limit none default
lxd/containers/rust snapshot_limit none default
lxd/containers/rust filesystem_count none default
lxd/containers/rust snapshot_count none default
lxd/containers/rust snapdev hidden default
lxd/containers/rust acltype off default
lxd/containers/rust context none default
lxd/containers/rust fscontext none default
lxd/containers/rust defcontext none default
lxd/containers/rust rootcontext none default
lxd/containers/rust relatime on temporary
lxd/containers/rust redundant_metadata all default
lxd/containers/rust overlay off default

lxc config show --expanded rust

name: rust
profiles:

  • default
    config:
    image.architecture: amd64
    image.description: ubuntu 16.04 LTS amd64 (release) (20170113)
    image.label: release
    image.os: ubuntu
    image.release: xenial
    image.serial: "20170113"
    image.version: "16.04"
    volatile.base_image: 683cdd3938706deeb66f19854ee27ef0c75e2594ce60ad4525357c3ce46f9772
    volatile.eth0.hwaddr: 00:16:3e:d8:db:a8
    volatile.idmap.base: "0"
    volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
    volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
    volatile.last_state.power: STOPPED
    volatile.root.hwaddr: 00:16:3e:3a:3e:f0
    volatile.root.name: eth1
    devices:
    eth0:
    name: eth0
    nictype: macvlan
    parent: enp1s0
    type: nic
    root:
    path: /
    type: disk
    ephemeral: false

egee-irl commented Feb 3, 2017

The issue seems to be that the default zpool size is only 20gb.

sudo zfs get all NAME-OF-POOL/containers/rust

NAME PROPERTY VALUE SOURCE
lxd/containers/rust type filesystem -
lxd/containers/rust creation Sun Jan 29 18:54 2017 -
lxd/containers/rust used 6.99G -
lxd/containers/rust available 0 -
lxd/containers/rust referenced 7.28G -
lxd/containers/rust compressratio 1.41x -
lxd/containers/rust mounted yes -
lxd/containers/rust origin lxd/images/683cdd3938706deeb66f19854ee27ef0c75e2594ce60ad4525357c3ce46f9772@readonly -
lxd/containers/rust quota none default
lxd/containers/rust reservation none default
lxd/containers/rust recordsize 128K default
lxd/containers/rust mountpoint /var/lib/lxd/containers/rust.zfs local
lxd/containers/rust sharenfs off default
lxd/containers/rust checksum on default
lxd/containers/rust compression on inherited from lxd
lxd/containers/rust atime on default
lxd/containers/rust devices on default
lxd/containers/rust exec on default
lxd/containers/rust setuid on default
lxd/containers/rust readonly off default
lxd/containers/rust zoned off default
lxd/containers/rust snapdir hidden default
lxd/containers/rust aclinherit restricted default
lxd/containers/rust canmount on default
lxd/containers/rust xattr on default
lxd/containers/rust copies 1 default
lxd/containers/rust version 5 -
lxd/containers/rust utf8only off -
lxd/containers/rust normalization none -
lxd/containers/rust casesensitivity sensitive -
lxd/containers/rust vscan off default
lxd/containers/rust nbmand off default
lxd/containers/rust sharesmb off default
lxd/containers/rust refquota none default
lxd/containers/rust refreservation none default
lxd/containers/rust primarycache all default
lxd/containers/rust secondarycache all default
lxd/containers/rust usedbysnapshots 0 -
lxd/containers/rust usedbydataset 6.99G -
lxd/containers/rust usedbychildren 0 -
lxd/containers/rust usedbyrefreservation 0 -
lxd/containers/rust logbias latency default
lxd/containers/rust dedup off default
lxd/containers/rust mlslabel none default
lxd/containers/rust sync standard default
lxd/containers/rust refcompressratio 1.43x -
lxd/containers/rust written 6.99G -
lxd/containers/rust logicalused 9.89G -
lxd/containers/rust logicalreferenced 10.5G -
lxd/containers/rust filesystem_limit none default
lxd/containers/rust snapshot_limit none default
lxd/containers/rust filesystem_count none default
lxd/containers/rust snapshot_count none default
lxd/containers/rust snapdev hidden default
lxd/containers/rust acltype off default
lxd/containers/rust context none default
lxd/containers/rust fscontext none default
lxd/containers/rust defcontext none default
lxd/containers/rust rootcontext none default
lxd/containers/rust relatime on temporary
lxd/containers/rust redundant_metadata all default
lxd/containers/rust overlay off default

lxc config show --expanded rust

name: rust
profiles:

  • default
    config:
    image.architecture: amd64
    image.description: ubuntu 16.04 LTS amd64 (release) (20170113)
    image.label: release
    image.os: ubuntu
    image.release: xenial
    image.serial: "20170113"
    image.version: "16.04"
    volatile.base_image: 683cdd3938706deeb66f19854ee27ef0c75e2594ce60ad4525357c3ce46f9772
    volatile.eth0.hwaddr: 00:16:3e:d8:db:a8
    volatile.idmap.base: "0"
    volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
    volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
    volatile.last_state.power: STOPPED
    volatile.root.hwaddr: 00:16:3e:3a:3e:f0
    volatile.root.name: eth1
    devices:
    eth0:
    name: eth0
    nictype: macvlan
    parent: enp1s0
    type: nic
    root:
    path: /
    type: disk
    ephemeral: false
@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Feb 3, 2017

Member

The default zpool defaults to 20% of the size of the partition behind /var/lib/lxd with a minimum size of 15GB and maximum size of 100GB.

The value that's selected for your system is pretty clearly shown during "lxd init" and only used if you just hit enter and don't specify a large one.

Note that for production environments you should really be using a dedicated partition or disk for ZFS as ZFS performance on file-backed storage is rather poor and should you fill the underlying partition somehow, zfs may crash and cause data loss.

ZFS doesn't make it particularly easy to grow a file backed zpool, the easiest is to temporarily create another backing disk, remove the original one, destroy it, create a new one, sync and remove the temporary one. Something along the lines of (for 50GB):

truncate -s 50G /var/lib/lxd/zfs.tmp
zpool attach POOL-NAME /var/lib/lxd/zfs.img /var/lib/lxd/zfs.tmp
zpool detach POOL-NAME /var/lib/lxd/zfs.img
rm /var/lib/lxd/zfs.img
truncate -s 50G /var/lib/lxd/zfs.img
zpool attach POOL-NAME /var/lib/lxd/zfs.tmp /var/lib/lxd/zfs.img
zpool detach POOL-NAME /var/lib/lxd/zfs.tmp
rm /var/lib/lxd/zfs.tmp

Note that the above hasn't been tested, so I may have forgotten something.

Member

stgraber commented Feb 3, 2017

The default zpool defaults to 20% of the size of the partition behind /var/lib/lxd with a minimum size of 15GB and maximum size of 100GB.

The value that's selected for your system is pretty clearly shown during "lxd init" and only used if you just hit enter and don't specify a large one.

Note that for production environments you should really be using a dedicated partition or disk for ZFS as ZFS performance on file-backed storage is rather poor and should you fill the underlying partition somehow, zfs may crash and cause data loss.

ZFS doesn't make it particularly easy to grow a file backed zpool, the easiest is to temporarily create another backing disk, remove the original one, destroy it, create a new one, sync and remove the temporary one. Something along the lines of (for 50GB):

truncate -s 50G /var/lib/lxd/zfs.tmp
zpool attach POOL-NAME /var/lib/lxd/zfs.img /var/lib/lxd/zfs.tmp
zpool detach POOL-NAME /var/lib/lxd/zfs.img
rm /var/lib/lxd/zfs.img
truncate -s 50G /var/lib/lxd/zfs.img
zpool attach POOL-NAME /var/lib/lxd/zfs.tmp /var/lib/lxd/zfs.img
zpool detach POOL-NAME /var/lib/lxd/zfs.tmp
rm /var/lib/lxd/zfs.tmp

Note that the above hasn't been tested, so I may have forgotten something.

@egee-irl

This comment has been minimized.

Show comment
Hide comment
@egee-irl

egee-irl Feb 3, 2017

I thought it might have been an initial configuration setting. I've only used ext4 previously so I wasn't expecting to run into limits like this.

The solution you posted worked though I ran into a couple issues while trying to detach. It eventually worked and I now have a 50gb disk.

egee-irl commented Feb 3, 2017

I thought it might have been an initial configuration setting. I've only used ext4 previously so I wasn't expecting to run into limits like this.

The solution you posted worked though I ran into a couple issues while trying to detach. It eventually worked and I now have a 50gb disk.

@egee-irl egee-irl closed this Feb 3, 2017

@ossie-git

This comment has been minimized.

Show comment
Hide comment
@ossie-git

ossie-git Feb 28, 2017

When detaching, I get the error: "cannot detach /var/lib/lxd/zfs.img: no valid replicas"

Does this mean it is still replicating? Also, is there some way to know how far along it is in the mirroring process?

ossie-git commented Feb 28, 2017

When detaching, I get the error: "cannot detach /var/lib/lxd/zfs.img: no valid replicas"

Does this mean it is still replicating? Also, is there some way to know how far along it is in the mirroring process?

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Feb 28, 2017

Member

zpool status

Member

stgraber commented Feb 28, 2017

zpool status

@ossie-git

This comment has been minimized.

Show comment
Hide comment
@ossie-git

ossie-git Feb 28, 2017

Thanks. I just got it working. By the way, I think you need to add
zpool set autoexpand=on <name_of_zpool>
as the first command

Also, as an added instruction, you can run:
zpool list
or
zpool get health,free,allocated <name_of_zpool>

to make sure everything worked out properly

ossie-git commented Feb 28, 2017

Thanks. I just got it working. By the way, I think you need to add
zpool set autoexpand=on <name_of_zpool>
as the first command

Also, as an added instruction, you can run:
zpool list
or
zpool get health,free,allocated <name_of_zpool>

to make sure everything worked out properly

@matlink

This comment has been minimized.

Show comment
Hide comment
@matlink

matlink Jun 18, 2017

I didn't run:

zpool set autoexpand=on <name_of_zpool>

before running the @stgraber instructions. Now I got a zfs pool that is not expanded:

$ sudo zpool list
NAME      SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
default  14,9G  14,4G   476M       35G    69%    96%  1.00x  ONLINE  -

Do you know how I can expand it retrospectively?

matlink commented Jun 18, 2017

I didn't run:

zpool set autoexpand=on <name_of_zpool>

before running the @stgraber instructions. Now I got a zfs pool that is not expanded:

$ sudo zpool list
NAME      SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
default  14,9G  14,4G   476M       35G    69%    96%  1.00x  ONLINE  -

Do you know how I can expand it retrospectively?

@matlink

This comment has been minimized.

Show comment
Hide comment
@matlink

matlink Jun 18, 2017

Well, sorry, I found:

sudo zpool online -e default /var/lib/lxd/disks/default.img

That did the trick.

matlink commented Jun 18, 2017

Well, sorry, I found:

sudo zpool online -e default /var/lib/lxd/disks/default.img

That did the trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment