New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement XFS quota for overlay2 #24771

Merged
merged 2 commits into from Oct 17, 2016

Conversation

@amir73il
Contributor

amir73il commented Jul 18, 2016

- What I did

Everybody wants to set disk quota limits on their containers.
Devicemapper got support for --storage-opt size, then zfs
and btrfs.

This pull request adds support for --storage-opt size to overlay2.

The semantics of the size parameter are slightly different then those of devmapper/zfs/btrfs.
For the latter, it means total size limit of composite base image + diff.
For overlay, it means size limit for the diff directory.
I chose to use the same parameter to simplify things a bit.

- How I did it

The overlay2 driver (just as well as overlay and aufs) uses directories
to store the file differences from base image.
XFS supports the project quotas feature, which allows setting a disk quota
for a sub directory and its descendants when the directory is created.
The --storage-opt size option to overlay2 utilizes XFS project quotas to
set a hard limit on the disk usage of a container directory.
The option is therefore applicable only when backing fs is xfs and
only when it was mounted with the 'pquota' mount option to enable
project quota support.

Why only overlay2?
Same support could be added to aufs, but it seems to belong to the past,
so I left it there.
overlay (1) cannot use project quotas, because hardlinks are not allowed across
directories that are assigned to different project quotas.

Why only xfs?
XFS has had project quotas for a long time. Ext4 is gained project quota support
in kernel v4.5 and the quota control API has been generalized to fit
both XFS and Ext4.
When these kernels get to distributions, the specific XFS project quota code can
be amended to serve overlay2 over both xfs and ext4 backing fs.

Known issues

  • Driver ignores the /etc/projid configuration file, so it does not coordinate container project id's with the xfs_quota tool.
  • Project quota entries are not cleared when container is removed.

- How to verify it

# grep xfs /etc/fstab
/dev/mapper/test-xfs /var/lib/docker xfs pquota 0 0

# mount | grep xfs
/dev/mapper/test-xfs on /var/lib/docker type xfs (rw,relatime,attr2,inode64,prjquota)

# dockerd --storage-driver=overlay2 -D&
# docker run --storage-opt size=5g -t -i ubuntu /bin/bash
...
root@3812172791eb:/# df -h .
Filesystem      Size  Used Avail Use% Mounted on
overlay         5.0G   12K  5.0G   1% /

- Description for the changelog

Adds support for --storage-opt size to overlay2 over xfs.

- A picture of a cute animal (not mandatory but encouraged)
cutebunny

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Jul 19, 2016

Wow pretty cool, thanks for working on this!

@amir73il amir73il force-pushed the aquasecurity:xfs_quota branch from d2e1095 to c121274 Jul 19, 2016

@albamc albamc referenced this pull request Jul 19, 2016

Closed

Xfs quota for overlay #24807

@crosbymichael

This comment has been minimized.

Member

crosbymichael commented Aug 12, 2016

Moving to code review, sorry for the delay.

@amir73il can you rebase this on master?

@crosbymichael crosbymichael self-assigned this Aug 12, 2016

@dmcgowan

This comment has been minimized.

Member

dmcgowan commented Aug 12, 2016

Excellent work, this is really cool to see.

From a code layout perspective, I would like to see the xfs specific logic go into a separate file. Ideally we could define an interface for quota management which could apply the quotas on create and be initialized by the storage options. The storage option parsing should be moved to one location, the xfs. prefixed options could be collected together and used to initialize the xfs quota manager.

With the possibility of supported ext4 in the near future, how similar to do you think the interface to apply the quotas will be? Having less xfs-specific logic in the main overlay file should help make adding support for other backing fs in the future cleaner if the usage is going to differ significantly.

@crosbymichael crosbymichael changed the title from Xfs quota to Implement XFS quota for overlay2 Aug 12, 2016

@amir73il

This comment has been minimized.

Contributor

amir73il commented Aug 13, 2016

@dmcgowan I fully agree with the code layout comments. Just wanted to put
the code out there before sorting everything out.
Ext4 project quota feature is going to be the same as xfs project quota.
The kernel ABI will be new and common to both ext4 and xfs, but xfs
specific ioctls will still be needed to support older kernels.
@albamc, can you take the task of moving xfs code to common files and
integrate it with your overlay patch?
I do suggest that you use your inode number approach and drop my projid
assignment code for start.

@amir73il amir73il force-pushed the aquasecurity:xfs_quota branch from c121274 to 83217b1 Sep 18, 2016

@amir73il

This comment has been minimized.

Contributor

amir73il commented Sep 18, 2016

@crosbymichael rebased to master. Sorry for the delay.
@dmcgowan I addressed your code review comments about code layout.
Left a TODO comment for adding ext4/xfs project quota for kernels >= v4.5.
Many thanks to @albamc for factoring out the project quota bits into a separate class
and for making the project id allocator persistent.

@amir73il amir73il force-pushed the aquasecurity:xfs_quota branch 4 times, most recently from db9bc2c to b18464e Sep 18, 2016

return "", err
}
backingFsDev := path.Join(home, "backingFsDev")

This comment has been minimized.

@dmcgowan

dmcgowan Sep 19, 2016

Member

can we make this name something more meaningful

This comment has been minimized.

@amir73il

amir73il Sep 20, 2016

Contributor

How about backingFsBlockDev? that's all it is, an easy handle to perform the set/get quota limit ioctls, which are performed on global fs level, not on directory level.

}
backingFsDev := path.Join(home, "backingFsDev")
syscall.Unlink(backingFsDev)

This comment has been minimized.

@dmcgowan

dmcgowan Sep 19, 2016

Member

Does this need to be removed every time or is this just to guarantee the device node is correct in case the home directory device changes?

This comment has been minimized.

@amir73il

amir73il Sep 20, 2016

Contributor

It is just a guarantee. The way xfs_quota tool does it to get the xfs mount point as argument and deduce the the block device from /proc/mounts. So having the backingFsDev actual node (as a duplicate of the original node (e.g. /dev/dm-1) is just a matter of convenience and less complex code.

return err
}
// enforce container disk project quota
if err := d.quotaCtl.SetQuota(dir, driver.options.quota); err != nil {

This comment has been minimized.

@dmcgowan

dmcgowan Sep 19, 2016

Member

Could protect this block in if driver.options.quota.Size > 0 and enter the outer block if len(storageOpt) > 0

This comment has been minimized.

@amir73il

amir73il Sep 20, 2016

Contributor

I guess you are right. I copied the parseStorageOpt() chunk from btrfs.go which does the same, but I can change it if you think it is important.

This comment has been minimized.

@dmcgowan

dmcgowan Sep 20, 2016

Member

I don't think it is important, but if you do push changes again I wouldn't mind seeing it changed :)

// NewQuotaCtl - initialize project quota support by finding the first project id
// to be used for the next container create
func NewQuotaCtl(basePath string) (*QuotaCtl, error) {

This comment has been minimized.

@dmcgowan

dmcgowan Sep 19, 2016

Member

At what point does this function determine that project quota is not supported. I was able to get it work with pquota option set, but then when I remounted without it there was no error on startup, only an error on container creation.

This comment has been minimized.

@amir73il

amir73il Sep 20, 2016

Contributor

The first thing that NewQuotaCtl() does is to call getMaxProjectID() and the first thing it does is check the project id on the home directory. This test will silently fail and cause projectQuotaSupported = false if XFS does not support project quotas.
This is not an error, because running overlay over xfs without project quota support is perfectly fine. Using the --size option on such a backfs is the error.
I guess I should add a comment about it.

//
// get max project id for next use
//
maxProjectID, err := getMaxProjectID(basePath)

This comment has been minimized.

@dmcgowan

dmcgowan Sep 19, 2016

Member

Ideally we could avoid the directory scanning used by this function when pquota is not supported.

This comment has been minimized.

@amir73il

amir73il Sep 20, 2016

Contributor

We do not get to directory scanning, because the first thing that getMaxProjectID() does is get the project id of the home directory. This test will fail if no pquota support.
If pquota is supported and home directory does not have a project id the check will succeed and return value of 0.
There is also a 'hidden' feature here - project quota can be used to set a quota limit (using xfs_quota tool) on the entire storage driver directory (/var/lib/docker/overlay2), to set it apart from the rest of the system. In that case, the root project id will be used as a "start offset" (e.g. echo docker:999 > /etc/projid) and all containers will be assigned project ids > 1000.
This is a way to prevent xfs_quota management from conflicting with docker.
I did not document this feature because I did not want to complicate. It can be documented later on in advanced documentation.

}
fsx.fsx_projid = C.__u32(projectID)
fsx.fsx_xflags |= C.FS_XFLAG_PROJINHERIT
_, _, errno = syscall.Syscall(syscall.SYS_IOCTL, getDirFd(dir), C.FS_IOC_FSSETXATTR,

This comment has been minimized.

@dmcgowan

dmcgowan Sep 19, 2016

Member

Will this show up as an xattr on the directory?

This comment has been minimized.

@amir73il

amir73il Sep 20, 2016

Contributor

No. This is not an extended attribute. It is an xfs internal inode flag (xflags). You can see them as well as the project id with the command:
xfs_io -c stat

@dmcgowan

This comment has been minimized.

Member

dmcgowan commented Sep 19, 2016

I was able to test this and verify the quotas work as expected. The detection of the feature does not seem to be working. Might be helpful to print the error on quota ctl creation at debug level as well.

@amir73il amir73il force-pushed the aquasecurity:xfs_quota branch from b18464e to 816528d Sep 20, 2016

@amir73il

This comment has been minimized.

Contributor

amir73il commented Sep 20, 2016

@dmcgowan thanks for the review. I addressed most of the comments by adding more documentation in the code. Regrading your statement that "The detection of the feature does not seem to be working", I am not sure what you mean. overlay2 driver automatically detects if pquota is enabled at Init() and prints a debug message with the boolean value of 'projectQuotaSupported'. If projectQuotaSupported is true, then --storage-opt can be used when creating containers. Otherwise (either because not xfs or not pquota), an error is generated on container create:
--storage-opt is supported only for overlay over xfs with 'pquota' mount option

What you described is how it was intended to work. Please let me know if I am missing something.

@dmcgowan

This comment has been minimized.

Member

dmcgowan commented Sep 20, 2016

I did a few more rounds of testing and created a script to help capture the results. Note I am using a loopback device to do the testing and fresh xfs created with default options.

My test script test_24771.sh can be found here https://gist.github.com/dmcgowan/a64d67a6ac067f40598a3ce9473c0823

Without pquota mount option (expected projectQuotaSupported=false)

# ./test_24771.sh t1
meta-data=/dev/loop0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=855, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
/dev/loop0 on /tmp/tmp.8VQNWZEUOc type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
... removed irrelevant output ...
DEBU[0001] [graphdriver] trying provided driver: overlay2 
DEBU[0001] getMaxProjectID(/tmp/tmp.8VQNWZEUOc/overlay2) = 1 
DEBU[0001] backingFs=xfs,  projectQuotaSupported=true   

$ docker run -it --storage-opt "size=30MB" alpine /bin/sh
... removed irrelevant output ...
docker: Error response from daemon: Failed to set quota limit for projid 1 on /tmp/tmp.idgFjJZAS7/overlay2/backingFsBlockDev: function not implemented.

With pquota mount option (works as expected)

# ./test_24771.sh t2 -o pquota
meta-data=/dev/loop0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=855, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
/dev/loop0 on /tmp/tmp.Ovy1vkiAi6 type xfs (rw,relatime,seclabel,attr2,inode64,prjquota)
... removed irrelevant output ...   
DEBU[0001] [graphdriver] trying provided driver: overlay2 
DEBU[0001] getMaxProjectID(/tmp/tmp.Ovy1vkiAi6/overlay2) = 1 
DEBU[0001] backingFs=xfs,  projectQuotaSupported=true

$ docker run -it --storage-opt "size=30MB" alpine /bin/sh
... removed irrelevant output ...
/ # dd if=/dev/zero of=testfile bs=1M count=64
dd: writing 'testfile': No space left on device
31+0 records in
29+1 records out

Kernel 4.7.2-101.fc23.x86_64

@amir73il

This comment has been minimized.

Contributor

amir73il commented Sep 20, 2016

@dmcgowan I see, so I will need to try to set a quota on a temp dir for a correct test of pquota support.
I think I also ran my test after remount and not after a fresh mkfs it did passed, strange...
Thanks for the test script. I will look into it

@dmcgowan

This comment has been minimized.

Member

dmcgowan commented Sep 20, 2016

@amir73il it was happening on non-temp directories prior to making the script, also wasn't sure if using a loopback had an effect. I would also be ok with an overlay2.allowquota options or something like that to explicitly turn it on through the daemon storage options if there is no fast and simple way to detect the flag on Init.

//
var d C.fs_disk_quota_t
d.d_version = C.FS_DQUOT_VERSION
d.d_id = fsx.fsx_projid

This comment has been minimized.

@albamc

albamc Sep 21, 2016

I think fsx.fsx_projid can be 0 because it wasn't initialized.
Am I wrong ?

@amir73il

This comment has been minimized.

Contributor

amir73il commented Sep 21, 2016

@amir73il amir73il force-pushed the aquasecurity:xfs_quota branch from 816528d to 20d1dbf Sep 21, 2016

@dmcgowan

This comment has been minimized.

Member

dmcgowan commented Oct 17, 2016

Fixes build for me (Fedora 23 4.7.6)

@amir73il

This comment has been minimized.

Contributor

amir73il commented Oct 17, 2016

Sent PR #27459

@dnephin

This comment has been minimized.

Member

dnephin commented Oct 18, 2016

This broke manpages generation. The daemon/graphdriver package is (indirectly) imported from cli/command. Previously the client package did not require linux/fs.h to compile.

I think we should add the man page generation as a build to CI to catch these problems. That will be much easier with something like #27359.

For now, can we move daemon/graphdriver/projectquota.go to either daemon/graphdriver/overlay2/ or daemon/graphdriver/quota ?

@dnephin dnephin referenced this pull request Oct 18, 2016

Merged

Move graphdriver quota #27472

@@ -198,8 +198,13 @@ The `-w` lets the command being executed inside directory given, here
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
User cannot pass a size less than the Default BaseFS Size. This option is only

This comment has been minimized.

@ghostplant

ghostplant Nov 18, 2016

Contributor

According to the Doc, I am a little confused.
For xfs + overlay2 quota control, whether container rootfs size is limited to 120G, or the top layer is limited to 120G?

This comment has been minimized.

@amir73il

amir73il Nov 18, 2016

Contributor

Only the top layer.
So only the size of changes from root is limited, including the entire size of all copies up files.

This comment has been minimized.

@ghostplant

ghostplant Nov 19, 2016

Contributor

How about zfs + overlay2?

This comment has been minimized.

@amir73il

amir73il Nov 19, 2016

Contributor

See below: For the overlay2 storage driver, the size option is only available if the backing fs is xfs. You cannot use the size option for xfs + overlay2

This comment has been minimized.

@amir73il

amir73il Nov 19, 2016

Contributor

I meant you cannot use size option for xfs + overlay2 of course.

This comment has been minimized.

@amir73il

amir73il Nov 19, 2016

Contributor

Oef. Autocorrect. zfs + overlay2 size option is not available

This comment has been minimized.

@ghostplant

ghostplant Feb 20, 2017

Contributor

@amir73il Is it needed to consider removing quota settings when the layer is removed?

@ghostplant

This comment has been minimized.

Contributor

ghostplant commented Nov 19, 2016

Hi, have the known issues solved?

@amir73il

This comment has been minimized.

Contributor

amir73il commented Nov 19, 2016

@ghostplant

This comment has been minimized.

Contributor

ghostplant commented Nov 19, 2016

Got it. Thanks!

@rhvgoyal

This comment has been minimized.

Contributor

rhvgoyal commented Jan 9, 2017

This sounds very useful.

I guess something like this might be useful for limiting volume size also.

@cmurf

This comment has been minimized.

cmurf commented Jan 9, 2017

Would Docker be able to use storaged for configuring quotas, if it were available? All the file systems have different quota features and interfaces, it'd be nice to see their overlapping quota support abstracted so Docker can use (project/directory/subvolume) quotas regardless of filesystem.

@amir73il

This comment has been minimized.

Contributor

amir73il commented Feb 20, 2017

@Levovar

This comment has been minimized.

Levovar commented Apr 3, 2017

I would like to have a question regarding the content of this request: am I right that the implemented change only adds overlay2fs size quota capability to the docker create and docker run interfaces?
Or the same option is available even as a dockerd storage-driver option?

I guess not, because the dockerd documentation page still only mentions "override_kernel_check" as the only overlayfs2 specific parameter.
Is it considered to make this option generic, so cluster administrators do not need to separately specify the quota parameter for each and every container creation command? If we consider that the direct docker create/run interfaces are sometimes "hidden" in an orchestrated cluster -e.g. in a K8 cluster-, the only way to impose generic, cluster-wide quotas would be through the configuration of dockerd itself (like with devicemapper).

@amir73il

This comment has been minimized.

Contributor

amir73il commented Apr 3, 2017

@Levovar you are right that this PR only adds storage-opt size to create/run commands.

IMO, it should not be a problem to add a dockerd generic storage-opt default_size so any create/run command that don't specify storage-opt size should use this default size (but I am not volunteering to implement it).

However, this requirement is not specific to overlay2 storage driver.
There is no such feature for btrfs (btrfs.min_space is quite different).
You may think that dm.basesize option is the same as default_size, but it is not.
dm.basesize is more like max_size, because you cannot create any container with
storage size larger then dm.basesize.
So default_size option may be useful for devicemapper as well, for example,
when you want to set the maximum container size to 50G, but set a default quota
for containers that are created without specifying storage size.

@imkin

This comment has been minimized.

Contributor

imkin commented Apr 11, 2017

@amir73il I had asked for something similar here #30491

@imkin

This comment has been minimized.

Contributor

imkin commented Apr 13, 2017

@amir73il I was planning to have a PR and would love to have your eyes on the changes. PR would be to fix #30491 and which fits in line with your comment above.

mbarnes added a commit to mbarnes/container-storage-setup that referenced this pull request Aug 30, 2017

Mount xfs filesystem with pquota option
Necessary to support per-container disk quotas for overlay2 on XFS.

e.g. docker run --storage-opt size=3g -it fedora /bin/bash

See: moby/moby#24771

mbarnes added a commit to mbarnes/container-storage-setup that referenced this pull request Sep 7, 2017

Mount xfs filesystem with pquota option
Necessary to support disk quotas for overlay2 on XFS.

e.g. docker run --storage-opt size=3g -it fedora /bin/bash

See: moby/moby#24771

rh-atomic-bot added a commit to projectatomic/container-storage-setup that referenced this pull request Sep 7, 2017

Mount xfs filesystem with pquota option
Necessary to support disk quotas for overlay2 on XFS.

e.g. docker run --storage-opt size=3g -it fedora /bin/bash

See: moby/moby#24771

Closes: #255
Approved by: rhvgoyal

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

projectquota: utility class for project quota controls
This class implements XFS project quota controls
for setting quota limits on a newly created directory.
It currently supports the legacy XFS specific ioctls.

Using this class, quota limits per container can be set
by directory based storage drivers (e.g. overlay), when
backing storage is XFS mounted with 'pquota' mount option.

TODO: use generic quota control ioctl FS_IOC_FS{GET,SET}XATTR
      for both xfs/ext4 for kernel version >= v4.5

cherry-pick from: moby#24771
This is to support quota on xfs

Signed-off-by: Amir Goldstein <amir73il@aquasec.com>
Signed-off-by: albam.c <albam.c@navercorp.com>
Signed-off-by: Lei Jitang <leijitang@huawei.com>
(cherry picked from commit 52897d1)

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

overlay2: add support for --storage-opt size
Allow passing --storage-opt size=X to docker create/run commands
for the `overlay2` graphriver.

The size option is only available if the backing fs is xfs that is
mounted with the `pquota` mount option.
The user can pass any size less then the backing fs size.

cherry-pick from: moby#24771

Signed-off-by: Amir Goldstein <amir73il@aquasec.com>
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment