New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental btrfs driver #3718

Merged
merged 2 commits into from Jan 30, 2014

Conversation

Projects
None yet
8 participants
@alexlarsson
Contributor

alexlarsson commented Jan 22, 2014

This is an experimental btrfs driver. To use it you must have
/var/lib/docker mounted on a btrfs filesystem and explicitly
specify DOCKER_DRIVER=btrfs in the docker daemon environment.

It works by using subvolumes for the docker image/container layers.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson alexl@redhat.com (github: alexlarsson)

@alexlarsson alexlarsson referenced this pull request Jan 22, 2014

Closed

Add btrfs driver #65

@creack

This comment has been minimized.

Show comment
Hide comment
@creack

creack Jan 22, 2014

Contributor

Can you add a dummy _unsupported.go file with // +build !linux to allow cross compilation?

Contributor

creack commented Jan 22, 2014

Can you add a dummy _unsupported.go file with // +build !linux to allow cross compilation?

Add experimenta btrfs driver
This is an experimental btrfs driver. To use it you must have
/var/lib/docker mounted on a btrfs filesystem and explicitly
specify DOCKER_DRIVER=btrfs in the docker daemon environment.

It works by using subvolumes for the docker image/container layers.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 22, 2014

Contributor

New version fixes the above, plus rebases it on master (adding a Put method).

Contributor

alexlarsson commented Jan 22, 2014

New version fixes the above, plus rebases it on master (adding a Put method).

return nil, err
}
if buf.Type != 0x9123683E {

This comment has been minimized.

@djmaze

djmaze Jan 26, 2014

Contributor

Had to change this to uint32(buf.Type) in order to get it compiling and running on a 32bit (armv7l) machine, as the value is returned as a int32 there.

I do not know enough Go to be sure if this is the correct way to solve this. But it yields the correct result. (Tested on 64bit, too.)

@djmaze

djmaze Jan 26, 2014

Contributor

Had to change this to uint32(buf.Type) in order to get it compiling and running on a 32bit (armv7l) machine, as the value is returned as a int32 there.

I do not know enough Go to be sure if this is the correct way to solve this. But it yields the correct result. (Tested on 64bit, too.)

return dir, nil
}
func (d *Driver) Put(id string) {

This comment has been minimized.

@crosbymichael

crosbymichael Jan 27, 2014

Contributor

Can we maybe have a comment here why this is not implemented in btrfs?

@crosbymichael

crosbymichael Jan 27, 2014

Contributor

Can we maybe have a comment here why this is not implemented in btrfs?

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 27, 2014

Contributor

ping @tianon we need to update the dockerfile in order to build this with our standard toolchain

Contributor

crosbymichael commented Jan 27, 2014

ping @tianon we need to update the dockerfile in order to build this with our standard toolchain

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Jan 27, 2014

Member

I'm testing some changes for making this work now, and will open a separate PR with them as soon as I've got them nailed down and properly tested. This is gonna be a big cache bust, but from the sounds of how well this driver works, it should be well worth it. :)

Member

tianon commented Jan 27, 2014

I'm testing some changes for making this work now, and will open a separate PR with them as soon as I've got them nailed down and properly tested. This is gonna be a big cache bust, but from the sounds of how well this driver works, it should be well worth it. :)

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 28, 2014

Contributor

LGTM

Contributor

crosbymichael commented Jan 28, 2014

LGTM

btrfs: Add comment to Put()
Document why we don't need to do anything in Put().

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 28, 2014

Contributor

LGTM

Contributor

unclejack commented Jan 28, 2014

LGTM

@Tranquility

This comment has been minimized.

Show comment
Hide comment
@Tranquility

Tranquility Jan 28, 2014

Contributor

I tried this PR but have some performance issues. docker run ubuntu echo test took between 1.5 and 5 seconds for me.

Contributor

Tranquility commented Jan 28, 2014

I tried this PR but have some performance issues. docker run ubuntu echo test took between 1.5 and 5 seconds for me.

@creack

This comment has been minimized.

Show comment
Hide comment
@creack

creack Jan 28, 2014

Contributor

@Tranquility Could you provide more info on your setup? OS, native file system, kernel version, docker file system, etc.
Did you try btrfs native/btrfs docker? ext4 native/btrfs docker? Or do you see the performances issues with the AUFS driver? or devicemapper?

With btrfs native fs + btrfs docker, I actually see a huge performance improvement.

Contributor

creack commented Jan 28, 2014

@Tranquility Could you provide more info on your setup? OS, native file system, kernel version, docker file system, etc.
Did you try btrfs native/btrfs docker? ext4 native/btrfs docker? Or do you see the performances issues with the AUFS driver? or devicemapper?

With btrfs native fs + btrfs docker, I actually see a huge performance improvement.

@Tranquility

This comment has been minimized.

Show comment
Hide comment
@Tranquility

Tranquility Jan 28, 2014

Contributor

I am using gentoo with kernel 3.12.9 and btrfs on the host.

I now tried all three drivers dm needs about 0.3 sec, aufs between 1 and 2 and btrfs between 2 and 5.

Contributor

Tranquility commented Jan 28, 2014

I am using gentoo with kernel 3.12.9 and btrfs on the host.

I now tried all three drivers dm needs about 0.3 sec, aufs between 1 and 2 and btrfs between 2 and 5.

@creack

This comment has been minimized.

Show comment
Hide comment
@creack
Contributor

creack commented Jan 28, 2014

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 29, 2014

Contributor

@alexlarsson It looks like some files from the init layer are being reported as having changed when using the btrfs driver.

$  for i in {1..15}; do cid=`docker run -d busybox sh -c "echo foo > /root/bar"`; docker diff $cid; done
C /dev
A /dev/kmsg
C /.dockerenv
C /etc/hostname
C /etc/hosts
C /.dockerinit
C /root
A /root/bar
C /dev
A /dev/kmsg
C /root
A /root/bar
C /etc/hostname
C /etc/resolv.conf
C /etc/hosts
C /dev
A /dev/kmsg
C /.dockerenv
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /.dockerenv
C /etc/hosts
C /etc/hostname
C /etc/resolv.conf
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /.dockerenv
C /.dockerinit
C /root
A /root/bar
C /.dockerinit
C /root
A /root/bar
C /.dockerenv
C /dev
A /dev/kmsg
C /dev
A /dev/kmsg
C /root
A /root/bar
C /root
A /root/bar
C /dev
A /dev/kmsg
C /dev
A /dev/kmsg
C /root
A /root/bar
C /dev
A /dev/kmsg
C /.dockerenv
C /.dockerinit
C /root
A /root/bar
C /.dockerenv
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /dev
A /dev/kmsg
C /root
A /root/bar
C /dev
A /dev/kmsg
C /.dockerenv
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /.dockerenv
C /.dockerinit
C /etc/hostname
C /etc/resolv.conf
C /etc/hosts
C /root
A /root/bar
C /dev
A /dev/kmsg
C /root
A /root/bar
Contributor

unclejack commented Jan 29, 2014

@alexlarsson It looks like some files from the init layer are being reported as having changed when using the btrfs driver.

$  for i in {1..15}; do cid=`docker run -d busybox sh -c "echo foo > /root/bar"`; docker diff $cid; done
C /dev
A /dev/kmsg
C /.dockerenv
C /etc/hostname
C /etc/hosts
C /.dockerinit
C /root
A /root/bar
C /dev
A /dev/kmsg
C /root
A /root/bar
C /etc/hostname
C /etc/resolv.conf
C /etc/hosts
C /dev
A /dev/kmsg
C /.dockerenv
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /.dockerenv
C /etc/hosts
C /etc/hostname
C /etc/resolv.conf
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /.dockerenv
C /.dockerinit
C /root
A /root/bar
C /.dockerinit
C /root
A /root/bar
C /.dockerenv
C /dev
A /dev/kmsg
C /dev
A /dev/kmsg
C /root
A /root/bar
C /root
A /root/bar
C /dev
A /dev/kmsg
C /dev
A /dev/kmsg
C /root
A /root/bar
C /dev
A /dev/kmsg
C /.dockerenv
C /.dockerinit
C /root
A /root/bar
C /.dockerenv
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /dev
A /dev/kmsg
C /root
A /root/bar
C /dev
A /dev/kmsg
C /.dockerenv
C /root
A /root/bar
C /.dockerinit
C /dev
A /dev/kmsg
C /.dockerenv
C /.dockerinit
C /etc/hostname
C /etc/resolv.conf
C /etc/hosts
C /root
A /root/bar
C /dev
A /dev/kmsg
C /root
A /root/bar
@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 29, 2014

Contributor

@unclejack Hmm, those are all mountpoints which have something else mounted on top of them inside the container. That mount should not be visible outside of the container namespace though, so i wonder why the change detection is picking it up.

I think we need some debug spew in archive/changes.go to see exactly how the files differ.

Contributor

alexlarsson commented Jan 29, 2014

@unclejack Hmm, those are all mountpoints which have something else mounted on top of them inside the container. That mount should not be visible outside of the container namespace though, so i wonder why the change detection is picking it up.

I think we need some debug spew in archive/changes.go to see exactly how the files differ.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 29, 2014

Contributor

@Tranquility its weird that aufs is taking > 1 second for you too. That seems to imply something other than the btrfs driver being problematic.

Starting a btrfs container is basically equivalent to doing:

[root@localhost test]# btrfs subvolume create subv
Create subvolume './subv'
[root@localhost test]# time btrfs subvolume snapshot subv snaps
Create a snapshot of 'subv' in './snaps'

real    0m0.011s
user    0m0.000s
sys 0m0.004s

Can you try this?

Contributor

alexlarsson commented Jan 29, 2014

@Tranquility its weird that aufs is taking > 1 second for you too. That seems to imply something other than the btrfs driver being problematic.

Starting a btrfs container is basically equivalent to doing:

[root@localhost test]# btrfs subvolume create subv
Create subvolume './subv'
[root@localhost test]# time btrfs subvolume snapshot subv snaps
Create a snapshot of 'subv' in './snaps'

real    0m0.011s
user    0m0.000s
sys 0m0.004s

Can you try this?

@Tranquility

This comment has been minimized.

Show comment
Hide comment
@Tranquility

Tranquility Jan 29, 2014

Contributor

@alexlarsson that's weired

# time btrfs subvolume snapshot subv snaps
Create a snapshot of 'subv' in './snaps'

real    0m0.575s
user    0m0.002s
sys 0m0.007s
# time docker run ubuntu echo test
test

real    0m5.500s
user    0m0.007s
sys 0m0.004s
Contributor

Tranquility commented Jan 29, 2014

@alexlarsson that's weired

# time btrfs subvolume snapshot subv snaps
Create a snapshot of 'subv' in './snaps'

real    0m0.575s
user    0m0.002s
sys 0m0.007s
# time docker run ubuntu echo test
test

real    0m5.500s
user    0m0.007s
sys 0m0.004s
@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 29, 2014

Contributor

@Tranquility I wonder what it is doing during that time, can you see some process using cpu?

Contributor

alexlarsson commented Jan 29, 2014

@Tranquility I wonder what it is doing during that time, can you see some process using cpu?

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 29, 2014

Contributor

@alexlarsson It looks like stat changes quite a bit:

#### name: /.dockerinit
oldstat &syscall.Stat_t{Dev:0xa000c9, Ino:0x348, Nlink:0x1, Mode:0x81ed, Uid:0x0, Gid:0x0, X__pad0:0,
Rdev:0x0, Size:0, Blksize:4096, Blocks:0, Atim:syscall.Timespec{Sec:1390995560, Nsec:41077304},
Mtim:syscall.Timespec{Sec:1390995560, Nsec:41077304}, Ctim:syscall.Timespec{Sec:1390995560,
Nsec:41077304}, X__unused:[3]int64{0, 0, 0}} 

newstat &syscall.Stat_t{Dev:0x23, Ino:0x7e027, Nlink:0x1, Mode:0x81c0, Uid:0x0, Gid:0x0, X__pad0:0,
Rdev:0x0, Size:15050408, Blksize:4096, Blocks:29400, Atim:syscall.Timespec{Sec:1390995550,
Nsec:303077491}, Mtim:syscall.Timespec{Sec:1390995550, Nsec:312077491}, 
Ctim:syscall.Timespec{Sec:1390995550, Nsec:312077491}, X__unused:[3]int64{0, 0, 0}}
Contributor

unclejack commented Jan 29, 2014

@alexlarsson It looks like stat changes quite a bit:

#### name: /.dockerinit
oldstat &syscall.Stat_t{Dev:0xa000c9, Ino:0x348, Nlink:0x1, Mode:0x81ed, Uid:0x0, Gid:0x0, X__pad0:0,
Rdev:0x0, Size:0, Blksize:4096, Blocks:0, Atim:syscall.Timespec{Sec:1390995560, Nsec:41077304},
Mtim:syscall.Timespec{Sec:1390995560, Nsec:41077304}, Ctim:syscall.Timespec{Sec:1390995560,
Nsec:41077304}, X__unused:[3]int64{0, 0, 0}} 

newstat &syscall.Stat_t{Dev:0x23, Ino:0x7e027, Nlink:0x1, Mode:0x81c0, Uid:0x0, Gid:0x0, X__pad0:0,
Rdev:0x0, Size:15050408, Blksize:4096, Blocks:29400, Atim:syscall.Timespec{Sec:1390995550,
Nsec:303077491}, Mtim:syscall.Timespec{Sec:1390995550, Nsec:312077491}, 
Ctim:syscall.Timespec{Sec:1390995550, Nsec:312077491}, X__unused:[3]int64{0, 0, 0}}
@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 29, 2014

Contributor

@unclejack The first is the stat of the the empty .dockerinit in the init layer, the second one is a stat of the bind-mounted actual .dockerinit file.

I thought that the .dockerinit file, etc were just mounted inside the container, but they actually are mounted from Container.Start() and unmounted when the container stops (in monitor()->cleanup()->Unmount()).

It seems to me like this should be a generic problem though? Why is devmapper not hitting this, or is it? The aufs backend is safe because it uses its own implementation of diff.

I think we should unmount all these extra mounts from the host side once the container is up and running, that should fix this issue.

Contributor

alexlarsson commented Jan 29, 2014

@unclejack The first is the stat of the the empty .dockerinit in the init layer, the second one is a stat of the bind-mounted actual .dockerinit file.

I thought that the .dockerinit file, etc were just mounted inside the container, but they actually are mounted from Container.Start() and unmounted when the container stops (in monitor()->cleanup()->Unmount()).

It seems to me like this should be a generic problem though? Why is devmapper not hitting this, or is it? The aufs backend is safe because it uses its own implementation of diff.

I think we should unmount all these extra mounts from the host side once the container is up and running, that should fix this issue.

@Tranquility

This comment has been minimized.

Show comment
Hide comment
@Tranquility

Tranquility Jan 29, 2014

Contributor

@alexlarsson aufs takes < 0.3 sec with 0.7.6 but > 1 sec with this pr merged with master. So maybe I did a mistake building the binary? Is there anything I could have done wrong?

Contributor

Tranquility commented Jan 29, 2014

@alexlarsson aufs takes < 0.3 sec with 0.7.6 but > 1 sec with this pr merged with master. So maybe I did a mistake building the binary? Is there anything I could have done wrong?

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 29, 2014

Contributor

@Tranquility Your system might be having some kind of problem. Nobody could reproduce the problems you're reporting so far. Since AUFS is slower than it is for others as well, I really think your operating system might be running into some kind of problem. That problem isn't related to this PR.

Contributor

unclejack commented Jan 29, 2014

@Tranquility Your system might be having some kind of problem. Nobody could reproduce the problems you're reporting so far. Since AUFS is slower than it is for others as well, I really think your operating system might be running into some kind of problem. That problem isn't related to this PR.

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 29, 2014

Contributor

@alexlarsson Devicemapper doesn't seem to be picking up any changes in the init.

Contributor

unclejack commented Jan 29, 2014

@alexlarsson Devicemapper doesn't seem to be picking up any changes in the init.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 29, 2014

Contributor

@unclejack Weird, it does for me.

Also, i just tried:

docker run -t -i -v /home/default busybox sh
/ # cd /home/default/
/home/default # touch foo

And then docker diff shows me the added "/home/default/foo" file, both on btrfs and devmapper. I don't think this is right, the volume is not part of the container (shouldn't e.g. be comitted), so we should not show it here.

Basically, we need to fix the diff code to avoid traversing into mountpoints, which will also fix this problem. I'll have a look at this, but it is independent of the btrfs backend really.

Contributor

alexlarsson commented Jan 29, 2014

@unclejack Weird, it does for me.

Also, i just tried:

docker run -t -i -v /home/default busybox sh
/ # cd /home/default/
/home/default # touch foo

And then docker diff shows me the added "/home/default/foo" file, both on btrfs and devmapper. I don't think this is right, the volume is not part of the container (shouldn't e.g. be comitted), so we should not show it here.

Basically, we need to fix the diff code to avoid traversing into mountpoints, which will also fix this problem. I'll have a look at this, but it is independent of the btrfs backend really.

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 29, 2014

Contributor

I can confirm that the diff problem is in 0.7.6 as well and the bug is triggered with devicemapper, too.

Contributor

unclejack commented Jan 29, 2014

I can confirm that the diff problem is in 0.7.6 as well and the bug is triggered with devicemapper, too.

@wking

This comment has been minimized.

Show comment
Hide comment
@wking

wking Jan 29, 2014

On Wed, Jan 29, 2014 at 03:02:09PM -0800, unclejack wrote:

I can confirm that the diff problem is in 0.7.6 as well and the bug
is triggered with devicemapper, too.

Is that this diff problem: #3809 (on devicemapper with v0.7.6)?

wking commented Jan 29, 2014

On Wed, Jan 29, 2014 at 03:02:09PM -0800, unclejack wrote:

I can confirm that the diff problem is in 0.7.6 as well and the bug
is triggered with devicemapper, too.

Is that this diff problem: #3809 (on devicemapper with v0.7.6)?

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 30, 2014

Contributor

@wking: It doesn't seem like the same thing, no mounts are involved in that bug.

Contributor

alexlarsson commented Jan 30, 2014

@wking: It doesn't seem like the same thing, no mounts are involved in that bug.

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jan 30, 2014

Contributor

@alexlarsson The problem with diff is a race between diff and container.Unmount.
for i in {1..15}; do cid=docker run -d busybox sh -c "echo foo > /root/bar"; sleep 1; docker diff $cid; done proves that.

This PR should get merged today because it doesn't cause any troubles.

Contributor

unclejack commented Jan 30, 2014

@alexlarsson The problem with diff is a race between diff and container.Unmount.
for i in {1..15}; do cid=docker run -d busybox sh -c "echo foo > /root/bar"; sleep 1; docker diff $cid; done proves that.

This PR should get merged today because it doesn't cause any troubles.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jan 30, 2014

Contributor

@unclejack I'm not sure exactly what that proves. In that case what happens is that the container exits and container.Unmount() gets rid of the extra mounts and the diff reports the right thing.

However, as long as the container runs it will get the extra diff files, as can be seen by:

for i in {1..15}; do cid=`docker run -d busybox sh -c "echo foo > /root/bar; sleep 1"`; docker diff $cid; done 

However, all of this is unrelated to the btrfs backend. I'll open a separate issue for it.

Contributor

alexlarsson commented Jan 30, 2014

@unclejack I'm not sure exactly what that proves. In that case what happens is that the container exits and container.Unmount() gets rid of the extra mounts and the diff reports the right thing.

However, as long as the container runs it will get the extra diff files, as can be seen by:

for i in {1..15}; do cid=`docker run -d busybox sh -c "echo foo > /root/bar; sleep 1"`; docker diff $cid; done 

However, all of this is unrelated to the btrfs backend. I'll open a separate issue for it.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 30, 2014

Contributor

Merging as this is unrelated to btrfs and the issue has been logged.

Contributor

crosbymichael commented Jan 30, 2014

Merging as this is unrelated to btrfs and the issue has been logged.

crosbymichael added a commit that referenced this pull request Jan 30, 2014

@crosbymichael crosbymichael merged commit 637a1dc into moby:master Jan 30, 2014

1 check passed

default The Travis CI build passed
Details

@alexlarsson alexlarsson deleted the alexlarsson:btrfs branch Mar 28, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment