Slow IO performance inside container compared with the host. #21485

Closed
alkmim opened this Issue Mar 24, 2016 · 62 comments

Comments

Projects
None yet
@alkmim

alkmim commented Mar 24, 2016

Output of docker version:

docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

docker info
Containers: 2
Images: 3
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-254:1-4458480-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem:
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.52 GB
 Data Space Total: 107.4 GB
 Data Space Available: 96.38 GB
 Metadata Space Used: 2.081 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.145 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.03.01 (2011-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.6-2-desktop
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
CPUs: 4
Total Memory: 7.722 GiB
Name: gustavo-host
ID: MRRI:5WIP:JOYH:4KZT:BVMU:HMMR:4BL6:6NKP:VM5H:36AN:6LFR:YHK7
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Physical Environment (8GB RAM, Core i5-4590, ext4)

Steps to reproduce the issue:

  1. Install openSuSE 13.2
  2. Install docker
  3. run Iometer benchmark on the host
  4. run Iometer benchmark on a container based on SuSE. A data volume was used as the partition were to execute the benchmark.
  5. Compare results

Describe the results you received:
Below is a table showing the results. For all tests, the performance inside the container was around 40% of the performance of the host.
image

Describe the results you expected:
I expected the performance of the container be closer to the host.

@MHBauer

This comment has been minimized.

Show comment
Hide comment
@MHBauer

MHBauer Mar 24, 2016

Contributor

Is this above the performance degradation described in the docs?

Contributor

MHBauer commented Mar 24, 2016

Is this above the performance degradation described in the docs?

@HackToday

This comment has been minimized.

Show comment
Hide comment
@HackToday

HackToday Mar 25, 2016

Contributor

hi @alkmim it seems you are using loop device not real device, is it ? loop device is slow

Contributor

HackToday commented Mar 25, 2016

hi @alkmim it seems you are using loop device not real device, is it ? loop device is slow

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 26, 2016

Member

@alkmim with "A data volume was used as the partition were to execute the benchmark." do you mean, that a volume is used, e.g. -v /some/path ?

Member

thaJeztah commented Mar 26, 2016

@alkmim with "A data volume was used as the partition were to execute the benchmark." do you mean, that a volume is used, e.g. -v /some/path ?

@alkmim

This comment has been minimized.

Show comment
Hide comment
@alkmim

alkmim Mar 28, 2016

Hi.

@thaJeztah: Yes.
@MHBauer: The docs do not specify how much will be the performance degradation.
@HackToday: Despite I'm not using a real device for the LVM, the benchmarck was executed inside a data volume. According to the docs: "One final point, data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you should to place heavy write workloads on data volumes."

Considering I am using data volume (-v /some/path) to execute the benchmark, a performance degradation of 60% is too much.

alkmim commented Mar 28, 2016

Hi.

@thaJeztah: Yes.
@MHBauer: The docs do not specify how much will be the performance degradation.
@HackToday: Despite I'm not using a real device for the LVM, the benchmarck was executed inside a data volume. According to the docs: "One final point, data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you should to place heavy write workloads on data volumes."

Considering I am using data volume (-v /some/path) to execute the benchmark, a performance degradation of 60% is too much.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 28, 2016

Member

@alkmim what is the backing filesystem that /var/lib/docker is on? I see docker is unable to detect it (Backing Filesystem:)

Member

thaJeztah commented Mar 28, 2016

@alkmim what is the backing filesystem that /var/lib/docker is on? I see docker is unable to detect it (Backing Filesystem:)

@alkmim

This comment has been minimized.

Show comment
Hide comment
@alkmim

alkmim Mar 29, 2016

@thaJeztah: It is an ext4 on an lvm. I'm not sure why docker did not detect it.
Output of mount:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=4039308k,nr_inodes=1009827,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/system-root on / type ext4 (rw,relatime,data=ordered)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)

alkmim commented Mar 29, 2016

@thaJeztah: It is an ext4 on an lvm. I'm not sure why docker did not detect it.
Output of mount:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=4039308k,nr_inodes=1009827,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/system-root on / type ext4 (rw,relatime,data=ordered)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)
@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Apr 11, 2016

Contributor

@alkmim Please provide the exact commands you've used to run the container and the benchmarks.

Contributor

unclejack commented Apr 11, 2016

@alkmim Please provide the exact commands you've used to run the container and the benchmarks.

@alkmim

This comment has been minimized.

Show comment
Hide comment
@alkmim

alkmim Apr 26, 2016

Hello @unclejack

The container was started running: docker run -v /datavolume:/datavolume opensuse /bin/bash

The benchmark used was the iometer.
Command to start server (windows): double-click on the iometer application.
Command to start the target (linux being tested): ./dynamo -i server_ip -m target_ip
A good tutorial about Iometer can be found here: http://greg.porter.name/wiki/HowTo:iometer
Iometer configuration file is attach.

Iometer.zip

alkmim commented Apr 26, 2016

Hello @unclejack

The container was started running: docker run -v /datavolume:/datavolume opensuse /bin/bash

The benchmark used was the iometer.
Command to start server (windows): double-click on the iometer application.
Command to start the target (linux being tested): ./dynamo -i server_ip -m target_ip
A good tutorial about Iometer can be found here: http://greg.porter.name/wiki/HowTo:iometer
Iometer configuration file is attach.

Iometer.zip

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Apr 30, 2016

Contributor

@alkmim You're using devicemapper with loopback mounted block devices. This is known to have poor performance and it's also potentially unsafe.

You've mentioned that bind mounts were being used with -v, but there's no path argument in any of the commands you've mentioned. The config you've provided also seems to mention Target / [ext4] as the path it's going to use for testing. This root directory is found on the devicemapper block device created for the container on the loopback mounted devicemapper block device. Your test was actually testing the performance of this storage, not the bind mounted directory from the host, based on what you've provided as configuration for iometer.

Loopback mounted block devices have poor performance. This is something to be expected. Docker makes use of the exact same loopback mounted block devices by default for devicemapper. There's nothing Docker itself can do to gain back the loss in performance when using the loopback mounted storage over using the host's disk directly.

You might want to take a look at the official docs on storage drivers to figure out how to get a setup which uses devicemapper on real block devices. You might also want to try to make use of the bind mounted storage for benchmarks.

There's nothing more to investigate for this issue. I'll close it now. Please feel free to comment.

Contributor

unclejack commented Apr 30, 2016

@alkmim You're using devicemapper with loopback mounted block devices. This is known to have poor performance and it's also potentially unsafe.

You've mentioned that bind mounts were being used with -v, but there's no path argument in any of the commands you've mentioned. The config you've provided also seems to mention Target / [ext4] as the path it's going to use for testing. This root directory is found on the devicemapper block device created for the container on the loopback mounted devicemapper block device. Your test was actually testing the performance of this storage, not the bind mounted directory from the host, based on what you've provided as configuration for iometer.

Loopback mounted block devices have poor performance. This is something to be expected. Docker makes use of the exact same loopback mounted block devices by default for devicemapper. There's nothing Docker itself can do to gain back the loss in performance when using the loopback mounted storage over using the host's disk directly.

You might want to take a look at the official docs on storage drivers to figure out how to get a setup which uses devicemapper on real block devices. You might also want to try to make use of the bind mounted storage for benchmarks.

There's nothing more to investigate for this issue. I'll close it now. Please feel free to comment.

@unclejack unclejack closed this Apr 30, 2016

@alkmim

This comment has been minimized.

Show comment
Hide comment
@alkmim

alkmim May 2, 2016

@unclejack: I posted the wrong command and forgot to add the configuration file for the tests on the container. The attachment had just the configuration file for the execution on the host. This is the reason why you saw the tests running on the "/". I'm sorry for this mistake.

I did use the "-v" option followed by the path. The container was started running:
docker run -v /datavolume:/build opensuse /bin/bash

The configuration file I sent was for the test on the host. I forgot to send you the configuration file for the test on the container. I'm sorry for this. Attached are both configuration files. I just changed the "Test Description" field to protect confidential information.

Sorry for the confusion. Could you please re-open the issue?

IometerConfFiles.zip

alkmim commented May 2, 2016

@unclejack: I posted the wrong command and forgot to add the configuration file for the tests on the container. The attachment had just the configuration file for the execution on the host. This is the reason why you saw the tests running on the "/". I'm sorry for this mistake.

I did use the "-v" option followed by the path. The container was started running:
docker run -v /datavolume:/build opensuse /bin/bash

The configuration file I sent was for the test on the host. I forgot to send you the configuration file for the test on the container. I'm sorry for this. Attached are both configuration files. I just changed the "Test Description" field to protect confidential information.

Sorry for the confusion. Could you please re-open the issue?

IometerConfFiles.zip

@unclejack unclejack reopened this May 2, 2016

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 11, 2016

I am also noticing that when my containers writes data on my host with "docker run -v /workspace:/home/workspace ...", etc. it is so much slower than when I run my services directly on my host.

Is this bug being fixed on docker already?

I am also noticing that when my containers writes data on my host with "docker run -v /workspace:/home/workspace ...", etc. it is so much slower than when I run my services directly on my host.

Is this bug being fixed on docker already?

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

This does not happen only on docker version 1.9.1 as the original author has posted, it's on 1.11.1 as well... It's still very easy to replicate the results on my environment and to verify that the performance can be improved on docker vs host. Here is the information from my environment to compare this:

  1. Docker's Info:
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: btrfs
 Build Version: Btrfs v3.16.2+20141003
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.16.7-35-default
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 7.814 GiB
ID: PVBJ:ORUD:ERVF:CO3L:GBQU:LQQ3:J6YL:KWKF:E353:7E3N:RLQE:2NTV
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
  1. Running the test 3 times on my host:
# time dd if=/dev/zero of=/workspace/test_host1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.02334 s, 127 kB/s

real    0m4.025s
user    0m0.000s
sys     0m0.588s
# time dd if=/dev/zero of=/workspace/test_host2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.15106 s, 123 kB/s

real    0m4.153s
user    0m0.008s
sys     0m0.140s
# time dd if=/dev/zero of=/workspace/test_host3.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.67524 s, 139 kB/s

real    0m3.677s
user    0m0.008s
sys     0m0.140s
  1. Running the test 3 times on my container:
# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 18.7555 s, 27.3 kB/s

real    0m18.761s
user    0m0.000s
sys     0m0.096s
# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.9839 s, 28.5 kB/s

real    0m17.985s
user    0m0.004s
sys     0m0.092s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container3.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.4889 s, 29.3 kB/s

real    0m17.490s
user    0m0.024s
sys     0m0.060s
  1. List of files created on my workspace
# ls -l
total 2102232
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container1.img
-rw-r--r-- 1 root root     512000 May 17 09:25 test_container2.img
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container3.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host1.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host2.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host3.img

ipeoshir commented May 17, 2016

This does not happen only on docker version 1.9.1 as the original author has posted, it's on 1.11.1 as well... It's still very easy to replicate the results on my environment and to verify that the performance can be improved on docker vs host. Here is the information from my environment to compare this:

  1. Docker's Info:
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: btrfs
 Build Version: Btrfs v3.16.2+20141003
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.16.7-35-default
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 7.814 GiB
ID: PVBJ:ORUD:ERVF:CO3L:GBQU:LQQ3:J6YL:KWKF:E353:7E3N:RLQE:2NTV
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
  1. Running the test 3 times on my host:
# time dd if=/dev/zero of=/workspace/test_host1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.02334 s, 127 kB/s

real    0m4.025s
user    0m0.000s
sys     0m0.588s
# time dd if=/dev/zero of=/workspace/test_host2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.15106 s, 123 kB/s

real    0m4.153s
user    0m0.008s
sys     0m0.140s
# time dd if=/dev/zero of=/workspace/test_host3.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.67524 s, 139 kB/s

real    0m3.677s
user    0m0.008s
sys     0m0.140s
  1. Running the test 3 times on my container:
# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 18.7555 s, 27.3 kB/s

real    0m18.761s
user    0m0.000s
sys     0m0.096s
# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.9839 s, 28.5 kB/s

real    0m17.985s
user    0m0.004s
sys     0m0.092s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container3.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.4889 s, 29.3 kB/s

real    0m17.490s
user    0m0.024s
sys     0m0.060s
  1. List of files created on my workspace
# ls -l
total 2102232
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container1.img
-rw-r--r-- 1 root root     512000 May 17 09:25 test_container2.img
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container3.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host1.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host2.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host3.img
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

My kernel is: 3.16.7-35
My docker is: 1.11.1-107.1

As from the tests above, the difference between 100kB/s to 30kB/s impacts my applications when I use docker containers. I have installed all the latest components available to test this.

My kernel is: 3.16.7-35
My docker is: 1.11.1-107.1

As from the tests above, the difference between 100kB/s to 30kB/s impacts my applications when I use docker containers. I have installed all the latest components available to test this.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 17, 2016

Member

I don't see these differences (although timing between runs can differ quite a bit);

On the host;

[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.11208 s, 460 kB/s

real    0m1.114s
user    0m0.000s
sys 0m0.075s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.05731 s, 484 kB/s

real    0m1.059s
user    0m0.003s
sys 0m0.071s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.743486 s, 689 kB/s

real    0m0.745s
user    0m0.005s
sys 0m0.048s

In a container (I added "read-only", to verify that nothing is written to the container's filesystem);

[root@fedora-2gb-ams3-01 ~]# docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash


bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.789156 s, 649 kB/s

real    0m0.790s
user    0m0.000s
sys 0m0.048s
bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.901782 s, 568 kB/s

real    0m0.903s
user    0m0.006s
sys 0m0.048s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.04446 s, 490 kB/s

real    0m1.047s
user    0m0.002s
sys 0m0.071s
Member

thaJeztah commented May 17, 2016

I don't see these differences (although timing between runs can differ quite a bit);

On the host;

[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.11208 s, 460 kB/s

real    0m1.114s
user    0m0.000s
sys 0m0.075s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.05731 s, 484 kB/s

real    0m1.059s
user    0m0.003s
sys 0m0.071s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.743486 s, 689 kB/s

real    0m0.745s
user    0m0.005s
sys 0m0.048s

In a container (I added "read-only", to verify that nothing is written to the container's filesystem);

[root@fedora-2gb-ams3-01 ~]# docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash


bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.789156 s, 649 kB/s

real    0m0.790s
user    0m0.000s
sys 0m0.048s
bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.901782 s, 568 kB/s

real    0m0.903s
user    0m0.006s
sys 0m0.048s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.04446 s, 490 kB/s

real    0m1.047s
user    0m0.002s
sys 0m0.071s
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

Running exactly like you did it still slow:

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.56118 s, 144 kB/s

real 0m3.562s
user 0m0.000s
sys 0m0.140s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.26732 s, 157 kB/s

real 0m3.268s
user 0m0.000s
sys 0m0.148s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.57667 s, 112 kB/s

real 0m4.578s
user 0m0.000s
sys 0m0.152s

docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 19.3217 s, 26.5 kB/s

real 0m19.324s
user 0m0.000s
sys 0m0.104s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.2857 s, 29.6 kB/s

real 0m17.287s
user 0m0.000s
sys 0m0.092s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.348 s, 29.5 kB/s

real 0m17.349s
user 0m0.000s
sys 0m0.092s

Running exactly like you did it still slow:

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.56118 s, 144 kB/s

real 0m3.562s
user 0m0.000s
sys 0m0.140s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.26732 s, 157 kB/s

real 0m3.268s
user 0m0.000s
sys 0m0.148s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.57667 s, 112 kB/s

real 0m4.578s
user 0m0.000s
sys 0m0.152s

docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 19.3217 s, 26.5 kB/s

real 0m19.324s
user 0m0.000s
sys 0m0.104s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.2857 s, 29.6 kB/s

real 0m17.287s
user 0m0.000s
sys 0m0.092s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.348 s, 29.5 kB/s

real 0m17.349s
user 0m0.000s
sys 0m0.092s

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

If I mount /workspace on a RAM (tmpfs) or use an SSD storage it gets better, but its expensive to have this for every application. What hardware/infrasctructure are you validating this?

If I mount /workspace on a RAM (tmpfs) or use an SSD storage it gets better, but its expensive to have this for every application. What hardware/infrasctructure are you validating this?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 17, 2016

Member

This is simply a fresh install of docker on a DigitalOcean droplet, nothing fancy; Fedora 23 (was preparing to reproduce another issue), 2 GB Memory / 40 GB Disk

Member

thaJeztah commented May 17, 2016

This is simply a fresh install of docker on a DigitalOcean droplet, nothing fancy; Fedora 23 (was preparing to reproduce another issue), 2 GB Memory / 40 GB Disk

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

That's why I mentioned that using RAM or SSD might get a different results. If I look at the plans offered by DigitalOceans, they are selling SSD cloud server plans...

Currently I am running virtual machines with regular disks and physical ones all running on openSUSE.

That's why I mentioned that using RAM or SSD might get a different results. If I look at the plans offered by DigitalOceans, they are selling SSD cloud server plans...

Currently I am running virtual machines with regular disks and physical ones all running on openSUSE.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 17, 2016

Member

But SSD or not; it's about the difference between in a container and outside.

@cyphar can you reproduce this on OpenSUSE?

Member

thaJeztah commented May 17, 2016

But SSD or not; it's about the difference between in a container and outside.

@cyphar can you reproduce this on OpenSUSE?

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

That's why I am worried... Even if I setup it fine on my environment, its not like I could ask everybody to adopt this SSD disk solution in order to have the same hardware as I recommend... I am investigating this on Ubuntu as well just to make sure its not openSUSE's specific... But it also happens on SLES 12.

That's why I am worried... Even if I setup it fine on my environment, its not like I could ask everybody to adopt this SSD disk solution in order to have the same hardware as I recommend... I am investigating this on Ubuntu as well just to make sure its not openSUSE's specific... But it also happens on SLES 12.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

So, on Ubuntu I got results closer to yours, @thaJeztah, thanks for looking... There was not much impact when running on a host vs a container.

Unfortunately there is no aufs driver on docker for openSUSE/SLES... Looks like something related with the storage driver, either btrfs I that am experiencing and devicemapper that @alkmim posted when this issue was reported.

Here is another VM (Ubuntu 14-04LTS) I have set on the same hardware as the openSUSE 13.2's:

  1. Environment
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 2
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.19.0-25-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.954 GiB
ID: KVRL:JOHJ:FJLT:EWZ5:5QSK:LO3E:C52E:ROIX:GGPN:OGO2:PXEB:JWC6
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
  1. Tests
mkdir -p /docker_workspace
rm -f /docker_workspace/*

time dd if=/dev/zero of=/docker_workspace/test_host1.img bs=512 count=1000 oflag=dsync
test1: 512000 bytes (512 kB) copied, 5.3491 s, 95.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.01828 s, 63.9 kB/s
test3: 512000 bytes (512 kB) copied, 8.3278 s, 61.5 kB/s

docker run --rm --net=host --read-only -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container1.img bs=512 count=1000 oflag=dsync"
test1: 512000 bytes (512 kB) copied, 7.24514 s, 70.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.99948 s, 56.9 kB/s
test3: 512000 bytes (512 kB) copied, 6.52957 s, 78.4 kB/s

ipeoshir commented May 17, 2016

So, on Ubuntu I got results closer to yours, @thaJeztah, thanks for looking... There was not much impact when running on a host vs a container.

Unfortunately there is no aufs driver on docker for openSUSE/SLES... Looks like something related with the storage driver, either btrfs I that am experiencing and devicemapper that @alkmim posted when this issue was reported.

Here is another VM (Ubuntu 14-04LTS) I have set on the same hardware as the openSUSE 13.2's:

  1. Environment
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 2
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.19.0-25-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.954 GiB
ID: KVRL:JOHJ:FJLT:EWZ5:5QSK:LO3E:C52E:ROIX:GGPN:OGO2:PXEB:JWC6
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
  1. Tests
mkdir -p /docker_workspace
rm -f /docker_workspace/*

time dd if=/dev/zero of=/docker_workspace/test_host1.img bs=512 count=1000 oflag=dsync
test1: 512000 bytes (512 kB) copied, 5.3491 s, 95.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.01828 s, 63.9 kB/s
test3: 512000 bytes (512 kB) copied, 8.3278 s, 61.5 kB/s

docker run --rm --net=host --read-only -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container1.img bs=512 count=1000 oflag=dsync"
test1: 512000 bytes (512 kB) copied, 7.24514 s, 70.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.99948 s, 56.9 kB/s
test3: 512000 bytes (512 kB) copied, 6.52957 s, 78.4 kB/s

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 17, 2016

Member

There was not much impact when running on a host vs a container.

Basically, there should be no impact; when using a bind-mounted directory, or a volume, there's nothing between the process and the disk, it's just a mounted directory. The only thing that docker can do, is set a constraint (but these are disabled by default), such as;

--device-read-bps=[]          Limit read rate (bytes per second) from a device (e.g., --device-read-bps=/dev/sda:1mb)
--device-read-iops=[]         Limit read rate (IO per second) from a device (e.g., --device-read-iops=/dev/sda:1000)
--device-write-bps=[]         Limit write rate (bytes per second) to a device (e.g., --device-write-bps=/dev/sda:1mb)
--device-write-iops=[]        Limit write rate (IO per second) to a device (e.g., --device-write-bps=/dev/sda:1000) 
Member

thaJeztah commented May 17, 2016

There was not much impact when running on a host vs a container.

Basically, there should be no impact; when using a bind-mounted directory, or a volume, there's nothing between the process and the disk, it's just a mounted directory. The only thing that docker can do, is set a constraint (but these are disabled by default), such as;

--device-read-bps=[]          Limit read rate (bytes per second) from a device (e.g., --device-read-bps=/dev/sda:1mb)
--device-read-iops=[]         Limit read rate (IO per second) from a device (e.g., --device-read-iops=/dev/sda:1000)
--device-write-bps=[]         Limit write rate (bytes per second) to a device (e.g., --device-write-bps=/dev/sda:1mb)
--device-write-iops=[]        Limit write rate (IO per second) to a device (e.g., --device-write-bps=/dev/sda:1000) 
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

There is nothing special with the directory I am using as a volume to share, its just a regular folder on the host. The only difference I could think of was the storage-driver that docker was using. So I am clueless then how to get this fixed on SuSE. I tried using the constrains, but there was no difference on the transfer rates:

  1. container running on opensuse 13.2's host:
# docker run --rm --net=host --device-read-bps=/dev/sda:1mb --device-write-bps=/dev/sda:1mb --device-read-iops=/dev/sda:1000 -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.6024 s, 30.8 kB/s

real    0m16.604s
user    0m0.032s
sys     0m0.060s
  1. opensuse 13.2's host:
# time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync                                                                                         1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.17687 s, 235 kB/s

real    0m2.179s
user    0m0.000s
sys     0m0.128s

Changing "--device-write-bps", etc. makes it hang, so I could not test this one.

There is nothing special with the directory I am using as a volume to share, its just a regular folder on the host. The only difference I could think of was the storage-driver that docker was using. So I am clueless then how to get this fixed on SuSE. I tried using the constrains, but there was no difference on the transfer rates:

  1. container running on opensuse 13.2's host:
# docker run --rm --net=host --device-read-bps=/dev/sda:1mb --device-write-bps=/dev/sda:1mb --device-read-iops=/dev/sda:1000 -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.6024 s, 30.8 kB/s

real    0m16.604s
user    0m0.032s
sys     0m0.060s
  1. opensuse 13.2's host:
# time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync                                                                                         1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.17687 s, 235 kB/s

real    0m2.179s
user    0m0.000s
sys     0m0.128s

Changing "--device-write-bps", etc. makes it hang, so I could not test this one.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 17, 2016

Mounting the shared folder (/workspace) on RAM makes it run instantaneously, but then again, it's an expensive resource like SSD. I'm still having performance issues when running applications on HDD storages... What puzzles me is how come docker, as just like a regular process writing on my folder could be so much slower?

# mount -t tmpfs -o size=1G tmpfs /workspace
# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync                                                                                                                   1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00166943 s, 307 MB/s

real    0m0.004s
user    0m0.000s
sys     0m0.000s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00108018 s, 474 MB/s

real    0m0.002s
user    0m0.000s
sys     0m0.000s

# umount /workspace

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.4395 s, 31.1 kB/s

real    0m16.445s
user    0m0.000s
sys     0m0.128s

# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.10887 s, 243 kB/s

real    0m2.111s
user    0m0.000s
sys     0m0.128s

Mounting the shared folder (/workspace) on RAM makes it run instantaneously, but then again, it's an expensive resource like SSD. I'm still having performance issues when running applications on HDD storages... What puzzles me is how come docker, as just like a regular process writing on my folder could be so much slower?

# mount -t tmpfs -o size=1G tmpfs /workspace
# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync                                                                                                                   1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00166943 s, 307 MB/s

real    0m0.004s
user    0m0.000s
sys     0m0.000s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00108018 s, 474 MB/s

real    0m0.002s
user    0m0.000s
sys     0m0.000s

# umount /workspace

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.4395 s, 31.1 kB/s

real    0m16.445s
user    0m0.000s
sys     0m0.128s

# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.10887 s, 243 kB/s

real    0m2.111s
user    0m0.000s
sys     0m0.128s
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 18, 2016

Contributor

@ipeoshir I can't reproduce this on Tumbleweed. I believe it's probably a kernel issue, since Docker doesn't do anything special with bindmounts. Can you try to reproduce this on an openSUSE distribution with a newer kernel (for example openSUSE Leap or Tumbleweed)? openSUSE 13.2 has very old packages.

% uname -a
Linux gondor 4.5.3-1-default #1 SMP PREEMPT Thu May 5 05:03:39 UTC 2016 (d29747f) x86_64 x86_64 x86_64 GNU/Linux
% lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE project
Description:    openSUSE Tumbleweed (20160514) (x86_64)
Release:        20160514
Codename:       n/a
% docker run --rm --read-only -it --net=host -v /workspace:/workspace opensuse/amd64:tumbleweed sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.06674 s, 167 kB/s

real    0m3.068s
user    0m0.000s
sys     0m0.136s
% sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.07103 s, 167 kB/s

real    0m3.072s
user    0m0.000s
sys     0m0.164s
Contributor

cyphar commented May 18, 2016

@ipeoshir I can't reproduce this on Tumbleweed. I believe it's probably a kernel issue, since Docker doesn't do anything special with bindmounts. Can you try to reproduce this on an openSUSE distribution with a newer kernel (for example openSUSE Leap or Tumbleweed)? openSUSE 13.2 has very old packages.

% uname -a
Linux gondor 4.5.3-1-default #1 SMP PREEMPT Thu May 5 05:03:39 UTC 2016 (d29747f) x86_64 x86_64 x86_64 GNU/Linux
% lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE project
Description:    openSUSE Tumbleweed (20160514) (x86_64)
Release:        20160514
Codename:       n/a
% docker run --rm --read-only -it --net=host -v /workspace:/workspace opensuse/amd64:tumbleweed sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.06674 s, 167 kB/s

real    0m3.068s
user    0m0.000s
sys     0m0.136s
% sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.07103 s, 167 kB/s

real    0m3.072s
user    0m0.000s
sys     0m0.164s
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 18, 2016

I found a way to replicate this consistently. It's related to how docker interacts with journal filesystems (ext3, ext4). I will post the steps here to make it easier to investigate, but basically you need to attach a disk to your host, format it ext3/ext4, mount it and use it as the directory to write your data... I formatted this disk with multiple filesystems (btrfs, xfs, ext2, ext3, ext4, etc.) and the difference was huge.

I found a way to replicate this consistently. It's related to how docker interacts with journal filesystems (ext3, ext4). I will post the steps here to make it easier to investigate, but basically you need to attach a disk to your host, format it ext3/ext4, mount it and use it as the directory to write your data... I formatted this disk with multiple filesystems (btrfs, xfs, ext2, ext3, ext4, etc.) and the difference was huge.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 18, 2016

Here are the testings. I attached an external disk with 1GB size for this.

  1. ext3, data=ordered
echo y | mkfs.ext3 /dev/sdb
mount /dev/sdb /workspace
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 31.8 kB/s on docker and 293 kB/s on host

  1. ext3, data=journal
echo y | mkfs.ext3 /dev/sdb
mount /dev/sdb /workspace -o data=journal
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 406 kB/s on docker and 409 kB/s on host

  1. ext2
echo y | mkfs.ext2 /dev/sdb
mount /dev/sdb /workspace
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 757 kB/s on docker and 722 kB/s on host

  1. btrfs
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /workspace
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 330 kB/s on docker and 317 kB/s on host

Here are the testings. I attached an external disk with 1GB size for this.

  1. ext3, data=ordered
echo y | mkfs.ext3 /dev/sdb
mount /dev/sdb /workspace
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 31.8 kB/s on docker and 293 kB/s on host

  1. ext3, data=journal
echo y | mkfs.ext3 /dev/sdb
mount /dev/sdb /workspace -o data=journal
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 406 kB/s on docker and 409 kB/s on host

  1. ext2
echo y | mkfs.ext2 /dev/sdb
mount /dev/sdb /workspace
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 757 kB/s on docker and 722 kB/s on host

  1. btrfs
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /workspace
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace

Results: 330 kB/s on docker and 317 kB/s on host

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 18, 2016

Member

@ipeoshir nice find! At least happy that part of the mystery is resolved

Member

thaJeztah commented May 18, 2016

@ipeoshir nice find! At least happy that part of the mystery is resolved

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 18, 2016

Member

Some quick search; https://www.redhat.com/archives/ext3-users/2011-April/msg00001.html (http://www.ibm.com/developerworks/library/l-fs8/index.html)

Theoretically, data=journal mode is the slowest journaling mode of all, since data gets written to disk twice rather than once. However, it turns out that in certain situations, data=journal mode can be blazingly fast.
....
The results were astounding. data=journal mode allowed the 16-meg-file to be read from 9 to over 13 times faster than other ext3 modes, ReiserFS, and even ext2 (which has no journaling overhead):

Member

thaJeztah commented May 18, 2016

Some quick search; https://www.redhat.com/archives/ext3-users/2011-April/msg00001.html (http://www.ibm.com/developerworks/library/l-fs8/index.html)

Theoretically, data=journal mode is the slowest journaling mode of all, since data gets written to disk twice rather than once. However, it turns out that in certain situations, data=journal mode can be blazingly fast.
....
The results were astounding. data=journal mode allowed the 16-meg-file to be read from 9 to over 13 times faster than other ext3 modes, ReiserFS, and even ext2 (which has no journaling overhead):

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 18, 2016

That is a very interesting article, I was reading it yesterday.

I am hinting openSUSE's on my tests as an example, but my target is SLES12. I can replicate the scenario there as well. Unfortunately the kernel is not the latest. Overall my application is slower when writing data when running on docker. This "dd" command was the easiest example I could find to share, but there are many other things running, its not going to help posting them here... My hope is that once we find the root cause of this difference other performance issues will get resolved as well.

Some other things I was trying was to change the filesystem formats where "/var/lib/docker" is mounted, but that does not make a difference, so its probably not related to storage-driver, but more like I/O operations inside docker when journal is enabled. ext2, xfs, btrs, etc. had no impact because of that mechanics explained on the article...

On Ubuntu 14-04LTS, there is not much impact, the kernel is a little bit newer though: 3.19.0-25.

My SLES12 environment is:

uname -a
Linux 3.12.51-52.39-default #1 SMP Fri Jan 15 20:03:12 UTC 2016 (16f5bac) x86_64 x86_64 x86_64 GNU/Linux
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-254:0-768197-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 122 MB
 Data Space Total: 107.4 GB
 Data Space Available: 44.78 GB
 Metadata Space Used: 634.9 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.03.01 (2011-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.12.51-52.39-default
Operating System: SUSE Linux Enterprise Server 12
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 49.24 GiB

That is a very interesting article, I was reading it yesterday.

I am hinting openSUSE's on my tests as an example, but my target is SLES12. I can replicate the scenario there as well. Unfortunately the kernel is not the latest. Overall my application is slower when writing data when running on docker. This "dd" command was the easiest example I could find to share, but there are many other things running, its not going to help posting them here... My hope is that once we find the root cause of this difference other performance issues will get resolved as well.

Some other things I was trying was to change the filesystem formats where "/var/lib/docker" is mounted, but that does not make a difference, so its probably not related to storage-driver, but more like I/O operations inside docker when journal is enabled. ext2, xfs, btrs, etc. had no impact because of that mechanics explained on the article...

On Ubuntu 14-04LTS, there is not much impact, the kernel is a little bit newer though: 3.19.0-25.

My SLES12 environment is:

uname -a
Linux 3.12.51-52.39-default #1 SMP Fri Jan 15 20:03:12 UTC 2016 (16f5bac) x86_64 x86_64 x86_64 GNU/Linux
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-254:0-768197-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 122 MB
 Data Space Total: 107.4 GB
 Data Space Available: 44.78 GB
 Metadata Space Used: 634.9 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.03.01 (2011-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.12.51-52.39-default
Operating System: SUSE Linux Enterprise Server 12
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 49.24 GiB
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 18, 2016

Member

The article describes that without journal, operations can be slow if simultaneous "read" and "writes" take place. So wondering if there's a "read" here. You could try using --log-driver=none, but I don't think that'll make a difference (container isn't logging anything really)

Member

thaJeztah commented May 18, 2016

The article describes that without journal, operations can be slow if simultaneous "read" and "writes" take place. So wondering if there's a "read" here. You could try using --log-driver=none, but I don't think that'll make a difference (container isn't logging anything really)

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 18, 2016

Disabling net and log-driver, its 31.1 kB/s against 273 kB/s. In seconds its 16.4585 s against 1.87827 s...

SLES12:

SLES12:~ # echo y | mkfs.ext3 /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
mke2fs 1.42.11 (09-Jul-2014)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 09c47793-c06d-435a-a50b-5c8147916943
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

SLES12:~ # mount /dev/sdb /workspace
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.4585 s, 31.1 kB/s

real    0m16.461s
user    0m0.000s
sys     0m0.080s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.87827 s, 273 kB/s

real    0m1.880s
user    0m0.008s
sys     0m0.040s
SLES12:~ # umount /workspace
SLES12:~ # echo y | mkfs.ext3 /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
mke2fs 1.42.11 (09-Jul-2014)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 65aa9c1f-453e-48cf-bfc2-b21cd2d5a7ad
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

SLES12:~ # mount /dev/sdb /workspace -o data=journal
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.24319 s, 412 kB/s

real    0m1.246s
user    0m0.008s
sys     0m0.016s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.40873 s, 363 kB/s

real    0m1.411s
user    0m0.000s
sys     0m0.024s
SLES12:~ # umount /workspace
SLES12:~ # echo y | mkfs.ext2 /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
mke2fs 1.42.11 (09-Jul-2014)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 25b1f7b3-3c1d-489d-84a6-df260d942842
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

SLES12:~ # mount /dev/sdb /workspace
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.789994 s, 648 kB/s

real    0m0.793s
user    0m0.000s
sys     0m0.048s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.670622 s, 763 kB/s

real    0m0.673s
user    0m0.004s
sys     0m0.036s
SLES12:~ # umount /workspace
SLES12:~ # mkfs.btrfs -f /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
Btrfs v3.16+20140829
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
Turning ON incompat feature 'skinny-metadata': reduced-size metadata extent refs
fs created label (null) on /dev/sdb
        nodesize 16384 leafsize 16384 sectorsize 4096 size 1.00GiB
SLES12:~ # mount /dev/sdb /workspace
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.97956 s, 259 kB/s

real    0m1.981s
user    0m0.000s
sys     0m0.196s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.93987 s, 264 kB/s

real    0m1.942s
user    0m0.000s
sys     0m0.192s
SLES12:~ # umount /workspace
SLES12:~ #

Disabling net and log-driver, its 31.1 kB/s against 273 kB/s. In seconds its 16.4585 s against 1.87827 s...

SLES12:

SLES12:~ # echo y | mkfs.ext3 /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
mke2fs 1.42.11 (09-Jul-2014)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 09c47793-c06d-435a-a50b-5c8147916943
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

SLES12:~ # mount /dev/sdb /workspace
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.4585 s, 31.1 kB/s

real    0m16.461s
user    0m0.000s
sys     0m0.080s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.87827 s, 273 kB/s

real    0m1.880s
user    0m0.008s
sys     0m0.040s
SLES12:~ # umount /workspace
SLES12:~ # echo y | mkfs.ext3 /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
mke2fs 1.42.11 (09-Jul-2014)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 65aa9c1f-453e-48cf-bfc2-b21cd2d5a7ad
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

SLES12:~ # mount /dev/sdb /workspace -o data=journal
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.24319 s, 412 kB/s

real    0m1.246s
user    0m0.008s
sys     0m0.016s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.40873 s, 363 kB/s

real    0m1.411s
user    0m0.000s
sys     0m0.024s
SLES12:~ # umount /workspace
SLES12:~ # echo y | mkfs.ext2 /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
mke2fs 1.42.11 (09-Jul-2014)
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 25b1f7b3-3c1d-489d-84a6-df260d942842
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

SLES12:~ # mount /dev/sdb /workspace
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.789994 s, 648 kB/s

real    0m0.793s
user    0m0.000s
sys     0m0.048s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.670622 s, 763 kB/s

real    0m0.673s
user    0m0.004s
sys     0m0.036s
SLES12:~ # umount /workspace
SLES12:~ # mkfs.btrfs -f /dev/sdb
docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspace
Btrfs v3.16+20140829
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
Turning ON incompat feature 'skinny-metadata': reduced-size metadata extent refs
fs created label (null) on /dev/sdb
        nodesize 16384 leafsize 16384 sectorsize 4096 size 1.00GiB
SLES12:~ # mount /dev/sdb /workspace
SLES12:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.97956 s, 259 kB/s

real    0m1.981s
user    0m0.000s
sys     0m0.196s
SLES12:~ # time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.93987 s, 264 kB/s

real    0m1.942s
user    0m0.000s
sys     0m0.192s
SLES12:~ # umount /workspace
SLES12:~ #
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 18, 2016

Btw, on some hosts we have some auditing, so I tried disabling those with --cap-drop=ALL. But on these hosts I don't have any auditing services running, so it should not matter I believe...

Btw, on some hosts we have some auditing, so I tried disabling those with --cap-drop=ALL. But on these hosts I don't have any auditing services running, so it should not matter I believe...

@flavio

This comment has been minimized.

Show comment
Hide comment
@flavio

flavio May 18, 2016

Contributor

I've been able to reproduce the issue on a SLE12 SP1 machine. I'll file an internal bug at SUSE and we will work on that. We will keep you posted.

Feel free to open a service request using the usual SUSE channels If you need a patch or want to follow the evolution of the bug closely.

Contributor

flavio commented May 18, 2016

I've been able to reproduce the issue on a SLE12 SP1 machine. I'll file an internal bug at SUSE and we will work on that. We will keep you posted.

Feel free to open a service request using the usual SUSE channels If you need a patch or want to follow the evolution of the bug closely.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 18, 2016

Thanks, good to know!

ipeoshir commented May 18, 2016

Thanks, good to know!

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 25, 2016

Contributor

Just to keep everyone who can't access the SUSE bugzilla in the loop: We can reproduce this bug with runC. I believe it's a kernel problem, but I'm still trying to reproduce it with just unshare.

Contributor

cyphar commented May 25, 2016

Just to keep everyone who can't access the SUSE bugzilla in the loop: We can reproduce this bug with runC. I believe it's a kernel problem, but I'm still trying to reproduce it with just unshare.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 31, 2016

Contributor

Okay, so it turns out that it's a known performance problem within in the blkio cgroup controller. Currently the only way of fixing the issue is for us to not join all cgroups when we create a container. I've opened a PR against runC so we might be able to work on this: opencontainers/runc#861.

Contributor

cyphar commented May 31, 2016

Okay, so it turns out that it's a known performance problem within in the blkio cgroup controller. Currently the only way of fixing the issue is for us to not join all cgroups when we create a container. I've opened a PR against runC so we might be able to work on this: opencontainers/runc#861.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 31, 2016

Member

@cyphar is this specific to SUSE, or could this affect other distros as well?

Member

thaJeztah commented May 31, 2016

@cyphar is this specific to SUSE, or could this affect other distros as well?

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 31, 2016

Contributor

I don't have another distribution to test this with at the moment, but as far as I can tell it's a problem on all distros (it's an architectural problem with how blkio works). I'd be grateful if someone could confirm that this is the case. In an internal bug, it is mentioned t hat kernel 4.3 may have included a fix for this problem.

The important thing to make sure you do when testing this bug is to make sure that you're in the root blkio cgroup when you do the control test. Otherwise the performance loss will be the same inside and outside the container. Run the following before running the test on the host:

% echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
Contributor

cyphar commented May 31, 2016

I don't have another distribution to test this with at the moment, but as far as I can tell it's a problem on all distros (it's an architectural problem with how blkio works). I'd be grateful if someone could confirm that this is the case. In an internal bug, it is mentioned t hat kernel 4.3 may have included a fix for this problem.

The important thing to make sure you do when testing this bug is to make sure that you're in the root blkio cgroup when you do the control test. Otherwise the performance loss will be the same inside and outside the container. Run the following before running the test on the host:

% echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 31, 2016

On Ubuntu 14.04.3 LTS (3.19.0-25-generic) running as root:

root@ubuntu:~# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
root@ubuntu:~# echo y | mkfs.ext2 /dev/sdb
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspacemke2fs 1.42.9 (4-Feb-2014)
/dev/sdb is entire device, not just one partition!
Proceed anyway? (y,n) /dev/sdb is mounted; will not make a filesystem here!
root@ubuntu:~# mount /dev/sdb /workspace
mount: /dev/sdb already mounted or /workspace busy
mount: according to mtab, /dev/sdb is already mounted on /workspace
root@ubuntu:~# docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.09269 s, 469 kB/s

real    0m1.094s
user    0m0.000s
sys     0m0.184s
root@ubuntu:~# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.16866 s, 438 kB/s

real    0m1.171s
user    0m0.000s
sys     0m0.180s
root@ubuntu:~# umount /workspace
root@ubuntu:~# echo y | mkfs.ext3 /dev/sdb
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspacemke2fs 1.42.9 (4-Feb-2014)
/dev/sdb is entire device, not just one partition!
Proceed anyway? (y,n) Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
65536 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

root@ubuntu:~# mount /dev/sdb /workspace
root@ubuntu:~# docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 8.27865 s, 61.8 kB/s

real    0m8.281s
user    0m0.008s
sys     0m0.192s
root@ubuntu:~# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 8.70208 s, 58.8 kB/s

real    0m8.704s
user    0m0.008s
sys     0m0.196s
root@ubuntu:~# umount /workspace

On Ubuntu 14.04.3 LTS (3.19.0-25-generic) running as root:

root@ubuntu:~# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs
root@ubuntu:~# echo y | mkfs.ext2 /dev/sdb
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspacemke2fs 1.42.9 (4-Feb-2014)
/dev/sdb is entire device, not just one partition!
Proceed anyway? (y,n) /dev/sdb is mounted; will not make a filesystem here!
root@ubuntu:~# mount /dev/sdb /workspace
mount: /dev/sdb already mounted or /workspace busy
mount: according to mtab, /dev/sdb is already mounted on /workspace
root@ubuntu:~# docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.09269 s, 469 kB/s

real    0m1.094s
user    0m0.000s
sys     0m0.184s
root@ubuntu:~# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.16866 s, 438 kB/s

real    0m1.171s
user    0m0.000s
sys     0m0.180s
root@ubuntu:~# umount /workspace
root@ubuntu:~# echo y | mkfs.ext3 /dev/sdb
docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
umount /workspacemke2fs 1.42.9 (4-Feb-2014)
/dev/sdb is entire device, not just one partition!
Proceed anyway? (y,n) Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
65536 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=268435456
8 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

root@ubuntu:~# mount /dev/sdb /workspace
root@ubuntu:~# docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 8.27865 s, 61.8 kB/s

real    0m8.281s
user    0m0.008s
sys     0m0.192s
root@ubuntu:~# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 8.70208 s, 58.8 kB/s

real    0m8.704s
user    0m0.008s
sys     0m0.196s
root@ubuntu:~# umount /workspace
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 31, 2016

It doesn't allow me to change "/sys/fs/cgroup/blkio/cgroup.procs" though... There is always a list of IDs, but the value of "echo $$" is there if that is the intention...

# cat /sys/fs/cgroup/blkio/cgroup.procs | grep $$
22830

It doesn't allow me to change "/sys/fs/cgroup/blkio/cgroup.procs" though... There is always a list of IDs, but the value of "echo $$" is there if that is the intention...

# cat /sys/fs/cgroup/blkio/cgroup.procs | grep $$
22830
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 31, 2016

Contributor

Can you please run cat /proc/self/cgroups both in the host (after echo $$ > /sys/fs/cgroup/blkio/cgroup.procs) and in the container? You shouldn't open the file in a text editor, you need to run the exact command I specified.

Contributor

cyphar commented May 31, 2016

Can you please run cat /proc/self/cgroups both in the host (after echo $$ > /sys/fs/cgroup/blkio/cgroup.procs) and in the container? You shouldn't open the file in a text editor, you need to run the exact command I specified.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 31, 2016

root@ubuntu:~# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs

root@ubuntu:~# cat /proc/self/cgroup
12:name=systemd:/user/0.user/5.session
11:hugetlb:/user/0.user/5.session
10:net_prio:/user/0.user/5.session
9:perf_event:/user/0.user/5.session
8:blkio:/
7:net_cls:/user/0.user/5.session
6:freezer:/user/0.user/5.session
5:devices:/user/0.user/5.session
4:memory:/user/0.user/5.session
3:cpuacct:/user/0.user/5.session
2:cpu:/user/0.user/5.session
1:cpuset:/

root@ubuntu:~# docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "cat /proc/self/cgroup"
12:name=systemd:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
11:hugetlb:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
10:net_prio:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
9:perf_event:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
8:blkio:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
7:net_cls:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
6:freezer:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
5:devices:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
4:memory:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
3:cpuacct:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
2:cpu:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
1:cpuset:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
root@ubuntu:~# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs

root@ubuntu:~# cat /proc/self/cgroup
12:name=systemd:/user/0.user/5.session
11:hugetlb:/user/0.user/5.session
10:net_prio:/user/0.user/5.session
9:perf_event:/user/0.user/5.session
8:blkio:/
7:net_cls:/user/0.user/5.session
6:freezer:/user/0.user/5.session
5:devices:/user/0.user/5.session
4:memory:/user/0.user/5.session
3:cpuacct:/user/0.user/5.session
2:cpu:/user/0.user/5.session
1:cpuset:/

root@ubuntu:~# docker run --rm --net=host --read-only -v "/workspace:/workspace" opensuse bash -c "cat /proc/self/cgroup"
12:name=systemd:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
11:hugetlb:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
10:net_prio:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
9:perf_event:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
8:blkio:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
7:net_cls:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
6:freezer:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
5:devices:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
4:memory:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
3:cpuacct:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
2:cpu:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
1:cpuset:/docker/6e1d2e44f6ec81347c8e10e2df4e810248054b541055401a929f134a47d1592f
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 31, 2016

Contributor

That's very odd. I'll look into this more. What is the kernel version?

Contributor

cyphar commented May 31, 2016

That's very odd. I'll look into this more. What is the kernel version?

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 31, 2016

root@ubuntu:~# uname -r
3.19.0-25-generic

root@ubuntu:~# dpkg-query -l | grep linux-image
ii  linux-image-3.19.0-25-generic       3.19.0-25.26~14.04.1                    amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-25-generic 3.19.0-25.26~14.04.1                    amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
root@ubuntu:~# uname -r
3.19.0-25-generic

root@ubuntu:~# dpkg-query -l | grep linux-image
ii  linux-image-3.19.0-25-generic       3.19.0-25.26~14.04.1                    amd64        Linux kernel image for version 3.19.0 on 64 bit x86 SMP
ii  linux-image-extra-3.19.0-25-generic 3.19.0-25.26~14.04.1                    amd64        Linux kernel extra modules for version 3.19.0 on 64 bit x86 SMP
@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir May 31, 2016

Some other benchmarks that might be useful:

opensuse:~ # echo y | mkfs.ext3 /dev/sdb

opensuse:~ # mount /dev/sdb /workspace
opensuse:~ # dbench -D /workspace -s -S -t 10 5
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
5 clients started
   5       190    37.92 MB/sec  warmup   1 sec
   5       721    29.85 MB/sec  execute   1 sec
   5      2419    31.94 MB/sec  execute   2 sec
   5      4019    41.23 MB/sec  execute   3 sec
   5      5804    38.22 MB/sec  execute   4 sec
   5      7241    42.54 MB/sec  execute   5 sec
   5      8406    38.85 MB/sec  execute   6 sec
   5      9995    39.76 MB/sec  execute   7 sec
   5     11554    39.83 MB/sec  execute   8 sec
   5     13049    38.90 MB/sec  execute   9 sec
   5     14408    40.88 MB/sec  cleanup  10 sec
   5     14408    40.72 MB/sec  cleanup  10 sec

Throughput 40.8924 MB/sec (sync open) (sync dirs) 5 procs

opensuse:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" -v "/usr/share/dbench:/usr/share/dbench" -v "/usr/bin/dbench:/usr/bin/dbench" opensuse bash -c "dbench -D /workspace -s -S -t 10 5"
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
5 clients started
   5        59    12.19 MB/sec  warmup   1 sec
   5       184    11.52 MB/sec  execute   1 sec
   5       258    11.61 MB/sec  execute   2 sec
   5       334    11.64 MB/sec  execute   3 sec
   5       403    11.57 MB/sec  execute   4 sec
   5       474    11.53 MB/sec  execute   5 sec
   5       537    11.51 MB/sec  execute   6 sec
   5       593    11.08 MB/sec  execute   7 sec
   5       869    10.38 MB/sec  execute   8 sec
   5      1182     9.99 MB/sec  execute   9 sec
   5      1855    10.65 MB/sec  cleanup  10 sec
   5      1855    10.60 MB/sec  cleanup  10 sec

Throughput 10.6543 MB/sec (sync open) (sync dirs) 5 procs
opensuse:~ # umount /workspace
root@ubuntu:~# echo y | mkfs.ext3 /dev/sdb
root@ubuntu:~ # mount /dev/sdb /workspace

root@ubuntu:~# dbench -D /workspace -s -S -t 10 5
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
0 of 5 processes prepared for launch   0 sec
5 of 5 processes prepared for launch   0 sec
releasing clients
   5       412    72.83 MB/sec  warmup   1 sec  latency 15.306 ms
   5      4741    74.56 MB/sec  execute   1 sec  latency 30.511 ms
   5      7767    80.20 MB/sec  execute   2 sec  latency 22.940 ms
   5     10945    84.60 MB/sec  execute   3 sec  latency 24.768 ms
   5     13905    81.32 MB/sec  execute   4 sec  latency 23.255 ms
   5     16954    81.18 MB/sec  execute   5 sec  latency 11.711 ms
   5     20229    80.41 MB/sec  execute   6 sec  latency 7.347 ms
   5     23398    81.52 MB/sec  execute   7 sec  latency 39.331 ms
   5     26149    81.10 MB/sec  execute   8 sec  latency 17.883 ms
   5     29468    82.56 MB/sec  execute   9 sec  latency 9.466 ms
   5  cleanup  10 sec
   0  cleanup  10 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      26295     0.104     7.178
 Close          19157     0.004     2.390
 Rename          1125     2.569    23.826
 Unlink          5461     2.391    39.323
 Qpathinfo      23766     0.045    14.019
 Qfileinfo       4139     0.002     0.128
 Qfsinfo         4500     0.005     4.587
 Sfileinfo       2140     0.180    14.230
 Find            9300     0.042     7.124
 WriteX         13034     1.995    24.762
 ReadX          41895     0.035    20.792
 LockX             90     0.003     0.017
 UnlockX           90     0.001     0.003
 Flush           1881     0.764    13.121

Throughput 82.5643 MB/sec (sync open) (sync dirs)  5 clients  5 procs  max_latency=39.331 ms
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" -v "/usr/share/dbench:/usr/share/dbench" -v "/usr/bin/dbench:/usr/bin/dbench" opensuse bash -c "dbench -D /workspace -s -S -t 10 5"
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
failed to create barrier semaphore
0 of 5 processes prepared for launch   0 sec
5 of 5 processes prepared for launch   0 sec
releasing clients
   5       328    59.38 MB/sec  warmup   1 sec  latency 189.325 ms
   5      3905    92.81 MB/sec  execute   1 sec  latency 101.124 ms
   5      7192    90.76 MB/sec  execute   2 sec  latency 17.686 ms
   5     10028    83.74 MB/sec  execute   3 sec  latency 31.201 ms
   5     13200    80.93 MB/sec  execute   4 sec  latency 6.912 ms
   5     16257    81.61 MB/sec  execute   5 sec  latency 8.592 ms
   5     19245    83.45 MB/sec  execute   6 sec  latency 11.496 ms
   5     22215    82.08 MB/sec  execute   7 sec  latency 22.794 ms
   5     25425    83.34 MB/sec  execute   8 sec  latency 22.229 ms
   5     28150    81.99 MB/sec  execute   9 sec  latency 120.632 ms
   5  cleanup  10 sec
   0  cleanup  10 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      26253     0.112    21.252
 Close          19481     0.003     1.171
 Rename          1117     2.567   120.617
 Unlink          5144     2.398   120.172
 Qpathinfo      24019     0.047     4.019
 Qfileinfo       4203     0.004     1.405
 Qfsinfo         4255     0.008     0.880
 Sfileinfo       2215     0.232    83.021
 Find            9163     0.054     9.587
 WriteX         13029     2.017   120.222
 ReadX          40866     0.034    14.647
 LockX             84     0.022     1.481
 UnlockX           84     0.002     0.003
 Flush           1832     0.776    22.075

Throughput 81.9868 MB/sec (sync open) (sync dirs)  5 clients  5 procs  max_latency=120.632 ms
root@ubuntu:~# umount /workspace

Some other benchmarks that might be useful:

opensuse:~ # echo y | mkfs.ext3 /dev/sdb

opensuse:~ # mount /dev/sdb /workspace
opensuse:~ # dbench -D /workspace -s -S -t 10 5
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
5 clients started
   5       190    37.92 MB/sec  warmup   1 sec
   5       721    29.85 MB/sec  execute   1 sec
   5      2419    31.94 MB/sec  execute   2 sec
   5      4019    41.23 MB/sec  execute   3 sec
   5      5804    38.22 MB/sec  execute   4 sec
   5      7241    42.54 MB/sec  execute   5 sec
   5      8406    38.85 MB/sec  execute   6 sec
   5      9995    39.76 MB/sec  execute   7 sec
   5     11554    39.83 MB/sec  execute   8 sec
   5     13049    38.90 MB/sec  execute   9 sec
   5     14408    40.88 MB/sec  cleanup  10 sec
   5     14408    40.72 MB/sec  cleanup  10 sec

Throughput 40.8924 MB/sec (sync open) (sync dirs) 5 procs

opensuse:~ # docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" -v "/usr/share/dbench:/usr/share/dbench" -v "/usr/bin/dbench:/usr/bin/dbench" opensuse bash -c "dbench -D /workspace -s -S -t 10 5"
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
5 clients started
   5        59    12.19 MB/sec  warmup   1 sec
   5       184    11.52 MB/sec  execute   1 sec
   5       258    11.61 MB/sec  execute   2 sec
   5       334    11.64 MB/sec  execute   3 sec
   5       403    11.57 MB/sec  execute   4 sec
   5       474    11.53 MB/sec  execute   5 sec
   5       537    11.51 MB/sec  execute   6 sec
   5       593    11.08 MB/sec  execute   7 sec
   5       869    10.38 MB/sec  execute   8 sec
   5      1182     9.99 MB/sec  execute   9 sec
   5      1855    10.65 MB/sec  cleanup  10 sec
   5      1855    10.60 MB/sec  cleanup  10 sec

Throughput 10.6543 MB/sec (sync open) (sync dirs) 5 procs
opensuse:~ # umount /workspace
root@ubuntu:~# echo y | mkfs.ext3 /dev/sdb
root@ubuntu:~ # mount /dev/sdb /workspace

root@ubuntu:~# dbench -D /workspace -s -S -t 10 5
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
0 of 5 processes prepared for launch   0 sec
5 of 5 processes prepared for launch   0 sec
releasing clients
   5       412    72.83 MB/sec  warmup   1 sec  latency 15.306 ms
   5      4741    74.56 MB/sec  execute   1 sec  latency 30.511 ms
   5      7767    80.20 MB/sec  execute   2 sec  latency 22.940 ms
   5     10945    84.60 MB/sec  execute   3 sec  latency 24.768 ms
   5     13905    81.32 MB/sec  execute   4 sec  latency 23.255 ms
   5     16954    81.18 MB/sec  execute   5 sec  latency 11.711 ms
   5     20229    80.41 MB/sec  execute   6 sec  latency 7.347 ms
   5     23398    81.52 MB/sec  execute   7 sec  latency 39.331 ms
   5     26149    81.10 MB/sec  execute   8 sec  latency 17.883 ms
   5     29468    82.56 MB/sec  execute   9 sec  latency 9.466 ms
   5  cleanup  10 sec
   0  cleanup  10 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      26295     0.104     7.178
 Close          19157     0.004     2.390
 Rename          1125     2.569    23.826
 Unlink          5461     2.391    39.323
 Qpathinfo      23766     0.045    14.019
 Qfileinfo       4139     0.002     0.128
 Qfsinfo         4500     0.005     4.587
 Sfileinfo       2140     0.180    14.230
 Find            9300     0.042     7.124
 WriteX         13034     1.995    24.762
 ReadX          41895     0.035    20.792
 LockX             90     0.003     0.017
 UnlockX           90     0.001     0.003
 Flush           1881     0.764    13.121

Throughput 82.5643 MB/sec (sync open) (sync dirs)  5 clients  5 procs  max_latency=39.331 ms
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# docker run --rm --net=none --log-driver=none --read-only -v "/workspace:/workspace" -v "/usr/share/dbench:/usr/share/dbench" -v "/usr/bin/dbench:/usr/bin/dbench" opensuse bash -c "dbench -D /workspace -s -S -t 10 5"
dbench version 4.00 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 2 secs
failed to create barrier semaphore
0 of 5 processes prepared for launch   0 sec
5 of 5 processes prepared for launch   0 sec
releasing clients
   5       328    59.38 MB/sec  warmup   1 sec  latency 189.325 ms
   5      3905    92.81 MB/sec  execute   1 sec  latency 101.124 ms
   5      7192    90.76 MB/sec  execute   2 sec  latency 17.686 ms
   5     10028    83.74 MB/sec  execute   3 sec  latency 31.201 ms
   5     13200    80.93 MB/sec  execute   4 sec  latency 6.912 ms
   5     16257    81.61 MB/sec  execute   5 sec  latency 8.592 ms
   5     19245    83.45 MB/sec  execute   6 sec  latency 11.496 ms
   5     22215    82.08 MB/sec  execute   7 sec  latency 22.794 ms
   5     25425    83.34 MB/sec  execute   8 sec  latency 22.229 ms
   5     28150    81.99 MB/sec  execute   9 sec  latency 120.632 ms
   5  cleanup  10 sec
   0  cleanup  10 sec

 Operation      Count    AvgLat    MaxLat
 ----------------------------------------
 NTCreateX      26253     0.112    21.252
 Close          19481     0.003     1.171
 Rename          1117     2.567   120.617
 Unlink          5144     2.398   120.172
 Qpathinfo      24019     0.047     4.019
 Qfileinfo       4203     0.004     1.405
 Qfsinfo         4255     0.008     0.880
 Sfileinfo       2215     0.232    83.021
 Find            9163     0.054     9.587
 WriteX         13029     2.017   120.222
 ReadX          40866     0.034    14.647
 LockX             84     0.022     1.481
 UnlockX           84     0.002     0.003
 Flush           1832     0.776    22.075

Throughput 81.9868 MB/sec (sync open) (sync dirs)  5 clients  5 procs  max_latency=120.632 ms
root@ubuntu:~# umount /workspace
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar May 31, 2016

Contributor

Can you run blktrace -d <the device> while the dd is running?

Contributor

cyphar commented May 31, 2016

Can you run blktrace -d <the device> while the dd is running?

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 1, 2016

Contributor

@ipeoshir I have some proposed fixes from our kernel team, which were mirrored in the internal ticket. Basically it boils down to three options that can help the performance:

  1. Switch IO scheduler on the underlying disk to 'deadline' - by that
    you'll completely lose propotional IO weighting between blkio cgroups and
    also some other features of CFQ IO scheduler. But it may work fine.
    You can do the switch by doing:

echo deadline >/sys/block/<device>/queue/scheduler

  1. A less drastic option - turn off CFQ scheduler idling by:

echo 0 >/sys/block/<device>/queue/iosched/slice_idle
echo 0 >/sys/block/<device>/queue/iosched/group_idle

After that CFQ IO scheduler will not wait before switching to serving
another process / blkio cgroup. So performance will not suffer when using
blkio cgroups but "IO hungry" cgroup / process can get disproportionate
amount of IO time compared to cgroup that does not have IO always ready.

  1. Switch the underlying filesystem to btrfs or XFS.

Using data=journal mode of ext4 as mentioned in <previous comment> has other performance implications (in general the performance is going to be much worse because all the writes happen twice - once to the journal and once to the final location on disk) so I would not consider that an ideal solution.

Contributor

cyphar commented Jun 1, 2016

@ipeoshir I have some proposed fixes from our kernel team, which were mirrored in the internal ticket. Basically it boils down to three options that can help the performance:

  1. Switch IO scheduler on the underlying disk to 'deadline' - by that
    you'll completely lose propotional IO weighting between blkio cgroups and
    also some other features of CFQ IO scheduler. But it may work fine.
    You can do the switch by doing:

echo deadline >/sys/block/<device>/queue/scheduler

  1. A less drastic option - turn off CFQ scheduler idling by:

echo 0 >/sys/block/<device>/queue/iosched/slice_idle
echo 0 >/sys/block/<device>/queue/iosched/group_idle

After that CFQ IO scheduler will not wait before switching to serving
another process / blkio cgroup. So performance will not suffer when using
blkio cgroups but "IO hungry" cgroup / process can get disproportionate
amount of IO time compared to cgroup that does not have IO always ready.

  1. Switch the underlying filesystem to btrfs or XFS.

Using data=journal mode of ext4 as mentioned in <previous comment> has other performance implications (in general the performance is going to be much worse because all the writes happen twice - once to the journal and once to the final location on disk) so I would not consider that an ideal solution.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir Jun 1, 2016

@cyphar, thanks!
I will try to follow 1) and 2) and see if it will fit as solutions to my use cases. Suggestion 3) will not work, we already tried any filesystem out there to have the performance improved but to no avail. I will see if I can have 980615 updated with that in mind, the issue is about the performance in general. The "dd" example was one point we could find that was requiring a fix.

ipeoshir commented Jun 1, 2016

@cyphar, thanks!
I will try to follow 1) and 2) and see if it will fit as solutions to my use cases. Suggestion 3) will not work, we already tried any filesystem out there to have the performance improved but to no avail. I will see if I can have 980615 updated with that in mind, the issue is about the performance in general. The "dd" example was one point we could find that was requiring a fix.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir Jun 1, 2016

Suggestions for:

  1. works for "dd", I switched to noop, deadline and cfq and could see that only cfq reduces the transfer rates.
  2. it only works when its on cfq, otherwise it outputs "Permission denied", but it has the same results.

However the overall performance is still slower, I can noticed that as described in #23137.

In the bug id 980615 we expect to fix the performance in general, I believe the root cause is not addressed in this case. Perhaps we can update it to not mention "ext3 and ext4 journaled filesystems", but then we would have to open another bugzilla anyway... So let's try to fix the root cause and figure out why it's impacting docker on SuSE..

ipeoshir commented Jun 1, 2016

Suggestions for:

  1. works for "dd", I switched to noop, deadline and cfq and could see that only cfq reduces the transfer rates.
  2. it only works when its on cfq, otherwise it outputs "Permission denied", but it has the same results.

However the overall performance is still slower, I can noticed that as described in #23137.

In the bug id 980615 we expect to fix the performance in general, I believe the root cause is not addressed in this case. Perhaps we can update it to not mention "ext3 and ext4 journaled filesystems", but then we would have to open another bugzilla anyway... So let's try to fix the root cause and figure out why it's impacting docker on SuSE..

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 1, 2016

Contributor

The problem @ipeoshir is that the only solution from the Docker (runC + libcontainer) side is non-trivial (I describe it in opencontainers/runc#861). And from the kernel side as far as I understand it's a fundamental design of how the CFQ IO scheduler works when weighting different cgroups (this is an upstream kernel thing). I'm not sure why it doesn't appear to happen on Ubuntu, but I'm going to look at it to see why that's the case.

To be clear, the filesystem isn't the cause. It's because of how IO scheduling works in the kernel.

Contributor

cyphar commented Jun 1, 2016

The problem @ipeoshir is that the only solution from the Docker (runC + libcontainer) side is non-trivial (I describe it in opencontainers/runc#861). And from the kernel side as far as I understand it's a fundamental design of how the CFQ IO scheduler works when weighting different cgroups (this is an upstream kernel thing). I'm not sure why it doesn't appear to happen on Ubuntu, but I'm going to look at it to see why that's the case.

To be clear, the filesystem isn't the cause. It's because of how IO scheduling works in the kernel.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir Jun 1, 2016

Well, we better wait for a good fix then. I have a more detailed spreasheet with the same use case running on ext2 on containers, ext3 on containers and ext3 on host and during a certain step (package installation) the time measure was this: 6812, 7206, 887 respectively, so it's really a bottleneck to proceed using docker.

ipeoshir commented Jun 1, 2016

Well, we better wait for a good fix then. I have a more detailed spreasheet with the same use case running on ext2 on containers, ext3 on containers and ext3 on host and during a certain step (package installation) the time measure was this: 6812, 7206, 887 respectively, so it's really a bottleneck to proceed using docker.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir Jun 3, 2016

@cyphar, @flavio, I added more detailed information on: bugzilla number 983015.

Tracing the versions of docker that can be installed on SLES12, this bug first appeared in "docker-1.5.0-23.1" while version "docker-1.5.0-20.1" and older had no issue with performance lags.

Installing just 34 packages on containers is taking "1m12.571s" (1.5.0-23.1 and newer) against "0m3.614s" (1.5.0-20.1 and older) when this issue is not there. Version 1.5.0-20.1 has the same performance when running on the host, so there could be some way to fix this on docker as this was working fine at some point.

ipeoshir commented Jun 3, 2016

@cyphar, @flavio, I added more detailed information on: bugzilla number 983015.

Tracing the versions of docker that can be installed on SLES12, this bug first appeared in "docker-1.5.0-23.1" while version "docker-1.5.0-20.1" and older had no issue with performance lags.

Installing just 34 packages on containers is taking "1m12.571s" (1.5.0-23.1 and newer) against "0m3.614s" (1.5.0-20.1 and older) when this issue is not there. Version 1.5.0-20.1 has the same performance when running on the host, so there could be some way to fix this on docker as this was working fine at some point.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 3, 2016

Contributor

This appears to be something we changed in the SUSE package (my guess is that we backported a patch that caused this). I'm trying to pin down what revision caused this.

Contributor

cyphar commented Jun 3, 2016

This appears to be something we changed in the SUSE package (my guess is that we backported a patch that caused this). I'm trying to pin down what revision caused this.

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir Jun 3, 2016

Not sure if it was a patch... I downloaded the binaries directly from docker, extracted them and it also had the issue.

ipeoshir commented Jun 3, 2016

Not sure if it was a patch... I downloaded the binaries directly from docker, extracted them and it also had the issue.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jun 3, 2016

Contributor

@ipeoshir But the -23.1 and -20.1 are the SLE package versions right? You didn't take them from somewhere else?

Contributor

cyphar commented Jun 3, 2016

@ipeoshir But the -23.1 and -20.1 are the SLE package versions right? You didn't take them from somewhere else?

@ipeoshir

This comment has been minimized.

Show comment
Hide comment
@ipeoshir

ipeoshir Jun 3, 2016

Yes, on the bugzilla I used solely packages from SLES12 to get the results for you.

But to test the binaries I downloaded from:

** And replaced the RPM binary (in this case it was an openSUSE 13.2) with those non-patched binaries, just to make sure it was not something from the distribution... So it's probably on docker's source code. In there (openSUSE) we can detect the latency problems between 1.5.0-21.1 and 1.6.0-25.1 available on the update repo (http://download.opensuse.org/repositories/openSUSE:/13.2:/Update/standard).

On SLES12 the difference is on those two package versions I pointed out...

ipeoshir commented Jun 3, 2016

Yes, on the bugzilla I used solely packages from SLES12 to get the results for you.

But to test the binaries I downloaded from:

** And replaced the RPM binary (in this case it was an openSUSE 13.2) with those non-patched binaries, just to make sure it was not something from the distribution... So it's probably on docker's source code. In there (openSUSE) we can detect the latency problems between 1.5.0-21.1 and 1.6.0-25.1 available on the update repo (http://download.opensuse.org/repositories/openSUSE:/13.2:/Update/standard).

On SLES12 the difference is on those two package versions I pointed out...

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jul 8, 2016

Contributor

This issue resulted in #24307 (which fixes the odd regression between 1.5.0 packages). In addition, we discovered that the reason that Ubuntu doesn't have this problem is because Ubuntu uses the deadline IO scheduler by default (SUSE distributions use CFQ), which doesn't suffer as badly from the blkio cgroup performance issue.

In any case, this issue can be closed (everything has been done on the Docker side that is possible).

/cc @thaJeztah

Contributor

cyphar commented Jul 8, 2016

This issue resulted in #24307 (which fixes the odd regression between 1.5.0 packages). In addition, we discovered that the reason that Ubuntu doesn't have this problem is because Ubuntu uses the deadline IO scheduler by default (SUSE distributions use CFQ), which doesn't suffer as badly from the blkio cgroup performance issue.

In any case, this issue can be closed (everything has been done on the Docker side that is possible).

/cc @thaJeztah

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jul 8, 2016

Contributor

@cyphar yeah, and this issue already is too big. Thanks for investigation and fix!
@alkmim @ipeoshir feel free to report another issue if problem is still here with docker master.

Contributor

LK4D4 commented Jul 8, 2016

@cyphar yeah, and this issue already is too big. Thanks for investigation and fix!
@alkmim @ipeoshir feel free to report another issue if problem is still here with docker master.

@LK4D4 LK4D4 closed this Jul 8, 2016

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 8, 2016

Member

thanks @cyphar, and thanks for doing the research

Member

thaJeztah commented Jul 8, 2016

thanks @cyphar, and thanks for doing the research

@mimizone

This comment has been minimized.

Show comment
Hide comment
@mimizone

mimizone Sep 27, 2016

Why the test outside the container/cgroup is not impacted by the bug/slow performance? Isn't the I/O scheduler always in the data path, cgroup or not?
like described on this great diagram https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram#Diagram_for_Linux_Kernel_4.0

Why the test outside the container/cgroup is not impacted by the bug/slow performance? Isn't the I/O scheduler always in the data path, cgroup or not?
like described on this great diagram https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram#Diagram_for_Linux_Kernel_4.0

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Sep 29, 2016

Contributor

@mimizone According to the kernel guys at SUSE, the reason is that the CFQ IO scheduler will add latency to requests in order to make sure that two racing cgroups don't starve one another. I don't fully understand the code behind it, but that's what they told me and the experimental results back this up (deadline scheduler would never dream of adding latency).

Contributor

cyphar commented Sep 29, 2016

@mimizone According to the kernel guys at SUSE, the reason is that the CFQ IO scheduler will add latency to requests in order to make sure that two racing cgroups don't starve one another. I don't fully understand the code behind it, but that's what they told me and the experimental results back this up (deadline scheduler would never dream of adding latency).

@danpat danpat referenced this issue in Project-OSRM/osrm-backend-docker Oct 24, 2016

Open

Slow performance of osrm-extract inside Docker container #1

@atzoum atzoum referenced this issue in longsleep/linux-pine64 Jan 23, 2017

Merged

Kernel configuration for supporting docker in swarm mode #50

@wohali wohali referenced this issue in apache/couchdb Feb 1, 2018

Closed

Could not open shard #1119

@wollanup wollanup referenced this issue in docker/for-linux May 17, 2018

Open

MySQL extremely slow just with docker-ce installed #247

2 of 3 tasks complete

@vsoch vsoch referenced this issue in singularityhub/interface Jun 25, 2018

Open

The interface is super slow #35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment