Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow file copy for big files. #687

Closed
mikhmv opened this issue Apr 20, 2012 · 22 comments
Closed

Slow file copy for big files. #687

mikhmv opened this issue Apr 20, 2012 · 22 comments
Labels
Type: Documentation Indicates a requested change to the documentation

Comments

@mikhmv
Copy link

mikhmv commented Apr 20, 2012

I have raidz with 5 disks. and system can only provides 53MB/sec copy on them.

$ ls -las 5173N_sorted_dedup_rg_dd2_kar.chr4.ra.dd.recal.bam
15220278 -rw-r--r-- 1 oneadmin cloud 15583853314 Mar 27 18:53 YYYY.bam

$ time cp YYYYY.bam test.tmp

real 4m38.525s
user 0m0.048s
sys 1m11.124s

~53MB/sec

System info: 
--------------------------------------------
uname -a
--------------------------------------------
Linux s0 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

 free -m
--------------------------------------------
             total       used       free     shared    buffers     cached
            Mem:        257897      77944     179953          0         59         82
            -/+ buffers/cache:      77801     180095
            Swap:       256988          0     256988

$ dpkg -s zfs-dkms
--------------------------------------------
Package: zfs-dkms
Status: install ok installed
Priority: extra
Section: kernel
Installed-Size: 9491
Maintainer: Darik Horn 
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.59-0ubuntu1~precise1

~$ dpkg -s zfsutils
--------------------------------------------
Package: zfsutils
Status: install ok installed
Priority: extra
Section: admin
Installed-Size: 698
Maintainer: Darik Horn 
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.59-0ubuntu1~precise1

$  dpkg -s libuutil1
--------------------------------------------
Package: libuutil1
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 148
Maintainer: Darik Horn 
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.59-0ubuntu1~precise1

$ dpkg -s libzfs1
--------------------------------------------
Package: libzfs1
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 308
Maintainer: Darik Horn 
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.59-0ubuntu1~precise1

$ dpkg -s libzpool1
--------------------------------------------
Package: libzpool1
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 1119
Maintainer: Darik Horn 
Architecture: amd64
Source: zfs-linux
Version: 0.6.0.59-0ubuntu1~precise1

$ sudo zpool list
--------------------------------------------
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  9.06T  6.20T  2.86T    68%  1.04x  ONLINE  -

$ sudo zpool status
--------------------------------------------
  pool: tank
 state: ONLINE
 scan: scrub repaired 0 in 53h10m with 0 errors on Fri Apr  6 18:49:24 2012
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            d1      ONLINE       0     0     0
            d2      ONLINE       0     0     0
            d3      ONLINE       0     0     0
            d4      ONLINE       0     0     0
            d5      ONLINE       0     0     0

errors: No known data errors

$ sudo zfs get all tank/biouml-shared
--------------------------------------------
NAME                PROPERTY              VALUE                  SOURCE
tank/biouml-shared  type                  filesystem             -
tank/biouml-shared  creation              Sun Feb  5  8:29 2012  -
tank/biouml-shared  used                  3.80T                  -
tank/biouml-shared  available             2.17T                  -
tank/biouml-shared  referenced            3.80T                  -
tank/biouml-shared  compressratio         1.13x                  -
tank/biouml-shared  mounted               yes                    -
tank/biouml-shared  quota                 none                   default
tank/biouml-shared  reservation           none                   default
tank/biouml-shared  recordsize            128K                   default
tank/biouml-shared  mountpoint            /tank/biouml-shared    default
tank/biouml-shared  sharenfs              off                    local
tank/biouml-shared  checksum              on                     default
tank/biouml-shared  compression           off                    local
tank/biouml-shared  atime                 on                     default
tank/biouml-shared  devices               on                     default
tank/biouml-shared  exec                  on                     default
tank/biouml-shared  setuid                on                     default
tank/biouml-shared  readonly              off                    default
tank/biouml-shared  zoned                 off                    default
tank/biouml-shared  snapdir               hidden                 default
tank/biouml-shared  aclinherit            restricted             default
tank/biouml-shared  canmount              on                     default
tank/biouml-shared  xattr                 on                     default
tank/biouml-shared  copies                1                      default
tank/biouml-shared  version               5                      -
tank/biouml-shared  utf8only              off                    -
tank/biouml-shared  normalization         none                   -
tank/biouml-shared  casesensitivity       sensitive              -
tank/biouml-shared  vscan                 off                    default
tank/biouml-shared  nbmand                off                    default
tank/biouml-shared  sharesmb              off                    default
tank/biouml-shared  refquota              none                   default
tank/biouml-shared  refreservation        none                   default
tank/biouml-shared  primarycache          all                    default
tank/biouml-shared  secondarycache        all                    default
tank/biouml-shared  usedbysnapshots       0                      -
tank/biouml-shared  usedbydataset         3.80T                  -
tank/biouml-shared  usedbychildren        0                      -
tank/biouml-shared  usedbyrefreservation  0                      -
tank/biouml-shared  logbias               latency                default
tank/biouml-shared  dedup                 off                    local
tank/biouml-shared  mlslabel              none                   default
tank/biouml-shared  sync                  standard               default
tank/biouml-shared  refcompressratio      1.13x                  -
@ryao
Copy link
Contributor

ryao commented Apr 21, 2012

Please post the output of zdb.

@mikhmv
Copy link
Author

mikhmv commented Apr 21, 2012

$ sudo zdb
-------------------------------------------
tank:
    version: 28
    name: 'tank'
    state: 0
    txg: 4
    pool_guid: 14708313474365326385
    hostname: 's0'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 14708313474365326385
        create_txg: 4
        children[0]:
            type: 'raidz'
            id: 0
            guid: 3208061938074886841
            nparity: 1
            metaslab_array: 31
            metaslab_shift: 36
            ashift: 12
            asize: 10001923440640
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 8180987024594158286
                path: '/dev/disk/zpool/d1-part1'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 13711974958319910791
                path: '/dev/disk/zpool/d2-part1'
                whole_disk: 1
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 13904306676428230775
                path: '/dev/disk/zpool/d3-part1'
                whole_disk: 1
                create_txg: 4
            children[3]:
                type: 'disk'
                id: 3
                guid: 6731424362151096911
                path: '/dev/disk/zpool/d4-part1'
                whole_disk: 1
                create_txg: 4
            children[4]:
                type: 'disk'
                id: 4
                guid: 9282870102680286776
                path: '/dev/disk/zpool/d5-part1'
                whole_disk: 1
                create_txg: 4

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

new daily build of ZFS even worse then previous (previous provides around 50~60MB/s).
Writing on regular drive (the same model on the same host as used for zfs):

~$ dd if=/dev/zero of=test count=100000
100000+0 records in
100000+0 records out
51200000 bytes (51 MB) copied, 0.41761 s, 123 MB/s

on ZFS raidz:

$ dd if=/dev/zero of=test count=100000
100000+0 records in
100000+0 records out
51200000 bytes (51 MB) copied, 4.58788 s, 11.2 MB/s

the result is 10 folds slower....

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

By any chance, are you using a Command Based Switching port multiplier?

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

All disks in this ZFS installation directly connected to this motherboard: http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8QGi-F.cfm
as a testing drive too.

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

Which Linux distribution are you using? What is your kernle version? Is your system BIOS up to date? Is your distribution up to date?

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

Hi Ryao,
Thanks for quick answers.
I am using ubuntu 12.04.
~$ uname -a
Linux s0 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
all updates are installed.

I checked bios it is up to date too. by the way to Unmount ZFS during reboot I wait for 30 min and press reset button....

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

Just addition, When I just install ZFS (January) the speed was ~200MB/s and on peak around 400MB/s. But it had issues with removing large files and stability.

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

Try using dd with bs=16384 as a parameter. The throughput should improve. As for waiting 30 minutes, that is a separate issue. I imagine that you would want to discuss that with @dajhorn.

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

Hi Ryao,
the performance is between 124 and 420 MB/s. It is really great.

The question is how to make so great performance for standard linux command like "cp" as it has speed around 11MB/s?
How can I say system to make all write operations with block 16384?

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

You are using ashift=12, so all write operations are 4KB in size. With 5 disks in raidz, you have 4 data disks, so the minimal stripe size is 16KB. If your files are less than 16KB in size, the overhead of writes will still be approximately the cost of 16KB, which is what is hurting performance.

You can improve performance by adding a SSD as a SLOG device to your vdev. Then write performance on both sequential and random operations should be close to the sequential write performance of the SSD.

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

I had 11-50MB/s when copy 16GB file. is it possible to improve without SSD drive?

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

I tested copying a 1GB file from a tmpfs to a pool containing a single 6-disk raidz2 vdev composed of Samsung HD204UI disks and the transfer time was 1.512 seconds. The transfer rate was 677.2 MB/sec.

It might be that you are encountering seek overhead. Are the reads and writes for that occurring inside a single pool?

@mikhmv
Copy link
Author

mikhmv commented Apr 23, 2012

reading from ZFS and storing on regular drive - 102MB/s.
From this drive to ZFS - 175MB/s.
You are right there are no issues with speed here. Thanks

@mikhmv mikhmv closed this as completed May 5, 2012
@rbabchis
Copy link

rbabchis commented Feb 8, 2015

I'm having this same problem. Copying large files within the same pool is very slow, but copying to/from the pool to/from an external source is as fast as expected. What is the solution to this? Would a LOG device actually make a difference (I don't want to waste my money)? Is there anything else that can be done?

@behlendorf
Copy link
Contributor

@rbabchis How slow? When copying files within a pool the drives are going to be contending for read and writes. This is going to impact performance to some degree.

@rbabchis
Copy link

rbabchis commented Feb 9, 2015

It bounces around a lot, but averages to about 50MB/sec (using rsync locally). Sometimes it crawls as slow as 25MB/sec without any apparent reason. I've tried and failed to determine why. bonnie++ shows about 150MB/sec read, 100MB/sec write, 50MB/sec rewrite.

@georgyo
Copy link

georgyo commented Oct 25, 2015

I have also noticed this problem.

I have 4 1TB SSDs, split into two mirrors. I can read at about 1GB/s and write around 600MB/s sustained.

However doing a cp the speed is capped at 12MB/s. Painfully slow.

Working with a 40gig file, it is 20x faster to copy the file to another partition on the drives and copy it back. Doing it this way results in a copy at a speed of around 400MB/s in both directions.

This also removes the theory of drive io being the bottle neck. It is 100% a problem of copy on the same zpool.

Each drive is partitioned exactly the same

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2149MB  2147MB  fat32              boot, legacy_boot, esp
 2      2149MB  4296MB  2147MB  ext4               bios_grub
 3      4296MB  38.7GB  34.4GB
 4      38.7GB  1024GB  986GB

And here is a copy of my zdb output.

rpool:
    version: 5000
    name: 'rpool'
    state: 0
    txg: 7382685
    pool_guid: 9801629415702614327
    errata: 0
    hostid: 2831200052
    hostname: 'bigtower'
    vdev_children: 2
    vdev_tree:
        type: 'root'
        id: 0
        guid: 9801629415702614327
        children[0]:
            type: 'mirror'
            id: 0
            guid: 9234036754859107455
            whole_disk: 0
            metaslab_array: 37
            metaslab_shift: 28
            ashift: 9
            asize: 985548980224
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 107605033391006887
                path: '/dev/sdb4'
                whole_disk: 0
                DTL: 6135
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 7474249608093431881
                path: '/dev/sda4'
                whole_disk: 0
                DTL: 6133
                create_txg: 4
        children[1]:
            type: 'mirror'
            id: 1
            guid: 1585376546109541994
            whole_disk: 0
            metaslab_array: 34
            metaslab_shift: 28
            ashift: 9
            asize: 985548980224
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 4044387596452549632
                path: '/dev/sdd4'
                whole_disk: 0
                DTL: 1614
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 168694465845471017
                path: '/dev/sdc4'
                whole_disk: 0
                DTL: 1616
                create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
ztemp:
    version: 5000
    name: 'ztemp'
    state: 0
    txg: 4
    pool_guid: 14071170611957067770
    errata: 0
    hostid: 2831200052
    hostname: 'bigtower'
    vdev_children: 4
    vdev_tree:
        type: 'root'
        id: 0
        guid: 14071170611957067770
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 6092877110596562040
            path: '/dev/sda3'
            whole_disk: 0
            metaslab_array: 39
            metaslab_shift: 28
            ashift: 9
            asize: 34355019776
            is_log: 0
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 1539552750830704949
            path: '/dev/sdb3'
            whole_disk: 0
            metaslab_array: 37
            metaslab_shift: 28
            ashift: 9
            asize: 34355019776
            is_log: 0
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 8574376514556456649
            path: '/dev/sdc3'
            whole_disk: 0
            metaslab_array: 36
            metaslab_shift: 28
            ashift: 9
            asize: 34355019776
            is_log: 0
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 12062958083244608527
            path: '/dev/sdd3'
            whole_disk: 0
            metaslab_array: 34
            metaslab_shift: 28
            ashift: 9
            asize: 34355019776
            is_log: 0
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

@kernelOfTruth
Copy link
Contributor

@georgyo are you intentionally using ashift=9 ?

http://open-zfs.org/wiki/Performance_tuning#Alignment_Shift_.28ashift.29

@georgyo
Copy link

georgyo commented Oct 25, 2015

@kernelOfTruth I did not specify an ashift when I created the zpools.

The SSDs in question are Samsung 850 pros. Which as far as I can tell have a sector size of 512.
I could change it, although I can't find anything that would say that these drives have 4k or 8k sectors.

I will bump up the ashift to 13 and report the results, but it will take some time to migrate all the data around.

# lsblk -o NAME,PHY-SeC,LOG-SEC,DISC-ALN,SIZE,TYPE,MODEL -S
NAME PHY-SEC LOG-SEC DISC-ALN   SIZE TYPE MODEL
sda      512     512        0 953.9G disk Samsung SSD 850 
sdb      512     512        0 953.9G disk Samsung SSD 850 
sdc      512     512        0 953.9G disk Samsung SSD 850 
sdd      512     512        0 953.9G disk Samsung SSD 850 

behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 24, 2018
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Requires-spl: refs/pull/687/head
behlendorf added a commit to behlendorf/zfs that referenced this issue May 21, 2018
This patch contains no functional changes.  It is solely intended
to resolve cstyle warnings in order to facilitate moving the spl
source code in to the zfs repository.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#687
@Plexucra
Copy link

Plexucra commented Dec 19, 2018

The SSDs in question are Samsung 850 pros. Which as far as I can tell have a sector size of 512.
I could change it, although I can't find anything that would say that these drives have 4k or 8k sectors.
will bump up the ashift to 13 and report the results

Was this successfully? (i have the same issue)

@sachk
Copy link

sachk commented Mar 8, 2019

I'm having the exact same issue as @rbabchis, copying a file from a dataset to another folder in the dataset or to another dataset is very slow, 20-50 MB/s with rsync. Using fio set to direct gives me sequential rw at around 80 MB/s. Sequential reads and sequential writes are both around 250 MB/s. I'm using zfs 0.8.0-rc3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

8 participants