Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very low Samba performance when primarycache=metadata #1773

Closed
kaazoo opened this issue Oct 8, 2013 · 16 comments
Closed

Very low Samba performance when primarycache=metadata #1773

kaazoo opened this issue Oct 8, 2013 · 16 comments
Labels
Type: Documentation Indicates a requested change to the documentation Type: Performance Performance improvement or performance problem
Milestone

Comments

@kaazoo
Copy link

kaazoo commented Oct 8, 2013

I have the following system:

  • Dell PowerEdge R420
  • 2x Xeon E5-2430L CPU
  • 32 GB RAM (arc-max set to 28 GB)
  • Ubuntu 12.04 x64, kernel 3.2.0-54-generic
  • ZFS / SPL package version '0.6.2-1~precise'
  • 3 mirror-pools (24, 24, 36 TB) consisting of 6 easyRAID FibreChannel storage devices (internal hardware RAID-6)
  • around 52 TB allocated disk space
  • total of 1064 filesystems and 210779 snapshots
  • runs Samba version 3.6.3-2ubuntu2.8

I noticed that Samba client access is not very 'snappy' when browsing through the top level directories of a Samba share. There are random delays, allthough throughput is OK.

Because ZFS will never be able to cache a big portion of 52 TB allocated disk space with only 28 GB of RAM for ARC, I thought it might be faster to set primarycache=metadata for all filesystems.
But that resulted in a drop from ~ 42 MB/s to ~ 2.5 MB/s when copying a 230 MB file.

Switching back to primarycache=all directly brought the Samba performance back.

Are you aware of this problem?

@behlendorf
Copy link
Contributor

It sounds as is Samba is regularly accessing data which is no longer cached. This could happen for a variety of reasons and it's hard to say exactly why without digging deeper in to what Samba is doing. However, since ZFS will already preferentially cache meta data in favor of file data I think you'll want to leave primarycache=all set regardless.

@GregorKopka
Copy link
Contributor

@behlendorf with primarycache=metadata ARC should ignore data, could this effectively disable prefetch?

@behlendorf
Copy link
Contributor

Yes. I'd only expect the metadata prefetching to work in that case.

@GregorKopka
Copy link
Contributor

This would explain why samba performance goes south: no prefetch from zfs = every single read gets a fresh roundtrip through the disk...

@unicolet
Copy link

I am experiencing the same issue with samba on top of zfs (centos 6 with zol 6.1). The fix for me was to add:

    socket options = IPTOS_LOWDELAY TCP_NODELAY
    max xmit = 65536

to smb.conf.
No other tuning done on zfs side, directory browsing improved a lot.

@behlendorf
Copy link
Contributor

@unicolet Thanks for the hint, although it's not at all clear to me why ZoL would need those tuning when other Linux FSs do not. Any ideas?

@unicolet
Copy link

@behlendorf no idea really, but for what it's worth I straced a samba pid while opening a directory (that I had already browsed, so at least in theory it should have been in the cache). This is a list, sorted by the ascending number of calls that occcur during a change directory from windows explorer on an XP client:

  1 17655
  1 inotify_add_watch
  1 inotify_rm_watch
  1 ioctl
  3 setgroups
  4 getxattr
  4 setregid
  4 setreuid
  8 getegid
  8 geteuid
 10 close
 10 open
 12 getdents
187 writev
190 select
192 getcwd
216 fcntl
346 stat
377 read
611 lstat

Could it be that lstat is slow in ZoL and it stalls network connectivity as samba would in fact lstat a directory and then send a bit of information over the network? The options above reduce the simptoms, but do not cure the issue.

@unicolet
Copy link

this ZoL setting improved snappines even more, should be the default:

zfs set xattr=sa tank/fish

@behlendorf
Copy link
Contributor

@unicolet It probably will be one dataset level feature flags are implemented.

@behlendorf
Copy link
Contributor

It's my understanding that the tuning suggested above have resulted in improved performance. Things should continue to gradually as we incrementally improve performance. Therefore I'm closing this out.

@computator
Copy link

computator commented Oct 24, 2017

I am not sure that this is actually solved because I just ran into this myself. I have several TB of large files that are infrequently accessed. I figured I would set primarycache=metadata to skip caching the file data since the files are unlikely to be accessed repeatedly. I turned it on and left it, but just discovered that I was only getting 4MB/s read speed vs the normal 180+MB/s. After some debugging I found the culprit to be primarycache=metadata. As soon as I set it back to all I was getting full speed again. This is definitely unrelated to the network because I was testing directly from disk using dd (with bs=4k) to check speeds.

Edit: That also rules out metadata such as stat() being an issue.

@gmelikov
Copy link
Member

@rlifshay this topic should be discussed at mailing lists, but we definitely need some more information (zfs get all poolname, zpool get all poolname for start).

@computator
Copy link

I am a developer myself, so I made a reproducible Vagrant environment to test this. I have attached my Vagrantfile so you can easily duplicate this yourself and inspect it (Github wouldn't let me upload it as is, so just remove the .txt extension to use it with vagrant). I have also included the pool and filesystem properties from the test pool. I can do the same for the pool I originally found the issue with if that is useful.

ubuntu@ubuntu-xenial:~$ sudo zfs get all testpool
NAME      PROPERTY              VALUE                  SOURCE
testpool  type                  filesystem             -
testpool  creation              Tue Oct 24 22:15 2017  -
testpool  used                  1.00G                  -
testpool  available             3.81G                  -
testpool  referenced            1.00G                  -
testpool  compressratio         1.00x                  -
testpool  mounted               yes                    -
testpool  quota                 none                   default
testpool  reservation           none                   default
testpool  recordsize            128K                   default
testpool  mountpoint            /testpool              default
testpool  sharenfs              off                    default
testpool  checksum              on                     default
testpool  compression           off                    default
testpool  atime                 on                     default
testpool  devices               on                     default
testpool  exec                  on                     default
testpool  setuid                on                     default
testpool  readonly              off                    default
testpool  zoned                 off                    default
testpool  snapdir               hidden                 default
testpool  aclinherit            restricted             default
testpool  canmount              on                     default
testpool  xattr                 on                     default
testpool  copies                1                      default
testpool  version               5                      -
testpool  utf8only              off                    -
testpool  normalization         none                   -
testpool  casesensitivity       sensitive              -
testpool  vscan                 off                    default
testpool  nbmand                off                    default
testpool  sharesmb              off                    default
testpool  refquota              none                   default
testpool  refreservation        none                   default
testpool  primarycache          metadata               local
testpool  secondarycache        all                    default
testpool  usedbysnapshots       0                      -
testpool  usedbydataset         1.00G                  -
testpool  usedbychildren        156K                   -
testpool  usedbyrefreservation  0                      -
testpool  logbias               latency                default
testpool  dedup                 off                    default
testpool  mlslabel              none                   default
testpool  sync                  standard               default
testpool  refcompressratio      1.00x                  -
testpool  written               1.00G                  -
testpool  logicalused           1.00G                  -
testpool  logicalreferenced     1.00G                  -
testpool  filesystem_limit      none                   default
testpool  snapshot_limit        none                   default
testpool  filesystem_count      none                   default
testpool  snapshot_count        none                   default
testpool  snapdev               hidden                 default
testpool  acltype               off                    default
testpool  context               none                   default
testpool  fscontext             none                   default
testpool  defcontext            none                   default
testpool  rootcontext           none                   default
testpool  relatime              on                     temporary
testpool  redundant_metadata    all                    default
testpool  overlay               off                    default
ubuntu@ubuntu-xenial:~$ sudo zpool get all testpool
NAME      PROPERTY                    VALUE                       SOURCE
testpool  size                        4.97G                       -
testpool  capacity                    20%                         -
testpool  altroot                     -                           default
testpool  health                      ONLINE                      -
testpool  guid                        6865565991240311730         default
testpool  version                     -                           default
testpool  bootfs                      -                           default
testpool  delegation                  on                          default
testpool  autoreplace                 off                         default
testpool  cachefile                   -                           default
testpool  failmode                    wait                        default
testpool  listsnapshots               off                         default
testpool  autoexpand                  off                         default
testpool  dedupditto                  0                           default
testpool  dedupratio                  1.00x                       -
testpool  free                        3.97G                       -
testpool  allocated                   1.00G                       -
testpool  readonly                    off                         -
testpool  ashift                      0                           default
testpool  comment                     -                           default
testpool  expandsize                  -                           -
testpool  freeing                     0                           default
testpool  fragmentation               12%                         -
testpool  leaked                      0                           default
testpool  feature@async_destroy       enabled                     local
testpool  feature@empty_bpobj         enabled                     local
testpool  feature@lz4_compress        active                      local
testpool  feature@spacemap_histogram  active                      local
testpool  feature@enabled_txg         active                      local
testpool  feature@hole_birth          active                      local
testpool  feature@extensible_dataset  enabled                     local
testpool  feature@embedded_data       active                      local
testpool  feature@bookmarks           enabled                     local
testpool  feature@filesystem_limits   enabled                     local
testpool  feature@large_blocks        enabled                     local
ubuntu@ubuntu-xenial:~$ 

@gmelikov
Copy link
Member

@rlifshay please read this topic, the answer was already here: zfs set xattr=sa testpool is a must have on Linux, if you won't use ZFS pool on non-Linux systems.

Additionally, I advice you to change zfs set atime=off testpool too.

@computator
Copy link

I did read the entire thread before I posted. I did not try adjusting either of those properties previously because they are both related to file attributes, not file data, and the issue is with read performance from a single file (as opposed to recursively listing files or something). I have modified my vagrantfile to set both of those properties just to make sure, but they had no effect. Here is a link to the updated Vagrantfile.

@gmelikov
Copy link
Member

@rlifshay dd with bs=4k on disabled cache is the worst case for recordsize=128k zfs pool, you have to increase bs read size to get moderate performance, or make recordsize smaller.

IIRC dd bs=4k fetches data by 4k block without any buffer, it's sequential, but it won't work well without cache, ZFS will refetch it.

Your vagrant file uses dd, not a samba, it's not an identical workload.

Please, use our mailing lists for support questions, it's not a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Indicates a requested change to the documentation Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

6 participants