Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sysctl flag to force nfsd to make all writes async even if client requests sync #208

Closed
wants to merge 1 commit into from

Conversation

bjornbouetsmith
Copy link

Hi,

I have made a simple patch for the nfs server.

This is a more "sane" version of the patch that floats around on the internet which just disables sync writes entirely.

This is a sysctl setting which can be set to either 0 or 1.

If set to 1, then the ioflags will ignore the sync request from the client.

@ghost
Copy link

ghost commented Oct 9, 2019

Hi, thank you for your suggestion. Have you considered setting the ZFS property sync=disabled on the shared dataset instead?

@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Oct 9, 2019

Yes, but that would leave everything async, that is not the point of this changeset. I just want to make this an alternative patch to the "silly" patch that people are doing, where they remove the possibility to make sync writes via NFS at all.

Furthermore tests have shown that sync=disabled on the dataset does not yield same kind of performance as forcing NFS to do only async writes.

This will make it possible to do via a sysctl setting, and will only have effect on NFS, not the entire dataset that is exposed via NFS.

It would be nice to extend this, so it could be "paths" that could be set async only.

But I am not sure how that would work with sysctl.

I guess it could be done with a combination of sysctl and lines in the nfsd.conf where you specify which paths exported that should be treated as async only.

And since this patch is "opt-in" - It would help those that have a ups backed freenas and would like to run async all the way from the NFS server to the dataset, and would have no consequence for those that do not want to run async.

@ghost
Copy link

ghost commented Oct 9, 2019

Furthermore tests have shown that sync=disabled on the dataset does not yield same kind of performance as forcing NFS to do only async writes.

Can you elaborate on this? If you have measurements I'd be interested to see them.

@bjornbouetsmith
Copy link
Author

To be honest I haven't tested it myself. I have just always been unhappy with the NFS performance in FreeNAS.

And to be fully transparent, its mostly an issue when using NFS from esxi, since it forces all writes to be sync.

This guy has done testing of a similar patch:
https://www.ateamsystems.com/tech-blog/solved-performance-issues-with-freebsd-zfs-backed-esxi-storage-over-nfs/

I am guessing that it must be all the things that happens when you ask ZFS for a sync write, like cache flushing, etc.

I have experienced abmyssal performance on NFS on FreeNAS running a 4x striped mirror of Intel Datacenter SSD's and still not being able to do more than 350MB/s on a 40Gbit connection.

No matter what kind of tweaks/sysctl changes, nothing helped - not even setting sync=forced and adding a Intel Optane 900P as a ZIL. Sync on ZFS is just slow when being triggered via NFS.

@dlavigne dlavigne requested review from amotin and a user October 20, 2019 13:52
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it's an improvement from the original patch you referenced, it still disables sync for all NFS clients. Would it be possible to leverage the existing mechanism for disabling sync through *stable so that this can be configured on a per-share basis through the exports file?.

@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Oct 21, 2019

@freqlabs Yes it would be nice to have it per share basis for sure, but that would complicate the patch quite a bit, since the code I have changed do not have direct access to the configuration.

But if we can get the NFS Server to pass in whether or not the given share should respect sync writes or not via the stable parameter, then it would be nice.

Then this sysctl flag would not be needed, unless you wanted all your shares to be async forced.

But I do not see any integration in FreeNAS where you can set a NFS share as async only, so a change to this would require that that part be implemented first.

But when that has been implemented and FreeNAS passes in the correct flags when it creates the exports file, then it would work just out of the box I am guessing.

Unless the nfs server would prioritise what the client sends over how the share is configured.

i.e. client has defined its share with sync and server with async.

@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Oct 21, 2019

@freqlabs
An alternative to using the stable parameter if the nfs server overwrites the configuration on the share with what the client sends would to implement a sysctl flag for each share that would be created when the nfs server starts and exports the shares.

i.e. vfs.nfds..ignore_sync_writes

Then the "only" challenge would be to know what share the write request is part of, but that is propably relatively easy

so pseudo code:

//read share name from write request

if (vfs.nfds.[share].ignore_sync_writes == 1)
{
syncflags = IO_NODELOCKED;
}

But this kind of patch can be added as an extra patch easily, since it ties easily into the existing patch and would just be an alternative to doing it for all shares.

So I would really this my original patch merged and then I would be happy to work on a different patch that would enable a per share enabling/disabling of this

@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Oct 21, 2019

and to update on numbers:
I have just done some testing on my own FreeNAS box, where NFS from ESXi is compared to ISCSi.

I have a pool with 8 intel datacenter SSD's set up in a 4xmirror.-

When writing via NFS (ESXi) I can maximum write 350MB/s sequentially, no matter what tweaks I do on a 40Gb network.

When doing the same via ISCSi from a windows machine running as a VM on EXSi, I can write 3500MB/s sequentially.

Same settings on the dataset, compression enabled, sync=standard - and both writes 0's, so even if ESXi sends utter zeroes, NFS somehome trips ZFS.

And though I don't expect quite the same performance from a patched NFS I am sure it will be better than the existing and worth it.

Unfortunately I don't have a clue on how to set up a FreeNSD build environment where I can build this patch, otherwise I would be more than happy to test a patched NFS server on my production FreeNAS, just so we can get some numbers to compare with, i.e.

sync=disabled on dataset
sync=forced on dataset (with zil)
patch enabled

@ghost
Copy link

ghost commented Oct 21, 2019

Thank you for your contribution. Among the people I discussed this with internally, there are none in favor of carrying the extra diff against upstream FreeBSD. We do not recommend ignoring sync, it is demanded for a good reason. If there is a performance bug in the NFS server, this is not the correct way to fix it.

If you are interested in setting up a FreeNAS build system for your personal development and testing, please see https://github.com/freenas/build (I recommend at least doubling the minimum hardware requirements it describes if you have the option).

@ghost ghost closed this Oct 21, 2019
@bjornbouetsmith
Copy link
Author

bjornbouetsmith commented Oct 21, 2019

Ok,

That was extremely dissapointing - and from my point of view a bit close minded of you.

You should not be the judge of how people treat their NAS.

If you run with battery backed SSD's, UPS, redundant power supplies, the risk is close to zero that anything bad would happen.

And in no way would it risk the underlying pool, it would only risk the files that was written from the network at the time in case the NFS Server or FreeBSD crashed fatally.

its a tiny opt-in feature, meaning it would have no impact unless people actively added the sysctl flag.

Furthermore it would literally take less than a minute to merge in case any conflict occured with the upstream FreeBSD.

So that means if I want this into FreeNAS, I wold have to get the patch into the FreeBSD source code and then it would automatically be merged into FreeNAS at some point?

How often do you merge changes from the FreeBSD version you have branched from?

@ghost
Copy link

ghost commented Oct 21, 2019

That was extremely dissapointing

Sorry to disappoint you.

You should not be the judge of how people treat their NAS.
the risk is close to zero

I'm not vehemently opposed to the idea of there being some mechanism for ignoring sync if desired. However, the proposed solution is sloppy. You have improved upon the inflexibility of the original patch, but it still introduces a global setting that enables dangerous behavior. I would much prefer to see this implemented in a way that can be controlled from the export options so it is properly integrated with the system.

This feature is best discussed with the FreeBSD project, where hopefully you can get feedback from developers responsible for the NFS code in question.

it would literally take less than a minute to merge

With an entire operating system like FreeNAS, the little patches here and there all add up. Adding another one for something we don't use and don't recommend using is counter to our ongoing efforts to reduce our patch set. A drop in the bucket yes, but unjustified nonetheless.

How often do you merge changes from the FreeBSD version you have branched from?

@amotin can correct me if I'm wrong, but I don't believe we have a fixed merge schedule. Checking through the commit history I see sometimes merges happen a month apart, sometimes days apart.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant