New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sysctl flag to force nfsd to make all writes async even if client requests sync #208
Add sysctl flag to force nfsd to make all writes async even if client requests sync #208
Conversation
Hi, thank you for your suggestion. Have you considered setting the ZFS property sync=disabled on the shared dataset instead? |
Yes, but that would leave everything async, that is not the point of this changeset. I just want to make this an alternative patch to the "silly" patch that people are doing, where they remove the possibility to make sync writes via NFS at all. Furthermore tests have shown that sync=disabled on the dataset does not yield same kind of performance as forcing NFS to do only async writes. This will make it possible to do via a sysctl setting, and will only have effect on NFS, not the entire dataset that is exposed via NFS. It would be nice to extend this, so it could be "paths" that could be set async only. But I am not sure how that would work with sysctl. I guess it could be done with a combination of sysctl and lines in the nfsd.conf where you specify which paths exported that should be treated as async only. And since this patch is "opt-in" - It would help those that have a ups backed freenas and would like to run async all the way from the NFS server to the dataset, and would have no consequence for those that do not want to run async. |
Can you elaborate on this? If you have measurements I'd be interested to see them. |
To be honest I haven't tested it myself. I have just always been unhappy with the NFS performance in FreeNAS. And to be fully transparent, its mostly an issue when using NFS from esxi, since it forces all writes to be sync. This guy has done testing of a similar patch: I am guessing that it must be all the things that happens when you ask ZFS for a sync write, like cache flushing, etc. I have experienced abmyssal performance on NFS on FreeNAS running a 4x striped mirror of Intel Datacenter SSD's and still not being able to do more than 350MB/s on a 40Gbit connection. No matter what kind of tweaks/sysctl changes, nothing helped - not even setting sync=forced and adding a Intel Optane 900P as a ZIL. Sync on ZFS is just slow when being triggered via NFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it's an improvement from the original patch you referenced, it still disables sync for all NFS clients. Would it be possible to leverage the existing mechanism for disabling sync through *stable
so that this can be configured on a per-share basis through the exports file?.
@freqlabs Yes it would be nice to have it per share basis for sure, but that would complicate the patch quite a bit, since the code I have changed do not have direct access to the configuration. But if we can get the NFS Server to pass in whether or not the given share should respect sync writes or not via the stable parameter, then it would be nice. Then this sysctl flag would not be needed, unless you wanted all your shares to be async forced. But I do not see any integration in FreeNAS where you can set a NFS share as async only, so a change to this would require that that part be implemented first. But when that has been implemented and FreeNAS passes in the correct flags when it creates the exports file, then it would work just out of the box I am guessing. Unless the nfs server would prioritise what the client sends over how the share is configured. i.e. client has defined its share with sync and server with async. |
@freqlabs i.e. vfs.nfds..ignore_sync_writes Then the "only" challenge would be to know what share the write request is part of, but that is propably relatively easy so pseudo code: //read share name from write request if (vfs.nfds.[share].ignore_sync_writes == 1) But this kind of patch can be added as an extra patch easily, since it ties easily into the existing patch and would just be an alternative to doing it for all shares. So I would really this my original patch merged and then I would be happy to work on a different patch that would enable a per share enabling/disabling of this |
and to update on numbers: I have a pool with 8 intel datacenter SSD's set up in a 4xmirror.- When writing via NFS (ESXi) I can maximum write 350MB/s sequentially, no matter what tweaks I do on a 40Gb network. When doing the same via ISCSi from a windows machine running as a VM on EXSi, I can write 3500MB/s sequentially. Same settings on the dataset, compression enabled, sync=standard - and both writes 0's, so even if ESXi sends utter zeroes, NFS somehome trips ZFS. And though I don't expect quite the same performance from a patched NFS I am sure it will be better than the existing and worth it. Unfortunately I don't have a clue on how to set up a FreeNSD build environment where I can build this patch, otherwise I would be more than happy to test a patched NFS server on my production FreeNAS, just so we can get some numbers to compare with, i.e. sync=disabled on dataset |
Thank you for your contribution. Among the people I discussed this with internally, there are none in favor of carrying the extra diff against upstream FreeBSD. We do not recommend ignoring sync, it is demanded for a good reason. If there is a performance bug in the NFS server, this is not the correct way to fix it. If you are interested in setting up a FreeNAS build system for your personal development and testing, please see https://github.com/freenas/build (I recommend at least doubling the minimum hardware requirements it describes if you have the option). |
Ok, That was extremely dissapointing - and from my point of view a bit close minded of you. You should not be the judge of how people treat their NAS. If you run with battery backed SSD's, UPS, redundant power supplies, the risk is close to zero that anything bad would happen. And in no way would it risk the underlying pool, it would only risk the files that was written from the network at the time in case the NFS Server or FreeBSD crashed fatally. its a tiny opt-in feature, meaning it would have no impact unless people actively added the sysctl flag. Furthermore it would literally take less than a minute to merge in case any conflict occured with the upstream FreeBSD. So that means if I want this into FreeNAS, I wold have to get the patch into the FreeBSD source code and then it would automatically be merged into FreeNAS at some point? How often do you merge changes from the FreeBSD version you have branched from? |
Sorry to disappoint you.
I'm not vehemently opposed to the idea of there being some mechanism for ignoring sync if desired. However, the proposed solution is sloppy. You have improved upon the inflexibility of the original patch, but it still introduces a global setting that enables dangerous behavior. I would much prefer to see this implemented in a way that can be controlled from the export options so it is properly integrated with the system. This feature is best discussed with the FreeBSD project, where hopefully you can get feedback from developers responsible for the NFS code in question.
With an entire operating system like FreeNAS, the little patches here and there all add up. Adding another one for something we don't use and don't recommend using is counter to our ongoing efforts to reduce our patch set. A drop in the bucket yes, but unjustified nonetheless.
@amotin can correct me if I'm wrong, but I don't believe we have a fixed merge schedule. Checking through the commit history I see sometimes merges happen a month apart, sometimes days apart. |
Hi,
I have made a simple patch for the nfs server.
This is a more "sane" version of the patch that floats around on the internet which just disables sync writes entirely.
This is a sysctl setting which can be set to either 0 or 1.
If set to 1, then the ioflags will ignore the sync request from the client.