-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XDR: fsync on close (also bucket dir), with option to disable. #2204
Conversation
I think that the file and dir flush code have to be moved into fs.cpp as there is non portable things in there: (at a minimum turn those into noop on platforms that don't support those things) on Windows, you have to rename with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just need to adjust the PR to not break Windows (and possibly other OSes)
af9ef4d
to
372c14f
Compare
Pushed update to have windows variants fleshed out. Will do some testing on OSX and BSD tomorrow. |
OSX uncovered a bug. pushed fix. moving to BSD. |
BSD works (well -- there's a failure but it is also on master, no change with this PR). @MonsieurNicolas I think this is ready to land unless you have any other nits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only one minor miss from what I see (can't really use the new flag)
r+ 7cafbba |
XDR: fsync on close (also bucket dir), with option to disable. Reviewed-by: MonsieurNicolas
Description
Resolves #2202
Checklist
clang-format
v5.0.0 (viamake format
or the Visual Studio extension)Performance is worse but not hugely. Benchmarks that hammer on the bucket system non-stop run about 10x slower wall time on local NVMe and 100x worse on a spindle. And at the very smallest end of the merge times, slowdown is indeed also 10x-100x: 1ms merge becomes 10ms on NVMe or even 100ms on spindle. At level 0, the 100x case is still only taking 1-3% of allowed time for the merge -- we had a ton of headroom -- but it's worse for sure.
However, it also amortizes-away pretty smoothly as we get into bigger merges. By level 4 we're back down to 0.03% of available time and at level 5 down to 0.009%. That is: the fsync is adding a definitely-noticable 100ms to a merge running on the spindle, but it's not adding proportionally much more as we scale to somewhat larger buckets.
To measure whether this vanishes into the noise completely when scaling up (or if, say, full buffers / actually having some real data to write changes things again), I made a synthetic 1.4gb XDR test and ran it against the spindle in a loop. This gives not super wonderful but also not super deadly results. The large files do definitely show different effects that are not "constant 100ms totally vanishing into the noise":
I.e. varies between about 1.5x and 3x.
Back in the world of NVMe (which we certainly recommend people use for their buckets) this sort of thing is much less severe, though still measurable: