-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cumulative] buildbot crunching, proof of concept (*1), #2129, #3115 #3190
[cumulative] buildbot crunching, proof of concept (*1), #2129, #3115 #3190
Conversation
additional changes (non-upstream) fbbf1f0 details - see: openzfs#3189
weird - it builds fine here:
edit: well, never had it built with debug on - therefore the difference XD |
|
||
typedef struct arc_buf_data { | ||
#ifdef DEBUG_ABD | ||
char pad[PAGE_SIZE]; /* debug, coredumps when accessed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same DEBUG build error, originating from #2129
error: variably modified 'pad' at file scope
not sure in what (best) way to fix it - so leaving it to author
./module/zfs/arc.c: 3470: line > 80 characters make: *** [checkstyle] Error 1 program finished with exit code 2
./module/zfs/arc.c: 3470: space or tab at end of line Makefile:1259: recipe for target 'checkstyle' failed make: *** [checkstyle] Error 1 program finished with exit code 2 let's see if more comes up
@behlendorf since several of the buildbots give it green light as of the latest commit (0afeca7) e.g. do you think it would be safe to use this ? |
Running on two test hosts for >24 hours so far, nothing in dmesg, and concurrent abuse from userspace and file operations with a scrub running seem to not kill it on the workstation. Server VM running copies of large directory structures over and over also hasnt died. |
@sempervictus thanks for your trust and feedback 👍 issue #3189 is the split-out version of this cumulative pull-request I've added .diff to the address like so: https://github.com/zfsonlinux/zfs/pull/3190.diff and the resulting file is the same Thanks for referring to the merge conflict, I'm sure there will be more in the near future if it won't get merged soon So https://github.com/kernelOfTruth/zfs/tree/zfs_master_12.03.2015_2129%2B3115 should be the branch you're looking for - I've just pushed the remaining changes to get it to the state of this pull request keep in mind though, that I haven't re-integrated those last fixes into the related upstream commits yet - which I'll do once I find some more time, currently I'm occupied with other projects Cheers ! edit: I've merged the last 2 commits into one - it's really not necessary to have the build bots work on a diff of one added white space on this branch (forced push) 3189.diff & 3190.diff are still the same after this |
This is looking quite promising, and for some reason appears to have improved write speeds considerably. Here's a quick and dirty tiotest output (without @behlendorf's fix for the benchmark itself).
Here's output from a prior build with ABD but sans the lock contention patch stack:
|
@sempervictus thanks for sharing those ! - they sure look impressive 👍 did I read that right, that cpu% and sys% load significantly went down ? Appears like Random Write is quite a beast on its own and depending on the workload is hard to tune towards (well both random write & read) :( |
Just for reference this is backed by a dm-crypt wrapped Samsung 840 1T, here's the zfs modprobe.d conf:
CPU seems way down with the multilist ARC, and overall system performance under load is significantly higher - building a kernel, doing a scrub, and running a bunch of RBX code concurrently doesnt create any visible UI lag under lightdm/Unity. |
@kernelOfTruth @sempervictus thanks for the feedback and testing of these proposed changes. We definitely want to get this work merged as soon as we're comfortable with it. To do that we need to get run time on the code under a variety of workloads so I'm really happy to see you testing this. I'm always going to be reluctant to declare a patch this large completely safe. However if it passes the buildbot for your platform then it's certainly reasonably safe to run. As for the performance results that's very encouraging and pretty much inline with my expectations. What would be tremendously helpful would be to collect a variety of benchmark results from the current master sources and from this pull request. |
@behlendorf: glad to be of help. By the way, this build also survives suspend/resume on a modern xeon. I would be interested to see how this stands up on a heavily loaded Lustre MDT, or something else which takes lots of seemingly random concurrent abuse. |
very nice ! 😄 @behlendorf you're welcome - happy to be of help @sempervictus did you observe an severe overstepping of the pre-set ARC boundaries (e.g. zfs_arc_max) ? currently considering if #3202 would need integration to make this a well-rounded system - if this to get a broader testing base, this might be the case for systems running older kernels |
ARC has been stable @ 10G for me, no signs of leakage. Take this with a pile of salt: i just had my first strange behavior while running this on the very loaded workstation. System started to "jitter" a bit, lag introduced in readline and UI operations, nothing in dmesg or kern.log... so no idea if it came from here or something else destabilized. I did offload ~2G of snapshot about an hour prior, might be something to look at. Had to reboot, and not seeing it anywhere else yet. |
@sempervictus interesting, hm, let's see if it shows up again and/or is reproducible I've added most of the upstream patches to the pull-request: the l2arc changes might not be needed, the latest commit from #3202 might not be needed due to multilist ARC and other changes, commit "Fix arc_meta_max accounting" and "Restructure per-filesystem reclaim" should make things work more reliably and consistently on older kernels, also Leaving out "Fix arc_adjust_meta() behavior" out for now since that clashes with #2129 and #3115 changes, if I remember correctly from putting the pull-request together Happy testing everyone 👍 |
Havent been able to reproduce, yet... |
edit: @sempervictus argh - sorry for the noise, I have obviously a hard time to concentrate (having lots of other things in life going on) please disregard that comment about the re-introduced "MUTEX_FSTRANS", like mentioned in the other issues/pull-requests it might be a good idea to protect the mutexes - but in this context it was unrelated thanks |
4fc3330
to
0afeca7
Compare
pad[PAGE_SIZE] to pad[4096] (how about archs other than x86, x86_64 ?)
yeah, well - thank you for the delays and DDOS 👎 poor github |
@odoucet Merci beaucoup for your feedback and testing ! did you observe anything suspicious in /proc/spl/kstat/zfs/arcstats ? what antivirus are you running exactly ? do you have the following patches from #3202 in your repository or included: if it's already included - perhaps @tuxoko @dweeezil @behlendorf have an idea on how to tackle this or this rings a bell for them |
only memory usage was monitored at that time, and memory was 99% full, meaning there is an overrhead to the 160GB ARC size (other process are not consuming much).
Sorry, did not record this one.
Clamav 0.98.6 with latest updates. Scan was performed with the daemon mode of this process, and multiscan client, meaning 16 files were open at the same time. It is very few number of file descriptors imo, and i'm surprised to have a system hang with that.
No No I'm thinking I should probably test latest trunk to make sure these errors are from this pull request... |
OK good to here. So concerning this very pull request, all seems OK, perfs seems great. |
@sempervictus if the jitter re-occured during your testing you may want to take a look at: #3216 that's basically the broken-out patches from #3189 ported/rebased to a newer zfs master and yesterday added the changes from #3225 which should minimize locks of any (?) kind |
@kernelOfTruth with #3115 now merged and #2129 being refreshed I don't think there's any point in keeping this open. If you agree can you please close it out. |
@behlendorf agreed, closing Thanks |
One huge cumulative pull-request to speed up testing for the build-bots
like mentioned in #3189