-
Notifications
You must be signed in to change notification settings - Fork 178
tsd_hash_search() GPF #174
Comments
This is the first report I've seen of problems in the tsd code. Is there a particular workload your using to trigger this issue? |
No, I'm afraid I've not been able to pin it down. It seems random, maybe once every three months per machine .. I mainly use AMD chips, but I've seen it on a Dell R710 (Intel/16 core) too ... [I have 1 machine thus far I've "not" see it on, but that's only been up for 6 weeks ..] The only think I can think is that they're all running KVM instances to varying degrees .. this particular machine is serving data via two VPS's running gluster. Other machines are running KVM off QCOW2 instances sat directly on ZFS partitions. All machines use SSD root disks .. machines are all between 1 & 3 years old .. , mostly single chip / 6-core, although the Dell was dual chip / 8-core. Nothing particularly exotic, all have multiple NIC's of different makes, a mixture of different motherboards and disk controllers .. mixture of disks .. all are running Ubuntu 11.04 and 12.04, which means it spans different kernels .. Don't know if any of that helps ?? fyi; previously most of these machines were running Software RAID10 for 1 year+ without incident ... |
Ok, another one .. different box with similar specification ... frequency seems to be after 20 days uptime ... Oct 10 15:05:47 data2 kernel: [1823996.336021] BUG: soft lockup - CPU#3 stuck for 22s! [kvm:14086] |
Hey guys, As directed from the mailing list, here are the details of my case which seems to be related to this same issue: Since recently we've been playing around with a new hardware configuration here and as a part of I have an easily reproducible kernel panic/oops which I can reproduce Kernel oops:
Setup: SuperMicro X9DRE-TF+ motherboard, 2 x E5-2620, 24 x 8G Kingston DDR3, I have 2 raidz2 pools: ssd1 comprised of all the 8 ssds and sata1 comprised Steps to reproduce: I run sysbench fileio random read/write tests for about 10-15 minutes and I get the above oops. cd /ssd1
sysbench --num-threads=12 --max-requests=50000 --test=fileio --file-num=4096 --file-block-size=64K --file-test-mode=rndrw --file-fsync-freq=1 prepare
sysbench --num-threads=12 --max-requests=50000 --test=fileio --file-num=4096 --file-block-size=64K --file-test-mode=rndrw --file-fsync-freq=1 run You might have to execute the second run line a few times in a row in order to get the oops. Doesn't always happen the first time. Between runs I do zfs export ssd1 && zfs import ssd1 to clear cache. We tested the following kernels with ZFS incorporated into the kernel (not as a module): With ZFS 0.6.0-rc12 sources we tried kernels: The crash seem to be the same regardless of the kernel. Another issue I noticed while running this comes from the test itself. It complains with this:
It could be test itself but I decided to post it here since it might be related. Notice the garbage in the brackets! I would appreciate if someone could take a look at this. I am ready to provide further details if needed. Regards, Rumen Telbizov |
@telbizov I've been unable to reproduce the issue locally, even with your test case. However, the above patch should resolve the issue since it effectively retires the offending code which isn't needed. Can you please apply the patch and verify the GPF is resolved. If the |
Hi Brian, Thank you very much for providing this patch. Due to the semi-intermittent nature of the problem we decided to run I'll report back here on Monday. Hopefully the news will be just as good. I wonder if other participants, involved in this issue might want to give Once again, thank you very much for your quick response and help. Regards, Rumen Telbizov |
Brian, Here's the update that I promised. It would be nice to see other people's experience with this patch but as far as I can tell Brian, do you intend to merge this change into the master branch? Thank you once again for your prompt response and the patch. Rumen Telbizov, |
@telbizov That's good news. A little more digging has turned up that this code was originally added to reduce lock contention on the zl_lock from But the test workload was running the filebench varmail tests and observing very slow fsync() times. Would it be possible for you to test this workload on your system with and without the patch so we know what the impact of this will be. Original Sun issue: 6535160 lock contention on zl_lock from zil_commit |
It's my understanding that the zfs_fsyncer_key TSD was added as a performance omtimization to reduce contention on the zl_lock from zil_commit(). This issue manifested itself as very long (100+ms) fsync() system call times for fsync() heavy workloads. However, under Linux I'm not seeing the same contention that was originally described. Therefore, I'm removing this code in order to ween ourselves off any dependence on TSD. If the original performance issue reappears on Linux we can revisit fixing it without resorting to TSD. This just leaves one small ZFS TSD consumer. If it can be cleanly removed from the code we'll be able to shed the SPL TSD implementation entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/spl#174
Merged in to master as commit openzfs/zfs@31f2b5a , I wasn't able to observe a significant difference in fsync(2) performance with or without the patch. If other notice a significant performance regression we can revisit this issue then. |
Sorry for my late response. I see that you've closed the issue but I did run the suggested test against both my ssd and sata zpools using kernels with and without your patch. Here are the details: Kernel 2.6.32-279.14.1.el6 [compiled with your patch, i.e. against issue-174]sata1, raidz2 of 8 sata disks
ssd1, raidz2 of 4 ssd disks
Kernel 3.7.0 [compiled without your patch - straight against github master before you merged issue-174]sata1, raidz2 of 8 sata disks
ssd1, raidz2 of 4 ssd disks
So, it does look like you are reverting some sort of a "performance fix" related to bugid 6535160 with your issue-174 branch. At least that's what it seems to me. plain:
issue-174:
The SSD zpool performance doesn't really change much between tests but on the other hand without your patch (issue-174) the SATA zpool performs almost as fast as the ssd pool! This is strange and seems unreal. As if there is some heavy caching effect taking place before your patch. Let me know what you think. It definitely looks like regression, although I am not sure if it's for better or worse. I'll run additional tests if needed. Maybe we need to reopen this ticket? Thanks you Brian. Rumen Telbizov |
@telbizov That's for running these benchmarks, your absolutely right there's a regression here so I'm reopening the issue. I've posted a new patch which readds the optimization as per-file instead of pre-thread. Can you please verify it fixes the performance regression. |
Thank you for reopening the issue and continuing to looking into it. I will recompile a new kernel against the latest Another thing I keep wondering is: can you explain this insane performance boost that I see when testing SATA disks? How could it possible be anywhere close to SSD, unless there's heavy caching involved? Cheers, Rumen Telbizov |
@telbizov Due to the performance regression I'm going to revert the original fix for this issue from the master branch. Once we're happy the the revised fix doesn't suffer from a similar problem we'll merge it. Until then you'll want to run with a locally patched version to avoid the panic which your workload seems particularly good at hitting.
Thank you.
It won't. The panic you were hitting was in the thread specific data implementation itself. With the new patch applied that code is still no longer used so we can't really hit that issue. Things may in fact perform slightly better because the new approach sheds some additional locking overhead.
The issue here is really lock contention in zil_commit() on the zr_lock and not disk speed. This optimization is designed to minimize that contention by falling back to synchronous write I/Os when it detects a thread which is calling fsync(2) constantly. This effectively reduces the contention and improves the throughput. The updated patch simply changes the existing logic to detect contention on a per-file instead of per-thread basic. |
The zfs_fsyncer_key TSD was added as a performance optimization to reduce contention on the zl_lock from zil_commit(). This issue manifested itself as very long (100+ms) system call times for fsync(2) heavy workloads. However, we'd like to eliminate the usage of TSD in ZoL while keeping this optimization. There is no native Linux kernel equivilant of TSD and the SPLs implementation is primitive. It would simply be best to remove all the TSD consumers. To achieve this the zfs_fsyncer_key TSD counter has been reworked to be per-file instead of per-thread. This is not expected to behave significantly differently than the original implementation for real workloads. This just leaves one small ZFS TSD consumer. If it can be cleanly removed from the code we'll be able to shed the SPL TSD implementation entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs/spl#174
Thanks for your continuing efforts Brian. Here are the results of my tests. Performance test: probing for regression in fsync(2) times compare to plain test abovesata1 (raidz2 8 sata 2.5" disks)
So it seems to me that yes, there is some performance improvement but it's not on par with the initial "plain" kernel without your patches. Here are the extracts: plain
fsync(2) per file patch
your original patch (TSD code removed)
ssd1 (raidz2 4 ssd OCZ disks)
So SSD test - as good as always. Stability testingI'll run my original sysbench test which was causing kernel panics in the first place over the weekend and make sure that we haven't regressed back in this aspect either. I'll report my results on Monday, 24th. Regards. Rumen Telbizov |
That's surprising, but that's why we run the tests. OK, so there's clearly more work to do here. Perhaps I will look in to just fixing the TSD issue. |
Brian, Just wanted to report on my second part of the last test - stability. It seems like on this front we're still OK. I've been running my sysbench, stability test for the weekend and the machine has been just fine - no kernel panics. Merry Christmas Rumen Telbizov |
Hello I can confirm that without that patch my 2 servers are crashing, but Once, I managed to get a stack trace (included below) Problem is easy to reproduce. Just start dbench with 300-400 processes and wait 15-20 minutes. Both servers are exactly the same: System Hardware:
|
@telbizov @azaroth Thanks for the dbench reproducer. I was able to recreate the issue fairly easily with the right options 1d6e830 Fix tsd_get/set() race with tsd_exit/destroy() |
This fix has been merged in to master. 6ef94aa Fix tsd_get/set() race with tsd_exit/destroy() |
@behlendorf Thanks for reproducing the issue yourself and even better - fixing it. Unfortunately at this time I no longer have that test setup that I was playing with before and cannot reproduce it right now. Soon enough though I might get a similar machine and will be able to try and test the now master branch which contains your fix. Once again, thank you for your efforts. Regards, Rumen Telbizov |
It's my understanding that the zfs_fsyncer_key TSD was added as a performance omtimization to reduce contention on the zl_lock from zil_commit(). This issue manifested itself as very long (100+ms) fsync() system call times for fsync() heavy workloads. However, under Linux I'm not seeing the same contention that was originally described. Therefore, I'm removing this code in order to ween ourselves off any dependence on TSD. If the original performance issue reappears on Linux we can revisit fixing it without resorting to TSD. This just leaves one small ZFS TSD consumer. If it can be cleanly removed from the code we'll be able to shed the SPL TSD implementation entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/spl#174
Hi, I've now seen this issue on 4 different servers, seems to happen at random with no pattern / frequency that I can spot - other than they all see the problem.
Using Ubuntu 11.04 and Ubuntu 12.04, PPA.
spl-dkms (2 0.6.0.71)
Tends to render the system "unhappy" although it keeps running .. although all ZFS volumes seem to die ...
The text was updated successfully, but these errors were encountered: