-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task l2arc_feed:1815 blocked for more than 120 seconds. #13435
Comments
I'd suggest using 2.1.4, for one thing - there've been a lot of random fixes that might or might not make a difference. (Maybe #12365, for example, or another L2ARC removal race that I don't have the link for offhand.) |
@rincebrain Thanks a lot. I find this is an real issue. sometime, it will keep printing this with no end. I have to do hard reset on the system, repeating a couple of times and then it pass. I will upgrade to 2.1.4 to see if it helps. |
If I am lucky enough, get the system boot, the boot is almost unusable. It keep freezing now and then. a quick fio test show miserable result
|
If it's flooding your system with errors about the l2arc, if 2.1.4 doesn't help, I'd try removing the l2arc from the pool? (If you can't get it to respond well enough to do that, you could try just offlining it then removing it if that works, or you can physically detach the l2arc and then import the pool.) What's the pool layout? Was this working until recently or after some update, or is this a new setup? |
I have removed both logs and l2arc device from the pool, that make no difference at all. I am struggling in upgrading to 2.1.4. because I have issues to get everything compiled and installed. The setup is pretty simple, a raid0 with ssd partition as cache and logs device
It was working great at beginning, then I start to see the filesystem frozen now and then. Up to now it is becoming unusable. I thought it was disk fault, run a couple of scrub, but find no errors |
The thread is blocked yes but what are the other cpus doing when that is printed though. Guessing something else borked. |
@liyimeng I'm suspecting some issue with one of the drives in your raid0 type stripe, I could be wrong, did smartctl show anything interesting? |
My driver is behind a hard raid controller, which dose not support passthrough. I therefore configure raid0 for each disk, and the controller hide all smartctl metrics aways :(
However, when I was hacking around, I accidently screw the raid confirmation in the controller, make one of disk become available, I therefore have to destroy the pool and rebuild it with the same setup. Now it is back into normal again. .e.g. I can get 170MB (vs 35KB) of 4k random write now. Lets see if the number will drop after a while |
@liyimeng Depending on your RAID controller card, you might be able to easily "reflash" the firmware on it to put it into "IT" mode instead of RAID mode if you're not using the RAID capabilities of the RAID controller itself and using RAID in software. |
@jittygitty Thanks for the hint! It is very tempating. I have this raid card running on an old Intel server(https://ark.intel.com/content/www/us/en/ark/products/56253/intel-server-board-s2600gz.html), the model number is LSI SAS 2208.
How risky for re-flashing the firmware. I am afraid if I brick it, it will be difficult to find a replacement since it is so old one. |
@liyimeng Lol I was gonna say just buy a cheap one on Ebay first to experiment flashing with, but Ebay is completely down: I've done it before on some Dell version of LSI card, was long ago but recall it was pretty easy and I don't think its very risky. But of course don't take my word for it or blame me after you brick it :D You can probably get help with that from the guys on the servethehome forum etc: |
@jittygitty Thanks, I need to think twice :) It is only server that I can perform some work at the moment. I will wait for a while to try that out. |
@jittygitty According to https://forums.servethehome.com/index.php?threads/the-complete-lsi-avago-broadcom-sbr-download-megathread.33607/post-326204 Meanwhile, I find your speculation would be writing, I am obersving slowing down on the pool just after two days of use. |
@liyimeng I think its doable, since Intel ones are being sold on ebay already flashed to IT mode specifically advertised for ZFS: If you do recreate your stripe yet again to try and pair with a different drive maybe, you can try connecting to onboard motherboard SATA and try get some smartctl readings for now that way. |
@jittygitty Thanks a lot. It is very nice of you. I tried this https://plone.lucidsolutions.co.nz/hardware/sas-controller/lsi-2208/configuring-jbod-with-a-lsi-2208-controller/view The author warn that is not stable. I don't know what exactly he mean. In my case, I am not possible boot OS for disk in the raid. I workaround it by booting from disk out of raid instead. Now I can run smartctl on disks now :D I don't know how to read these output. Meanwhile, I am still wokring on upgrading to 2.1.4.
|
Have had the same problem for a long time that boot takes up to 20 minutes on a small server with 8 disk raidz2 and 2x nvme cache. Always updating to latest zfs version but have not seen noticeable changes up to including 2.1.5. [ 1088.601676] kernel: INFO: task l2arc_feed:1431 blocked for more than 966 seconds. And the "a start job is running for import zfs pools by cache file" message. Today installed zfs 2.1.6 and problem is completely gone, boot time in few seconds! |
it is kind of random, I am still on 2.1.4, but it has disappeared for a while, not sure if it will comes back. I seldom reboot my computer, might be once every three months, So I am not sure if problem still there. |
I had this also on Ubuntu 22.04 default ZFS package (so ZFS 2.1.4), but after installing zfs 2.1.6 from JonothonP the stask traces have gone away. Importing a degraded RAID-Z3 pool (missing a single drive) is still taking a few minutes though. (even though I have ensured the cachefile is up to date). |
I had this issue due to a extreme amount of snapshots created by containerd zfs snapshotter plugin Reducing number of snapshots made it fast again. |
System information
Describe the problem you're observing
Eveytime I reboot my system, when importing zpool, booting get stuck for a long time and I observe a call trace like below, is this something to worry about?
The text was updated successfully, but these errors were encountered: