Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

Set spl_taskq_thread_dynamic=0 by default #484

Closed
wants to merge 1 commit into from

Conversation

behlendorf
Copy link
Contributor

Disable dynamic taskqs by default. They have been implicated in
several lockups and disabling them typically resolves the issue.
These lockups may only occur with certain kernel CONFIG_* options
and versions but until the root cause is identified it's safest
to disable this functionality by default. End users may opt to
re-enable them if they have not observed any problems in their
environment.

Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov

Disable dynamic taskqs by default.  They have been implicated in
several lockups and disabling them typically resolves the issue.
These lockups may only occur with certain kernel CONFIG_* options
and versions but until the root cause is identified it's safest
to disable this functionality by default.  End users may opt to
re-enable them if they have not observed any problems in their
environment.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf
Copy link
Contributor Author

@dweeezil, @nedbass I think we need to disable dynamic taskqs in the release branch until we can determine the root cause of some of the reported deadlocks.

@behlendorf behlendorf added the Bug label Oct 13, 2015
@dweeezil
Copy link
Contributor

@behlendorf I would tend to agree. Part of my recent testing regimen has been to try to stress dynamic thread creation but so far, I've not been able to reproduce any problems. I'm thinking of adding some more kstats to get a handle on the relationship between the number of sequentially launched tasks and those which actually get a new thread. Since these deadlocks don't all seem to be related to memory allocation, my current half-baked theory is that there may be some sort of dependency, maybe in the zio pipeline, which can cause deadlocks if certain tasks are run sequentially. I also wonder whether there may be some interaction with the taskq reconfiguration in spa.c and or the task priorities.

@behlendorf
Copy link
Contributor Author

That sounds like my experience, I haven't been able to trigger any issues with them either. Issue #483 contains a warning from lockdep but it looks like a false positive. The code just needs to have a subclass added to make lockdep happy.

I like the idea of adding a kstat to provide visibility in to the taskqs. Being able to easily spot check how they're behaving could be tremendously helpful when investigating performance issues. Since there are a relatively small number of them you could track some stats for each of them and output them in a kstat one taskq per-line.

As for this deadlock I've been mulling over it over and it sure seems like we're somehow taking the tq->tq_lock lock twice. Although it's not clear to be how that's possible.

@behlendorf
Copy link
Contributor Author

Merged to the release branch, this functionally was not been disabled in master. We'll fix it there.

4d3d716 Set spl_taskq_thread_dynamic=0 by default

@behlendorf behlendorf closed this Oct 15, 2015
@dweeezil
Copy link
Contributor

@behlendorf FYI, I've got taskq kstats mostly working in https://github.com/dweeezil/spl/tree/taskq-kstat and am working on tuning the ouput. I'm also mulling over getting some per tqent output as well. So far it looks kinda like this:

tim@zfsdev:~/src/spl$ cat /proc/spl/taskq 
taskq                 inst   act  nthr   seq  spwn  maxt   pri  mina  maxa  cura      flags
spl_kmem_cache           0     0     1     1     0     4   100    32 2147483647    32   80000005
spl_system_taskq         0     0     1     0     0    64   100     4 2147483647     4   80000005
spl_dynamic_taskq        0     0     1     1     0     1   100     4 2147483647     4   80000001
...
z_ioctl_iss              0     0     1     3     0     1   100    50 2147483647     0   80000004
z_ioctl_int              0     0     1     1     0     1   100    50 2147483647     0   80000004
metaslab_group_taskq     0     0     1     2     0     2   100    10 2147483647     3   8000000c
metaslab_group_taskq     1     0     1     3     0     2   100    10 2147483647     3   8000000c
metaslab_group_taskq     2     0     1     3     0     2   100    10 2147483647     3   8000000c
z_iput                   0     0     1     0     0     4   120    32 2147483647    32   80000005
zil_clean                0     0     1     0     0     1   120     2     2     2   80000001
zil_clean                1     0     1     1     0     1   120     2     2     2   80000001

I move the sequential task counter into the taskq struct to give it visibility. I'll convert this to a pull request when I'm happy with it.

MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this pull request Dec 3, 2015
Bug Fixes

* Fix CPU hotplug openzfs/spl#482
* Disable dynamic taskqs by default to avoid deadlock
  openzfs/spl#484
* Don't import all visible pools in zfs-import init script
  openzfs/zfs#3777
* Fix use-after-free in vdev_disk_physio_completion
  openzfs/zfs#3920
* Fix avl_is_empty(&dn->dn_dbufs) assertion openzfs/zfs#3865

Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Change-Id: I36347630be2506bee4ff0a05f1b236ba2ba7a0ae
Reviewed-on: http://review.whamcloud.com/16877
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
@behlendorf behlendorf added this to the 0.6.5 milestone Mar 23, 2016
@behlendorf behlendorf deleted the disable-dynamic-taskq branch July 28, 2017 22:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants