Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool add intermittently fails #71

Closed
nedbass opened this issue Oct 20, 2010 · 2 comments
Closed

zpool add intermittently fails #71

nedbass opened this issue Oct 20, 2010 · 2 comments

Comments

@nedbass
Copy link
Contributor

nedbass commented Oct 20, 2010

I sometimes observe a failure to add a cache or log vdev to a zpool. If I issue the same command repeatedly, it eventually succeeds. I only see this for real disks, never for loopback or scsi_debug devices. Also if I run zpool under strace the problem never happens. So there seems to be a subtle timing issue.

$ zpool add ztest cache A4
cannot add to 'ztest': no such pool or dataset

This happens when the call zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, &zc) returns ENOENT. By instrumenting the kernel code with SPL debugging macros, I found where ENOENT is getting returned from and dumped a stack trace:

SPL: Showing stack for process 25977
Pid: 25977, comm: lt-zpool Tainted: P           ----------------  2.6.32-63.el6.x86_64 #1
Call Trace:
 [] spl_debug_dumpstack+0x27/0x40 [spl]
 [] spa_scan_get_stats+0x41/0x140 [zfs]
 [] ? nvlist_add_uint64_array+0x16/0x20 [znvpair]
 [] vdev_config_generate+0x590/0x10f0 [zfs]
 [] ? kmem_alloc_debug+0xeb/0x130 [spl]
 [] ? schedule+0x1e4/0x408
 [] ? _cond_resched+0x30/0x40
 [] vdev_config_generate+0x28f/0x10f0 [zfs]
 [] ? kmem_alloc_debug+0xeb/0x130 [spl]
 [] ? _cond_resched+0x30/0x40
 [] vdev_config_generate+0x28f/0x10f0 [zfs]
 [] ? kmem_free_debug+0x16/0x20 [spl]
 [] spa_config_generate+0x1f9/0x9c0 [zfs]
 [] spa_open_common+0xb4/0x490 [zfs]
 [] ? ftrace_call+0x5/0x2b
 [] spa_get_stats+0x49/0x2c0 [zfs]
 [] ? _cond_resched+0x9/0x40
 [] ? __kmalloc+0x20c/0x220
 [] zfs_ioc_pool_stats+0x31/0x70 [zfs]
 [] zfsdev_ioctl+0xef/0x1c0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] ? ftrace_call+0x5/0x2b
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b
SPL: 25977:0:(spa_misc.c:1662:spa_scan_get_stats()) Process leaving (rc=2 : 2 : 2)

For reference, here is the implicated code. I have confirmed that scn != NULL. This is as far as I've dug into the problem so far.

1653 /*
1654  * Get scan stats for zpool status reports
1655  */
1656 int
1657 spa_scan_get_stats(spa_t *spa, pool_scan_stat_t *ps)
1658 {
1659         dsl_scan_t *scn = spa->spa_dsl_pool ? spa->spa_dsl_pool->dp_scan : NULL;
1660 
1661         if (scn == NULL || scn->scn_phys.scn_func == POOL_SCAN_NONE)
1662                 return (ENOENT);
...
@nedbass
Copy link
Contributor Author

nedbass commented Oct 20, 2010

Actually after looking more closely, I think spa_scan_get_stats() returning ENOENT is a red herring. But with better instrumentation I'm now focusing on spa_validate_aux_devs() and vdev_label_init().

@nedbass
Copy link
Contributor Author

nedbass commented Oct 22, 2010

This is fixed by d877ac6.

dajhorn referenced this issue in zfsonlinux/pkg-zfs Jan 6, 2012
The taskq_t's active thread list is sorted based on its
tqt_ent->tqent_id field. The list is kept sorted solely by inserting
new taskq_thread_t's in their correct sorted location; no other
means is used. This means that once inserted, if a taskq_thread_t's
tqt_ent->tqent_id field changes, the list runs the risk of no
longer being sorted.

Prior to the introduction of the taskq_dispatch_prealloc() interface,
this was not a problem as a taskq_ent_t actively being serviced under
the old interface should always have a static tqent_id field. Thus,
once the taskq_thread_t is added to the taskq_t's active thread list,
the taskq_thread_t's tqt_ent->tqent_id field would remain constant.

Now, this is no longer the case. Currently, if using the
taskq_dispatch_prealloc() interface, any given taskq_ent_t actively
being serviced _may_ have its tqent_id value incremented. This happens
when the preallocated taskq_ent_t structure is recursively dispatched.
Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id
field silently modified from under its feet. If this were to happen
to a taskq_thread_t on a taskq_t's active thread list, this would
compromise the integrity of the order of the list (as the list
_may_ no longer be sorted).

To get around this, the taskq_thread_t's taskq_ent_t pointer was
replaced with its own static copy of the tqent_id. So, as a taskq_ent_t
is pulled off of the taskq_t's pending list, a static copy of its
tqent_id is made and this copy is used to sort the active thread
list. Using a static copy is key in ensuring the integrity of the
order of the active thread list. Even if the underlying taskq_ent_t
is recursively dispatched (as has its tqent_id modified), this
static copy stored inside the taskq_thread_t will remain constant.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #71
dajhorn referenced this issue in zfsonlinux/pkg-zfs Jan 6, 2012
A preallocated taskq_ent_t's tqent_flags must be checked prior to
servicing the taskq_ent_t. Once a preallocated taskq entry is serviced,
the ownership of the entry is handed back to the caller of
taskq_dispatch, thus the entry's contents can potentially be mangled.

In particular, this is a problem in the case where a preallocated taskq
entry is serviced, and the caller clears it's tqent_flags field. Thus,
when the function returns and task_done is called, it looks as though
the entry is **not** a preallocated task (when in fact it **is** a
preallocated task).

In this situation, task_done will place the preallocated taskq_ent_t
structure onto the taskq_t's free list. This is a **huge** mistake. If
the taskq_ent_t is then freed by the caller of taskq_dispatch, the
taskq_t's free list will hold a pointer to garbage data. Even worse, if
nothing has over written the freed memory before the pointer is
dereferenced, it may still look as though it points to a valid list_head
belonging to a taskq_ent_t structure.

Thus, the task entry's flags are now copied prior to servicing the task.
This copy is then checked to see if it is a preallocated task, and
determine if the entry needs to be passed down to the task_done
function.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #71
richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018
* Fixed hdr.len check in io_receiver thread

Signed-off-by: satbir <satbir.chhikara@gmail.com>
sdimitro pushed a commit to sdimitro/zfs that referenced this issue Feb 14, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant