Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer dereference when attempting to destroy snapshots en-mass to allow a pool to resilver #8237

Closed
wrouesnel opened this issue Jan 4, 2019 · 4 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@wrouesnel
Copy link
Contributor

wrouesnel commented Jan 4, 2019

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel Linux xen 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Architecture x86_64
ZFS Version 0.7.5-1ubuntu13
SPL Version 0.7.5-1ubuntu1

Describe the problem you're observing

Snapshot deletion stalls and zpool status -v cannot list all errors when trying to delete snapshots on a pool with data errors.

Describe how to reproduce the problem

Not sure - this has happened on a pool which is displaying this structure (due to a multi-disk failure)

  pool: storage
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 4.42T in 54h29m with 501417 errors on Fri Jan  4 00:03:12 2019
config:

        NAME                                              STATE     READ WRITE CKSUM
        storage                                           DEGRADED     0     0  491K
          raidz3-0                                        DEGRADED     0     0  994K
            ata-ST3000DM001-1CH166_Z1F5RRC6               DEGRADED     0     0     0  too many errors
            ata-ST2000DM001-1CH164_Z1E9NG93               DEGRADED     0     0     0  too many errors
            ata-WDC_WD20EZRX-00D8PB0_WD-WMC4M3194337      DEGRADED     0     0     0  too many errors
            sdv                                           DEGRADED     0     0     2  too many errors
            replacing-4                                   DEGRADED     0     0     1
              11536183557135630467                        UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST2000DM001-1CH164_Z1E9NG5T-part1
              ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M7VNSP5X    ONLINE       0     0     0
            replacing-5                                   DEGRADED     0     0     1
              8619258095693369758                         UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F5MGFV-part1
              ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M0TVE345    ONLINE       0     0     0
            ata-ST2000DM001-1CH164_Z340PXSZ               DEGRADED     0     0     0  too many errors
            replacing-7                                   ONLINE       0     0   231
              sdk                                         ONLINE       0     0     0
              ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M2HL73Z4    ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE358947      ONLINE       0     0     0
            ata-WDC_WD15EARS-00S8B1_WD-WCAVY4004247       DEGRADED     0     0     0  too many errors
          raidz3-1                                        DEGRADED     0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE403743      ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE401801      ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE380254      ONLINE       0     0     0
            15477698036315389930                          UNAVAIL      0     0     0  was /dev/disk/by-id/ata-WDC_WD20EARX-00PASB0_WD-WCAZA6592392-part1
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE428280      ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE411377      ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAE410176      ONLINE       0     0     0
            ata-ST2000DM001-1CH164_Z1E9NG14               ONLINE       0     0     0
            ata-WDC_WD20EFRX-68EUZN0_WD-WCC4M2HL7Y8C      ONLINE       0     0     0
            ata-WDC_WD20EZRX-00D8PB0_WD-WMC4M0DNECCF      ONLINE       0     0     0
        logs
          mirror-2                                        UNAVAIL      0     0     0  insufficient replicas
            ata-Corsair_Force_3_SSD_11436522000012071599  OFFLINE      0     0     0
            ata-Corsair_Force_3_SSD_11436522000012072283  OFFLINE      0     0     0

errors: 501433 data errors, use '-v' for a list

The disk replacements refuse to complete, and I was attempting to remove damaged snapshots to allow them to complete.

Include any warning/errors/backtraces from the system logs

[261418.948829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
[261418.963115] IP: dle_enqueue_subobj.isra.5+0x25/0x140 [zfs]
[261418.970208] PGD 0 P4D 0
[261418.977392] Oops: 0000 [#1] SMP PTI
[261418.984477] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo br_netfilter aufs gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_in
tel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf bridge stp llc bonding nf_log_ipv6 zfs(PO) zunicode(PO) zavl(PO) icp(PO) input_leds joydev mac_hid ip6table_filter intel_pch_
thermal ip6_tables shpchp lpc_ich ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_REDIRECT nf_nat_redirect xt_nat ipmi_si zcommon(PO) ipmi_devintf xt_addrtype znvpair(PO) ipmi_msghandler spl(O) iptable_nat nf_nat_ipv4 nf_nat
 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack libcrc32c kvm_intel kvm iptable_filter irqbypass
[261419.064332]  binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbhid hid ttm ahci libahci drm_kms_helper igb syscopyarea dca sysfillrect 
sysimgblt i2c_algo_bit fb_sys_fops mpt3sas drm e1000e raid_class scsi_transport_sas ptp pps_core
[261419.092816] CPU: 1 PID: 12028 Comm: txg_sync Tainted: P           O     4.15.0-20-generic #21-Ubuntu
[261419.107063] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.01.04.0002.011020141517 01/10/2014
[261419.121337] RIP: 0010:dle_enqueue_subobj.isra.5+0x25/0x140 [zfs]
[261419.128433] RSP: 0018:ffffa6ef8b563848 EFLAGS: 00010286
[261419.135528] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000008e009
[261419.149805] RDX: 0000000000000000 RSI: ffff88390acc2970 RDI: ffff883aa5f3d800
[261419.164050] RBP: ffffa6ef8b563880 R08: ffff883999002300 R09: ffffffffc06e7f43
[261419.178279] R10: fffff23a46160e00 R11: ffff8838d3826180 R12: ffff88390acc2968
[261419.192506] R13: 000000000008e009 R14: ffff883999002300 R15: 0000000000086ebd
[261419.206747] FS:  0000000000000000(0000) GS:ffff883aaf240000(0000) knlGS:0000000000000000
[261419.220977] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[261419.228068] CR2: 0000000000000058 CR3: 00000002fba0a004 CR4: 00000000001606e0
[261419.242298] Call Trace:
[261419.249477]  ? avl_find+0x5f/0x90 [zavl]
[261419.256614]  dsl_deadlist_insert_bpobj+0x103/0x1e0 [zfs]
[261419.263743]  ? zap_cursor_retrieve+0x91/0x2e0 [zfs]
[261419.270969]  ? zap_remove+0x13/0x20 [zfs]
[261419.278084]  ? zap_remove_int+0x54/0x80 [zfs]
[261419.285301]  dsl_deadlist_merge+0xdf/0x340 [zfs]
[261419.292389]  ? _cond_resched+0x19/0x40
[261419.299573]  ? mutex_lock+0x12/0x40
[261419.306663]  ? _cond_resched+0x19/0x40
[261419.313855]  ? mutex_lock+0x12/0x40
[261419.320963]  ? dsl_dir_diduse_space+0xee/0x180 [zfs]
[261419.328188]  ? dsl_dir_diduse_space+0x14b/0x180 [zfs]
[261419.335309]  dsl_destroy_snapshot_sync_impl+0x2ec/0xba0 [zfs]
[261419.342437]  dsl_destroy_snapshot_sync+0x66/0x130 [zfs]
[261419.349651]  dsl_sync_task_sync+0xb2/0x120 [zfs]
[261419.356784]  dsl_pool_sync+0x302/0x430 [zfs]
[261419.364010]  spa_sync+0x43e/0xd80 [zfs]
[261419.371177]  txg_sync_thread+0x2cd/0x4a0 [zfs]
[261419.378432]  ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[261419.385572]  thread_generic_wrapper+0x74/0x90 [spl]
[261419.392761]  kthread+0x121/0x140
[261419.399865]  ? __thread_exit+0x20/0x20 [spl]
[261419.406963]  ? kthread_create_worker_on_cpu+0x70/0x70
[261419.414152]  ? kthread_create_worker_on_cpu+0x70/0x70
[261419.421241]  ret_from_fork+0x35/0x40
[261419.428424] Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 49 89 cd 48 89 d3 4d 89 c6 48 83 ec 10 48 8b 3f <4c> 8b 7a 58 48 89 75 d0 e8 7e 34 fe ff 4c 3b b8 38 01 00 00 48 
[261419.449773] RIP: dle_enqueue_subobj.isra.5+0x25/0x140 [zfs] RSP: ffffa6ef8b563848
[261419.464014] CR2: 0000000000000058
[261419.471099] ---[ end trace eb8dd3d04cfa7209 ]---

NAME     PROPERTY                       VALUE                          SOURCE
storage  size                           31.8T                          -
storage  capacity                       73%                            -
storage  altroot                        -                              default
storage  health                         DEGRADED                       -
storage  guid                           18404054517318695605           -
storage  version                        -                              default
storage  bootfs                         -                              default
storage  delegation                     on                             default
storage  autoreplace                    off                            default
storage  cachefile                      -                              default
storage  failmode                       wait                           default
storage  listsnapshots                  off                            default
storage  autoexpand                     off                            default
storage  dedupditto                     0                              default
storage  dedupratio                     1.00x                          -
storage  free                           8.32T                          -
storage  allocated                      23.4T                          -
storage  readonly                       off                            -
storage  ashift                         12                             local
storage  comment                        -                              default
storage  expandsize                     -                              -
storage  freeing                        0                              -
storage  fragmentation                  18%                            -
storage  leaked                         0                              -
storage  multihost                      off                            default
storage  feature@async_destroy          enabled                        local
storage  feature@empty_bpobj            active                         local
storage  feature@lz4_compress           active                         local
storage  feature@multi_vdev_crash_dump  disabled                       local
storage  feature@spacemap_histogram     active                         local
storage  feature@enabled_txg            active                         local
storage  feature@hole_birth             active                         local
storage  feature@extensible_dataset     enabled                        local
storage  feature@embedded_data          active                         local
storage  feature@bookmarks              enabled                        local
storage  feature@filesystem_limits      disabled                       local
storage  feature@large_blocks           disabled                       local
storage  feature@large_dnode            disabled                       local
storage  feature@sha512                 disabled                       local
storage  feature@skein                  disabled                       local
storage  feature@edonr                  disabled                       local
storage  feature@userobj_accounting     disabled                       local
@wrouesnel
Copy link
Contributor Author

Tried upgrading to zfs 0.7.12 (0.7.12-0york0~18.04 from this PPA) and got the same error when trying to continue deleting snapshots.

The effect seems to be any operations involving snapshots stop working, and the pool stops being able to accept writes though my system stays up till I reboot. A zpool resilver operation will run to completion, but fail to ever actually finish replacing resilvered disks (i.e. it completes but the old disk cannot be detached, offlined or removed).

[ 5248.198752] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
[ 5248.208095] IP: dle_enqueue_subobj.isra.5+0x25/0x110 [zfs]
[ 5248.215100] PGD 0 P4D 0 
[ 5248.218805] Oops: 0000 [#1] SMP PTI
[ 5248.223616] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo br_netfilter aufs bridge stp llc bonding nf_log_ipv6 ip6table_filter ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_REDIRECT nf_nat_redirect xt_nat xt_addrtype iptable_nat nf_nat_ipv4 nf_nat nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack binfmt_misc nf_conntrack libcrc32c iptable_filter intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel gpio_ich kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf ipmi_si ipmi_devintf ipmi_msghandler intel_pch_thermal zfs(POE) zunicode(PO) zavl(PO) icp(POE) zcommon(POE) znvpair(PO) spl(OE) input_leds joydev mac_hid
[ 5248.309968]  shpchp lpc_ich sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbhid hid mgag200 igb ttm e1000e mpt3sas dca i2c_algo_bit drm_kms_helper ptp ahci syscopyarea sysfillrect sysimgblt libahci raid_class fb_sys_fops drm scsi_transport_sas pps_core
[ 5248.345164] CPU: 3 PID: 23099 Comm: txg_sync Tainted: P           OE    4.15.0-43-generic #46-Ubuntu
[ 5248.356643] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.01.04.0002.011020141517 01/10/2014
[ 5248.369471] RIP: 0010:dle_enqueue_subobj.isra.5+0x25/0x110 [zfs]
[ 5248.377481] RSP: 0018:ffffb19788807848 EFLAGS: 00010286
[ 5248.384606] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000008e009
[ 5248.393882] RDX: 0000000000000000 RSI: ffff9ed2f4cf9170 RDI: ffff9ed368e95000
[ 5248.403157] RBP: ffffb19788807880 R08: ffff9ed35414bd80 R09: 0000000000000000
[ 5248.412430] R10: ffffea164eb3d400 R11: ffff9ed36a7b9a40 R12: ffff9ed2f4cf9168
[ 5248.421701] R13: 000000000008e009 R14: ffff9ed35414bd80 R15: 0000000000000000
[ 5248.431122] FS:  0000000000000000(0000) GS:ffff9ed36f2c0000(0000) knlGS:0000000000000000
[ 5248.441623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5248.449503] CR2: 0000000000000058 CR3: 00000003d0e0a003 CR4: 00000000001606e0
[ 5248.458936] Call Trace:
[ 5248.463124]  ? avl_find+0x5f/0x90 [zavl]
[ 5248.468994]  dsl_deadlist_insert_bpobj+0x10b/0x1b0 [zfs]
[ 5248.476434]  ? zap_cursor_retrieve+0x91/0x2e0 [zfs]
[ 5248.483382]  ? zap_remove+0x13/0x20 [zfs]
[ 5248.489332]  ? zap_remove_int+0x54/0x80 [zfs]
[ 5248.495664]  dsl_deadlist_merge+0xe3/0x2e0 [zfs]
[ 5248.502224]  ? _cond_resched+0x19/0x40
[ 5248.507804]  ? mutex_lock+0x12/0x40
[ 5248.513087]  ? _cond_resched+0x19/0x40
[ 5248.518655]  ? mutex_lock+0x12/0x40
[ 5248.523959]  ? dsl_dir_diduse_space+0xee/0x180 [zfs]
[ 5248.530906]  ? dsl_dir_diduse_space+0x14b/0x180 [zfs]
[ 5248.537940]  dsl_destroy_snapshot_sync_impl+0x2f0/0xa30 [zfs]
[ 5248.545762]  dsl_destroy_snapshot_sync+0x68/0x110 [zfs]
[ 5248.552983]  dsl_sync_task_sync+0xb2/0x120 [zfs]
[ 5248.559511]  dsl_pool_sync+0x30a/0x420 [zfs]
[ 5248.565646]  spa_sync+0x43e/0xd30 [zfs]
[ 5248.571289]  txg_sync_thread+0x2cd/0x4a0 [zfs]
[ 5248.577568]  ? __switch_to_asm+0x40/0x70
[ 5248.583284]  ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[ 5248.590029]  thread_generic_wrapper+0x74/0x90 [spl]
[ 5248.596763]  kthread+0x121/0x140
[ 5248.601632]  ? __thread_exit+0x20/0x20 [spl]
[ 5248.607652]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5248.614534]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5248.621419]  ret_from_fork+0x35/0x40
[ 5248.626609] Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 49 89 cd 48 89 d3 4d 89 c6 48 83 ec 10 48 8b 3f <4c> 8b 7a 58 48 89 75 d0 e8 9e 3c fe ff 4c 3b b8 38 01 00 00 48 
[ 5248.650185] RIP: dle_enqueue_subobj.isra.5+0x25/0x110 [zfs] RSP: ffffb19788807848
[ 5248.659724] CR2: 0000000000000058
[ 5248.664559] ---[ end trace 5120590b58074559 ]---

@wrouesnel
Copy link
Contributor Author

wrouesnel commented Oct 29, 2019

So the pool in question which has this problem is still struggling along, which is to say, the affected datasets can be moved around and renamed, but not deleted (because the kernel thread crashes with the above).

I've been doing a little digging and to my (unexperienced) eyes, this looks like a logic bug in the relevant functions: (comments added inline to show my thinking)

From dsl_deadlist.c:

static void
dsl_deadlist_insert_bpobj(dsl_deadlist_t *dl, uint64_t obj, uint64_t birth,
    dmu_tx_t *tx)
{
	dsl_deadlist_entry_t dle_tofind;
	dsl_deadlist_entry_t *dle;
	avl_index_t where;
	uint64_t used, comp, uncomp;
	bpobj_t bpo;

	ASSERT(MUTEX_HELD(&dl->dl_lock));

	VERIFY0(bpobj_open(&bpo, dl->dl_os, obj));
	VERIFY0(bpobj_space(&bpo, &used, &comp, &uncomp));
	bpobj_close(&bpo);

	dsl_deadlist_load_tree(dl);

	dmu_buf_will_dirty(dl->dl_dbuf, tx);
	dl->dl_phys->dl_used += used;
	dl->dl_phys->dl_comp += comp;
	dl->dl_phys->dl_uncomp += uncomp;

	dle_tofind.dle_mintxg = birth;
        // this is the start of where it goes wrong
	dle = avl_find(&dl->dl_tree, &dle_tofind, &where); 
        // if this fails, we catch the NULL though and do avl_nearest...
	if (dle == NULL)
                // but this is never NULL checked, but can return NULL (and in my pool, I guess due to dataloss, does?
		dle = avl_nearest(&dl->dl_tree, where, AVL_BEFORE);
	dle_enqueue_subobj(dl, dle, obj, tx); // the crash happens inside here
}

Looking at avl.c:

void *
avl_nearest(avl_tree_t *tree, avl_index_t where, int direction)
{
	int child = AVL_INDEX2CHILD(where);
	avl_node_t *node = AVL_INDEX2NODE(where);
	void *data;
	size_t off = tree->avl_offset;

	if (node == NULL) { // nothing stopping us returning null
		ASSERT(tree->avl_root == NULL);
		return (NULL);
	}
	data = AVL_NODE2DATA(node, off);
	if (child != direction)
		return (data);

	return (avl_walk(tree, data, direction));
}

I'm not really clear enough on what the correct fix here would be, since the on-disk data is wrecked at this point. But it seems like, since the data is already recorded as a permanent error and non-existent, it should be possible to bail out of this in the snapshot delete scenario (or at least not crash a kernel thread).

@behlendorf
Copy link
Contributor

Thanks for digging in to this. It does seem that this function should include at least some recovery logic along the lines of what's done in dsl_deadlist_insert(). That would at least allow it to try and continue when zfs_recover is set. @shartse and @ahrens would have the best insight on how to most gracefully handle this kind of damage.

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 29, 2019
@stale
Copy link

stale bot commented Oct 28, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Oct 28, 2020
@stale stale bot closed this as completed Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants