Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIO errors to local scsi targets via scst_local atop ZVOLs, related to #4042 #4097

Closed
sempervictus opened this issue Dec 13, 2015 · 3 comments
Labels
Component: ZVOL ZFS Volumes Status: Inactive Not being actively updated Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@sempervictus
Copy link
Contributor

While trying to address #4042 by being clever, i created scst_local handlers and targets for my ZVOLs. And in local testing this worked like a charm - write performance almost doubled (seems to eliminate jitter and provides the numbers i'd expect from a 5SSD RAIDZ, 800/200 linear/random write). However, soon as i mapped these exports into libvirt as virtio-scsi luns (instead of disks) i got this wonderful stack trace from scst:

scst-svn-trunk/scst/src/dev_handlers/scst_vdisk.c:6744 blockio_exec_rw+0x6b3/0x6d0 [scst_vdisk]()
[35895.359993] Refused bio with invalid length 1152 and/or offset 2944.
[35895.359994] Modules linked in: scst_local(OE) scst_vdisk(OE) scst(OE) libcrc32c dlm configfs vhost_net vhost macvtap macvlan serpent_sse2_x86_64 serpent_generic lrw glue_helper ablk_helper cryptd ecryptfs ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf bridge stp llc kvm_intel kvm gpio_ich ppdev serio_raw input_leds joydev ioatdma microcode 8250_fintek ipmi_ssif shpchp i7core_edac dca edac_core lpc_ich i5500_temp parport_pc ipmi_si ipmi_msghandler jc42 coretemp lp parport nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache mlx5_core zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) drbg ansi_cprng xts gf128mul dm_crypt mlx4_en vxlan ip6_udp_tunnel udp_tunnel ses enclosure i2c_algo_bit ttm drm_kms_helper hid_generic syscopyarea sysfillrect sysimgblt uas fb_sys_fops usbhid ahci usb_storage psmouse drm hid mlx4_core libahci aacraid e1000e fjes [last unloaded: scst]
[35895.360067] CPU: 6 PID: 18888 Comm: cinder001_0 Tainted: P           OE   4.3.2-ssv-i7 #ssv
[35895.360069] Hardware name: Intel Thurley/Greencity, BIOS GEMAV200 01/28/2011
[35895.360071]  ffffffffc08e3508 ffff8806407efa68 ffffffff823963ad ffff8806407efab0
[35895.360074]  ffff8806407efaa0 ffffffff82071a06 0000000000000b80 ffffea001c1892c0
[35895.360076]  ffff88061718c000 ffffffffc08e7360 0000000000000001 ffff8806407efb00
[35895.360079] Call Trace:
[35895.360089]  [<ffffffff823963ad>] dump_stack+0x44/0x57
[35895.360094]  [<ffffffff82071a06>] warn_slowpath_common+0x86/0xc0
[35895.360097]  [<ffffffff82071a8c>] warn_slowpath_fmt+0x4c/0x50
[35895.360101]  [<ffffffffc08e05d4>] ? blockio_exec_rw+0xf4/0x6d0 [scst_vdisk]
[35895.360105]  [<ffffffffc08e0b93>] blockio_exec_rw+0x6b3/0x6d0 [scst_vdisk]
[35895.360110]  [<ffffffff8253c790>] ? scsi_kmap_atomic_sg+0x190/0x190
[35895.360121]  [<ffffffffc09fda70>] ? scst_process_redirect_cmd+0x190/0x220 [scst]
[35895.360126]  [<ffffffffc08e0c52>] blockio_exec_read+0x12/0x20 [scst_vdisk]
[35895.360129]  [<ffffffffc08d9f35>] vdev_do_job+0x35/0xe0 [scst_vdisk]
[35895.360132]  [<ffffffffc08db280>] blockio_exec+0x70/0x200 [scst_vdisk]
[35895.360140]  [<ffffffffc09f940a>] scst_do_real_exec+0x4a/0x190 [scst]
[35895.360143]  [<ffffffff827a4f2e>] ? _raw_spin_unlock_bh+0x1e/0x20
[35895.360151]  [<ffffffffc09fad81>] scst_exec_check_blocking+0xd1/0x190 [scst]
[35895.360159]  [<ffffffffc09fb7c5>] scst_process_active_cmd+0x85/0x1800 [scst]
[35895.360168]  [<ffffffffc09fdedf>] scst_do_job_active+0x5f/0x80 [scst]
[35895.360175]  [<ffffffffc09fe008>] scst_cmd_thread+0x108/0x310 [scst]
[35895.360179]  [<ffffffff820b2fc0>] ? prepare_to_wait_event+0xf0/0xf0
[35895.360186]  [<ffffffffc09fdf00>] ? scst_do_job_active+0x80/0x80 [scst]
[35895.360189]  [<ffffffff8208e6f9>] kthread+0xc9/0xe0
[35895.360192]  [<ffffffff8208e630>] ? kthread_park+0x60/0x60
[35895.360194]  [<ffffffff827a584f>] ret_from_fork+0x3f/0x70
[35895.360196]  [<ffffffff8208e630>] ? kthread_park+0x60/0x60
[35895.360198] ---[ end trace 40ca6b99d614699c ]---

To my untrained eye, this looks to be coming from scsi_kmap_atomic_sg which makes me think we may have a problem on our end.
Mapping a physical disk this way works.

If i can get this solved, we wont need any of the top/bottom scsi handler discussions anymore as we'll just be able to use the scst_local_tgt and tcm_local (LIO) interfaces for that work, leaving it up to the scsi subsystem of choice.

Only thing i can think of is that the virtio-scsi driver doesnt like my 8K block size, so i'll try with 4, or to try parameters allowing me to remap the blocksize (though i think that requires the disk abstraction, and wont work in lun mode).

@behlendorf
Copy link
Contributor

@sempervictus the key to the error here in on the second line you've pasted. It's expecting 512b aligned lengths and offsets, someone will need to determine where it's getting these values from and why.

[35895.359993] Refused bio with invalid length 1152 and/or offset 2944.

@sempervictus
Copy link
Contributor Author

The block sizing it expects is actually 4096, at least thats what i set in scst.conf when creating the scst_local mappings via vdisk_blockio handlers.

I've implemented this with tcm_loop via LIO and it seems to do most of what i want. Using this as a lun via virtio-scsi still causes problems though, with a volblocksize=8k the data gets corrupted pretty badly. When using as virtio-scsi/disk, with libvirt setting the blocksize to 4096/8192 i also get corruption (blocks are spread out by a factor of 8 which makes me think its expecting 512). When i set the blocksize in libvirt to 512/4096 it plays nice again.
Performance wise, i'm seeing 600MB/s to >1GB/s sustained write throughput at the cinder VM, and about 300-600 at the iSCSI targets (open-iscsi has massive overhead it seems). The iSCSI targets are exported via SCST in the Cinder VM, as vdisk_block to reduce caching, and the Cinder VM itself is set to cache='none' on the disks and in the Nova compute nodes. It does seem to buffer some of the IO through tcm_loop, but definitely preferable to the VM directly accessing the ZVOL as a virtio-scsi disk where the throughput seems capped at ~215MB/s for sustained write (although random writes still hit well over 300MB/s). Methinks our zvols are not behaving as expected given the strange artificial caps which can apparently be bypassed via loopback scsi abstraction.

EDIT: given that LIO (unfortunately) is now part of upstream, is it viable for us to hook tcm_loop from the ZVOL code to implement the volmode= property similar to what BSD does for GEOM and normal dev (to address #3438)? Or is that going to end up being some sort of heinous GPL violation? Would be a nice end-run around the problem if we could hook existing implementation...

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZVOL ZFS Volumes Status: Inactive Not being actively updated Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants
@behlendorf @sempervictus and others