Skip to content

ZSTs, DEBUG kernels, and copyin, oh my! #267

@rzezeski

Description

@rzezeski

Today, @Nieuwejaar came to me with a very perplexing kernel panic.

debugging crash dump vmcore.1 (64-bit) from rust
operating system: 5.11 xde-0-g7b420ea31b (i86pc)
build version: heads/xde-0-g7b420ea31b-dirty

image uuid: 7389e819-d240-cd85-b776-fc1253d5a82b
panic message: copyin: kaddr argument below kernelbase
dump content: kernel pages only

fffffe007eb237e8 vpanic()
fffffe007eb23810 copyin+0x30()
fffffe007eb23840 ddi_copyin+0x21(1, 1, 0, 202003)
fffffe007eb23920 list_ports_hdlr+0x61()
fffffe007eb23c00 xde_dld_ioc_opte_cmd+0x56()
fffffe007eb23cb0 drv_ioctl+0x172(3500000000, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08)
fffffe007eb23cf0 cdev_ioctl+0x3f(3500000000, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08)
fffffe007eb23d40 spec_ioctl+0x55(fffffe59d6191d00, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08, 0)
fffffe007eb23dd0 fop_ioctl+0x40(fffffe59d6191d00, de001e61, fffffc7fffdfb8d0, 202003, fffffe59b9814688, fffffe007eb23e08, 0)
fffffe007eb23ef0 ioctl+0x144(a, de001e61, fffffc7fffdfb8d0)
fffffe007eb23f00 sys_syscall+0x283()

Notice the first two arguments to ddi_copyin() -- those are supposed to be a userspace pointer and kernel pointer. Last I checked 0x1 does not a pointer make. So what the hell is happening?

All OPTE commands have a request and a response. When an ioctl/cmd is performed, the xde code tries to copyin the request bytes from userspace and deserialize them to a request struct.

        let mut bytes = Vec::with_capacity(self.ioctl.req_len);
        let ret = unsafe {
            ddi::ddi_copyin(
                self.ioctl.req_bytes as *const c_void,
                bytes.as_mut_ptr() as *mut c_void,
                self.ioctl.req_len,
                self.mode,
            )
        };

However, some of these requests are ZSTs.

#[derive(Debug, Deserialize, Serialize)]
pub struct ListPortsReq {
    pub unused: (),
}

In those cases we never do anything with the copied request.

#[no_mangle]
fn list_ports_hdlr(
    env: &mut IoctlEnvelope,
) -> Result<ListPortsResp, OpteError> {
    let _req: ListPortsReq = env.copy_in_req()?;

But many people have been using OPTE without any problem. So what gives. Well, there's a difference in behavior between ddi_copyin() in non-DEBUG vs DEBUG kernels.

.copyin_panic_msg:
	.string "copyin: kaddr argument below kernelbase"

<... SNIP ...>
	ENTRY(copyin)
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$24, %rsp

	/*
	 * save args in case we trap and need to rerun as a copyop
	 */
	movq	%rdi, (%rsp)
	movq	%rsi, 0x8(%rsp)
	movq	%rdx, 0x10(%rsp)

	movq	kernelbase(%rip), %rax
#ifdef DEBUG
	cmpq	%rax, %rsi		/* %rsi = kaddr */
	jnb	1f
	leaq	.copyin_panic_msg(%rip), %rdi
	xorl	%eax, %eax
	call	panic
1:
#endif

If I look at an opteadm list-ports request on a non-DEBUG kernel I see the following:

rpz@kalm:~/opte-dtrace$ pfexec ./opte-trace opte-ioctl.d 
opte_cmd_ioctl_t {
    uint64_t api_version = 0xc
    int cmd = 0x1
    uint64_t flags = 0
    uint64_t reserved = 0
    char *req_bytes = 0x1
    size_t req_len = 0
    char *resp_bytes = 0x1378010
    size_t resp_len = 0x4000
    size_t resp_len_actual = 0
}
ddi_copyin(1, 1, 0, 0x202003) => 0

So it seems ddi_copyin() doesn't care about the bogus kernel address on non-DEBUG. I'm not sure if that should be the case. I wonder if it should panic in non-DEBUG or return a -1 (and bump a kstat)? Though, it is a zero-byte copy, so does it really matter what value was passed for the addresses?

So anyways, this doesn't answer the question of where the address value of 0x1 is coming from. Well that has to do with the fact that this request is a ZST. Which means we end up doing the following:

        let mut bytes = Vec::with_capacity(0);
        let ret = unsafe {
            ddi::ddi_copyin(
                self.ioctl.req_bytes as *const c_void,
                bytes.as_mut_ptr() as *mut c_void,
                self.ioctl.req_len,
                self.mode,
            )
        };

If you look in the RawVec code in the alloc lib you'll see this creates a new struct with a dangling pointer. And that dangling pointer ends up aligned to 1 for a ZST.

I believe the fix here is to not lean on ZSTs like this and instead be more explicit with commands that have no request body by not calling ddi_copyin at all in those cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions