Skip to content

Unbounded callback lifetimes cause rlua to be unsound #97

@kyren

Description

@kyren

I'm going to write a very long explanation of this bug, because the root issue here is something that has come up again and again, and I need to document exactly what's going on.

tl;dr I might have an easy fix, but the internal types for callbacks inside rlua are problematic and they probably should change before 1.0. (Edit: I don't have an easy fix D:)

So, the internal type of callbacks in rlua is (basically) this:

type Callback<'cb> = Box<Fn(&'cb Lua, MultiValue<'cb>) -> Result<MultiValue<'cb> + Send>

There are a few different ways to create rust callbacks in rlua, but all of the APIs are basically variations on Lua::create_function:

pub fn create_function<'lua, 'cb, A, R, F>(&'lua self, f: F) -> Result<Function<'lua>>
where
    A: FromLuaMulti<'cb>,
    R: ToLuaMulti<'cb>,
    F: Fn(&'cb Lua, A) -> Result<R> + 'static + Send,

These functions take an F and produce the internal Callback type, which is then later used to create a Lua callback with lua_pushcclosure, but that part isn't terribly important. What is important here is the 'cb lifetime: all of the APIs that look like this have the 'cb lifetime as an unbounded user chosen lifetime, and this is deeply, deeply wrong. This has always been wrong, and this has been wrong since the very first versions of this crate.

Here's what the type of the internal callback should be:

type Callback = Box<for<'cb> Fn(&'cb Lua, MultiValue<'cb>) -> Result<MultiValue<'cb>> + Send>

which is to say that the internal callback should be a function that given any 'lua lifetime, it can take a Lua ref and arguments with the same 'lua lifetime and produce results with that same 'lua lifetime. This is the actual, logical requirement that we want to express on the callback, and though we can make this type just fine, problems start when we want to make our callback API. Let's see how we might change our create_function type to allow for the correct callback type:

pub fn create_function<'lua, A, R, F>(&'lua self, f: F) -> Result<Function<'lua>>
where
    A: for<'cb> FromLuaMulti<'cb>,
    R: for<'cb> ToLuaMulti<'cb>,
    F: for<'cb> Fn(&'cb Lua, A) -> Result<R> + 'static + Send,

Except, this will never work, because we have three separate for<'cb> HRTBs and there is no way to tell the rust compiler that we need to universally quantify all three trait bounds over a single lifetime.

What's especially frustrating is that it's actually totally possible to write the code that produces the correct callback type, but it's not currently possible to put that code into a function and name its type signature; observe.

You can see me struggling with this problem in a very old reddit post here. To proceed with the API and get around this otherwise very tricky or impossible to solve limitation, I picked an API that conveniently "lied" about the callback lifetimes, and I believed this was safe because all callbacks at the time were 'static anyway. Well, honestly I had a pretty poor understanding of the borrow checker at the time and I was very unsure of everything, but later on I became slightly more confident that my lie based interface was safe due to the 'static callback requirement. I now know this is wrong, and should have realized this sooner.

I lied a bit above, the real callback type inside rlua is actually not what I said above, there is actually a second lifetime parameter to make the callback itself non-'static for use in Lua::scope, so the real callback type is actually this:

type Callback<'cb, 'a> = Box<Fn(&'cb Lua, MultiValue<'cb>) -> Result<MultiValue<'cb>> + Send + 'a>

This is to facilitate the Scope callbacks having a non-'static lifetime, which if you understand how the argument lifetimes are lying there, you know this is obviously a problem. At the time I initially made the Scope interface I didn't really fully understand the implications of the misleading callback types that I was using. I later found out via #82 that the initial interface that I had written was of course unsound, and it was only after fixing that issue that I more fully understood the problem.

From a certain perspective, the misleading callback types are the root cause of #82, because the callback type requires this user chosen, unbounded 'cb lifetime and that makes writing a sound interface extremely difficult and easy to get wrong. I believe the solution for #82 is correct, but it is extremely delicate and intricate and confusing, and this is the LAST property that you want when writing rust code with unsafe.

So the reason that I'm writing about this problem now is not actually due to #82. Instead it is because I had a minor panic attack today over the following thought: "what if you convinced the borrow checker to pick 'static as the 'cb lifetime in rlua callbacks?".

Well, it turns out you can totally do this, and it's obvious that you can do this and I should have realized this sooner. Observe this monstrosity:

extern crate rlua;

use std::cell::RefCell;

use rlua::{Lua, Table};

fn main() {
    thread_local! {
        static BAD_TIME: RefCell<Option<Table<'static>>> = RefCell::new(None);
    }

    let lua = Lua::new();

    lua.create_function(|_, table: Table| {
        BAD_TIME.with(|bt| {
            *bt.borrow_mut() = Some(table);
        });
        Ok(())
    }).unwrap()
    .call::<_, ()>(lua.create_table().unwrap())
    .unwrap();

    // In debug, this will panic with a reference leak before getting to the next part but
    // it segfaults anyway.
    drop(lua);

    BAD_TIME.with(|bt| {
        println!(
            "you're gonna have a bad time: {}",
            bt.borrow().as_ref().unwrap().len().unwrap()
        );
    });
}

So the proper fix is honestly the same as the proper fix for #82, which is to fix the callback types, and thus also fix the callback API. However, without ATCs, it appears that the only way that I can do so is either macros (to write the voldemort function who's type I can't name), or possibly some other trait trick I'm not aware of, or possibly redesign ToLua / FromLua to not require ATCs, I'm not sure.

In the meantime, there's a hopefully simple fix for this issue, but I need to test it out first, and I'll be back in a moment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions