Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: expose per-instance data #728

Closed
jrose-signal opened this issue May 7, 2021 · 19 comments · Fixed by #902
Closed

Feature request: expose per-instance data #728

jrose-signal opened this issue May 7, 2021 · 19 comments · Fixed by #902

Comments

@jrose-signal
Copy link
Contributor

N-API provides the rather limited napi_set_instance_data to allow associating a single word of data with an instance. Neon already uses this API to store its own per-instance data, but it would be great if that were exposed to users in a convenient way. A possible shape for the API:

pub struct PerInstanceKey<T: 'static>(PhantomData<&'static mut T>);
impl<T: 'static> PerInstanceKey<T> {
  pub const fn new() {
    Self(PhantomData)
  }
}

impl Context {
  pub fn set_instance_data<T: 'static>(&mut self, key: &'static PerInstanceKey<T>, value: T) -> Option<T> {
    let mut user_data: HashMap = &mut InstanceData::get(self).user_data;
    let previous_value = user_data.insert(key as *const (), Box::new(value as dyn Any));
    previous_value.map(|old_value| *old_value.downcast().expect("type-safe API"))
  }

  pub fn get_instance_data_mut<T: 'static>(&mut self, key: &'static PerInstanceKey<T>) -> Option<&mut T> {
    unimplemented!() // same as set_* simplified
  }
}
static MY_DATA: PerInstanceKey<String>;

cx.set_instance_data(&MY_DATA, "hello".to_string());

An alternate version of the API would include a function to construct the initial value if not present, more like thread_local!, but this version is the simplest I could think of.

@kjvalencik
Copy link
Member

@jrose-signal This looks like a good minimal API. I think it might need to return some type of borrow guard to ensure borrowed instance data lives long enough, unless set_instance_data is made to error if already set.

Also, I could see benefits of only providing an immutable reference, otherwise, we would need to bind to the lifetime of &mut Context to prevent aliasing.

It would be really neat to make this typesafe without Any by making Contex generic over the type. Unfortunately, I think that would be incredibly disruptive to the Neon API.

@jrose-signal
Copy link
Contributor Author

I think it might need to return some type of borrow guard to ensure borrowed instance data lives long enough, unless set_instance_data is made to error if already set.

Also, I could see benefits of only providing an immutable reference, otherwise, we would need to bind to the lifetime of &mut Context to prevent aliasing.

In this most limited form it's safe because it's using the lifetime of the &mut Context, but that does rule out a lot of patterns (such as accessing two instance data values without copying). Something more flexible would definitely require more design.

It would be really neat to make this typesafe without Any by making Contex generic over the type. Unfortunately, I think that would be incredibly disruptive to the Neon API.

I'm not sure how it's related to Context, but yeah, doing this without boxing would mean a custom collection type like hlist. I decided it ultimately wasn't so bad since Box<dyn Any> is on par with a single JavaScript object.

@kjvalencik
Copy link
Member

kjvalencik commented May 10, 2021

I think borrowing against the lifetime 'a in &mut 'a Context is going to be bad for most use cases because nearly all Neon APIs require a mutable reference to Context. It's used to enforce compile time guarantees about state of the VM.

In order to get around that, the type would need to be clone-able, but then, maybe it should be consistent with JsBox and require the user to use interior mutability.

It might be possible for Neon to have get_instance_data and get_instance_data_mut APIs that borrow against 'a in &mut Context<'a> and &'a mut Context respectively. It would internally need to use an Rc and the immutable version would need a borrow guard, but both enforce single thread-ness and ownership.

@kjvalencik
Copy link
Member

I really like this idea, but I think we could make it a single function with an API similar to Entry.

Also, I don't think InstanceKey is necessary; users could manage that themselves.

@jrose-signal
Copy link
Contributor Author

Yeah, InstanceKey is mostly just to avoid exposing the dyn Any and to provide a source of unique IDs, but a simpler string-keyed thing that explicitly used dyn Any would work too.

@dherman
Copy link
Collaborator

dherman commented Jun 9, 2021

I love this idea too!

I like the Entry pattern, I bet we can make use of that.

I’m a little concerned if we totally eliminate PerInstanceKey it could diffuse responsibility for a nonobvious coding pattern (esp using phantom types) to users and lead to boilerplate.

One possible idea: what if instead of a PerInstanceKey nonce type we instead published a trait that let users tag a type as instance data? We could use TypeId to generate the nonce for the internal table. If users wanted more than one key for a type they could use a newtype pattern, which is effectively the same thing as PerInstanceKey but maybe a little more intuitive. Something like:

impl InstanceData for Foo {
    fn init<a, C: Context<a>>(cx: &mut C) -> Self {
        unimplemented!()
    }
}

@dherman
Copy link
Collaborator

dherman commented Jun 10, 2021

@jrose-signal BTW can you say a little bit about your specific use cases for instance data? Concrete examples can he helpful for thinking about the design.

@jrose-signal
Copy link
Contributor Author

Heh. My specific use case is probably not the best example: I wanted to store a unique value per-instance so that I could safely store a "current context" using something like scoped-tls and do synchronous calls when a future is resumed. Without a disambiguator between instances, there'd be no way to tell whether the Future can be resumed on the current context or if it's actually a worker context.

All of this is related to @indutny-signal's work making EventQueue/Channel have less overhead in Electron; the fastest way to execute something on a context is to already be on that context. There's more thinking for me to do here: it's not generally recommended to synchronously resume futures in Rust because you could deadlock yourself. But if we know the future that was fulfilled came from a JavaScript future, we're already on the microtask queue and don't have any other pending work.

But in general anything that can be computed once and cached benefits from being in PerInstanceData, such as recording a set of JavaScript-defined Error types to use for Rust errors. Though I suppose you could also do it generically by adding JsBox properties to the global object.

@dherman
Copy link
Collaborator

dherman commented Jun 10, 2021

I made this diagram to help me understand the entities and instances at runtime:
instances
IIUC, N-API gives you one slot for each .node instance at runtime. So Neon and all user and library crates that talk to Neon would have to share that one slot.

@kjvalencik This makes me feel like some sort of extensibility could be important, especially since Neon-related library crates might want to be able to leverage that global storage without stepping on each other's toes. If we just expose it as a mutable slot and allow anyone to overwrite it, it's hard for non-coordinating entities like libraries to use it cooperatively.

@kjvalencik
Copy link
Member

kjvalencik commented Jun 11, 2021

This makes me feel like some sort of extensibility could be important, especially since Neon-related library crates might want to be able to leverage that global storage without stepping on each other's toes

I think you found the crucial use case here. Without this, crates in the Neon ecosystem that aren't part of Neon proper can't use instance data without coordination with the addon.

I'm convinced. This also makes it clear to me that some type of marker (either suggestion works fine) instead of being stringly typed is important. It's better to statically prove they won't conflict.

@dherman
Copy link
Collaborator

dherman commented Jun 12, 2021

I’m still thinking about this but a few early thoughts based on the idea of separate modules being able to publish their own singleton data with this API:

  • I like the trait approach because I think traits are the idiomatic Rust mechanism for extensible/modular protocols.
  • A module might sometimes want to publish an immutable singleton. Since interior mutability is always an option as @kjvalencik mentioned, I think the more useful approach is for us to only provide immutable access, and modules can decide for themselves if they want to offer mutation methods on their singleton types they publish, which would use RefCell internally.
  • I can’t think of any way for us to guarantee that a singleton is initialized before use, since they are defined and used modularly. So I think we will have to offer some sort of general purpose initialization protocol. But I think we should be able to define it so a module can abstract an API over a singleton to ensure consumers never have to do the initialization themselves.
  • I think the initialization protocol in general might sometimes want the place in the code that is asking for the singleton to be able to use local variables to initialize the singleton, so the initialization protocol might need an associated type for the initialization protocol to be able to accept an input passed in from the call site. This is a little clunky but I’m not sure if there’s a more ergonomic alternative.

This is just where my thinking is so far. I’ll sketch up a mock API soon we can look at and discuss.

@kjvalencik
Copy link
Member

The mutability thing is tricky. Neon can safely give mutable access to the instance data only if the lifetime is associated with a borrow of Context. This is safe because it's impossible to alias the pointer.

However, this isn't super useful because you wouldn't be able to use Context while holding a reference to the instance data and that prevents a lot of useful patterns. Instead, reference counting would need to be introduced (Rc) and you're back to an interior mutability pattern if you need.

Additionally, even with only immutable references, it's important to be careful not to simply let the value be replaced because this could result in either dropping the previous value while something still references it or leaking the previous value.

I'm thinking that we might be able to borrow ideas of the once_cell crate. Conceptually, it's very similar--storing a const initialized on first access. A similar API might make it familiar to users.

@kjvalencik
Copy link
Member

kjvalencik commented Jun 25, 2021

@dherman and I had a call today and @dherman said something really insightful. Instead of trying to make a surrogate key, we can make the key the actual type used for accessing the data. I really like this design because it feels familiar and matches how users typically use statics.

static MY_DATA: neon::Cell<MyData> = neon::Cell::new();

/* ... */

let data = MY_DATA.get_or_init(&mut cx, MyData::new);

This could be implemented by internally holding a lazily initialized id that counts the position in a Vec<Box<dyn Any>> stored in the actual instance data.

use std::marker::PhantomData;

use once_cell::sync::OnceCell;

struct Cell<T> {
    id: OnceCell<usize>,
    _data: PhantomData<T>,
}

impl<T> Cell<T> {
    pub const fn new() -> Self {
        Self {
            id: OnceCell::new(),
            _data: PhantomData,
        }
    }
}

impl <T: Send + 'static> Cell<T> {
    fn get_or_init<'a, C, F>(&'static self, cx: &mut C, f: F) -> &T
    where
        C: Context<'a>,
        F: FnOnce() -> T,
    {
        todo!()
    }
}

Once catch that makes this different from normal OnceCell is that we only provide methods on &'static self and not &self. This will cover most use cases while ensuring users know they are leaking static data that lives the life of the module.

Also, I realize this is essentially @jrose-signal's originally design, just slightly tweaked / inverted, so maybe I'm just slow to catch up. 😆

@jrose-signal
Copy link
Contributor Author

Unfortunately this isn't quite correct because you should be able to access the same data in multiple instances in the same program, which means you can't store an offset in the global data (because it might be different for each instance). That's why I went with using the address of the global itself as the key in my original plan, though I forgot that as a zero-sized type (in my version) it might not have a unique address! So there'd have to be a placeholder byte in there too.

@kjvalencik
Copy link
Member

@jrose-signal Yes, there would need to be a placeholder, but this is effectively the same design.

The index isn't stored in global data, it's stored in the key. The get would simply need to initialize the Vec with a bunch of None up until the proper length.

But, these are all implementation details. It could easily be a hashmap or a vec with ids or address or whatever. The piece that clarifies this more for me is treating the key like it's actually holding the data.

@jrose-signal
Copy link
Contributor Author

On the one hand I feel like that's weird since you could have code like this:

let x = MY_DATA.get_or_init(&mut main_cx, || …).clone();
worker_queue.send(|mut worker_cx| {
  let y = MY_DATA.get_or_init(&mut worker_cx, || x.clone());
  eprintln!("{} {}", x, y); // may be different
}

On the other hand, that's basically how thread-local storage works, and people have been treating that like a global in Rust (and not in Rust) forever.

@dherman
Copy link
Collaborator

dherman commented Jun 27, 2021

Oh right, TLS is an interesting precedent, great point. Our situation is very analogous, it’s basically “worker-local storage.”

Should we consider modeling the API after LocalKey and thread_local!? I like following precedent, but that macro feels a little old school Rust and a little weird. Maybe a more modern variation would be a proc macro attribute on static declarations?

#[neon::local]
static MY_DATA: Foo = /* ... */;

It does have that weirdness that you mention, @jrose-signal, that MY_DATA is not a Foo but a LocalKey<Foo> and can be in different states depending on the cx, but I might be comfortable with that by documenting the precedent and similarity with TLS.

@kjvalencik
Copy link
Member

I agree that it matches up nicely with TLS and naming the type neon::LocalKey (or @jrose-signal's original recommendation of InstanceKey) is a good.

With that said, I think the thread_local and LocalKey API is a rough spot in Rust and I don't think we should copy it. I still think mirroring OnceCell for an API would be better.

Maybe we can split the difference and borrow a little from both?

@kjvalencik
Copy link
Member

kjvalencik commented Jul 7, 2021

Updated summary of discussion:

pub struct InstanceKey<T: 'static> {
    _data: PhantomData<&'static mut T>,
    _empty: u8,
}

impl<T: 'static> InstanceKey<T> {
    pub const fn new() {
        Self {
            _data: PhantomData,
            _empty: 0,
        }
    }

    pub fn get_or_init<'a, C>(&'static self, cx: &C, f: F) -> &'a T
    where
        C: Context<'a>,
        F: FnOnce() -> T,
    {
        todo!()
    }
}

Usage:

use neon::InstanceKey;

static RUNTIME: InstanceKey<Runtime> = InstanceKey::new();

fn run_async(mut cx: FunctionContext) -> JsResult<JsUndefined> {
    let runtime = RUNTIME.get_or_init(|| Runtime::new());
    
    runtime.spawn(async {
        println!("Hello, World!");
    });

    Ok(cx.undefined())
}

Other potentially useful additions:

  • get_or_try_init for fallible initialization
  • neon::LazyInstanceKey for tying an initialization function directly to the key
    const RUNTIME: LazyInstanceKey<Runtime> = LazyInstanceKey::new(|| Runtime::new());

Mutability is omitted because there isn't an ergonomic way to provide it with static checks. Static checks would require blocking any calls that require &mut cx while data is held. Instead, interior mutability is pushed to the user similar to JsBox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants