Feature request: expose per-instance data #728

jrose-signal · 2021-05-07T20:58:36Z

N-API provides the rather limited napi_set_instance_data to allow associating a single word of data with an instance. Neon already uses this API to store its own per-instance data, but it would be great if that were exposed to users in a convenient way. A possible shape for the API:

pub struct PerInstanceKey<T: 'static>(PhantomData<&'static mut T>);
impl<T: 'static> PerInstanceKey<T> {
  pub const fn new() {
    Self(PhantomData)
  }
}

impl Context {
  pub fn set_instance_data<T: 'static>(&mut self, key: &'static PerInstanceKey<T>, value: T) -> Option<T> {
    let mut user_data: HashMap = &mut InstanceData::get(self).user_data;
    let previous_value = user_data.insert(key as *const (), Box::new(value as dyn Any));
    previous_value.map(|old_value| *old_value.downcast().expect("type-safe API"))
  }

  pub fn get_instance_data_mut<T: 'static>(&mut self, key: &'static PerInstanceKey<T>) -> Option<&mut T> {
    unimplemented!() // same as set_* simplified
  }
}

static MY_DATA: PerInstanceKey<String>;

cx.set_instance_data(&MY_DATA, "hello".to_string());

An alternate version of the API would include a function to construct the initial value if not present, more like thread_local!, but this version is the simplest I could think of.

The text was updated successfully, but these errors were encountered:

kjvalencik · 2021-05-10T13:12:21Z

@jrose-signal This looks like a good minimal API. I think it might need to return some type of borrow guard to ensure borrowed instance data lives long enough, unless set_instance_data is made to error if already set.

Also, I could see benefits of only providing an immutable reference, otherwise, we would need to bind to the lifetime of &mut Context to prevent aliasing.

It would be really neat to make this typesafe without Any by making Contex generic over the type. Unfortunately, I think that would be incredibly disruptive to the Neon API.

jrose-signal · 2021-05-10T17:05:19Z

I think it might need to return some type of borrow guard to ensure borrowed instance data lives long enough, unless set_instance_data is made to error if already set.

Also, I could see benefits of only providing an immutable reference, otherwise, we would need to bind to the lifetime of &mut Context to prevent aliasing.

In this most limited form it's safe because it's using the lifetime of the &mut Context, but that does rule out a lot of patterns (such as accessing two instance data values without copying). Something more flexible would definitely require more design.

It would be really neat to make this typesafe without Any by making Contex generic over the type. Unfortunately, I think that would be incredibly disruptive to the Neon API.

I'm not sure how it's related to Context, but yeah, doing this without boxing would mean a custom collection type like hlist. I decided it ultimately wasn't so bad since Box<dyn Any> is on par with a single JavaScript object.

kjvalencik · 2021-05-10T17:51:41Z

I think borrowing against the lifetime 'a in &mut 'a Context is going to be bad for most use cases because nearly all Neon APIs require a mutable reference to Context. It's used to enforce compile time guarantees about state of the VM.

In order to get around that, the type would need to be clone-able, but then, maybe it should be consistent with JsBox and require the user to use interior mutability.

It might be possible for Neon to have get_instance_data and get_instance_data_mut APIs that borrow against 'a in &mut Context<'a> and &'a mut Context respectively. It would internally need to use an Rc and the immutable version would need a borrow guard, but both enforce single thread-ness and ownership.

kjvalencik · 2021-06-08T18:58:07Z

I really like this idea, but I think we could make it a single function with an API similar to Entry.

Also, I don't think InstanceKey is necessary; users could manage that themselves.

jrose-signal · 2021-06-08T19:12:05Z

Yeah, InstanceKey is mostly just to avoid exposing the dyn Any and to provide a source of unique IDs, but a simpler string-keyed thing that explicitly used dyn Any would work too.

dherman · 2021-06-09T17:33:04Z

I love this idea too!

I like the Entry pattern, I bet we can make use of that.

I’m a little concerned if we totally eliminate PerInstanceKey it could diffuse responsibility for a nonobvious coding pattern (esp using phantom types) to users and lead to boilerplate.

One possible idea: what if instead of a PerInstanceKey nonce type we instead published a trait that let users tag a type as instance data? We could use TypeId to generate the nonce for the internal table. If users wanted more than one key for a type they could use a newtype pattern, which is effectively the same thing as PerInstanceKey but maybe a little more intuitive. Something like:

impl InstanceData for Foo {
    fn init<‘a, C: Context<‘a>>(cx: &mut C) -> Self {
        unimplemented!()
    }
}

dherman · 2021-06-10T06:25:05Z

@jrose-signal BTW can you say a little bit about your specific use cases for instance data? Concrete examples can he helpful for thinking about the design.

jrose-signal · 2021-06-10T17:30:55Z

Heh. My specific use case is probably not the best example: I wanted to store a unique value per-instance so that I could safely store a "current context" using something like scoped-tls and do synchronous calls when a future is resumed. Without a disambiguator between instances, there'd be no way to tell whether the Future can be resumed on the current context or if it's actually a worker context.

All of this is related to @indutny-signal's work making EventQueue/Channel have less overhead in Electron; the fastest way to execute something on a context is to already be on that context. There's more thinking for me to do here: it's not generally recommended to synchronously resume futures in Rust because you could deadlock yourself. But if we know the future that was fulfilled came from a JavaScript future, we're already on the microtask queue and don't have any other pending work.

But in general anything that can be computed once and cached benefits from being in PerInstanceData, such as recording a set of JavaScript-defined Error types to use for Rust errors. Though I suppose you could also do it generically by adding JsBox properties to the global object.

dherman · 2021-06-10T19:39:51Z

I made this diagram to help me understand the entities and instances at runtime:

IIUC, N-API gives you one slot for each .node instance at runtime. So Neon and all user and library crates that talk to Neon would have to share that one slot.

@kjvalencik This makes me feel like some sort of extensibility could be important, especially since Neon-related library crates might want to be able to leverage that global storage without stepping on each other's toes. If we just expose it as a mutable slot and allow anyone to overwrite it, it's hard for non-coordinating entities like libraries to use it cooperatively.

kjvalencik · 2021-06-11T12:58:50Z

This makes me feel like some sort of extensibility could be important, especially since Neon-related library crates might want to be able to leverage that global storage without stepping on each other's toes

I think you found the crucial use case here. Without this, crates in the Neon ecosystem that aren't part of Neon proper can't use instance data without coordination with the addon.

I'm convinced. This also makes it clear to me that some type of marker (either suggestion works fine) instead of being stringly typed is important. It's better to statically prove they won't conflict.

dherman · 2021-06-12T19:55:56Z

I’m still thinking about this but a few early thoughts based on the idea of separate modules being able to publish their own singleton data with this API:

I like the trait approach because I think traits are the idiomatic Rust mechanism for extensible/modular protocols.
A module might sometimes want to publish an immutable singleton. Since interior mutability is always an option as @kjvalencik mentioned, I think the more useful approach is for us to only provide immutable access, and modules can decide for themselves if they want to offer mutation methods on their singleton types they publish, which would use RefCell internally.
I can’t think of any way for us to guarantee that a singleton is initialized before use, since they are defined and used modularly. So I think we will have to offer some sort of general purpose initialization protocol. But I think we should be able to define it so a module can abstract an API over a singleton to ensure consumers never have to do the initialization themselves.
I think the initialization protocol in general might sometimes want the place in the code that is asking for the singleton to be able to use local variables to initialize the singleton, so the initialization protocol might need an associated type for the initialization protocol to be able to accept an input passed in from the call site. This is a little clunky but I’m not sure if there’s a more ergonomic alternative.

This is just where my thinking is so far. I’ll sketch up a mock API soon we can look at and discuss.

kjvalencik · 2021-06-14T00:46:19Z

The mutability thing is tricky. Neon can safely give mutable access to the instance data only if the lifetime is associated with a borrow of Context. This is safe because it's impossible to alias the pointer.

However, this isn't super useful because you wouldn't be able to use Context while holding a reference to the instance data and that prevents a lot of useful patterns. Instead, reference counting would need to be introduced (Rc) and you're back to an interior mutability pattern if you need.

Additionally, even with only immutable references, it's important to be careful not to simply let the value be replaced because this could result in either dropping the previous value while something still references it or leaking the previous value.

I'm thinking that we might be able to borrow ideas of the once_cell crate. Conceptually, it's very similar--storing a const initialized on first access. A similar API might make it familiar to users.

kjvalencik · 2021-06-25T18:07:41Z

@dherman and I had a call today and @dherman said something really insightful. Instead of trying to make a surrogate key, we can make the key the actual type used for accessing the data. I really like this design because it feels familiar and matches how users typically use statics.

static MY_DATA: neon::Cell<MyData> = neon::Cell::new();

/* ... */

let data = MY_DATA.get_or_init(&mut cx, MyData::new);

This could be implemented by internally holding a lazily initialized id that counts the position in a Vec<Box<dyn Any>> stored in the actual instance data.

use std::marker::PhantomData;

use once_cell::sync::OnceCell;

struct Cell<T> {
    id: OnceCell<usize>,
    _data: PhantomData<T>,
}

impl<T> Cell<T> {
    pub const fn new() -> Self {
        Self {
            id: OnceCell::new(),
            _data: PhantomData,
        }
    }
}

impl <T: Send + 'static> Cell<T> {
    fn get_or_init<'a, C, F>(&'static self, cx: &mut C, f: F) -> &T
    where
        C: Context<'a>,
        F: FnOnce() -> T,
    {
        todo!()
    }
}

Once catch that makes this different from normal OnceCell is that we only provide methods on &'static self and not &self. This will cover most use cases while ensuring users know they are leaking static data that lives the life of the module.

Also, I realize this is essentially @jrose-signal's originally design, just slightly tweaked / inverted, so maybe I'm just slow to catch up. 😆

jrose-signal · 2021-06-25T18:22:43Z

Unfortunately this isn't quite correct because you should be able to access the same data in multiple instances in the same program, which means you can't store an offset in the global data (because it might be different for each instance). That's why I went with using the address of the global itself as the key in my original plan, though I forgot that as a zero-sized type (in my version) it might not have a unique address! So there'd have to be a placeholder byte in there too.

kjvalencik · 2021-06-25T20:47:05Z

@jrose-signal Yes, there would need to be a placeholder, but this is effectively the same design.

The index isn't stored in global data, it's stored in the key. The get would simply need to initialize the Vec with a bunch of None up until the proper length.

But, these are all implementation details. It could easily be a hashmap or a vec with ids or address or whatever. The piece that clarifies this more for me is treating the key like it's actually holding the data.

jrose-signal · 2021-06-25T22:26:19Z

On the one hand I feel like that's weird since you could have code like this:

let x = MY_DATA.get_or_init(&mut main_cx, || …).clone();
worker_queue.send(|mut worker_cx| {
  let y = MY_DATA.get_or_init(&mut worker_cx, || x.clone());
  eprintln!("{} {}", x, y); // may be different
}

On the other hand, that's basically how thread-local storage works, and people have been treating that like a global in Rust (and not in Rust) forever.

dherman · 2021-06-27T08:28:09Z

Oh right, TLS is an interesting precedent, great point. Our situation is very analogous, it’s basically “worker-local storage.”

Should we consider modeling the API after LocalKey and thread_local!? I like following precedent, but that macro feels a little old school Rust and a little weird. Maybe a more modern variation would be a proc macro attribute on static declarations?

#[neon::local]
static MY_DATA: Foo = /* ... */;

It does have that weirdness that you mention, @jrose-signal, that MY_DATA is not a Foo but a LocalKey<Foo> and can be in different states depending on the cx, but I might be comfortable with that by documenting the precedent and similarity with TLS.

kjvalencik · 2021-06-28T14:15:49Z

I agree that it matches up nicely with TLS and naming the type neon::LocalKey (or @jrose-signal's original recommendation of InstanceKey) is a good.

With that said, I think the thread_local and LocalKey API is a rough spot in Rust and I don't think we should copy it. I still think mirroring OnceCell for an API would be better.

Maybe we can split the difference and borrow a little from both?

kjvalencik · 2021-07-07T16:31:51Z

Updated summary of discussion:

pub struct InstanceKey<T: 'static> {
    _data: PhantomData<&'static mut T>,
    _empty: u8,
}

impl<T: 'static> InstanceKey<T> {
    pub const fn new() {
        Self {
            _data: PhantomData,
            _empty: 0,
        }
    }

    pub fn get_or_init<'a, C>(&'static self, cx: &C, f: F) -> &'a T
    where
        C: Context<'a>,
        F: FnOnce() -> T,
    {
        todo!()
    }
}

Usage:

use neon::InstanceKey;

static RUNTIME: InstanceKey<Runtime> = InstanceKey::new();

fn run_async(mut cx: FunctionContext) -> JsResult<JsUndefined> {
    let runtime = RUNTIME.get_or_init(|| Runtime::new());
    
    runtime.spawn(async {
        println!("Hello, World!");
    });

    Ok(cx.undefined())
}

Other potentially useful additions:

get_or_try_init for fallible initialization

neon::LazyInstanceKey for tying an initialization function directly to the key

const RUNTIME: LazyInstanceKey<Runtime> = LazyInstanceKey::new(|| Runtime::new());

Mutability is omitted because there isn't an ergonomic way to provide it with static checks. Static checks would require blocking any calls that require &mut cx while data is held. Instead, interior mutability is pushed to the user similar to JsBox.

dherman mentioned this issue May 24, 2022

feature(neon): API for thread-local data #902

Merged

dherman closed this as completed in #902 Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: expose per-instance data #728

Feature request: expose per-instance data #728

jrose-signal commented May 7, 2021

kjvalencik commented May 10, 2021

jrose-signal commented May 10, 2021

kjvalencik commented May 10, 2021 •

edited

Loading

kjvalencik commented Jun 8, 2021

jrose-signal commented Jun 8, 2021

dherman commented Jun 9, 2021

dherman commented Jun 10, 2021 •

edited

Loading

jrose-signal commented Jun 10, 2021

dherman commented Jun 10, 2021 •

edited

Loading

kjvalencik commented Jun 11, 2021 •

edited

Loading

dherman commented Jun 12, 2021

kjvalencik commented Jun 14, 2021

kjvalencik commented Jun 25, 2021 •

edited

Loading

jrose-signal commented Jun 25, 2021

kjvalencik commented Jun 25, 2021

jrose-signal commented Jun 25, 2021

dherman commented Jun 27, 2021 •

edited

Loading

kjvalencik commented Jun 28, 2021

kjvalencik commented Jul 7, 2021 •

edited

Loading

Feature request: expose per-instance data #728

Feature request: expose per-instance data #728

Comments

jrose-signal commented May 7, 2021

kjvalencik commented May 10, 2021

jrose-signal commented May 10, 2021

kjvalencik commented May 10, 2021 • edited Loading

kjvalencik commented Jun 8, 2021

jrose-signal commented Jun 8, 2021

dherman commented Jun 9, 2021

dherman commented Jun 10, 2021 • edited Loading

jrose-signal commented Jun 10, 2021

dherman commented Jun 10, 2021 • edited Loading

kjvalencik commented Jun 11, 2021 • edited Loading

dherman commented Jun 12, 2021

kjvalencik commented Jun 14, 2021

kjvalencik commented Jun 25, 2021 • edited Loading

jrose-signal commented Jun 25, 2021

kjvalencik commented Jun 25, 2021

jrose-signal commented Jun 25, 2021

dherman commented Jun 27, 2021 • edited Loading

kjvalencik commented Jun 28, 2021

kjvalencik commented Jul 7, 2021 • edited Loading

kjvalencik commented May 10, 2021 •

edited

Loading

dherman commented Jun 10, 2021 •

edited

Loading

dherman commented Jun 10, 2021 •

edited

Loading

kjvalencik commented Jun 11, 2021 •

edited

Loading

kjvalencik commented Jun 25, 2021 •

edited

Loading

dherman commented Jun 27, 2021 •

edited

Loading

kjvalencik commented Jul 7, 2021 •

edited

Loading