feature(neon): API for thread-local data #902

dherman · 2022-05-24T04:58:40Z

This PR adds a neon::thread::LocalKey API for storing thread-local data:

use neon::thread::LocalKey;

static THREAD_ID: LocalKey<u32> = LocalKey::new();

THREAD_ID.get_or_init(&mut cx, || x);

let thread_id: u32 = THREAD_ID.get(&mut cx).unwrap();

Closes #728.

crates/neon/src/instance/mod.rs

crates/neon/src/lifecycle.rs

kjvalencik · 2022-05-25T13:13:07Z

@dherman Something we need to consider for this feature is if we need internal reference counting to ensure the value isn't dropped early.

Specifically, is it possible for the VM to stop and call the InstanceData destructor while we are still in a Neon context? Hopefully that's not possible and would be a Node-API bug, but I'm not sure.

- `GlobalTable::default()` to avoid boilerplate `new()` method - Rename `borrow()` and `borrow_mut()` to `get()` and `get_mut()` - Add `'static` bound to global contents - Use `cloned()` in test code

… `get()` method.

- `Global<T>::get()` returns an `Option<&T>` - `Global<T>::get_or_init()` returns an `&T` - Lifetime of returned reference is the inner `'cx` since the boxed reference is immutable

crates/neon/src/instance/mod.rs

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

crates/neon/src/instance/mod.rs

dherman · 2022-05-25T20:22:17Z

@jrose-signal We're playing around with two different versions of this API: an immutable one and a mutable one. (Either variation of the API should be more or less equally expressive, but they each make certain use cases more or less ergonomic.) I'll describe the pros and cons below but since you filed #728 I'm curious if you have a sense of whether your use cases would lean more on the mutable or immutable side?

Immutable

Pros

We can give out an &'cx T to the contents of the cell, which is live for the duration of the context (usually the duration of a Neon function)
Encourages the use of stable types such as constants or extend-only data structures, which is generally safer for global data

Cons

Any mutability needs to be wrapped in interior mutability e.g. RefCell

Mutable

Pros

More convenient for arbitrary mutable datatypes

Cons

We can't safely hand out a long-lived reference, i.e. Global::get(&'a mut cx) can only produce an Option<&'a T>
Most methods end up locking the &mut cx context, requiring users to work around borrowck more (e.g. by cloning the contents)
Requires either extra helper methods (e.g. Global::take()) for common cases, or an extra layer of internal boxing in order to vend pointers to the cell's Option<T> so users can take advantage of the full std::option::Option APIs

jrose-signal · 2022-05-25T22:38:06Z

I'm afraid I've paged out the scheme I was going to use with this feature, especially since @indutny-signal successfully got Node to speed up "running this microtask has put another task on the microtask queue". I think I was just going to stick a unique ID in each instance and then have an API that did something like the following:

fn run_or_send(self, cx: &mut impl Context<'_>) {
  if UNIQUE_ID.get(cx) == self.target_context_unique_id {
    self.run(cx);
  } else {
    let channel = self.channel.clone();
    channel.send(move |cx| self.run(cx));
  }
}

This is not the most interesting use of instance-global data, but it is perfectly satisfied by the immutable API.

I haven't looked at your implementation, but it seems possible for you to provide both APIs, fn get<'cx>(&self, cx: &Context<'cx>) -> &'cx Self::Target and fn get_mut<'a>(&self, &'a mut Context<'_>) -> &'a mut Self::Target. But there is value in opinionated APIs as well, and I did completely gloss over the initialization part. So I guess it comes down to what someone would actually do with the mutable API where RefCell would be painful.

jrose-signal · 2022-05-25T22:56:24Z

Oh yeah, I also had this use case:

But in general anything that can be computed once and cached benefits from being in PerInstanceData, such as recording a set of JavaScript-defined Error types to use for Rust errors.

This is a bit trickier, because the initialization logic wants to use the Context, and that could end up calling some other Neon function. I think that means I wouldn't be able to use get_or_init; I'd have to get, check for None, do some other stuff, and then get_or_init.

kjvalencik · 2022-05-25T23:18:27Z

Good catch @jrose-signal on get_or_init.

@dherman This is use case we had discussed for the closure getting a reference to the context that was passed in. There's also benefit to a get_or_try_init in case you are using fallible Neon APIs.

Unfortunately, we can't have both versions of the API because it would allow having a mutable and immutable reference at the same time since they have different lifetimes.

jrose-signal · 2022-05-25T23:25:19Z

To be clear, even if get_or_init took the Context, you could still end up in this scenario:

outer_function
 \ get_or_init
    \ some_js_function
       \ inner_function
          \ get_or_init

and you should not be able to init twice. So I don't think get_or_init should take the Context, but at the same time you can't just rely on get_or_init if you need the Context to do the init.

dherman · 2022-05-26T17:50:43Z

I have seen this pattern a million times (an API using a closure to initialize a data structure, which makes the API re-entrant) and I still missed it! Thanks for the eagle eye, @jrose-signal.

I'm wondering if we can handle the double-initialization scenario automatically and panic. I think we can plausibly make the case that re-entrant re-initialization is a rare corner case and a bug, so it wouldn't need pollute the signature.

There's definitely precedent for having multiple variants to get_or_init such as Option::get_or_insert, Option::get_or_insert_default, Options::get_or_insert_with. The context version would pollute the API with NeonResults but having variants with simpler signatures maybe makes it feel more ergonomic.

Some possible sketches:

impl<T> Global<T> {
    fn get_or_init<'cx, 'a, C>(&self, cx: &'a mut C, value: T) -> &'cx T
    where
        C: Context<'cx>,
    { ... }

    fn get_or_init_with<'cx, 'a, C, F>(&self, cx: &'a mut C) -> NeonResult<&'cx T>
    where
        C: Context<'cx>,
        F: FnOnce(&mut C) -> NeonResult<T>,
    { ... }
}

impl<T: Default> Global<T> {
    fn get_or_init_default<'cx, 'a, C>(&self, cx: &'a mut C) -> &'cx T
    where
        C: Context<'cx>,
    {
        self.get_or_init_with(cx, Default::default)
    }
}

Thoughts?

dherman · 2022-05-26T17:53:04Z

Oops, the Default version was assuming a non-context-taking variant. Maybe we'd have two versions of the callback API, one that takes a context and one that doesn't? There's certainly a danger of API bloat, but I think there's some budget here, given that many of the variants are just leaning on common Rust idioms like Default.

- Also rename `get_or_init` to `get_or_init_with` - Also add `get_or_init` that takes an owned init value

dherman · 2022-05-27T03:23:55Z

I pushed an implementation of my ideas above. See what you think? @jrose-signal I think this might work for your use case without you having to do a complicated get + get_or_init dance, because it simply does the double-initialization checking for you.

kjvalencik · 2022-05-27T16:33:19Z

I could be re-entrant, but I think that's okay as long as we define the semantics. It doesn't need to panic, it could overwrite (since we know all other references of dropped). Overwrite semantics would be identical to if you hand wrote something like:

let data = if let Some(data) = DATA.get(&mut cx) {
    data
} else {
    DATA.insert(&mut cx, Data::new(cx.channel()));
}

- Uses an RAII pattern to ensure `get_or_try_init` always terminates cleanly - All initialization paths are checked for the dirty state to avoid re-entrancy - Also adds API docs and safety comments

dherman · 2022-06-01T02:15:38Z

I think re-entrant initialization is going to be so rare that we should treat it as an error, but instead of waiting for the outer initialization to fail, we should fail on the inner initialization. I've pushed an implementation that does this checking by setting the state to a third "dirty" state during get_or_try_init() and then panics if any initialization occurs during that state.

kjvalencik

I'm really excited about this! The API looks great and the transaction code is robust. ❤️

crates/neon/src/instance/mod.rs

crates/neon/src/lifecycle.rs

test/napi/lib/workers.js

crates/neon/src/instance/mod.rs

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

dherman · 2022-06-08T17:23:41Z

Yeah I'm on board, and in fact, this made me realize that even instance is not a particularly helpful term, and we can fully align with std by calling this neon::thread::LocalKey and explaining this as "thread-local storage for JavaScript threads." Since JavaScript now has threads, and Node refers to them as such (e.g. require('node:worker_threads')), this fully aligns this API with the intuitions of TLS.

jrose-signal · 2022-06-08T18:17:33Z

I was worried that someone would expect data set in one addon to be available in another one (per your diagram in #728 (comment)), but since LocalKeys aren't created with any sort of user-visible representation, they'd never be the "same" key across addons anyway, any more than two copies of the same global would be the same variable at runtime in two dynamic libraries. (C header semantics notwithstanding…) So that lets Neon ignore the distinction between "worker/thread-local" and "addon/instance-local".

dherman · 2022-06-08T18:19:53Z

@jrose-signal Just to make sure I follow, does that mean you do like this idea of calling it thread-local? (I pushed the change quickly just to see what it looks like, but I'm totally open to feedback.)

jrose-signal · 2022-06-08T18:41:00Z

I think it's not bad. I don't love it because JS might some day use "thread" to mean something else; calling it "worker-local" would probably be the equivalent using today's terminology. "instance-local" isn't wrong but pushes people to learn about instances when they may not need to, so I understand why you're trying to move away from it.

dherman · 2022-06-08T18:41:50Z

That makes sense. I think the reason I favor thread over worker is that (a) the main thread isn't a worker, and (b) Node refers to workers as threads already anyway.

jrose-signal · 2022-06-08T18:46:33Z

MDN refers to "threads" too so yeah, should be clear enough. (And of course there can be a "technically it's per-instance" section in the docs.)

dherman · 2022-06-08T18:47:14Z

MDN refers to "threads" too so yeah, should be clear enough. (And of course there can be a "technically it's per-instance" section in the docs.)

I'll add something to the docs…

jrose-signal · 2022-06-08T18:48:37Z

The other thing I'd definitely want to see in the docs is "a JS thread is not necessarily bound to one OS thread, so you should be using this and not std::thread APIs".

…s and threads.

dherman · 2022-06-08T21:13:40Z

I added some text to the docs based on these suggestions. Reviews/feedback welcome!

crates/neon/src/thread/mod.rs

- Eliminate `get_or_init` and rename `get_or_init_with` to `get_or_init` - Add `# Panic` section to doc comment

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

- Add addon lifecycle diagram - Add panics notes to `Root::{into_inner, to_inner}` - Replace "immutable" with "thread-safe" in list of safe cases

kjvalencik · 2022-06-09T12:53:09Z

Looks great! Love the docs.

dherman · 2022-06-09T14:33:29Z

My sincere thanks to both @jrose-signal and @kjvalencik for all the excellent feedback, ideas, and reviews. I'm happy with how this API came together.

Add neon::instance::Global API for storing instance-global data.

94111d4

dherman force-pushed the instance-data branch from 7672b09 to 94111d4 Compare May 24, 2022 05:00

kjvalencik reviewed May 24, 2022

View reviewed changes

dherman added 4 commits May 25, 2022 10:51

Address review comments:

f872b76

- `GlobalTable::default()` to avoid boilerplate `new()` method - Rename `borrow()` and `borrow_mut()` to `get()` and `get_mut()` - Add `'static` bound to global contents - Use `cloned()` in test code

Use 'cx as the inner lifetime of contexts.

6f12663

Get rid of [] overloading for GlobalTable in favor of an inherent…

18e60f1

… `get()` method.

Immutable version of API:

b8b089b

- `Global<T>::get()` returns an `Option<&T>` - `Global<T>::get_or_init()` returns an `&T` - Lifetime of returned reference is the inner `'cx` since the boxed reference is immutable

kjvalencik reviewed May 25, 2022

View reviewed changes

crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved

dherman and others added 2 commits May 25, 2022 12:22

Explicitly name the types in the transmute

cf0d32a

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

Explicitly name the types in the transmute.

f5cd0ba

kjvalencik reviewed May 25, 2022

View reviewed changes

crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved

Add get_or_try_init and get_or_init_default

aee77f4

- Also rename `get_or_init` to `get_or_init_with` - Also add `get_or_init` that takes an owned init value

Protect re-entrant cases with "dirty" state checking

9075b76

- Uses an RAII pattern to ensure `get_or_try_init` always terminates cleanly - All initialization paths are checked for the dirty state to avoid re-entrancy - Also adds API docs and safety comments

dherman added 4 commits May 31, 2022 20:12

Use GlobalCellValue shorthand in type definition of GlobalCell.

b28f439

Prettier fixups

bbc4f22

Add a test for storing rooted objects in instance globals

ff75d2d

Minor style cleanup for TryInitTransaction::is_trying()

d9b8251

kjvalencik requested changes Jun 2, 2022

View reviewed changes

Global::new() can use the derived Default::default()

a66e511

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

Improvements to neon::instance top-level API docs

3338877

Rename neon::instance to neon::thread and Local to LocalKey

42eec50

Some more documentation text about the relationships between instance…

85d99f5

…s and threads.

kjvalencik reviewed Jun 8, 2022

View reviewed changes

dherman and others added 7 commits June 8, 2022 18:23

Addresses some of @kjvalencik's review suggestions:

c906fbc

- Eliminate `get_or_init` and rename `get_or_init_with` to `get_or_init` - Add `# Panic` section to doc comment

Clarify doc text

0f57620

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

Idiomatic Rust variable name in doc example

14a2fe4

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

Link to neon::main docs in doc comment

019c2d3

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

Clarifying doc text about cross-thread sharing

58f80b3

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

s/fail/panic/ in doc text

f3cc0ab

Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>

More docs improvements:

2005e61

- Add addon lifecycle diagram - Add panics notes to `Root::{into_inner, to_inner}` - Replace "immutable" with "thread-safe" in list of safe cases

dherman requested a review from kjvalencik June 9, 2022 02:19

dherman added 2 commits June 8, 2022 19:20

Link to crate::main in the docs instead of neon::main

d50ca63

More copy editing in the docs

ece0c02

kjvalencik approved these changes Jun 9, 2022

View reviewed changes

A few last copy-editing nits

310cb5b

dherman changed the title ~~feature(neon): API for instance-global data~~ feature(neon): API for thread-local data Jun 9, 2022

dherman merged commit f747e67 into main Jun 9, 2022

dherman deleted the instance-data branch June 9, 2022 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(neon): API for thread-local data #902

feature(neon): API for thread-local data #902

dherman commented May 24, 2022 •

edited

Loading

kjvalencik commented May 25, 2022

dherman commented May 25, 2022

jrose-signal commented May 25, 2022

jrose-signal commented May 25, 2022

kjvalencik commented May 25, 2022 •

edited

Loading

jrose-signal commented May 25, 2022 •

edited

Loading

dherman commented May 26, 2022

dherman commented May 26, 2022

dherman commented May 27, 2022

kjvalencik commented May 27, 2022 •

edited

Loading

dherman commented Jun 1, 2022

kjvalencik left a comment

dherman commented Jun 8, 2022 •

edited

Loading

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

kjvalencik commented Jun 9, 2022

dherman commented Jun 9, 2022

feature(neon): API for thread-local data #902

feature(neon): API for thread-local data #902

Conversation

dherman commented May 24, 2022 • edited Loading

kjvalencik commented May 25, 2022

dherman commented May 25, 2022

Immutable

Pros

Cons

Mutable

Pros

Cons

jrose-signal commented May 25, 2022

jrose-signal commented May 25, 2022

kjvalencik commented May 25, 2022 • edited Loading

jrose-signal commented May 25, 2022 • edited Loading

dherman commented May 26, 2022

dherman commented May 26, 2022

dherman commented May 27, 2022

kjvalencik commented May 27, 2022 • edited Loading

dherman commented Jun 1, 2022

kjvalencik left a comment

Choose a reason for hiding this comment

dherman commented Jun 8, 2022 • edited Loading

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

jrose-signal commented Jun 8, 2022

dherman commented Jun 8, 2022

kjvalencik commented Jun 9, 2022

dherman commented Jun 9, 2022

dherman commented May 24, 2022 •

edited

Loading

kjvalencik commented May 25, 2022 •

edited

Loading

jrose-signal commented May 25, 2022 •

edited

Loading

kjvalencik commented May 27, 2022 •

edited

Loading

dherman commented Jun 8, 2022 •

edited

Loading