Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(neon): API for thread-local data #902

Merged
merged 33 commits into from
Jun 9, 2022
Merged

feature(neon): API for thread-local data #902

merged 33 commits into from
Jun 9, 2022

Conversation

dherman
Copy link
Collaborator

@dherman dherman commented May 24, 2022

This PR adds a neon::thread::LocalKey API for storing thread-local data:

use neon::thread::LocalKey;

static THREAD_ID: LocalKey<u32> = LocalKey::new();

THREAD_ID.get_or_init(&mut cx, || x);

let thread_id: u32 = THREAD_ID.get(&mut cx).unwrap();

Closes #728.

crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/lifecycle.rs Outdated Show resolved Hide resolved
crates/neon/src/lifecycle.rs Outdated Show resolved Hide resolved
crates/neon/src/lifecycle.rs Outdated Show resolved Hide resolved
@kjvalencik
Copy link
Member

@dherman Something we need to consider for this feature is if we need internal reference counting to ensure the value isn't dropped early.

Specifically, is it possible for the VM to stop and call the InstanceData destructor while we are still in a Neon context? Hopefully that's not possible and would be a Node-API bug, but I'm not sure.

- `GlobalTable::default()` to avoid boilerplate `new()` method
- Rename `borrow()` and `borrow_mut()` to `get()` and `get_mut()`
- Add `'static` bound to global contents
- Use `cloned()` in test code
- `Global<T>::get()` returns an `Option<&T>`
- `Global<T>::get_or_init()` returns an `&T`
- Lifetime of returned reference is the inner `'cx` since the boxed reference is immutable
dherman and others added 2 commits May 25, 2022 12:22
@dherman
Copy link
Collaborator Author

dherman commented May 25, 2022

@jrose-signal We're playing around with two different versions of this API: an immutable one and a mutable one. (Either variation of the API should be more or less equally expressive, but they each make certain use cases more or less ergonomic.) I'll describe the pros and cons below but since you filed #728 I'm curious if you have a sense of whether your use cases would lean more on the mutable or immutable side?


Immutable

Pros

  • We can give out an &'cx T to the contents of the cell, which is live for the duration of the context (usually the duration of a Neon function)
  • Encourages the use of stable types such as constants or extend-only data structures, which is generally safer for global data

Cons

  • Any mutability needs to be wrapped in interior mutability e.g. RefCell

Mutable

Pros

  • More convenient for arbitrary mutable datatypes

Cons

  • We can't safely hand out a long-lived reference, i.e. Global::get(&'a mut cx) can only produce an Option<&'a T>
  • Most methods end up locking the &mut cx context, requiring users to work around borrowck more (e.g. by cloning the contents)
  • Requires either extra helper methods (e.g. Global::take()) for common cases, or an extra layer of internal boxing in order to vend pointers to the cell's Option<T> so users can take advantage of the full std::option::Option APIs

@jrose-signal
Copy link
Contributor

I'm afraid I've paged out the scheme I was going to use with this feature, especially since @indutny-signal successfully got Node to speed up "running this microtask has put another task on the microtask queue". I think I was just going to stick a unique ID in each instance and then have an API that did something like the following:

fn run_or_send(self, cx: &mut impl Context<'_>) {
  if UNIQUE_ID.get(cx) == self.target_context_unique_id {
    self.run(cx);
  } else {
    let channel = self.channel.clone();
    channel.send(move |cx| self.run(cx));
  }
}

This is not the most interesting use of instance-global data, but it is perfectly satisfied by the immutable API.


I haven't looked at your implementation, but it seems possible for you to provide both APIs, fn get<'cx>(&self, cx: &Context<'cx>) -> &'cx Self::Target and fn get_mut<'a>(&self, &'a mut Context<'_>) -> &'a mut Self::Target. But there is value in opinionated APIs as well, and I did completely gloss over the initialization part. So I guess it comes down to what someone would actually do with the mutable API where RefCell would be painful.

@jrose-signal
Copy link
Contributor

Oh yeah, I also had this use case:

But in general anything that can be computed once and cached benefits from being in PerInstanceData, such as recording a set of JavaScript-defined Error types to use for Rust errors.

This is a bit trickier, because the initialization logic wants to use the Context, and that could end up calling some other Neon function. I think that means I wouldn't be able to use get_or_init; I'd have to get, check for None, do some other stuff, and then get_or_init.

@kjvalencik
Copy link
Member

kjvalencik commented May 25, 2022

Good catch @jrose-signal on get_or_init.

@dherman This is use case we had discussed for the closure getting a reference to the context that was passed in. There's also benefit to a get_or_try_init in case you are using fallible Neon APIs.

Unfortunately, we can't have both versions of the API because it would allow having a mutable and immutable reference at the same time since they have different lifetimes.

@jrose-signal
Copy link
Contributor

jrose-signal commented May 25, 2022

To be clear, even if get_or_init took the Context, you could still end up in this scenario:

outer_function
 \ get_or_init
    \ some_js_function
       \ inner_function
          \ get_or_init

and you should not be able to init twice. So I don't think get_or_init should take the Context, but at the same time you can't just rely on get_or_init if you need the Context to do the init.

@dherman
Copy link
Collaborator Author

dherman commented May 26, 2022

I have seen this pattern a million times (an API using a closure to initialize a data structure, which makes the API re-entrant) and I still missed it! Thanks for the eagle eye, @jrose-signal.

I'm wondering if we can handle the double-initialization scenario automatically and panic. I think we can plausibly make the case that re-entrant re-initialization is a rare corner case and a bug, so it wouldn't need pollute the signature.

There's definitely precedent for having multiple variants to get_or_init such as Option::get_or_insert, Option::get_or_insert_default, Options::get_or_insert_with. The context version would pollute the API with NeonResults but having variants with simpler signatures maybe makes it feel more ergonomic.

Some possible sketches:

impl<T> Global<T> {
    fn get_or_init<'cx, 'a, C>(&self, cx: &'a mut C, value: T) -> &'cx T
    where
        C: Context<'cx>,
    { ... }

    fn get_or_init_with<'cx, 'a, C, F>(&self, cx: &'a mut C) -> NeonResult<&'cx T>
    where
        C: Context<'cx>,
        F: FnOnce(&mut C) -> NeonResult<T>,
    { ... }
}

impl<T: Default> Global<T> {
    fn get_or_init_default<'cx, 'a, C>(&self, cx: &'a mut C) -> &'cx T
    where
        C: Context<'cx>,
    {
        self.get_or_init_with(cx, Default::default)
    }
}

Thoughts?

@dherman
Copy link
Collaborator Author

dherman commented May 26, 2022

Oops, the Default version was assuming a non-context-taking variant. Maybe we'd have two versions of the callback API, one that takes a context and one that doesn't? There's certainly a danger of API bloat, but I think there's some budget here, given that many of the variants are just leaning on common Rust idioms like Default.

- Also rename `get_or_init` to `get_or_init_with`
- Also add `get_or_init` that takes an owned init value
@dherman
Copy link
Collaborator Author

dherman commented May 27, 2022

I pushed an implementation of my ideas above. See what you think? @jrose-signal I think this might work for your use case without you having to do a complicated get + get_or_init dance, because it simply does the double-initialization checking for you.

@kjvalencik
Copy link
Member

kjvalencik commented May 27, 2022

I could be re-entrant, but I think that's okay as long as we define the semantics. It doesn't need to panic, it could overwrite (since we know all other references of dropped). Overwrite semantics would be identical to if you hand wrote something like:

let data = if let Some(data) = DATA.get(&mut cx) {
    data
} else {
    DATA.insert(&mut cx, Data::new(cx.channel()));
}

- Uses an RAII pattern to ensure `get_or_try_init` always terminates cleanly
- All initialization paths are checked for the dirty state to avoid re-entrancy
- Also adds API docs and safety comments
@dherman
Copy link
Collaborator Author

dherman commented Jun 1, 2022

I think re-entrant initialization is going to be so rare that we should treat it as an error, but instead of waiting for the outer initialization to fail, we should fail on the inner initialization. I've pushed an implementation that does this checking by setting the state to a third "dirty" state during get_or_try_init() and then panics if any initialization occurs during that state.

Copy link
Member

@kjvalencik kjvalencik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited about this! The API looks great and the transaction code is robust. ❤️

crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/lifecycle.rs Outdated Show resolved Hide resolved
crates/neon/src/lifecycle.rs Show resolved Hide resolved
crates/neon/src/lifecycle.rs Show resolved Hide resolved
crates/neon/src/lifecycle.rs Show resolved Hide resolved
crates/neon/src/lifecycle.rs Show resolved Hide resolved
test/napi/lib/workers.js Outdated Show resolved Hide resolved
crates/neon/src/instance/mod.rs Outdated Show resolved Hide resolved
Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>
@dherman
Copy link
Collaborator Author

dherman commented Jun 8, 2022

Yeah I'm on board, and in fact, this made me realize that even instance is not a particularly helpful term, and we can fully align with std by calling this neon::thread::LocalKey and explaining this as "thread-local storage for JavaScript threads." Since JavaScript now has threads, and Node refers to them as such (e.g. require('node:worker_threads')), this fully aligns this API with the intuitions of TLS.

@jrose-signal
Copy link
Contributor

I was worried that someone would expect data set in one addon to be available in another one (per your diagram in #728 (comment)), but since LocalKeys aren't created with any sort of user-visible representation, they'd never be the "same" key across addons anyway, any more than two copies of the same global would be the same variable at runtime in two dynamic libraries. (C header semantics notwithstanding…) So that lets Neon ignore the distinction between "worker/thread-local" and "addon/instance-local".

@dherman
Copy link
Collaborator Author

dherman commented Jun 8, 2022

@jrose-signal Just to make sure I follow, does that mean you do like this idea of calling it thread-local? (I pushed the change quickly just to see what it looks like, but I'm totally open to feedback.)

@jrose-signal
Copy link
Contributor

I think it's not bad. I don't love it because JS might some day use "thread" to mean something else; calling it "worker-local" would probably be the equivalent using today's terminology. "instance-local" isn't wrong but pushes people to learn about instances when they may not need to, so I understand why you're trying to move away from it.

@dherman
Copy link
Collaborator Author

dherman commented Jun 8, 2022

That makes sense. I think the reason I favor thread over worker is that (a) the main thread isn't a worker, and (b) Node refers to workers as threads already anyway.

@jrose-signal
Copy link
Contributor

MDN refers to "threads" too so yeah, should be clear enough. (And of course there can be a "technically it's per-instance" section in the docs.)

@dherman
Copy link
Collaborator Author

dherman commented Jun 8, 2022

MDN refers to "threads" too so yeah, should be clear enough. (And of course there can be a "technically it's per-instance" section in the docs.)

I'll add something to the docs…

@jrose-signal
Copy link
Contributor

The other thing I'd definitely want to see in the docs is "a JS thread is not necessarily bound to one OS thread, so you should be using this and not std::thread APIs".

@dherman
Copy link
Collaborator Author

dherman commented Jun 8, 2022

I added some text to the docs based on these suggestions. Reviews/feedback welcome!

crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Outdated Show resolved Hide resolved
crates/neon/src/thread/mod.rs Show resolved Hide resolved
dherman and others added 7 commits June 8, 2022 18:23
- Eliminate `get_or_init` and rename `get_or_init_with` to `get_or_init`
- Add `# Panic` section to doc comment
Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>
Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>
Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>
Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>
Co-authored-by: K.J. Valencik <kjvalencik@gmail.com>
- Add addon lifecycle diagram
- Add panics notes to `Root::{into_inner, to_inner}`
- Replace "immutable" with "thread-safe" in list of safe cases
@dherman dherman requested a review from kjvalencik June 9, 2022 02:19
@kjvalencik
Copy link
Member

Looks great! Love the docs. :shipit:

@dherman dherman changed the title feature(neon): API for instance-global data feature(neon): API for thread-local data Jun 9, 2022
@dherman dherman merged commit f747e67 into main Jun 9, 2022
@dherman dherman deleted the instance-data branch June 9, 2022 14:32
@dherman
Copy link
Collaborator Author

dherman commented Jun 9, 2022

My sincere thanks to both @jrose-signal and @kjvalencik for all the excellent feedback, ideas, and reviews. I'm happy with how this API came together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: expose per-instance data
3 participants