Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for once_cell #74465

Open
3 of 9 tasks
KodrAus opened this issue Jul 18, 2020 · 130 comments
Open
3 of 9 tasks

Tracking Issue for once_cell #74465

KodrAus opened this issue Jul 18, 2020 · 130 comments
Assignees
Labels
A-concurrency Area: Concurrency related issues. B-unstable Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@KodrAus
Copy link
Contributor

KodrAus commented Jul 18, 2020

This is a tracking issue for the RFC "standard lazy types" (rust-lang/rfcs#2788).
The feature gate for the issue is #![feature(once_cell)].

Unstable API

// core::lazy

pub struct OnceCell<T> { .. }

impl<T> OnceCell<T> {
    pub const fn new() -> OnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
}
impl<T> From<T> for OnceCell<T>;
impl<T> Default for OnceCell<T>;
impl<T: Clone> Clone for OnceCell<T>;
impl<T: PartialEq> PartialEq for OnceCell<T>;
impl<T: Eq> Eq for OnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for OnceCell<T>;

pub struct Lazy<T, F = fn() -> T> { .. }

impl<T, F> Lazy<T, F> {
    pub const fn new(init: F) -> Lazy<T, F>;
}
impl<T, F: FnOnce() -> T> Lazy<T, F> {
    pub fn force(this: &Lazy<T, F>) -> &T;
}
impl<T: Default> Default for Lazy<T>;
impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>;
impl<T: fmt::Debug, F> fmt::Debug for Lazy<T, F>;

// std::lazy

pub struct SyncOnceCell<T> { .. }

impl<T> SyncOnceCell<T> {
    pub const fn new() -> SyncOnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(mut self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
    fn is_initialized(&self) -> bool;
    fn initialize<F, E>(&self, f: F) -> Result<(), E> where F: FnOnce() -> Result<T, E>;
    unsafe fn get_unchecked(&self) -> &T;
    unsafe fn get_unchecked_mut(&mut self) -> &mut T;
}
impl<T> From<T> for SyncOnceCell<T>;
impl<T> Default for SyncOnceCell<T>;
impl<T: RefUnwindSafe + UnwindSafe> RefUnwindSafe for SyncOnceCell<T>;
impl<T: UnwindSafe> UnwindSafe for SyncOnceCell<T>;
impl<T: Clone> Clone for SyncOnceCell<T>;
impl<T: PartialEq> PartialEq for SyncOnceCell<T>;
impl<T: Eq> Eq for SyncOnceCell<T>;
unsafe impl<T: Sync + Send> Sync for SyncOnceCell<T>;
unsafe impl<T: Send> Send for SyncOnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for SyncOnceCell<T>;

pub struct SyncLazy<T, F = fn() -> T>;

impl<T, F> SyncLazy<T, F> {
    pub const fn new(f: F) -> SyncLazy<T, F>;
}
impl<T, F: FnOnce() -> T> SyncLazy<T, F> {
    pub fn force(this: &SyncLazy<T, F>) -> &T;
}
impl<T, F: FnOnce() -> T> Deref for SyncLazy<T, F>;
impl<T: Default> Default for SyncLazy<T>;
impl<T, F: UnwindSafe> RefUnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: RefUnwindSafe;
impl<T, F: UnwindSafe> UnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: UnwindSafe;
unsafe impl<T, F: Send> Sync for SyncLazy<T, F> where SyncOnceCell<T>: Sync;
impl<T: fmt::Debug, F> fmt::Debug for SyncLazy<T, F>;

Steps

Unresolved Questions

Inlined from #72414:

Implementation history

@KodrAus KodrAus added C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. needs-rfc This change is large or controversial enough that it should have an (e-)RFC accepted before doing it labels Jul 18, 2020
KodrAus added a commit to KodrAus/rust that referenced this issue Jul 18, 2020
@JohnTitor JohnTitor added the B-unstable Implemented in the nightly compiler and unstable. label Jul 18, 2020
chansuke pushed a commit to chansuke/rust that referenced this issue Jul 20, 2020
@matklad
Copy link
Member

matklad commented Jul 24, 2020

Let's cross-out the "should get be blocking?" concern. I decided against this for once_cell, for the following reasons:

  • it's makes Clone, Eq, Debug blocking, which is surprising
  • the original issue that prompted this question used Lazy, and Lazy is immune from this issue, as it always uses blocking get_or_init.

@matklad
Copy link
Member

matklad commented Jul 24, 2020

Added two more open questions from the RFC.

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jul 24, 2020
@matklad
Copy link
Member

matklad commented Jul 27, 2020

I've added a summary of proposed API to the issue description.

I wonder if makes sense for @rust-lang/libs to do a sort of "API review" here: this is a pretty big chunk of API, and we tried to avoid bike shedding on the RFC.

@matklad
Copy link
Member

matklad commented Oct 2, 2020

Here's an interesting use-case for non-blocking subset of OnceCell -- building cyclic data structures: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4eceeefc224cdcc719962a9a0e1f72fc

@withoutboats
Copy link
Contributor

I strongly expect a method called get to be nonblocking. I am softly in favor of adding a wait API that blocks, but would prefer that it be added in a separate feature later based on demand.

@matklad
Copy link
Member

matklad commented Oct 2, 2020

Yeah, to be clear, there's a consensus that get should be non-blocking, the question is resolved. What is not completely solved in my mind, is where we should have core::lazy::SyncOnceCell. That's possible in theory (by only providing get and set methods), but would be hacky to implement, and of questionable usefulness. The above example is a new use-case for that thing.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 4, 2020

Naming. I'm ok to just roll with the Sync prefix like SyncLazy for now, but have a personal preference for Atomic like AtomicLazy.

I don't think Atomic would be the right word for these. Rust's Atomic types and operations (including Arc) never block and never involve the operating system's scheduler (they're all defined in core or alloc, not std). They're all directly based on the basic atomic operations supported by the processor architecture itself.

I'd expect something that's named AtomicLazy/AtomicOnceCell to do the same. And that's something that already exists as another valid strategy for certain Lazy/OnceCell-like types: Instead of blocking all but one thread when multiple threads encounter an 'empty' cell, it wouldn't block but run the initialization function on each of these threads. The first thread to finish atomically stores its initialized value in the cell, and the others simply drop() the value they created.

The std does something similar in a few places (although not wrapped in a type or publicly exposed). For example, here:

// There is no locking here. It's okay if this is executed by multiple threads in
// parallel. `lookup` will result in the same value, and it's okay if they overwrite
// eachothers result as long as they do so atomically. We don't need any guarantees

And here:

match self.lock.load(Ordering::SeqCst) {
0 => {}
n => return n as *const _,
}
let inner = box Inner { remutex: ReentrantMutex::uninitialized(), held: Cell::new(false) };
inner.remutex.init();
let inner = Box::into_raw(inner);
match self.lock.compare_and_swap(0, inner as usize, Ordering::SeqCst) {
0 => inner,
n => {
Box::from_raw(inner).remutex.destroy();
n as *const _
}
}
}

And another example in parking_lot.

So, since this Lazy/OnceCell implementation does block (such that the initialization function can be FnOnce and the type doesn't have to fit in an atomic), and an alternative purely atomic strategy does exist, I'd really avoid using the word 'atomic' in the name here.

@matklad
Copy link
Member

matklad commented Nov 11, 2020

I've added non-blocking flavors of the primitives to the once_cell crate: https://docs.rs/once_cell/1.5.1/once_cell/race/index.html. They are restricted (can be provided only for atomic types), but are compatible with no_std.

It seems to me that "first one wins" is a better semantics if you can't block, so I am going to resolve Sync no_std subset like this:

  • only std supports sync module, as it requires synchronization (std::thread)
  • if you need something like OnceCell in no-std, your choices are
    • use race module from once_cell with different API, which might or might not be upifted to std some day
    • use some version based on spin locks (this risks @matklad crashing into issue tracker of your project with explanation of how pure spin locks are almost always wrong).

I've ticked this question's box.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 11, 2020

It's a bit of a shame that Lazy uses a fn() -> T by default. With that type, it needlessly stores a function pointer even if it is constant. Would it require big language changes to make it work without storing a function pointer (so, a closure as ZST), while still being as easy to use? Maybe if captureless closures would implement some kind of const Default? And some way to not have to name the full type in statics. That's probably not going to happen very soon, but it'd be a shame if this becomes possible and we can't improve Lazy because the fn() -> T version was already stabilized. Is there another way to do this?

@phil-opp
Copy link
Contributor

@matklad

They are restricted (can be provided only for atomic types), but are compatible with no_std.

This seems like a very major restriction, which rules out most use cases of SyncLazy/SyncOnceCell. So I don't think that this really resolves the sync no_std use case.

I agree that spinlocks have their problems, but they're still better than using static mut instead. I understand that we don't want to hardcode SyncLazy/SyncOnceCell to use a deadlock-prone spinlock on no_std, but maybe it's possible to let the user supply their own implementation of a Mutex/Once primitive?

This could be implemented using a second generic argument on the Sync* types (or maybe even on the Mutex/Once types). This way, users could specify how the synchronization should happen based on their application. A single-threaded embedded application could just disable interrupts for the critical section, a toy OS kernel could use a spinlock, and projects with their own threading system could supply a "proper" synchronization primitive. Maybe I'm missing something, but this seems like a good solution to me.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 11, 2020

Some thoughts about &mut self functions on (Sync)OnceCell:

These types have both &mut self and &self functions, but the &mut interface seems somewhat incomplete, and it's a bit tricky to pick names for overlapping functionality. For example, take can only be done with unique access, so fn take(&mut self) -> Option<T> makes sense. But set can be done on an empty cell through a shared reference, or on a cell in any state through an unique reference. So both fn set(&self, value: T) -> Result<&T, T>; (like Cell::set) and fn set(&mut self, value: T) -> &mut T; (like Option::insert) would make sense.

Maybe if the get_or_insert/get_or_insert_with pair already provides a 'one time set' functionality, set (or insert?) should be the &mut self version instead?

@matklad
Copy link
Member

matklad commented Nov 12, 2020

Unresolved question: method naming

Currently, we have get_or_init and get_or_try_init. Are those good names? Here are some alternatives (see also #78943)

  1. get_or_init, get_or_try_init
  2. get_or_insert_with, try_get_or_insert_with
  3. get_with, try_get_with

1. Pro: Status Quo, name specific to OnceCell (you see x.get_or_init, you know x is one cell). Con: doesn't feel like it perfectly fits with other std names.
2. Pro: matches Option::get_or_inser_with exactly. Con: for OnceCell, unlike Option, this is the core API. It's a shame that its a mouthful.
3. Pro: short, matches std conventions. Con: _with without or suggest that the closure will be always called, but it's not really the case.

I've though more about this, and I think I actually like 3 most. It's Con seems like a Pro to me. In the typical use-case, you only use _with methods:

impl Spam {
  fn get_eggs(&self) -> &Eggs {
    self.eggs.get_with(|| Eggs::cook())
  }
}

So, the closure is sort-of always called, it's just cached. Not sure if I my explanation makes sense, but I do feel that this is different from, eg, Entry::or_insert_with.

@matklad
Copy link
Member

matklad commented Nov 12, 2020

@phil-opp: I think it is rather certain that, even if std provides a subset of OnceCell for no_std, it will be non-blocking subset (set and get).

It certainly is possible to use spinlocks, or make sync::OnceCell parametric (compile-time or run-time) over blocking primitives. I am pretty sure that should be left for crates.io crate though.

I feel one important criterion for inclusion in std is "design space has a solution with a single canonical API". OnceCell API seem canonical. If we add paramters, the design space inflates. Even if some solution would be better, it won't be obviously canonical, and would be better left to crates.io.

@matklad
Copy link
Member

matklad commented Nov 12, 2020

It's a bit of a shame that Lazy uses a fn() -> T by default.

@m-ou-se yeah, totally agree that this is a hack and feels like a hack. It works well enough in practice, but there's one gotcha: specifying type for a local lazy does not work:

let x = 92;
let works1: = Lazy::new(|| x.to_string());
let broken: Lazy<String> = Lazy::new(|| x.to_string());
let works2: Lazy<String, _> =  Lazy::new(|| x.to_string());

The broken variant is something that people occasionally write, and it fails with a somewhat confusing error. If we remove the default type, it will still be broken, but folks won't have intuition that "one parameter should be enough".

One easy way out here is to stabilize only OnceCell, and punt on Lazy for the time being. OnceCell contains all the tricky bit, and Lazy is just some syntactic sugar. For me (and probably for some, but not all, other folks) writing

fn global_state() -> &'static GlobalState {
  static INSTANCE: SyncOnceCell<GlobalState> = SyncOnceCell::new();
  INSTANCE.get_or_init(GlobalState::default)
}

doesn't feel like a deal breaker.I'd prefer that to pulling a 3rd party dep (lazy_staic or once_cell).

That said, I think Lazy's hack is worth stabilizing. Even if in the future we'll be able to write:

static GLOBAL_STATE: Lazy<GlobalState, _> = Lazy::new(GlobalState::default);

I don't see a lot of practical problems with

static GLOBAL_STATE: Lazy<GlobalState> = Lazy::new(GlobalState::default);

working as well.

@nwn
Copy link

nwn commented Nov 12, 2020

Unresolved question: method naming

1. `get_or_init`, `get_or_try_init`

2. `get_or_insert_with`, `try_get_or_insert_with`

3. `get_with`, `try_get_with`

I think 1 is the most appropriate. The init terminology makes more sense than insert in the context of a once cell. Depending on whether we expose a direct value initializer, it may be more consistent to add _with to these methods, though.

I've though more about this, and I think I actually like 3 most. It's Con seems like a Pro to me. In the typical use-case, you only use _with methods:

[...]

So, the closure is sort-of always called, it's just cached. Not sure if I my explanation makes sense, but I do feel that this is different from, eg, Entry::or_insert_with.

This doesn't seem very intuitive to me and isn't always true when there are multiple points of initialization. For example, consider:

impl Spam {
    fn get_eggs(&self, cooked: bool) -> &Eggs {
        if cooked {
            self.eggs.set(Eggs::cook());
        }
        self.eggs.get_with(|| Eggs::raw())
    }
}

In this case, the closure may not run and in fact a different value has been cached. I think get_or_init_with would make this case more clear.

@raphaelcohn
Copy link

Something I've recently got bitten by is the need to manage which memory allocator a memory uses. I've been workign wit ha design that has a different global memory allocator when running threads or coroutines (so restricting a coroutine to a maximum amount of memory). This could be thought of as a bit of a hack; one of the long-term design decisions of early Rust that still bites is not making the memory allocator type explicit in the standard collections.

With a lazy, the challenge becomes ensuring that they're all allocated using the same memory allocator.

@briansmith
Copy link

No, repr(C) guarantee order, but compatibility only with most used layout of C compilers of the target.

The C standards require the address of the first field to be the address of the structure, which is why I suggested #[repr(c)], putting the value at the start of the field, and avoiding using any non-transparent wrappers like Option around the field.

Anyway, I don't have a strong opinion about whether to do extra work to support the ability of non-Rust code to be able to access the value.

@tgross35
Copy link
Contributor

tgross35 commented Dec 2, 2022

I don't think there's much benefit to providing any sort of guarantee on internal layout - any alternative to Option means mimicing its behavior in a separate place, and losing optimizations geared at Option (e.g. niches).

For any Rust + C project that already has a good reason to use a Rust OnceCell, I really think the correct solution is to useget(), get_mut(), get_or_init(), etc. and wrap them in something extern "C", or pass their result to C as applicable. Otherwise, you're just rewriting those exact functions in C

rib pushed a commit to rust-mobile/ndk-glue that referenced this issue Dec 6, 2022
Piggybacking on the [motivation in winit]: `lazy_static!` is a macro
whereas `once_cell` achieves the same using generics.  Its
implementation has also been [proposed for inclusion in `std`], making
it easier for us to switch to a standardized version if/when that
happens.  The author of that winit PR is making this change to many more
crates, slowly turning the scales in favour of `once_cell` in our
dependency tree too.

Furthermore `lazy_static` hasn't published any updates for 3 years, and
the new syntax is closer for dropping this wrapping completely when the
necessary constructors become `const` (i.e. switching to `parking_lot`
will give us a [`const fn new()` on `RwLock`]) or this feature lands in stable
`std`.

[motivation in winit]: rust-windowing/winit#2313
[proposed for inclusion in `std`]: rust-lang/rust#74465
[`const fn new()` on `RwLock`]: https://docs.rs/lock_api/latest/lock_api/struct.RwLock.html#method.new
@tgross35
Copy link
Contributor

I've opened partial stabilization PR #105587 for OnceCell and OnceLock, I believe a FCP for those would be next

@inquisitivecrystal inquisitivecrystal removed the needs-rfc This change is large or controversial enough that it should have an (e-)RFC accepted before doing it label Dec 12, 2022
@SUPERCILEX
Copy link
Contributor

Can we add fn into_inner(self) -> Option<T> to LazyCell? That'd be helpful when doing things only if the LazyCell fired.

@eggyal
Copy link
Contributor

eggyal commented Dec 26, 2022

Can we add fn into_inner(self) -> Option<T> to LazyCell? That'd be helpful when doing things only if the LazyCell fired.

Wouldn't fn is_initialized(this: &Self) -> bool or perhaps fn get(this: &Self) -> Option<&T> be more useful, as they don't require taking ownership? (As with other smart pointers, these are associated functions to avoid conflicts with methods of the inner type T).

@SUPERCILEX
Copy link
Contributor

Maybe for others, but for my use case I specifically need to take ownership.

@SUPERCILEX
Copy link
Contributor

Went ahead and opened a PR: #106152

@SimonSapin
Copy link
Contributor

SimonSapin commented Dec 26, 2022

If you have ownership is it useful to have a LazyCell at all instead of Option? Or is there some scenario where you first need to initialize through a shared reference, and later recover full ownership?

@SUPERCILEX
Copy link
Contributor

Well yeah but then I have to manage lazy initialization myself.

@SimonSapin
Copy link
Contributor

@SUPERCILEX
Copy link
Contributor

TIL! That's ever so slightly more annoying b/c you have to carry around the option and closure separately (or wrap them in your own type), but I'd be ok with having my PR closed if we think you should use this instead.

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 30, 2022
Add #[inline] markers to once_cell methods

Added inline markers to all simple methods under the `once_cell` feature. Relates to rust-lang#74465 and  rust-lang#105587

This should not block rust-lang#105587
Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023
Add #[inline] markers to once_cell methods

Added inline markers to all simple methods under the `once_cell` feature. Relates to rust-lang#74465 and  rust-lang#105587

This should not block rust-lang#105587
@elichai
Copy link
Contributor

elichai commented Jan 19, 2023

I've added non-blocking flavors of the primitives to the once_cell crate: https://docs.rs/once_cell/1.5.1/once_cell/race/index.html. They are restricted (can be provided only for atomic types), but are compatible with no_std.

FWIW, I use once_cell a lot for initializing a cryptographic context, and usually, the racy option is the one that I actually want (parking a thread can be more expensive than just initializing the context)

This kind of use case might disappear once const fn becomes powerful enough that I'll be able to initialize these at compile time

@ydewit
Copy link

ydewit commented Feb 3, 2023

I have a question about using get_or_init in the OnceLock struct.

My use case involves two Tasks producing values (lhs and rhs), and I need to reduce the redex once both values are available. The order of these values becoming available is unknown.

Could OnceLock be used here? I was thinking of using one OnceLock shared between two Tasks, and once a Task produces its value, it calls get_or_init. If the OnceLock is empty, it will be set, otherwise, I would get the existing value. However, I am not sure how to determine which value was returned in order to process the redex.

As I understand it, after the get_or_init call, the (Boxed) value will be moved, and I can't compare pointers.

My question is: could get_or_init take an Option with the current value as a parameter, or is there a way to map over OnceLock to either use or set its value?

@ydewit
Copy link

ydewit commented Feb 3, 2023

It just realized that in my specific case, one of the values has a positive polarity and the other one negative. So I can use OnceLock as is by checking the polarity of the cell in the OnceLock. In any case, I think the question above still holds.

mqudsi added a commit to fish-shell/fish-shell that referenced this issue Feb 5, 2023
lazy_static has better ergonomics at the call/access sites (it returns a
reference to the type directly, whereas with once_cell we get a static Lazy<T>
that we must dereference instead) but the once_cell api is slated for
integration into the standard library [0] and has been the "preferred" way to
declare static global variables w/ deferred initialization. It's also less
opaque and easier to comprehend how it works, I guess?

(Both `once_cell` and `lazy_static` are already in our dependency tree, so this
should have no detrimental effect on build times. It actually negligibly
*improves* build times by not using macros, reducing the amount of expansion the
compiler has to do by a miniscule amount.)

[0]: rust-lang/rust#74465
brandonweeks added a commit to google/native-pkcs11 that referenced this issue Feb 6, 2023
More similar to the upcoming std implementation:

rust-lang/rust#74465
brandonweeks added a commit to google/native-pkcs11 that referenced this issue Feb 6, 2023
More similar to the upcoming std implementation:

rust-lang/rust#74465
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Concurrency related issues. B-unstable Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests