Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box::new flavors keep multiplying #90

Open
the8472 opened this issue Aug 5, 2021 · 21 comments
Open

Box::new flavors keep multiplying #90

the8472 opened this issue Aug 5, 2021 · 21 comments

Comments

@the8472
Copy link
Member

the8472 commented Aug 5, 2021

While working on rust-lang/rust#87777 I noticed that variants of Box::new keep growing. Currently they follow a pattern of {try?}_new_{uninit, zeroed, _}_{slice?}_{in?}.

I think moving this to a builder would make sense. I think most dimensions could be modeled as an enum, generic or associated type, the rest would still need separate build methods.

I imagine it ideally working something like Box::<[u8]>::new().zeroed().in(Alloc).slice(100).assume_init(). Of course most of the time one wouldn't want all of that noise. To keep the fallible stuff separate there would be a separate entry point into the builder via Box::try_new()....

I recall that there was discussion about in-place initialization with initializer functions (guaranteed RVO?), so at least the zeroed/uninit/value dimension might grow another member.

@thomcc
Copy link
Member

thomcc commented Aug 5, 2021

I'd want some guarantee that in practice this ends up with good codegen (at least on all of -O[3sz]).

Allocation can be very hot, and my experience with using builders has been that they're best for non-performance intensive cases, and can generate bad code due to deficiencies in LLVM.

@the8472
Copy link
Member Author

the8472 commented Aug 5, 2021

I think it could, for example RawVec already determines some of its behavior via an enum and it still optimizes fine on the higher opt levels. Also "guarantee" and "in practice" is a bit contradictory ;). But we could add codegen tests.

https://github.com/rust-lang/rust/blob/61a941b8badbce727085c505068d72fa3e737f5b/library/alloc/src/raw_vec.rs#L22-L27

@thomcc
Copy link
Member

thomcc commented Aug 7, 2021

for example RawVec already determines some of its behavior via an enum and it still optimizes fine on the higher opt levels

My experience is that RawVec is the source of some surprising behavior around optimizations (in particular it's often not optimized well IME), but I don't the problems I've seen are related to that enum at all.

Also "guarantee" and "in practice" is a bit contradictory

Ah sure, I meant assurance rather than guarantee, maybe. Codegen tests would certainly alleviate my worries, especially if the builder will likely grow more and more options over time.

That said, if it's not something easy to set up for that part of the stdlib, it might be overkill.

@Kixunil
Copy link

Kixunil commented Feb 3, 2022

I don't see how builder would help here, can you give an example?

@the8472
Copy link
Member Author

the8472 commented Feb 3, 2022

Take

new(value)
new_in(value, allocator)
new_uninit()
new_uninit_in(allocator)
new_zeroed()
new_zeroed_in(allocator)

That's 6 methods for the 2 dimensions

  • initialization stragegy: {zeroed, uninit, by-value}
  • allocator choice: {global, custom}

With a builder it's 1 method to set an allocator (global being the default) and 2-3 methods to choose the initialization strategy.

And then there are additional dimensions such as

  • fallibility
  • size (for [T])

With a builder the dimensions add, without they multiply. E.g. today we don't have try_new_uninit_slice_in. Instead of adding yet another method to Box it would automatically fall out of the builder because it makes those aspects orthogonal.

@the8472
Copy link
Member Author

the8472 commented Feb 3, 2022

Also, the solution doesn't have to be a builder. Parameterized functions could work too. My concern is the proliferation of methods, I'm not married to a particular solution.

@Kixunil
Copy link

Kixunil commented Feb 4, 2022

The problem I see is they all affect the resulting type.

  • zeroed and uninit produce MaybeUninit<T>
  • by value produces T
  • allocator is in the type

This would have to be represented in type state which will create a bunch of new types. I'm not convinced it's better. That's also why I asked for specific example because maybe I'm missing some solution and seeing it in an example would be the best way to show me and even prove it's possible.

@the8472
Copy link
Member Author

the8472 commented Feb 4, 2022

Oh so you were asking for an example implementation? I thought you were asking for an example of method properties that could be orthogonalized.

This would have to be represented in type state

Box already is <T, A>, I think a builder wouldn't be much more complicated than that, except maybe needing associated types instead of generics. But I haven't worked out the details because, as I said before, that I don't advocate for a specific solution, I mainly wanted to highlight the issue.

@sollyucko
Copy link

What about something like this? The keys to reducing the number of types are requiring a specific ordering and taking advantage of generics. Also, it shouldn't be too difficult to add new options, especially at the end.

Pseudo-Rust:

fn Box<T: ? Sized>::builder() -> BoxBuilder1<T>;
fn Box<T: ? Sized, A: AllocRef>::builder_in(allocator: A) -> BoxBuilder1<T, A>;

fn BoxBuilder1<T, A: AllocRef>::init(value: T) -> BoxBuilder2<T, A>;
fn BoxBuilder1<T, A: AllocRef>::uninit() -> BoxBuilder2<MaybeUninit<T>, A>;
fn BoxBuilder1<T, A: AllocRef>::zeroed() -> BoxBuilder2<MaybeUninit<T>, A>;
fn BoxBuilder1<[T], A: AllocRef>::init(value: impl Unsize<[T]>) -> BoxBuilder2<[T], A>;
fn BoxBuilder1<[T], A: AllocRef>::init_copied(value: &[T]) -> BoxBuilder2<[T], A>;
fn BoxBuilder1<[T], A: AllocRef>::uninit(len: usize) -> BoxBuilder2<[MaybeUninit<T>], A>;
fn BoxBuilder1<[T], A: AllocRef>::zeroed(len: usize) -> BoxBuilder2<[MaybeUninit<T>], A>;

fn BoxBuilder2<T: ?Sized, A: AllocRef>::try_build() -> Result<Box<T, A>, A::Err>;
fn BoxBuilder2<T: ?Sized, A: AllocRef>::build() -> Box<T, A>;

@Kixunil
Copy link

Kixunil commented Feb 4, 2022

@sollyucko I think builder 1 and 2 should be reversed to enable zero-copy initialization. However it could cause problems around inference or so and slices would be problematic.

@Wodann
Copy link

Wodann commented Feb 9, 2022

Of course most of the time one wouldn't want all of that noise.

Is your goal to reduce noise for the user of the API? I feel like there are generally two target audiences for APIs:

  1. power users - that are trying to get maximum performance.
  2. casual users - that are looking for the most convenient way to get the job done.

To serve the needs of 1. you need to design low-level APIs that directly tie into the underlaying system. For 2. you can create wrapper/helper APIs that add simplifying logic.

Our low-level API should already be the Allocator API, which is reused in Box, RawVec, etc. It seems like we're currently bleeding that API to other classes built on top of it, making the lives of **2. ** harder. It seems like it'd suffice to just use the Allocator API when someone wants to get raw performance and directly use the result to create a box.

Pseudocode:

trait Allocator {
  fn try_allocate_*<T>(&self) -> Result<NonNull<T>, AllocError>;
}

impl<T, A: Allocator> Box<T, A> {
  fn from_raw(ptr: NonNull<T>, allocator: A) -> Box<T, A> {
    Box(ptr, allocator)
  } 
}

fn usage(allocator: &Allocator) {
  let foo = allocator.try_allocate_*::<T>().map(|a| Box::from_raw(a, allocator).expect("Failed to allocate.");
}

If there are concerns about safety (mismatched allocation & allocator), we could (re-)consider using a "result type":

struct Allocation<T, A: Allocator> {
  ptr: NonNull<T>,
  allocator: A,
}

allowing:

impl<T, A: Allocator> From<Allocation<T, A> for Box<T, A> {
  fn from(allocation: Allocation<T, A>) -> Box<T, A> {
    Box(allocation.ptr, allocation.allocator)
  }
}

@berkus
Copy link

berkus commented Feb 17, 2022

Note to self: probably where context and capabilities could be super helpful.

@the8472
Copy link
Member Author

the8472 commented Apr 29, 2022

An additional advantage of a builder is that it could build Box, Rc, Arc and possibly other types without multiplying all of the methods. Essentially the outpout container is yet another dimension.

@tgross35
Copy link

Do you have an idea what the builder would look like here? I've played with it a bit and can't figure out a good way to handle all the conversions among types

If this is looking like a way forward, it would be nice to have something in nightly for a while

@Jules-Bertholet
Copy link

I recall that there was discussion about in-place initialization with initializer functions (guaranteed RVO?)

rust-lang/rfcs#2884

@tgross35
Copy link

tgross35 commented Mar 8, 2024

I might have something workable-ish here

#[derive(Debug, Clone)]
pub struct BoxBuilder<T: ?Sized, D: ?Sized = Init<T>, A = Global, F = Infallible> {
    zeroed: bool,
    alloc: A,
    slice_len: usize,
    falliblity: F,
    ty: PhantomData<T>,
    data: D,
}

fn main() {
    let builder = BoxBuilder::new(1234u32);
    dbg!(&builder);

    let v = builder.clone().build_box();
    assert_eq!(v, Box::new(1234u32));

    let v = builder.clone().build_rc();
    assert_eq!(v, Rc::new(1234u32));

    let v = builder.clone().build_arc();
    assert_eq!(v, Arc::new(1234u32));

    let v = builder.clone().fallible().build_box();
    assert_eq!(v, Ok(Box::new(1234u32)));

    let v = BoxBuilder::new_uninit().zeroed().build_box();
    unsafe { assert_eq!(v.assume_init(), Box::new(0)) };
}

Full long code https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=1e4ee1bb361af4f6b0807ccc4313508b

It currently can't handle unsized types so slices do not work, but that is probably fixable. Generics look pretty ugly, maybe some could be combined.

@the8472
Copy link
Member Author

the8472 commented Jul 6, 2024

rust-lang/rust#127415 will increase the counter by 2.

@AljoschaMeyer
Copy link

You are welcome 🙃

In more seriousness, a principled approach like the one suggested in this issue would have ensured that the oversight of those particular combinations had never occurred in the first place. Which is definitely a plus for what is being proposed here.

@8573
Copy link

8573 commented Jul 8, 2024

I'm surprised by the builder's requiring that a payload be provided first and being configured only after that, which is the opposite of what I would expect from the builder pattern.

That up-front payload also seems confusingly pointless in the second-to-last example, where it's replaced with zero.

@Skgland
Copy link

Skgland commented Jul 8, 2024

I'm surprised by the builder's requiring that a payload be provided first and being configured only after that, which is the opposite of what I would expect from the builder pattern.

That up-front payload also seems confusingly pointless in the second-to-last example, where it's replaced with zero.

So it should be more like this which also supports unsized payload.

Edit: v2 now init/uninit/zeroed , sized/unsized is back in the builder

v3 now Box/Rc/Arc is also selected via the builder, only a single build function rather than build_box, build_rc, build_arc

v4 like v3 but with a custom ThinBox MyThinBox SmartPointer type that supports ?Sized

@tgross35
Copy link

tgross35 commented Jul 8, 2024

I'm surprised by the builder's requiring that a payload be provided first and being configured only after that, which is the opposite of what I would expect from the builder pattern.

That up-front payload also seems confusingly pointless in the second-to-last example, where it's replaced with zero.

Referring to my example? There is a new_uninit() that doesn't take a value, I don't know why I didn't just put that in the example in the comments (updated it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests