Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Placement by return #2884

Open
wants to merge 12 commits into
base: master
from

Conversation

@PoignardAzur
Copy link

PoignardAzur commented Mar 17, 2020

Rendered.


Glossary:

@burdges

This comment has been minimized.

Copy link

burdges commented Mar 17, 2020

I've questions about https://github.com/rust-lang/rfcs/blob/c34989cc81c0dbd96ca65a50fe987ad09f5a6251/text/0000-placement-by-return.md#guide-level-explanation

Is there any good reason why good() works but bad() does not work? Isn't this really just complex bug in rustc? If the return is Sized then could we guarantee no copies so long as the return only gets bound once and gets used directly by the caller?

We cannot improve the situation for bad_dst() though because only bad_dst() can allocate the space for passing between returns_dst() and takes_dst(). It appears this problem exists for Sized types too, so they too requires the new_with, etc. proposed by this RFC.

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 17, 2020

Is there any good reason why good() works but bad() does not work? Isn't this really just complex bug in rustc? If the return is Sized then could we guarantee no copies so long as the return only gets bound once and gets used directly by the caller?

There's no fundamental reason, no.

It just requires more complex analysis (GCE versus GCE + NRVO), and rules that have to be nailed down as to when NRVO does or doesn't apply. This RFC tries to be minimalistic.

@CAD97

This comment has been minimized.

Copy link

CAD97 commented Mar 19, 2020

After reading the RFC, I get the understanding that guaranteed copy elision doesn't actually guarantee copy elision for small (roughly register sized) types? This seems problematic, as the guarantee is no longer a guarantee, as it doesn't apply in some cases (though cases where the difference is basically not measurable).

@rpjohnst

This comment has been minimized.

Copy link

rpjohnst commented Mar 19, 2020

The downsides of future-possibility Split the discriminant from the payload can be mitigated by applying that split only at function boundaries.

Rust already has two separate ideas of how values are represented- one for passing/returning them, and another for storing in memory when they have their address taken- that's the whole idea behind #[repr(transparent)]. This works because you can't take the address of a value "in motion" like that to begin with.

So if you want to return DST enum variants, you can return the discriminant in a register, and the actual unwrapped variant values the usual way (i.e. via a return pointer for large ones, or yet another register for small ones). Now you can "emplace" any or all of those variants using the same techniques as structs and arrays.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Mar 19, 2020

@CAD97 also because placement new works by basically passing a hidden out pointer, I don't think that RVO is actually an optimization for types smaller than a word.

On the main proposal, it would be best if we get the infallible case for RVO from a function, because can build some useful apis around that, like: Box::new_with, and maybe even Vec::push_with etc. But I don't think we need to support DST's in the initial proposal. It looks complex, and that could hinder the simpler core proposal from going anywhere.

That said, we could also make some more explicit apis for Box and Vec to allow you to do the allocation before creating the value, using typechecked out pointers. This also allows you to handle !Sized types, but with an unsafe api

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=602847046fde9cc25ec28d7b33b5fd5a

These explicit apis are harder to use, but they also completely side-step the issue of how to do fallible operations, because the "RVO"ed value isn't returned in the Result. It also makes it super easy to see what's going on, whereas implicit RVO can silently break if you aren't careful (especially because we won't have NRVO even with this proposal).

With the code in the playground, you could write something like this:

let big_value = Box::try_write_new(|uninit| {
    let value = some_fallible_op()?;
    let mut init = uninit.write(BigValue::new(value));
    init.modify(); // we can now safely use `BigValue` through `init`
    Ok(init)
})?;

Which is pretty nice. This will work if we have 3 guarantees,

  • RVO is guaranteed if the value is created on the return from a function/closure
  • Uninit::write is guaranteed to not copy a RVO value, but directly write it into the backing pointer
  • Uninit::write is guaranteed to not copy a value that is constructed at with at the call to write, , but will directly write it into the backing pointer (i.e. uninit.write([0_u8; 1024]) won't copy a kilobyte of data)

I think that these guarantees are simple enough to make (especially since they seem to already work for all the simple cases I tried), so we could easily move forward with a proposal. With this, we can support fallible construction, and the simple case of Box::new_with(BigValue::new).

We can support more elaborate schemes in the future, after we have some sort of basic RVO guarantees.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Mar 20, 2020

Awesome to see another stab at this!

Passing a closure is essentially the same thing as passing an initializer list, so it should have the same performance as C++ emplacement.

This is a really clever point 👍

The _with suffix is based on functions like get_or_insert_with. They're basically filling the same role as C++ emplace_ methods.

One thing I've seen from that in C++ is that the usual advice became to essentially just always use emplace. Do we expect the same would happen here? If so, the extra length of _with || would seem unfortunate. (Not that I have a good idea to avoid churn here.)

Hmm, is there some way we could "overload" Box::new and such to take either T or impl FnOnce()->T? I'm not sure if we could get that through coherence (since the output of the FnOnce is an output type), but I also think it's not something that can be violated in stable today. (Roughly I'm thinking that people writing constructors could take impl BikeshedMe<T> to opt-in to supporting this, instead of needing new methods everywhere.

The design for this proposal is meant to allow already-idiomatic "constructors" [..] to be allocated directly into a container.

When I first read this design choice it felt clearly-correct, but I ended up wondering more about it over the course of reading the RFC. I wonder if a different kind of constructor could be less overall churn (somehow) than new versions of anything that puts things in a Bx or vector or whatever. Maybe there'd be a way that only the types large enough to really care about this would have to make the new kind of constructor, but ordinary things wouldn't.

Although as I type that, I guess any generic wrapper-constructor (like Some(x)) would also need to support this new kind of constructor too, so maybe that wouldn't be better anyway...

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Mar 20, 2020

One thing I've seen from that in C++ is that the usual advice became to essentially just always use emplace. Do we expect the same would happen here? If so, the extra length of _with || would seem unfortunate. (Not that I have a good idea to avoid churn here.)

Interesting. What I heard was that emplacement should theoretically always be at least as fast as insertion, and it is often faster, but there are also a bunch of cases where emplacement is actually slower, won't even compile, introduces an exception safety issue or even introduces subtle UB (Effective Modern C++ covers all this in Item 42). So we just write whichever makes the code more readable unless we care enough to benchmark it.

However, most of the reasons for those annoying cases (implicit type conversions, implicit expensive copies, new making raw pointers, imperfect perfect forwarding) simply don't exist in Rust, so it's plausible that "always use emplacement" would end up being true or closer to true for Rust regardless.

@burdges

This comment has been minimized.

Copy link

burdges commented Mar 20, 2020

If you're worried about length then Box::with works fine. I think Box::new should always box the closure, but you could investigate reviving the box keyword for closures.

text/0000-placement-by-return.md Outdated Show resolved Hide resolved
text/0000-placement-by-return.md Outdated Show resolved Hide resolved
text/0000-placement-by-return.md Outdated Show resolved Hide resolved
@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Mar 20, 2020

Since you mentioned "generator" here, the RFC could be simplified a lot (hiding the state machine) if it can be explained in terms of #2033 generator.

From what I understand, every return dst_object; gets desugared into something like

let layout = Layout::for_value(&dst_object);
let slot = yield layout;
slot.copy_from_nonoverlapping(&raw dst_object as _, layout.size());
return <*mut Dst>::from_raw_parts(slot, ptr::metadata(&raw dst_object));

so e.g.

fn with_multiple_exit_points() -> [i32] {
    while keep_going() {
        if got_cancellation() {
            return [];
        }
    }
    let n = 100;
    [1; n]
}

gets desugared as #2033 generators like

fn with_multiple_exit_points___gen() 
    -> impl Generator<*mut u8, Yield=Layout, Return=*mut [i32]> 
{
    move |_: *mut u8| {
        while keep_going() {
            if got_cancellation() {
                // BEGIN REWRITE -----------------------------------------------
                let slot = yield Layout::array::<i32>(0).unwrap();
                let p = slot as *mut i32;
                return ptr::slice_from_raw_parts_mut(p, 0);
                // END REWRITE -------------------------------------------------
            }
        }
        let n = 100;
        // BEGIN REWRITE -----------------------------------------------
        let slot = yield Layout::array::<i32>(n).unwrap();
        let p = slot as *mut i32;
        for i in 0..n {
            unsafe {
                p.add(i).write(1);
            }
        }
        return ptr::slice_from_raw_parts_mut(p, n);
        // END REWRITE -------------------------------------------------
    }
}

and then read_unsized_return_with could be implemented by wrapping that generator...,

pub struct ReadUnsizedReturnWithFinish<G> {
    layout: Layout,
    generator: G,
}

impl<T, G> ReadUnsizedReturnWithFinish<G>
where 
    T: ?Sized, 
    G: Generator<*mut u8, Yield = Layout, Return = *mut T> + Unpin,
{
    fn new(mut generator: G) -> Self {
        match Pin::new(&mut generator).resume(ptr::null_mut()) {
            GeneratorState::Yielded(layout) => Self { layout, generator },
            _ => panic!("generator completed without returning a layout"),
        }
    }
    pub fn finish(mut self, slot: *mut u8) -> *mut T {
        match Pin::new(&mut self.generator).resume(slot) {
            GeneratorState::Complete(ptr) => ptr,
            _ => panic!("generator returned too many layouts"),
        }
    }
    pub fn layout(&self) -> Layout {
        self.layout
    }
}

which can be used directly in the desugaring...

fn function_that_calls_my_function() -> str {
    println!("Hi there!");
    my_function()
}

fn function_that_calls_my_function___gen() 
    -> impl Generator<*mut u8, Yield = Layout, Return = *mut str> 
{
    move |_: *mut u8| {
        println!("Hi there!");
        // BEGIN REWRITE -----------------------------------------------
        let state = ReadUnsizedReturnWithFinish::new(my_function___gen());
        let slot = yield state.layout();
        return state.finish(slot);
        // END REWRITE -------------------------------------------------
    }
}

and write_unsized_return_with can similarly be implemented by being such a generator

fn write_unsized_return_with___gen<T: ?Sized>(layout: Layout, f: impl FnOnce(*mut u8) -> *mut T) 
    -> impl Generator<*mut u8, Yield = Layout, Return = *mut T> 
{
    move |_: *mut u8| f(yield layout)
}
@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 21, 2020

@CAD97

This seems problematic, as the guarantee is no longer a guarantee, as it doesn't apply in some cases

That's a good question.

On a specification level, GCE would take the form of guarantees that any code observing the address of the variable being returned (eg trait methods) would "see" the address stay the same. This might have implications for eg Pin types.

However, upholding these guarantees for register-sized types would probably be terrible for performance (compared to just passing the return value in the rax register).


@rpjohnst

Rust already has two separate ideas of how values are represented- one for passing/returning them, and another for storing in memory when they have their address taken- that's the whole idea behind #[repr(transparent)]. This works because you can't take the address of a value "in motion" like that to begin with.

Can you elaborate?

In particular, how do you think the following code should be handled, layout-wise?

fn bar() -> Result<SomeType, Error> {
    let my_data : Result<SomeType, Error> = get_data();
    
    foo(&my_data);
    my_data
}

@ KrishnaSannasi

With the code in the playground, you could write something like this:

You can already do something similar (and safer) using the current proposal:

let big_value = || {
    let value = some_fallible_op()?;
    Ok(Box::new_with(|| BigValue::new(value))
}()

The hard case is when some_fallible_op and BigValue::new are the same function.


@scottmcm

One thing I've seen from that in C++ is that the usual advice became to essentially just always use emplace. Do we expect the same would happen here? If so, the extra length of _with || would seem unfortunate. (Not that I have a good idea to avoid churn here.)

Yeah, I had the same reaction.

It gets even worse if we nail down semantics for placement return of Result; then you get Box::new_with_result(|| actual_data)? instead of Box::new(actual_data?) .

One workaround would be to use the macros I proposed (eg box!(actual_data)).

But the fundamental problem is that a permanent solution would require non-transparent semantics; that is, a way to write a function thay says "I'm pretending to take a X, but I actually take a closure returning X" (in such a way that they're indistiguishable in calling code); something like lazy arguments in D.

I'm not sure the Rust community is willing to accept that kind of feature.

@rpjohnst

This comment has been minimized.

Copy link

rpjohnst commented Mar 21, 2020

Can you elaborate?

In particular, how do you think the following code should be handled, layout-wise?

fn bar() -> Result<SomeType, Error> {
    let my_data : Result<SomeType, Error> = get_data();
    
    foo(&my_data);
    my_data
}

Result<SomeType, Error> has a fixed memory layout that overlaps SomeType and Error, and encodes which one is live as a tag or an otherwise-invalid bit pattern. This fixed memory layout enables any code with a pointer to Result<SomeType, Error> to use it reliably.

But get_data and bar return a Result<SomeType, Error> by move, so there can be no source-level pointers around to rely on that layout. As long as the value is back in that layout for foo(&my_data), we can use a second fixed representation to communicate between callees and callers.

Assume, for example, that SomeType and Error are large enough that the usual ABI for returning them by value is for the caller to pass in a pointer for the callee to write to. Then, bar works like this:

  • As a hidden parameter, receive two pointers from its caller- one *mut SomeType and one *mut Error.
  • Allocate stack space for Result<SomeType, Error>'s memory layout.
  • Compute two pointers into that stack space, corresponding to the locations of SomeType and Error in a Result.
  • Call get_data with those two pointers. It will fill in exactly one of them, and place a discriminant for which one in a fixed register.
  • Encode that discriminant in the allocated stack space.
  • Call foo with a pointer to that stack slot.
  • Match on my_data and copy either a SomeType or Error through one of the pointers received from the caller. Mark which one with a value in a fixed register.

We can specialize this for other representations of SomeType and/or Error. If it's small enough, we skip its hidden pointer argument, the callee places it in a second fixed register, and the caller writes that register into memory if necessary.

Now if someone writes Box::try_with(get_data), try_with can work like this:

  • As a hidden parameter, receive a *mut Error pointer.
  • Allocate heap space for SomeType.
  • Call get_data with a *mut SomeType to the new allocation and the *mut Error parameter. It will either write a SomeType directly to the heap, or an Error directly through the pointer, and place a discriminant in a register.
  • Match on the discriminant:
    • If Ok, place an Ok discriminant in a register, and the newly-constructed Box<SomeType> in a second register.
    • If Err, free the heap allocation, and place an Err discriminant in a register.

This also suggests a more precise framing for @CAD97's question about guarantees. We can restrict the guarantee of "has the same address before and after the return move" to types whose function-return ABI goes through memory. With a specified ABI, this would become a real answer; today it is a less-vague version of "roughly register sized."

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 21, 2020

@rpjohnst So if I'm understanding your proposed semantics, you're saying that SomeType would be treated as a contiguous block of memory when being passed to a function, and as a pair of placement pointers when being returned?

That would be one possible design, yeah. The downside is that it clashes with NRVO in non-obvious ways. In any case, it's beyond the scope of this RFC.

@comex

This comment has been minimized.

Copy link

comex commented Mar 22, 2020

I don't agree that it makes sense to punt fallible allocator support.

After all, we surely want to support them eventually. Any design that supports fallible allocators can almost certainly be easily modified to support infallible ones, but the reverse isn't true. If this RFC is accepted, but we later discover that supporting fallible allocators requires a completely different design, we'll end up having to maintain two new sets of APIs in all the collections, on top of the existing non-placement-aware APIs. One set of duplicate APIs will already be a (well-justified) burden for language learners; there's no need to add another!

Therefore, we shouldn't accept the RFC without at least collectively agreeing that fallibility can be done with the same base design. But if we've managed to agree on that, it's not much further of a step to just put it in the RFC.

For what it's worth, I think fallibility can be done with the same base design, in the way @rpjohnst and others have been discussing.

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 23, 2020

@kennytm There are pros and cons to using the generator syntax like you mention. On one hand, it's a little more concise; on the other hand, it requires keeping another hypothetical feature in mind; I'm not sure this helps readability.

On the other other hand, I'm thinking if I wrote this RFC from the ground up now, I would use the yield syntax, so maybe I should do that anyway; I'm just not sure the readability improvements (if any) are worth rewriting half the RFC. It's understandable as it is.

I'll fix the typos you mentioned.

@comex To be fair, I think this RFC already does an adequate job showing that fallible emplacement isn't a dead end. It already includes two possible strategies for fallible emplacement, and mentions some pros and cons for each strategy.

I'm going to explore the solution space a little deeper, but I don't think this RFC should commit to a future strategy or spend too much of its complexity budget on detailing a future extension.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Mar 26, 2020

So, I'm not 100% able to follow this discussion, but from a user perspective, it's not super great to have there be a huge performance gap between Box::with(|| [0; 10000]) and Box::new([0; 10000]). From an ergonomics perspective, being able to do something like Box::with(SomeType::new) is cool, but it seems a bit too magical that return values are special and parameters are not.

IMHO, I'd rather be explicit if the magic isn't applied uniformly. In other words, the closure you pass in should take an argument of &mut MaybeUninit<Self>.

@KrishnaSannasi

This comment has been minimized.

Copy link

KrishnaSannasi commented Mar 26, 2020

So, I'm not 100% able to follow this discussion, but from a user perspective, it's not super great to have there be a huge performance gap between Box::with(|| [0; 10000]) and Box::new([0; 10000])

I don't think that we can fix this without changing the semanics of Box::new([0; 10000]). This is because [0; 10000] is evaluated before Box::new is called. i.e. before the allocation is even available. This means that you must copy the array from the caller into the new allocation. The only way around this is to delay evaluation of [0; 10000] until we have an allocation we can directly write into. This is precisely the abstraction that closures provide.

IMHO, I'd rather be explicit if the magic isn't applied uniformly. In other words, the closure you pass in should take an argument of &mut MaybeUninit.

This is kinda of what I proposed earlier, but in a more type-safe way. If we just passed a &mut MaybeUninit<Self>, you would have no way of enforcing the write. But with the way I described in the linked comment, you do.

being able to do something like Box::with(SomeType::new) is cool, but it seems a bit too magical that return values are special and parameters are not.

Note: LLVM already performs this optimization, all we are doing here is making the guarantee that this optimization will always (reliably) be performed.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Mar 27, 2020

You are right about the ordering-- I didn't even consider that part.

I guess that my main concern is the semantics of movement here. Assuming we omit the "bad" case and boil stuff down to:

Box::with(|| {
    let x = initial_state;
    f(&mut x);
    x
})

Then conceptually, I see no difference between this and:

let x = initial_state;
f(&mut x);
Box::new(x)

Basically, you're arguing that there's a semantic ordering between function calls, and that Rust should automagically coerce function types into this special generator type. Whereas, I feel like it is more like semantically changing the meaning of bindings, and how the compiler could specially interpret let bindings, function parameters, and return values as referring to the same memory addresses for non-Copy objects.

Because really, what you're proposing is treating it like this for a small subset of use cases. But the main issue is that this is adding new APIs that would advise against this being applied to other use cases in the future.

Obviously, it makes sense to not strictly define that Rust never moves data around ever, but given the way Rust's ownership system works, I can't see a clear way to explain to a user why return values are special and parameters are not, especially when we have to specifically access the code inside the function in order to make the special changes that permit this behaviour.

Not 100% sure if what I'm saying makes sense, but it's mostly where I'm at right now as far as thinking about this.

EDIT: Essentially, you're saying that the provided closure is the one that must be modified to make the no-copy elision work, and not ever Box::new.

@CAD97

This comment has been minimized.

Copy link

CAD97 commented Mar 27, 2020

Whereas, I feel like it is more like semantically changing the meaning of bindings, and how the compiler could specially interpret let bindings, function parameters, and return values as referring to the same memory addresses for non-Copy objects.

Note that this proposal explicitly does not guarantee NRVO, so your example would not do anything emplaced, and x would still semantically be a stack location. (Guaranteed) RVO only kicks in when you return an "emplacable expression," which is a struct/array/tuple literal or another emplacable function call. An intervening binding will inhibit this guarantee.

Yes, this does create a theoretical performance penalty for extracting a temporary value. But in practice, this shouldn't bite many people; if Box::with(|| { [large; array] }) is guaranteed to RVO, then Box::with(|| { let x = [large; array]; x }) should rather easily optimize to the same code; it's just the guarantee that is lost, not the actual optimization.

I don't think we should try to guarantee NRVO for your example. Why? Because NRVO is complicated to specify, and it's easily rewritable to not need it:

let mut boxed = Box::with(|| initial_state);
f(&mut *boxed);
boxed

In fact, we could probably get away without actually doing anything special for Box::with to guarantee the copy elision on our side, and just ensure LLVM does it, because it's already pretty good at eliminating the memcpy with that API. Guaranteeing it on our side is probably better, as it works with less inlining information and without optimizations, but this is just formalizing the pattern that already works.

(Edit: to be clear, I'm not against having nrvo and friends as an optimization; I just think guaranteeing it is not necessary.)

@burdges

This comment has been minimized.

Copy link

burdges commented Mar 27, 2020

Aside from guarantees, there are more complex cases cases for NVRO too, so an #[nrvo] attribute could enforce both guarantees and express the desired NRVO structure.

fn foo(..) -> ( #[nrvo] Big1 , Result<#[nrvo] Big2,io::Error> ) { }

fn bar(..) ->  Result<#[nrvo] Big2,io::Error> {
    let mut ret = Err(..);
    let b = Box::with(|| { 
        let mut a; 
        (a,ret) = foo(..);
        a
    });
    if Some(a) = ret { a.attach_insert("foo", b); }
    ret
}

In foo, we do placement return for a Big1 into one caller provided buffer and for a Big2 into a separate caller provided buffer. We treat the Big2 buffer as a "dissociated" internals for a Result<Big2,io::Error>, our caller should not move anything if they unpack the Result themselves with inlinable methods, but might require moves if they pass the Result<Big2,io::Error> to non-inlined code.

In bar, we allocate for the Big1 but pass through the placed Big2 and "dissociated" result, and then call some Big2 methods that attached the Big1.

I suppose an #[nrvo] attribute might mostly encourage hard to optimize code, but worth mentioning.

As an aside, we'd exploit better NRVO optimizations, and "dissociated" NRVO enums, lots in cryptography where folks often return types like Option<[u64; 5]>. We'll keep this [u64; 5] on the stack, but we currently eat too much stack space on smart cards, etc. if we return it through multiple layers. We should zero any stack afterward, which improves performance even on heavier machines.

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 27, 2020

You are right about the ordering-- I didn't even consider that part.

I guess that my main concern is the semantics of movement here. Assuming we omit the "bad" case and boil stuff down to:

Box::with(|| {
    let x = initial_state;
    f(&mut x);
    x
})

Then conceptually, I see no difference between this and:

let x = initial_state;
f(&mut x);
Box::new(x)

There is a difference, because at least some part of the code in Box::new must execute before initial_state can be computed. Before Box::new is called, the space in memory where initial_state will be stored doesn't even exist.

Of course, one could argue that in your second snippet, the compiler could reorder the calls so that Box::new is called at the beginning of the function, and initial_state is emplaced into the result; that would essentially be NRVO on steroids. And, personally speaking, I do want that feature to be added to Rust eventually.

But the semantics aren't trivial. Box::new could theoretically have any number of side-effects; even use interior mutability to change the value of initial_state under the developer's feet. Which means reordering Box::new and initial_state could silently change the observable behavior of the program, which is a huge deal-breaker.

tl;dr The fundamental reason the RFC is written that way is that space must be allocated before data can be emplaced into it; and returning data from closures that are called inside of the functions allocating the data is the simplest way Rust can express that (yet).

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 28, 2020

@clarfon

be changed to:

1. Caller calls first half of `Box::new`

2. `Box::new` allocates memory, then yields to caller

3. Caller tells `f` to write to that memory

4. Caller calls second half of `Box::new`

5. (there is no second half of `Box::new`, but technically you could do something here)

6. `Box::new` returns address

Sure, it's more complicated, but the upside is people aren't suggested to use an entirely different method, Box::with, just because we don't have the ability to change Box::new.

To be clear, I 100% agree with you; I think the substitution you propose should become the idiomatic syntax eventually.

But before we get to that point, we need a standard way to emplace (potentially unsized) data into arbitrary data structures. There are two possibilities here: uninitialized arguments, or GCE; I think GCE is the best one; for instance, in your example, step 3 would probably involve GCE.

Once we get there, we can consider adding lazy arguments and/or implicit reordering like in your example; but these are heavy changes, with a lot of hidden implications, that need a RFC of their own.

(I should probably add a section about passing uninitialized memory in Rationale and Alternatives)


@ssokolow

However, I could get behind a macro with a name like box!() (to complement vec![]), which expands to a use of Box::with. It's not a surprise that macros can expand to arbitrary chunks of code and do magic-seeming things.

You'll notice this is also in the RFC =P

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Mar 28, 2020

For anyone else who forgot and had trouble finding it again: "GCE" = "guaranteed copy elision"

FWIW, I'm pretty convinced at this point that some form of GCE/NRVO is the safest and least magical way to enable "placement" performance guarantees. It certainly seems like a clear net win over anything involving pointers to uninitialized memory or &uninit references or whatever.

What baffles me is why there's any opposition to adding a new Box::with function for this. All the Box::new modifications suggested so far are either clearly a non-starter due to breaking changes, so overwhelmingly confusing there's no way they're a net win, or I just have no idea what anyone's talking about. For example, is this talk of "calling half the function, then its argument, then the rest of this function" just a really weird way of saying "generators"? (in which case, "that's a later RFC" seems clearly correct, but why aren't we just saying "make Box::new a generator"?) Or are we talking about some other language feature that I've never heard of before but the participants in this RFC thread are familiar with from somewhere else? (the last post mentions "lazy arguments"; would that be like Haskell thunks? or LazyCell<T>?)

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 28, 2020

@CAD97

In fact, we could probably get away without actually doing anything special for Box::with to guarantee the copy elision on our side, and just ensure LLVM does it, because it's already pretty good at eliminating the memcpy with that API. Guaranteeing it on our side is probably better, as it works with less inlining information and without optimizations, but this is just formalizing the pattern that already works.

It's also necessary on a language level if you want to return unsized types.


@Ixrec

For anyone else who forgot and had trouble finding it again: "GCE" = "guaranteed copy elision"

Oh wow, I've been using that abbreviation without ever taking the time to write its definition, haven't I?

I have become everything I swore to destroy.

(the last post mentions "lazy arguments"; would that be like Haskell thunks? or LazyCell<T>?)

I was thinking of D's lazy arguments, which I think are like Haskell thunks?

Or are we talking about some other language feature that I've never heard of before but the participants in this RFC thread are familiar with from somewhere else?

Nope. Any talk of splitting a function in half exists exclusively in the head of the people proposing it, me included. I don't think there's any proposal for this, or any prior discussion of the concept.

It could be done with thunks, or, as you point out, with generators, or some other way.

Either way, I think this is an interesting solution space to explore, but it's beyond the scope of this RFC; and I don't think this RFC needs these features to be useful.

@PoignardAzur PoignardAzur force-pushed the PoignardAzur:placement-by-return branch from 3fd340e to 7b30ddc Mar 28, 2020
@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 28, 2020

@kennytm Thanks for the proof-reading. I think I've fixed all typos. I'm going to avoid using the yield syntax for now.

@rpjohnst @comex I added details to the rationale to address some of your questions. Obviously I get that you're still invested in the "results as fat pointers" feature, but I hope I did a good job detailing why I want to delay that decision.

@clarfon I'll add your feedback soon, probably alongside a section about NRVO.

@CAD97

This comment has been minimized.

Copy link

CAD97 commented Mar 28, 2020

I still don't think the RFC addresses the conflict between "guaranteed copy elision" and "small types are still copied."

To be clear, I get the reason for not copy eliding small, can-be-passed-in-registers types. But I think the RFC should spell out what it means to have guaranteed copy elision (except it's not always guaranteed).

Copy link

rpjohnst left a comment

I added details to the rationale to address some of your questions. Obviously I get that you're still invested in the "results as fat pointers" feature, but I hope I did a good job detailing why I want to delay that decision.

That approach is totally fine with me, Box::with et al is easily forward compatible with fixing GCE for enums. Mostly what I was responding to was some overly-strong wording about how far-reaching such a change would be, which is still there:

text/0000-placement-by-return.md Outdated Show resolved Hide resolved
text/0000-placement-by-return.md Outdated Show resolved Hide resolved
text/0000-placement-by-return.md Outdated Show resolved Hide resolved
@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Mar 29, 2020

I still don't think the RFC addresses the conflict between "guaranteed copy elision" and "small types are still copied."

Honestly, I would call this guaranteed move elision because I assume that the main deciding factor is whether the type implements Copy.

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Mar 29, 2020

@rpjohnst Right.

Re-reading our conversation, I think the main reason we've been talking past each other is that I've haven't been clear enough about NRVO.

The way I see it, NRVO is the obvious next step if/when this RFC is implemented. This means being able to pass the address of an object even as you're emplacing it. The example I gave earlier with foo(&my_data); was written with this constraint in mind.

This complicates Result emplacement, because if we do decide to go with NRVO, then we need to be able to pass references to our Result; which means Result's layout is no longer "whatever the compiler wants", but needs to defined in continguous-chunks-of-bytes space. We take a look at Schrodinger's cat, so to speak.

(of course, one possible strategy is to simply forbid NRVO when emplacing Result; that would be a non-breaking change, but it's not really convenient for users)

But you're right that the RFC text still doesn't adequately explain this.

@CAD97 I'll try to specify the behavior more precisely.

I don't think it's too important, though; this strikes as the kind of detail that gets implemented first, specified second.

@rpjohnst

This comment has been minimized.

Copy link

rpjohnst commented Mar 29, 2020

(of course, one possible strategy is to simply forbid NRVO when emplacing Result; that would be a non-breaking change, but it's not really convenient for users)

I think this deserves more thought. You're right that if you take a reference to the Result before returning it, that copies will have to take place to rearrange it into its return ABI. And any mitigations here will make NRVO even more complex to specify.

But would that actually be a problem in practice? Do people pass around references to Results and then return them? Or do people pass around references to some other value, and then wrap it up in Ok right at the last minute? Or maybe they pass that value through a chain of methods, some of which return Result?

It would be good to get more data about what happens in practice here- I suspect there are several patterns that are amenable to NRVO, just applied to the (large) Ok variant instead of the Result itself. (This in itself is perhaps reason enough to hold off on such an ABI in this RFC, but a survey of patterns would also help figure out how best to specify NRVO in general.)

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Apr 1, 2020

@CAD97 @rpjohnst Re: GCE for small types, I'm thinking that GCE would be defined as guaranteeing that return values keep their observable return addresses if any of the following is true:

  • They are larger than an implementation-defined size (probably two registers).
  • Their type is unsized.
  • Their type is !Unpin.

Does that make sense?

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Apr 1, 2020

I don't think Unpin matters here. If you've got an owned T: !Unpin outside a Pin<P<T>> you're free to move it around.

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Apr 1, 2020

I added a part about register-sized types. It's a little vague, but I don't think it matters too much (it's not like Rust has a formal specification yet).

If anyone has a better idea for that section, I'm open to actionable feedback.


@rpjohnst I updated the Result section.

I get what you're saying about me overestimating the difficulties (and you're right that passing a reference to your Result isn't the common case), but I still want that part to be handled in a future RFC, for reasons explained in the text.

I'd appreciate if we could move on from this part of the RFC; unless someone believes that implementing the RFC as-is would cut Rust off from future improvements.


@clarfon I added a section on NRVO + lazy arguments + reordering.

I honestly kind of regret adding it, because the "future possibilities" section is already pretty big, and I'm worried people are going to bikeshed it instead of discussing the main proposal; but we've already started discussing the possibility anyway, and writing it down (instead of debating the fuzzy version in our head) might give people new ideas.

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Apr 1, 2020

I honestly kind of regret adding it, because the "future possibilities" section is already pretty big, and I'm worried people are going to bikeshed it instead of discussing the main proposal ...

For what it's worth, I believe you can use <details><summary>Named Return Value Optimization</summary> ... </details> to include all of this in the text while leaving it collapsed and hidden by default, so it doesn't get undue visual emphasis and it's more obvious what parts of the RFC you really think are the important parts. The RFC's quite long at this point (certainly longer than my personal attention span), so there may be multiple places where this is worthwhile.

@PoignardAzur

This comment has been minimized.

Copy link
Author

PoignardAzur commented Apr 2, 2020

Done.

- **GCE:** [Guaranteed Copy Elision](https://stackoverflow.com/questions/38043319/how-does-guaranteed-copy-elision-work).
- **NRVO:** [Named Return Value Optimization](https://shaharmike.com/cpp/rvo/).
- **DST:** [Dynamically-Sized Type](https://doc.rust-lang.org/reference/dynamically-sized-types.html).
- **HKT:** Higher-Kinded Type.

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

This should definitely be linked to something, even if that's just the Wikipedia page.

This comment has been minimized.

Copy link
@KrishnaSannasi

This comment has been minimized.

Copy link
@PoignardAzur

PoignardAzur Apr 7, 2020

Author

Well that's nitpicky. Nobody would have said anything if I hadn't put links on the three other items =P

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 7, 2020

Contributor

The curse of being helpful :p

Rust has a dysfunctional relationship with objects that are large or variable in size. It can accept them as parameters pretty well using references, but creating them is unwieldy and inneficient:

* A function pretty much has to use `Vec` to create huge arrays, even if the array is fixed size. The way you'd want to do it, `Box::new([0; 1_000_000])`, will allocate the array on the stack and then copy it into the Box. This same form of copying shows up in tons of API's, like serde's Serialize trait.
* There's no safe way to create gigantic, singular structs without overhead. If your 1M array is wrapped somehow, you pretty much have to allocate the memory by hand and transmute.

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

This is worded weirdly; I'd just compare it to Box::new: you still have to copy the value in order to put it into the struct. The transmutation is just an extra thing you can do because the struct is on the stack, whereas the Box has to be moved to the heap.


* **It needs to be possible to wrap it in a safe API.** Safe API examples are given for built-in data structures, including a full sketch of the implementation for Box, including exception safety.
* **It needs to support already-idiomatic constructors like `fn new() -> GiantStruct { GiantStruct { ... } }`** Since this proposal is defined in terms of Guaranteed Copy Elision, this is a gimme.
* **It needs to be possible to in-place populate data structures that cannot be written using a single literal expression.** The `write_return_with` intrinsic allows this to be done in an unsafe way. Sketches for APIs built on top of them are also given in the [future-possibilities] section.

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

Minor nit, should clarify that write_return_with is what's being proposed here, not something that already exists.

This comment has been minimized.

Copy link
@PoignardAzur

PoignardAzur Apr 7, 2020

Author

Oh yeah, good point.

let n = 1_000_000;
[1; n]
}
// This function will copy the array when it returns.

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

I would clarify this comment to say that while this could elide the copy, this is not guaranteed by the RFC for ease of implementation. Basically what you said in comments-- we don't support anything that isn't directly in the return slot for the initial pass.

Similar thing-- would let arr = [0; 1_000_000]; arr be copied as well, or are lets without mutation in the middle okay?

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

Never mind, this is covered by a later example. Maybe also put that example up here?

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

When a function returns a value while respecting the constraints described above, the value's observable address will not change, if the value's type is:

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

I feel like a quick bulleted list here of those constraints without examples would be nice as an additional summary.


## Did you say I can return unsized types?

A function that directly returns an unsized type should be compiled into two functions, essentially as a special kind of generator:

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

I feel like this could use a bit of clarification on top of the examples. Basically clarify that you're splitting the function into the half that decides what layout the data will take in memory, and the half that will actually write to the allocated memory (that being on either the stack or the heap depending on the caller).

To make sure this functionality can be used with no overhead, the language should guarantee some amount of copy elision. The following operations should be guaranteed zero-copy:

* Directly returning the result of another function that also returns the same type.
* Blocks, unsafe blocks, and branches that have acceptable expressions in their tail position.

This comment has been minimized.

Copy link
@clarfon

clarfon Apr 5, 2020

Contributor

Nit, this should be the last item in the list because it refers to the whole list.

@clarfon

This comment has been minimized.

Copy link
Contributor

clarfon commented Apr 5, 2020

Potential enum solution: I believe that it would be fine to allow an MVP unsized enums where only one variant is unsized, to allow simple cases like Option<Unsized> or Result<UnsizedOk, SizedErr>.

If we had something like this, we could also potentially specify generators that yield xor return unsized values, even though it may be nice to be able to do both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.