New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsized Rvalues #1909

Merged
merged 4 commits into from Feb 8, 2018

Conversation

@arielb1
Contributor

arielb1 commented Feb 19, 2017

This is the replacement to RFC #1808. I will write the "guaranteed optimizations" section tomorrow.

Rendered

Summary comment from Sept 6, 2017

cc @nikomatsakis @eddyb @llogiq @whitequark @briansmith

@jonas-schievink

Overall, seems like a step in the right direction. The details still need some fleshing out, of course.

Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md

@Ericson2314 Ericson2314 referenced this pull request Feb 20, 2017

Closed

Immovable types #1858

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Feb 20, 2017

Contributor

I like the idea of Clone: ?Sized, but I think clone_from can only properly be written with out pointers?

edit Oops, I thought clone_from was a new method. I was thinking we could do something like

// implementation can assume `*self` and `*source` have the same size. 
unsafe fn move_from_unsized(&out self, source: &Self);

This is similar to things that would be useful for emplacement.

Contributor

Ericson2314 commented Feb 20, 2017

I like the idea of Clone: ?Sized, but I think clone_from can only properly be written with out pointers?

edit Oops, I thought clone_from was a new method. I was thinking we could do something like

// implementation can assume `*self` and `*source` have the same size. 
unsafe fn move_from_unsized(&out self, source: &Self);

This is similar to things that would be useful for emplacement.

@withoutboats withoutboats added the T-lang label Feb 20, 2017

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Feb 20, 2017

@arielb1 This doesn't reach being a replacement for #1808, primarily because, as written, the lifetime of the value is only until the end of the block. What are your plans for extending the lifetime of the value past that?

whitequark commented Feb 20, 2017

@arielb1 This doesn't reach being a replacement for #1808, primarily because, as written, the lifetime of the value is only until the end of the block. What are your plans for extending the lifetime of the value past that?

Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md
Show outdated Hide outdated text/0000-unsized-rvalues.md
@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Feb 20, 2017

Contributor

@whitequark

@arielb1 This doesn't reach being a replacement for #1808, primarily because, as written, the lifetime of the value is only until the end of the block. What are your plans for extending the lifetime of the value past that?

I don't see any clear design for that, even in #1808.

Contributor

arielb1 commented Feb 20, 2017

@whitequark

@arielb1 This doesn't reach being a replacement for #1808, primarily because, as written, the lifetime of the value is only until the end of the block. What are your plans for extending the lifetime of the value past that?

I don't see any clear design for that, even in #1808.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Feb 20, 2017

Member

@whitequark Something like 'fn would be orthogonal to this (and has to work for Sized values too).

Member

eddyb commented Feb 20, 2017

@whitequark Something like 'fn would be orthogonal to this (and has to work for Sized values too).

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark commented Feb 20, 2017

@eddyb Fair.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Feb 20, 2017

I'm worried about ambiguity from overloading the [T; n] syntax too. Afaik, we do not know exactly how const fn will shake out yet, so [T; foo()] could change types as APIs change elsewhere. And [T; foo!()] has that problem immediately. These errors might be harder to locate than you'd expect. At minimum, there are enough hiccups going back and forth between [T] and [T; n] that one cannot argue that overloading [x; n] makes teaching the language easier.

There are plenty of new syntax that avoids any ambiguity in the types: Anything like [x| n] [x: n], [x](n), [x][n], [x;; n], [x, n], [x; * n], [x; alloca n], etc. Anything involving a word like alloca whether a keyword like box, a function like uninitialized, or a macro. Any sort of "slice comprehension notation" that consumes an iterator to make a VLA, like [x : 0..n], [expr(i) for i in 0..n], etc.

In fact, you could easily build a &mut [u8] VLA from a &str with a slice comprehension notation like

let hw = &mut [ x for x in "hello world".as_bytes().iter() ];

Edit You could even make a &mut str this way if you first define unsafe fn from_utf8_unchecked_mut(v: &mut [u8]) -> &mut str { ::core::mem::transmute(v) } but you'd probably wrap all that in a macro called str_dup or something.

burdges commented Feb 20, 2017

I'm worried about ambiguity from overloading the [T; n] syntax too. Afaik, we do not know exactly how const fn will shake out yet, so [T; foo()] could change types as APIs change elsewhere. And [T; foo!()] has that problem immediately. These errors might be harder to locate than you'd expect. At minimum, there are enough hiccups going back and forth between [T] and [T; n] that one cannot argue that overloading [x; n] makes teaching the language easier.

There are plenty of new syntax that avoids any ambiguity in the types: Anything like [x| n] [x: n], [x](n), [x][n], [x;; n], [x, n], [x; * n], [x; alloca n], etc. Anything involving a word like alloca whether a keyword like box, a function like uninitialized, or a macro. Any sort of "slice comprehension notation" that consumes an iterator to make a VLA, like [x : 0..n], [expr(i) for i in 0..n], etc.

In fact, you could easily build a &mut [u8] VLA from a &str with a slice comprehension notation like

let hw = &mut [ x for x in "hello world".as_bytes().iter() ];

Edit You could even make a &mut str this way if you first define unsafe fn from_utf8_unchecked_mut(v: &mut [u8]) -> &mut str { ::core::mem::transmute(v) } but you'd probably wrap all that in a macro called str_dup or something.

@eternaleye

This comment has been minimized.

Show comment
Hide comment
@eternaleye

eternaleye Feb 20, 2017

@burdges: One option I think would be viable is mut, phrased like your alloca example:

const n = ...;
let m = ...;
let w = [0u8; 3] // [u8; 3] as today
let x = [0u8; n] // [u8; n] as today
let y = [0u8; m] // Error, as today
let z = [0u8; mut m] // [u8], as a VLA
let q = [0u8; mut n] // [u8], as a VLA

This has several advantages:

  1. Const-dependent types seem to have converged on using the const keyword for marking where values are permitted as generic parameters, but only when constant
  2. Arrays really are the one pre-existing const-dependent type
  3. We can't require const for the existing array length without breaking compat, and it makes the common case verbose
  4. mut is the natural inverse of const, is short, is already reserved, and is meaningless in that context today
  5. It's easily extended (at a later date) to allowing (restricted) runtime-dependent types

eternaleye commented Feb 20, 2017

@burdges: One option I think would be viable is mut, phrased like your alloca example:

const n = ...;
let m = ...;
let w = [0u8; 3] // [u8; 3] as today
let x = [0u8; n] // [u8; n] as today
let y = [0u8; m] // Error, as today
let z = [0u8; mut m] // [u8], as a VLA
let q = [0u8; mut n] // [u8], as a VLA

This has several advantages:

  1. Const-dependent types seem to have converged on using the const keyword for marking where values are permitted as generic parameters, but only when constant
  2. Arrays really are the one pre-existing const-dependent type
  3. We can't require const for the existing array length without breaking compat, and it makes the common case verbose
  4. mut is the natural inverse of const, is short, is already reserved, and is meaningless in that context today
  5. It's easily extended (at a later date) to allowing (restricted) runtime-dependent types
@camlorn

This comment has been minimized.

Show comment
Hide comment
@camlorn

camlorn Feb 20, 2017

@kennytm
I don't think this is an advanced feature that no one should use, I think it is a feature that everyone should use all the time. It's basically C's alloca, which can be and is used often enough.

If the concern is that people will accidentally overflow the stack, yes, they will. But I can already do that in so, so many ways. I don't know what the size of the struct from the library over there is; I don't know what the size of the stack-allocated array is either. How are these cases significantly different?

What I see here is "Here is a thing that can avoid heap allocation for small arrays", and that's brilliant, and I want it.

camlorn commented Feb 20, 2017

@kennytm
I don't think this is an advanced feature that no one should use, I think it is a feature that everyone should use all the time. It's basically C's alloca, which can be and is used often enough.

If the concern is that people will accidentally overflow the stack, yes, they will. But I can already do that in so, so many ways. I don't know what the size of the struct from the library over there is; I don't know what the size of the stack-allocated array is either. How are these cases significantly different?

What I see here is "Here is a thing that can avoid heap allocation for small arrays", and that's brilliant, and I want it.

@petrochenkov

This comment has been minimized.

Show comment
Hide comment
@petrochenkov

petrochenkov Feb 20, 2017

Contributor

@camlorn
I never expected to see "C's alloca" and "a feature that everyone should use" in one sentence.
What code bases use alloca often enough?

Contributor

petrochenkov commented Feb 20, 2017

@camlorn
I never expected to see "C's alloca" and "a feature that everyone should use" in one sentence.
What code bases use alloca often enough?

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Feb 20, 2017

If one wants the function or macro route, then collect_slice makes a nice name. If intrinsics could return an unsized value, then the signature might look like

fn collect_slice<T,I>(itr: I) -> [T] where T: Sized, I: Iterator<T>+ExactSizeIterator

I think that's equivalent to any sort of comprehension notation without the new syntax being quite so new.

burdges commented Feb 20, 2017

If one wants the function or macro route, then collect_slice makes a nice name. If intrinsics could return an unsized value, then the signature might look like

fn collect_slice<T,I>(itr: I) -> [T] where T: Sized, I: Iterator<T>+ExactSizeIterator

I think that's equivalent to any sort of comprehension notation without the new syntax being quite so new.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Feb 20, 2017

Contributor

@burdges

Ordinary functions can't really return DSTs - you can't alloca into your caller's stackframe, and there's a chicken-and-egg problem where the callee can't figure out how much space to allocate before it calls the function.

Contributor

arielb1 commented Feb 20, 2017

@burdges

Ordinary functions can't really return DSTs - you can't alloca into your caller's stackframe, and there's a chicken-and-egg problem where the callee can't figure out how much space to allocate before it calls the function.

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Feb 20, 2017

Contributor

you can't alloca into your caller's stackframe

I'd think tail calls with dynamically sized args and this are somewhat comparable. Both should be possible in at least some cases. Certainly lots of work and out of scope for the initial version, however.

Contributor

Ericson2314 commented Feb 20, 2017

you can't alloca into your caller's stackframe

I'd think tail calls with dynamically sized args and this are somewhat comparable. Both should be possible in at least some cases. Certainly lots of work and out of scope for the initial version, however.

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Feb 20, 2017

Ordinary functions can't really return DSTs - you can't alloca into your caller's stackframe, and there's a chicken-and-egg problem where the callee can't figure out how much space to allocate before it calls the function.

This just means that functions that return DSTs need a different calling convention than other functions, right? Callee pops stack frame except for its result, which is left on the stack. The caller can calculate the size of the result by remembering the old value of SP and subtracting the new value of SP.

briansmith commented Feb 20, 2017

Ordinary functions can't really return DSTs - you can't alloca into your caller's stackframe, and there's a chicken-and-egg problem where the callee can't figure out how much space to allocate before it calls the function.

This just means that functions that return DSTs need a different calling convention than other functions, right? Callee pops stack frame except for its result, which is left on the stack. The caller can calculate the size of the result by remembering the old value of SP and subtracting the new value of SP.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Feb 20, 2017

Contributor

@briansmith

That would requires implementing that calling convention through LLVM. Plus locals in the callee function would get buried beneath the stack frame.

Also, this would mean that the return value is not by-reference, so you would not get the benefits of RVO.

Contributor

arielb1 commented Feb 20, 2017

@briansmith

That would requires implementing that calling convention through LLVM. Plus locals in the callee function would get buried beneath the stack frame.

Also, this would mean that the return value is not by-reference, so you would not get the benefits of RVO.

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Feb 20, 2017

That would requires implementing that calling convention through LLVM.

Of course.

To me it seems very natural to expect that Rust would drive many changes to LLVM, including major ones. I think it would be helpful for the Rust language team to explain their aversion to changing LLVM somewhere so people can understand the limits of what is reasonable to ask for, somewhere (not here). My own interactions with LLVM people gave me the impression that LLVM is very open to accepting changes, especially for non-Clang languages.

briansmith commented Feb 20, 2017

That would requires implementing that calling convention through LLVM.

Of course.

To me it seems very natural to expect that Rust would drive many changes to LLVM, including major ones. I think it would be helpful for the Rust language team to explain their aversion to changing LLVM somewhere so people can understand the limits of what is reasonable to ask for, somewhere (not here). My own interactions with LLVM people gave me the impression that LLVM is very open to accepting changes, especially for non-Clang languages.

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Feb 20, 2017

Contributor

For the record, #1808 mandated type ascription for unsized types to steer clear of ambiguity, e.g. let x : [usize] = [0; n];.

I also think requiring the size expression to somehow be special is a bad idea, both design- and implementation-wise.

Contributor

llogiq commented Feb 20, 2017

For the record, #1808 mandated type ascription for unsized types to steer clear of ambiguity, e.g. let x : [usize] = [0; n];.

I also think requiring the size expression to somehow be special is a bad idea, both design- and implementation-wise.

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl Feb 20, 2017

Contributor

mut is the natural inverse of const

I... I don't think this is true at all in Rust (and honestly I'm kind of surprised so many people apparently think it is).

The issue is that const ("constant"; "a value that doesn't change") has two potential meanings in a programming language:

  • "Not mutable", or (if applied to a value) "runtime constant". This is what const means in C and C++.

  • "Compile-time constant", that is, a value already known at compile-time. This is what const means in Rust. (And, if my memory's not playing tricks, Pascal?)

I've always disfavored our use of the const keyword for this purpose, for this reason - it means something different than it does in C, which is where most people are familiar with it from.

(...someone at this point is itching to bring up *const. Yes, our usage of const is also inconsistent. Given that *const is most often used for C FFI, I think this syntax is mostly justifiable by thinking of that const there as being "in the C sense", so that our *const means the same thing as their const*.)

Anyway: even if const were to mean the other thing, mut still means "mutable", which is the opposite of "immutable", and not of "compile-time". It very much does not mean "runtime value". This is kind of like suggesting we use "dark weight" to describe something that is difficult to lift because "dark" is the opposite of "light".

Contributor

glaebhoerl commented Feb 20, 2017

mut is the natural inverse of const

I... I don't think this is true at all in Rust (and honestly I'm kind of surprised so many people apparently think it is).

The issue is that const ("constant"; "a value that doesn't change") has two potential meanings in a programming language:

  • "Not mutable", or (if applied to a value) "runtime constant". This is what const means in C and C++.

  • "Compile-time constant", that is, a value already known at compile-time. This is what const means in Rust. (And, if my memory's not playing tricks, Pascal?)

I've always disfavored our use of the const keyword for this purpose, for this reason - it means something different than it does in C, which is where most people are familiar with it from.

(...someone at this point is itching to bring up *const. Yes, our usage of const is also inconsistent. Given that *const is most often used for C FFI, I think this syntax is mostly justifiable by thinking of that const there as being "in the C sense", so that our *const means the same thing as their const*.)

Anyway: even if const were to mean the other thing, mut still means "mutable", which is the opposite of "immutable", and not of "compile-time". It very much does not mean "runtime value". This is kind of like suggesting we use "dark weight" to describe something that is difficult to lift because "dark" is the opposite of "light".

@eternaleye

This comment has been minimized.

Show comment
Hide comment
@eternaleye

eternaleye Feb 20, 2017

@glaebhoerl: Mm, that's fair. However, on the one hand I do think there's a meaningful relationship there (in a sense, that value is mutable across invocations of the function), and on the other hand, there really isn't a better keyword for it. The closest is probably do, and using that for this feature would likely be a bit... bikeshed-inducing.

eternaleye commented Feb 20, 2017

@glaebhoerl: Mm, that's fair. However, on the one hand I do think there's a meaningful relationship there (in a sense, that value is mutable across invocations of the function), and on the other hand, there really isn't a better keyword for it. The closest is probably do, and using that for this feature would likely be a bit... bikeshed-inducing.

@glaebhoerl

This comment has been minimized.

Show comment
Hide comment
@glaebhoerl

glaebhoerl Feb 20, 2017

Contributor

that value is mutable across invocations of the function

So are function arguments and lets :)

For that matter, if we have an "opposite of const" it's let.

(I'm not convinced there's a need for any kind of special syntax, though. The meaning of [0; random()] is obvious enough and natural. Whether the runtime characteristics justify putting up extra guardrails remains to be shown - ideally we'd find out from practical experience.)

Contributor

glaebhoerl commented Feb 20, 2017

that value is mutable across invocations of the function

So are function arguments and lets :)

For that matter, if we have an "opposite of const" it's let.

(I'm not convinced there's a need for any kind of special syntax, though. The meaning of [0; random()] is obvious enough and natural. Whether the runtime characteristics justify putting up extra guardrails remains to be shown - ideally we'd find out from practical experience.)

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Feb 20, 2017

I suppose you can always define a macro to populate your VLA from an iterator or closure or whatever, like

macro_rules! collect_slice { ($itr:expr) => {
    {
        let itr = $itr;
        let len = ExactSizeIterator::len(itr);
        let mut vla = unsafe { [uninitalized(): len] };
        for (i,j) in vla.iter_mut().zip(itr) { *i = *j; }
        vla
    }
} }

What about simply [x: n] for a VLA? It parallels [x; n] nicely without the ambiguity. It's even forward compatible with comprehension notations if anybody ever wants those.

burdges commented Feb 20, 2017

I suppose you can always define a macro to populate your VLA from an iterator or closure or whatever, like

macro_rules! collect_slice { ($itr:expr) => {
    {
        let itr = $itr;
        let len = ExactSizeIterator::len(itr);
        let mut vla = unsafe { [uninitalized(): len] };
        for (i,j) in vla.iter_mut().zip(itr) { *i = *j; }
        vla
    }
} }

What about simply [x: n] for a VLA? It parallels [x; n] nicely without the ambiguity. It's even forward compatible with comprehension notations if anybody ever wants those.

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Feb 20, 2017

Contributor

@glaebhoerl random sure, but isn't the risk some other function that down the road becomes possible to evaluate at compile-time?

Contributor

Ericson2314 commented Feb 20, 2017

@glaebhoerl random sure, but isn't the risk some other function that down the road becomes possible to evaluate at compile-time?

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 1, 2017

If the size is truly dynamic, do we always perform a branch on creation of the value or something?

Yes. And a single access (I forget if it's read or write) for every PAGE_SIZE bytes.

whitequark commented Dec 1, 2017

If the size is truly dynamic, do we always perform a branch on creation of the value or something?

Yes. And a single access (I forget if it's read or write) for every PAGE_SIZE bytes.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Dec 2, 2017

DemiMarie commented Dec 2, 2017

@eternaleye

This comment has been minimized.

Show comment
Hide comment
@eternaleye

eternaleye Dec 2, 2017

eternaleye commented Dec 2, 2017

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 2, 2017

To add to this, some architectures do not even have pages at all (and yet still work fine with stack probing), e.g. Cortex-M3 if you put the stack at the bottom of the RAM.

whitequark commented Dec 2, 2017

To add to this, some architectures do not even have pages at all (and yet still work fine with stack probing), e.g. Cortex-M3 if you put the stack at the bottom of the RAM.

@bill-myers

This comment has been minimized.

Show comment
Hide comment
@bill-myers

bill-myers Jan 3, 2018

I might have missed a mention of it, but it's important to note that plain "alloca" is not enough to implement this feature.

In particular, plain allocas never get freed until the function returns, which means that if you have an unsized variable declared in a loop, the stack usage will now be proportional to the number of loop iterations, which is catastrophic.

Instead, the stack pointer needs to be rewinded every loop iteration (and in general whenever an alloca goes out of scope), which probably requires LLVM changes, although it might possible to get away with just altering the stack pointer via inline assembly.

Also, this is fundamentally incompatible with 'fn-scoped allocas, so if they are added the language needs to forbid 'fn-scoped allocas when unsized rvalues are in scope.

bill-myers commented Jan 3, 2018

I might have missed a mention of it, but it's important to note that plain "alloca" is not enough to implement this feature.

In particular, plain allocas never get freed until the function returns, which means that if you have an unsized variable declared in a loop, the stack usage will now be proportional to the number of loop iterations, which is catastrophic.

Instead, the stack pointer needs to be rewinded every loop iteration (and in general whenever an alloca goes out of scope), which probably requires LLVM changes, although it might possible to get away with just altering the stack pointer via inline assembly.

Also, this is fundamentally incompatible with 'fn-scoped allocas, so if they are added the language needs to forbid 'fn-scoped allocas when unsized rvalues are in scope.

@crlf0710

This comment has been minimized.

Show comment
Hide comment
@crlf0710

crlf0710 Jan 3, 2018

@nikomatsakis @eddyb it's been a while since Sept the fcp completed, any chance to get this merged? Thanks a lot.

crlf0710 commented Jan 3, 2018

@nikomatsakis @eddyb it's been a while since Sept the fcp completed, any chance to get this merged? Thanks a lot.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Jan 3, 2018

In particular, plain allocas never get freed until the function returns, which means that if you have an unsized variable declared in a loop, the stack usage will now be proportional to the number of loop iterations, which is catastrophic.

This is not catastrophic. In fact, for certain use cases (using the stack as a local bump pointer allocator) it is necessary and desirable. Unsized rvalues must be used together with lifetime ascription to let the compiler free them.

Instead, the stack pointer needs to be rewinded every loop iteration (and in general whenever an alloca goes out of scope), which probably requires LLVM changes, although it might possible to get away with just altering the stack pointer via inline assembly.

LLVM has @llvm.stacksave and @llvm.stackrestore intrinsics for this.

whitequark commented Jan 3, 2018

In particular, plain allocas never get freed until the function returns, which means that if you have an unsized variable declared in a loop, the stack usage will now be proportional to the number of loop iterations, which is catastrophic.

This is not catastrophic. In fact, for certain use cases (using the stack as a local bump pointer allocator) it is necessary and desirable. Unsized rvalues must be used together with lifetime ascription to let the compiler free them.

Instead, the stack pointer needs to be rewinded every loop iteration (and in general whenever an alloca goes out of scope), which probably requires LLVM changes, although it might possible to get away with just altering the stack pointer via inline assembly.

LLVM has @llvm.stacksave and @llvm.stackrestore intrinsics for this.

@aturon aturon referenced this pull request Feb 7, 2018

Open

Tracking issue for RFC #1909: Unsized Rvalues #48055

0 of 6 tasks complete
@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Feb 7, 2018

Member

This RFC has been (very belatedly!) merged!

Tracking issue

Member

aturon commented Feb 7, 2018

This RFC has been (very belatedly!) merged!

Tracking issue

@aturon aturon merged commit 6c3c48d into rust-lang:master Feb 8, 2018

scottlamb added a commit to scottlamb/moonfire-nvr that referenced this pull request Feb 23, 2018

take FnMut closures by reference
I mistakenly thought these had to be monomorphized. (The FnOnce still
does, until rust-lang/rfcs#1909 is implemented.) Turns out this way works
fine. It should result in less compile time / code size, though I didn't check
this.

@whitequark whitequark referenced this pull request May 13, 2018

Closed

support alloca #618

@TheDan64

This comment has been minimized.

Show comment
Hide comment
@TheDan64

TheDan64 Jun 5, 2018

As I understand it, a goal of this RFC is to make unsized trait objects possible. So, would this make Vec<Trait> and friends (ie HashMap<X, Trait>) possible? Vec<Box<Trait>> and Vec<&Trait> are both frustrating types to work with today when needing heterogeneous collections.

TheDan64 commented Jun 5, 2018

As I understand it, a goal of this RFC is to make unsized trait objects possible. So, would this make Vec<Trait> and friends (ie HashMap<X, Trait>) possible? Vec<Box<Trait>> and Vec<&Trait> are both frustrating types to work with today when needing heterogeneous collections.

@Diggsey

This comment has been minimized.

Show comment
Hide comment
@Diggsey

Diggsey Jun 5, 2018

Contributor

@TheDan64 no: trait objects are always unsized, and types like Vec cannot work with unsized types because they store their elements in a contiguous block of memory - that's not going to change.

This RFC allows you to pass unsized values (including trait objects) to functions by value (ie. move them into the function) and to store unsized values directly on the stack.

Contributor

Diggsey commented Jun 5, 2018

@TheDan64 no: trait objects are always unsized, and types like Vec cannot work with unsized types because they store their elements in a contiguous block of memory - that's not going to change.

This RFC allows you to pass unsized values (including trait objects) to functions by value (ie. move them into the function) and to store unsized values directly on the stack.

@kennytm

This comment has been minimized.

Show comment
Hide comment
@kennytm

kennytm Jun 5, 2018

Member

@TheDan64 Note that even if we allowed Vec<dyn Trait>, it still cannot store heterogeneous items (all elements must have the same type, determined by a single vtable stored elsewhere).

Member

kennytm commented Jun 5, 2018

@TheDan64 Note that even if we allowed Vec<dyn Trait>, it still cannot store heterogeneous items (all elements must have the same type, determined by a single vtable stored elsewhere).

@TheDan64

This comment has been minimized.

Show comment
Hide comment
@TheDan64

TheDan64 Jun 5, 2018

It'd be interesting if somehow Vec<Trait> / Vec<dyn Trait> could use an anonymous enum under the hood, so that all heterogeneous elements of the Vec are the same width without all of the boilerplate of actually creating an enum for every type implementing Trait

TheDan64 commented Jun 5, 2018

It'd be interesting if somehow Vec<Trait> / Vec<dyn Trait> could use an anonymous enum under the hood, so that all heterogeneous elements of the Vec are the same width without all of the boilerplate of actually creating an enum for every type implementing Trait

@kennytm

This comment has been minimized.

Show comment
Hide comment
@kennytm

kennytm Jun 5, 2018

Member

That's not possible because you don't know how big the type will be. Consider this, in crate A we have:

pub trait Trait {
    fn do_something(&self);
}
impl Trait for u8 { ... }
impl Trait for u64 { ... }

let mut TRAITS: Vec<dyn Trait> = vec![1u8, 2u64, ...];

And then in crate B, we write:

extern crate crate_a;

struct MyHugeStruct([u64; 8192]);
impl crate_a::Trait for MyHugeStruct {
    fn do_something(&self) {}
}

...

crate_a::TRAITS.push_back(MyHugeStruct([0u64; 8192]));

so how do crate_a know there is going to be a 64 KiB item while allocating the Vec and generate the anonymous enums?

Member

kennytm commented Jun 5, 2018

That's not possible because you don't know how big the type will be. Consider this, in crate A we have:

pub trait Trait {
    fn do_something(&self);
}
impl Trait for u8 { ... }
impl Trait for u64 { ... }

let mut TRAITS: Vec<dyn Trait> = vec![1u8, 2u64, ...];

And then in crate B, we write:

extern crate crate_a;

struct MyHugeStruct([u64; 8192]);
impl crate_a::Trait for MyHugeStruct {
    fn do_something(&self) {}
}

...

crate_a::TRAITS.push_back(MyHugeStruct([0u64; 8192]));

so how do crate_a know there is going to be a 64 KiB item while allocating the Vec and generate the anonymous enums?

@TheDan64

This comment has been minimized.

Show comment
Hide comment
@TheDan64

TheDan64 Jun 5, 2018

Good point!

TheDan64 commented Jun 5, 2018

Good point!

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Jun 5, 2018

Is there any facility for Vec<dyn Trait> to enforce that all elements have the same type? I think no because lifetimes are erased too early in the compilation process.

I'd think if Vec<dyn Trait> made sense then it must accept heterogeneous types, so it likely makes no sense. In principle, one could make ArenaVec<dyn Trait> like types that operate as a vector of owning references though.

burdges commented Jun 5, 2018

Is there any facility for Vec<dyn Trait> to enforce that all elements have the same type? I think no because lifetimes are erased too early in the compilation process.

I'd think if Vec<dyn Trait> made sense then it must accept heterogeneous types, so it likely makes no sense. In principle, one could make ArenaVec<dyn Trait> like types that operate as a vector of owning references though.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jun 5, 2018

Member

Is there any facility for Vec<dyn Trait> to enforce that all elements have the same type? I think no because lifetimes are erased too early in the compilation process.

I'd think if Vec<dyn Trait> made sense then it must accept heterogeneous types, so it likely makes no sense. In principle, one could make ArenaVec<dyn Trait> like types that operate as a vector of owning references though.

Where do lifetimes come into it?

My view on it is that Vec<T> is a resizeable Box<[T]>, so the next logical question is: what would [dyn Trait] mean? Well, at any given time, it must be one type, i.e. [T; n] where T: Trait.

Working our way from a more explicit dynamic existential notation, Box<[dyn Trait]> would be roughly sugar for Box<dyn<n: usize> [dyn<T: Trait> T; n]>.
Something very important here is that without indirection (or with this RFC, stack ownership with usage patterns that can be implemented through indirection), dyn can't function, so dyns necessarily "bubble up" to just around the pointer, so our earlier type must be equivalent to dyn<T: Trait, n: usize> Box<[T; n]>, with homogenous element types, as all static arrays are.

This makes Vec<dyn Trait> be dyn<T: Trait> Vec<T>, which may be useful if you have many values of the same type, but you want to forget the exact type. One real-world use of this that has come up in the past is game engines, where you might want Vec<Vec<dyn Entity>> (or similar), to keep all the values of the same type together, for processing speed and storage benefits.

Member

eddyb commented Jun 5, 2018

Is there any facility for Vec<dyn Trait> to enforce that all elements have the same type? I think no because lifetimes are erased too early in the compilation process.

I'd think if Vec<dyn Trait> made sense then it must accept heterogeneous types, so it likely makes no sense. In principle, one could make ArenaVec<dyn Trait> like types that operate as a vector of owning references though.

Where do lifetimes come into it?

My view on it is that Vec<T> is a resizeable Box<[T]>, so the next logical question is: what would [dyn Trait] mean? Well, at any given time, it must be one type, i.e. [T; n] where T: Trait.

Working our way from a more explicit dynamic existential notation, Box<[dyn Trait]> would be roughly sugar for Box<dyn<n: usize> [dyn<T: Trait> T; n]>.
Something very important here is that without indirection (or with this RFC, stack ownership with usage patterns that can be implemented through indirection), dyn can't function, so dyns necessarily "bubble up" to just around the pointer, so our earlier type must be equivalent to dyn<T: Trait, n: usize> Box<[T; n]>, with homogenous element types, as all static arrays are.

This makes Vec<dyn Trait> be dyn<T: Trait> Vec<T>, which may be useful if you have many values of the same type, but you want to forget the exact type. One real-world use of this that has come up in the past is game engines, where you might want Vec<Vec<dyn Entity>> (or similar), to keep all the values of the same type together, for processing speed and storage benefits.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Jun 5, 2018

I see, so fn foo() -> Vec<dyn Trait> as -> dyn<T: Trait> Vec<T> acts much like-> Vec<impl Trait> in that you get an anonymized type T which you cannot insert, pop, etc., but differs by avoiding monomorphisation on this anonymized type, by instead using a bigger vtable that points to code that knows the size, or maybe looks up the size in the vtable.

As for lifetimes, I'd expect a Vec<Vec<dyn Entity>> requires some reflection to get back the inner Vec<T> for modifications, which breaks if BorrowedCat<'a> : Entity, so you actually want Entity : Any, right?

Now why would Vec<Vec<dyn Entity>> be Vec<dyn<T: Entity> Vec<T>> not dyn<T: Entity> Vec<Vec<T>>? I suppose there must be a rule for this dyn<T: Entity> .. construct so that &'a Trait is dyn<T: Trait> &'a T, MutexGuard<'a,dyn Trait> is dyn<T: Trait> MutexGuard<'a,T>, etc.? It's just building the vtable for the innermost type containing all that particular dyn Trait perhaps? if type Foo<T> = HashMap<T,Vec<T>> then Foo<dyn Trait> is dyn<T: Trait> HashMap<T,Vec<T>>, yes?

burdges commented Jun 5, 2018

I see, so fn foo() -> Vec<dyn Trait> as -> dyn<T: Trait> Vec<T> acts much like-> Vec<impl Trait> in that you get an anonymized type T which you cannot insert, pop, etc., but differs by avoiding monomorphisation on this anonymized type, by instead using a bigger vtable that points to code that knows the size, or maybe looks up the size in the vtable.

As for lifetimes, I'd expect a Vec<Vec<dyn Entity>> requires some reflection to get back the inner Vec<T> for modifications, which breaks if BorrowedCat<'a> : Entity, so you actually want Entity : Any, right?

Now why would Vec<Vec<dyn Entity>> be Vec<dyn<T: Entity> Vec<T>> not dyn<T: Entity> Vec<Vec<T>>? I suppose there must be a rule for this dyn<T: Entity> .. construct so that &'a Trait is dyn<T: Trait> &'a T, MutexGuard<'a,dyn Trait> is dyn<T: Trait> MutexGuard<'a,T>, etc.? It's just building the vtable for the innermost type containing all that particular dyn Trait perhaps? if type Foo<T> = HashMap<T,Vec<T>> then Foo<dyn Trait> is dyn<T: Trait> HashMap<T,Vec<T>>, yes?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jun 5, 2018

Member

I suppose there must be a rule

Vec<T> contains a pointer to T (or to [T], rather - hence the comparison to Box<[T]>).
The indirection is key, and where the dyn "bubbles up to".
It has nothing specific to do with Vec<T> being generic over T.

You don't need reflection for modifications, at least not all of them (specifically, adding new elements is a problem, but looking at/removing existing ones is fine), and there's nothing special about Vec - if you need reflection, you end up using Any - there's no other mechanism to do so.

Note that Vec<Vec<dyn Entity + 'arena>> is perfectly fine, but then Entity can't inherit Any.

Member

eddyb commented Jun 5, 2018

I suppose there must be a rule

Vec<T> contains a pointer to T (or to [T], rather - hence the comparison to Box<[T]>).
The indirection is key, and where the dyn "bubbles up to".
It has nothing specific to do with Vec<T> being generic over T.

You don't need reflection for modifications, at least not all of them (specifically, adding new elements is a problem, but looking at/removing existing ones is fine), and there's nothing special about Vec - if you need reflection, you end up using Any - there's no other mechanism to do so.

Note that Vec<Vec<dyn Entity + 'arena>> is perfectly fine, but then Entity can't inherit Any.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Jun 6, 2018

Yes, I suppose the vtable "handles the reflection" for removing elements. I suppose dyn<T: Trait> HashMap<T,..> has this indirection through Trait : Borrow<K> probably, which sounds quite restrictive.

Anyways dyn bubbling up represents vtable expansion, so one might even want say dyn<T: AddAssign<&T>> (Vec<T>, HashMap<K,T>) to express that AddAssign is object safe here, although not normally, and you can use it between elements in these two data structures.

burdges commented Jun 6, 2018

Yes, I suppose the vtable "handles the reflection" for removing elements. I suppose dyn<T: Trait> HashMap<T,..> has this indirection through Trait : Borrow<K> probably, which sounds quite restrictive.

Anyways dyn bubbling up represents vtable expansion, so one might even want say dyn<T: AddAssign<&T>> (Vec<T>, HashMap<K,T>) to express that AddAssign is object safe here, although not normally, and you can use it between elements in these two data structures.

@cramertj cramertj referenced this pull request Jul 2, 2018

Merged

Implement Unsized Rvalues #51131

7 of 11 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment