New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alloca for Rust #1808

Closed
wants to merge 8 commits into
from

Conversation

@llogiq
Contributor

llogiq commented Dec 6, 2016

(rendered)

Show outdated Hide outdated text/0000-alloca.md
[drawbacks]: #drawbacks
- Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space
leads to segfaults at best and undefined behavior at worst. On unices, the stack can usually be extended at runtime,

This comment has been minimized.

@sfackler

sfackler Dec 6, 2016

Member

Will we not be able to correctly probe the stack for space when alloca-ing?

@sfackler

sfackler Dec 6, 2016

Member

Will we not be able to correctly probe the stack for space when alloca-ing?

This comment has been minimized.

@comex

comex Dec 7, 2016

It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them.

Speaking of which, are stack probes still not actually in place for regular stack allocations? Just tried compiling a test program (that allocates a large array on the stack) with rustc nightly on my system and didn't see any probes in the asm output.

@comex

comex Dec 7, 2016

It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them.

Speaking of which, are stack probes still not actually in place for regular stack allocations? Just tried compiling a test program (that allocates a large array on the stack) with rustc nightly on my system and didn't see any probes in the asm output.

This comment has been minimized.

@jmesmon

jmesmon Dec 7, 2016

It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them.

Can't one already overflow the stack with recursion?

@jmesmon

jmesmon Dec 7, 2016

It should be possible to do stack probes. If they aren't used, though, this must be marked unsafe, as it's trivial to corrupt memory without them.

Can't one already overflow the stack with recursion?

This comment has been minimized.

@sfackler

sfackler Dec 7, 2016

Member

Right, which is why stack probes are inserted so that the runtime can detect that and abort with a fatal runtime error: stack overflow message rather than just run off into whatever memory is below.

@sfackler

sfackler Dec 7, 2016

Member

Right, which is why stack probes are inserted so that the runtime can detect that and abort with a fatal runtime error: stack overflow message rather than just run off into whatever memory is below.

This comment has been minimized.

@comex

comex Dec 7, 2016

Except they actually aren't, since it hasn't been implemented yet (or maybe it only works on Windows?). OSes normally provide a 4kb guard page below the stack, so most stack overflows will crash anyway (and trigger that error, which comes from a signal handler), but a function with a big enough stack frame can skip past that page, and I think it may actually be possible in practice to exploit some Rust programs that way... I should try.

@comex

comex Dec 7, 2016

Except they actually aren't, since it hasn't been implemented yet (or maybe it only works on Windows?). OSes normally provide a 4kb guard page below the stack, so most stack overflows will crash anyway (and trigger that error, which comes from a signal handler), but a function with a big enough stack frame can skip past that page, and I think it may actually be possible in practice to exploit some Rust programs that way... I should try.

This comment has been minimized.

This comment has been minimized.

@whitequark

whitequark Dec 7, 2016

I am working on integrating stack probes, which are required for robustness on our embedded system (ARTIQ). I expect to implement them in a way generic enough to be used on all bare-metal platforms, including no-MMU, as well as allow for easy shimming on currently unsupported OSes.

@whitequark

whitequark Dec 7, 2016

I am working on integrating stack probes, which are required for robustness on our embedded system (ARTIQ). I expect to implement them in a way generic enough to be used on all bare-metal platforms, including no-MMU, as well as allow for easy shimming on currently unsupported OSes.

Show outdated Hide outdated text/0000-alloca.md
# Unresolved questions
[unresolved]: #unresolved-questions
- Could we return the slice directly (reducing visible complexity)?

This comment has been minimized.

@sfackler

sfackler Dec 6, 2016

Member

I may have missed it, but I'm not sure I understand why we wouldn't be able to just return the slice?

@sfackler

sfackler Dec 6, 2016

Member

I may have missed it, but I'm not sure I understand why we wouldn't be able to just return the slice?

This comment has been minimized.

@jmesmon

jmesmon Dec 7, 2016

My guess:

Because the size of the returned value could be variable and would need to be copied (or preserved on the stack). Not sure rust has anything that does that right now. Suppose it could be done, could just become a detail of the calling convention. Can't have caller allocate unless it knows how much memory to allocate, and making it aware of the memory needed by the callee could complicate things (and would likely be impossible to do in some cases without running the callee twice).

If these stack allocations were parameterized with type-level numbers, it would be fairly straight forward (ignoring all the general complexity of type level numbers), but this RFC doesn't propose that.

@jmesmon

jmesmon Dec 7, 2016

My guess:

Because the size of the returned value could be variable and would need to be copied (or preserved on the stack). Not sure rust has anything that does that right now. Suppose it could be done, could just become a detail of the calling convention. Can't have caller allocate unless it knows how much memory to allocate, and making it aware of the memory needed by the callee could complicate things (and would likely be impossible to do in some cases without running the callee twice).

If these stack allocations were parameterized with type-level numbers, it would be fairly straight forward (ignoring all the general complexity of type level numbers), but this RFC doesn't propose that.

compiler would become somewhat more complex (though a simple incomplete escape analysis implementation is already in
[clippy](https://github.com/Manishearth/rust-clippy).
# Unresolved questions

This comment has been minimized.

@sfackler

sfackler Dec 6, 2016

Member

C's alloca can't be used inside of a function argument list - would we need the same restriction or would we handle that properly?

@sfackler

sfackler Dec 6, 2016

Member

C's alloca can't be used inside of a function argument list - would we need the same restriction or would we handle that properly?

This comment has been minimized.

@jmesmon

jmesmon Dec 7, 2016

My understanding is that that limitation exists primarily to allow naive single pass compilers to exist (along with some "interesting" ways of implementing alloca()). I don't think that concern would apply to rust.

@jmesmon

jmesmon Dec 7, 2016

My understanding is that that limitation exists primarily to allow naive single pass compilers to exist (along with some "interesting" ways of implementing alloca()). I don't think that concern would apply to rust.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 7, 2016

Member

Using a function for this would be a mistake IMO. It's not a true function in C either, for various reasons.

&[x; n] producing &[T] for runtime values of n is more appealing from an implementation perspective.
That is, LLVM's stack variables (its alloca instruction, which is poorly named but whatever) have an optional "dynamic arity", which can be considered to be 1 for regular variables.

This model is closer to VLAs than alloca, and it could work well with box [x; n] too (which would evaluate n first, allocate, then initialize the slice with copies of x, and produce Box<[T]>, Rc<[T]> etc).

Member

eddyb commented Dec 7, 2016

Using a function for this would be a mistake IMO. It's not a true function in C either, for various reasons.

&[x; n] producing &[T] for runtime values of n is more appealing from an implementation perspective.
That is, LLVM's stack variables (its alloca instruction, which is poorly named but whatever) have an optional "dynamic arity", which can be considered to be 1 for regular variables.

This model is closer to VLAs than alloca, and it could work well with box [x; n] too (which would evaluate n first, allocate, then initialize the slice with copies of x, and produce Box<[T]>, Rc<[T]> etc).

@steveklabnik

This comment has been minimized.

Show comment
Hide comment
@steveklabnik

steveklabnik Dec 7, 2016

Member

I do know some people are very wary of alloca/vlas in rust, but I can't remember who....

Member

steveklabnik commented Dec 7, 2016

I do know some people are very wary of alloca/vlas in rust, but I can't remember who....

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Dec 7, 2016

It's worth being careful to avoid accidentally considering any constant n as runtime at the same time though, so fixing issues like rust-lang/rust#34344 and worrying about more complex constant expressions. As otherwise Foo<&[x; n]> becomes messy and fixing it risks breakage.

burdges commented Dec 7, 2016

It's worth being careful to avoid accidentally considering any constant n as runtime at the same time though, so fixing issues like rust-lang/rust#34344 and worrying about more complex constant expressions. As otherwise Foo<&[x; n]> becomes messy and fixing it risks breakage.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 7, 2016

Member

@burdges We already have an analysis which treats rust-lang/rust#34344 correctly - the problem there is the evaluator (which is AST-based and pre-typeck, this will change next year hopefully), not the qualifier.

A simpler and far more problematic example would be something like if false { 0 } else { 1 } - and I have to agree here, as much as I'd like box [x; n] for runtime n (even w/o VLAs).

Member

eddyb commented Dec 7, 2016

@burdges We already have an analysis which treats rust-lang/rust#34344 correctly - the problem there is the evaluator (which is AST-based and pre-typeck, this will change next year hopefully), not the qualifier.

A simpler and far more problematic example would be something like if false { 0 } else { 1 } - and I have to agree here, as much as I'd like box [x; n] for runtime n (even w/o VLAs).

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 7, 2016

Member

@steveklabnik That might've been @cmr.

Member

eddyb commented Dec 7, 2016

@steveklabnik That might've been @cmr.

# Drawbacks
[drawbacks]: #drawbacks
- Even more stack usage means the dreaded stack limit will probably be reached even sooner. Overflowing the stack space

This comment has been minimized.

@Stebalien

Stebalien Dec 7, 2016

Contributor

On the flip side, this might reduce stack usage for users of ArrayVec and those manually allocating overly-large arrays on the stack (I occasionally do this when reading small files).

@Stebalien

Stebalien Dec 7, 2016

Contributor

On the flip side, this might reduce stack usage for users of ArrayVec and those manually allocating overly-large arrays on the stack (I occasionally do this when reading small files).

@japaric

This comment has been minimized.

Show comment
Hide comment
@japaric

japaric Dec 7, 2016

Member

Could we get an example of how one might use this reserve function in the RFC? Because I can't tell just from its signature. It basically returns a &[T] so you can't mutate it? (unless it's supposed to only be used with Cell types?). And it seems to me that function lets you read unitialized memory (mem::reserve::<Droppable>(2)[0]) but it's not marked as unsafe?

Member

japaric commented Dec 7, 2016

Could we get an example of how one might use this reserve function in the RFC? Because I can't tell just from its signature. It basically returns a &[T] so you can't mutate it? (unless it's supposed to only be used with Cell types?). And it seems to me that function lets you read unitialized memory (mem::reserve::<Droppable>(2)[0]) but it's not marked as unsafe?

Show outdated Hide outdated text/0000-alloca.md
```
`StackSlice`'s embedded lifetime ensures that the stack allocation may never leave its scope. Thus the borrow checker
can uphold the contract that LLVM's `alloca` requires.

This comment has been minimized.

@jhjourdan

jhjourdan Dec 7, 2016

I don't quite see why the type of reserve would prevent StackSlice to escape the scope where reserve is called. More precisely, It seems like the user could define:

fn reserve_and_zero<'a>(elements : usize) -> &'a[u32] {
    let s = *reserve(usize);
    for x in s.iter_mut() {  *x = 0 }
    return s
}

Which would be invalid if it is not inlined.

@jhjourdan

jhjourdan Dec 7, 2016

I don't quite see why the type of reserve would prevent StackSlice to escape the scope where reserve is called. More precisely, It seems like the user could define:

fn reserve_and_zero<'a>(elements : usize) -> &'a[u32] {
    let s = *reserve(usize);
    for x in s.iter_mut() {  *x = 0 }
    return s
}

Which would be invalid if it is not inlined.

This comment has been minimized.

@rkruppe

rkruppe Dec 7, 2016

Contributor

Yes, the signatures proposed here is completely unsound. Since 'a is a lifetime parameter not constrained by anything, any call to reserve can simply pick 'a = 'static.

@rkruppe

rkruppe Dec 7, 2016

Contributor

Yes, the signatures proposed here is completely unsound. Since 'a is a lifetime parameter not constrained by anything, any call to reserve can simply pick 'a = 'static.

Show outdated Hide outdated text/0000-alloca.md
- `mem::with_alloc<T, F: Fn([T]) -> U>(elems: usize, code: F) -> U` This has the benefit of reducing API surface, and
introducing rightwards drift, which makes it more unlikely to be used too much. However, it also needs to be
monomorphized for each function (instead of only for each target type), which will increase compile times.

This comment has been minimized.

@oli-obk

oli-obk Dec 7, 2016

Contributor

I think this is the only solution that will really work, since otherwise you can always use the reserve intrinsic to create a slice with a 'static lifetime

@oli-obk

oli-obk Dec 7, 2016

Contributor

I think this is the only solution that will really work, since otherwise you can always use the reserve intrinsic to create a slice with a 'static lifetime

Show outdated Hide outdated text/0000-alloca.md
monomorphized for each function (instead of only for each target type), which will increase compile times.
- dynamically sized arrays are a potential solution, however, those would need to have a numerical type that is only
fully known at runtime, requiring complex type system gymnastics.

This comment has been minimized.

@oli-obk

oli-obk Dec 7, 2016

Contributor

The generalization of this features is to allow unsized locals (which simply allocate stack space for the size given by the memory which they are moved out of)

To prevent accidental usage of this, a language item StackBox<T: !Sized> could be created, which alloca's all its memory depending on the size_of_val. The problem is that this would need to run code at moving, which is unprecedented in Rust

@oli-obk

oli-obk Dec 7, 2016

Contributor

The generalization of this features is to allow unsized locals (which simply allocate stack space for the size given by the memory which they are moved out of)

To prevent accidental usage of this, a language item StackBox<T: !Sized> could be created, which alloca's all its memory depending on the size_of_val. The problem is that this would need to run code at moving, which is unprecedented in Rust

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 7, 2016

Member

@burdges Oh I remember now, it's just that it's been a while since I thought about this.
The solution is to only do this if the expected type is &(mut) [T] to begin with and still require constants otherwise. This is one of the rarer cases where type ascription makes a noticeable difference, I suppose.

Member

eddyb commented Dec 7, 2016

@burdges Oh I remember now, it's just that it's been a while since I thought about this.
The solution is to only do this if the expected type is &(mut) [T] to begin with and still require constants otherwise. This is one of the rarer cases where type ascription makes a noticeable difference, I suppose.

@tomaka

This comment has been minimized.

Show comment
Hide comment
@tomaka

tomaka Dec 7, 2016

SmallVecs work well enough and have the added benefit of limiting stack usage.

As I have already said in an issue somewhere, the assembly generated by SmallVec is very disappointing and saying that it "works well enough" is not really true.

For example this Rust code:

extern {
    fn foo(data: *const u32);
}

pub fn test() {
    let mut a: SmallVec<[u32; 4]> = SmallVec::new();
    a.push(5);
    a.push(12);
    a.push(94);
    a.push(1);

    unsafe { foo(a.as_ptr()) }
}

...generates the assembly below:
(compiled with cargo rustc --release -- --emit=asm with the 2016-11-05 nightly and smallvec v0.3.1)

	pushq	%rbp
	subq	$80, %rsp
	leaq	80(%rsp), %rbp
	movq	$-2, -8(%rbp)
	xorps	%xmm0, %xmm0
	movups	%xmm0, -36(%rbp)
	movaps	%xmm0, -48(%rbp)
	leaq	-48(%rbp), %rcx
	movl	$5, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$12, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$94, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$1, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	cmpl	$1, -40(%rbp)
	leaq	-36(%rbp), %rcx
	cmoveq	-32(%rbp), %rcx
	callq	foo
	cmpl	$1, -40(%rbp)
	jne	.LBB4_5
	movq	-24(%rbp), %rdx
	testq	%rdx, %rdx
	je	.LBB4_8
	movq	-32(%rbp), %rcx
	shlq	$2, %rdx
	movl	$4, %r8d
	callq	__rust_deallocate
	jmp	.LBB4_8
	leaq	-32(%rbp), %rax
	movl	$1, -40(%rbp)
	xorps	%xmm0, %xmm0
	movups	%xmm0, (%rax)
	addq	$80, %rsp
	popq	%rbp
	retq

And this is the _ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E function that is called four times:

	pushq	%r15
	pushq	%r14
	pushq	%rsi
	pushq	%rdi
	pushq	%rbp
	pushq	%rbx
	subq	$40, %rsp
	movl	%edx, %r14d
	movq	%rcx, %rsi
	movl	8(%rsi), %ebx
	cmpl	$1, %ebx
	movl	$4, %ebp
	cmoveq	24(%rsi), %rbp
	movq	(%rsi), %rax
	cmpq	%rbp, %rax
	jne	.LBB1_1
	leaq	(%rbp,%rbp), %rax
	cmpq	$1, %rax
	movl	$1, %r15d
	cmovaq	%rax, %r15
	cmpq	%r15, %rbp
	ja	.LBB1_11
	movl	$4, %ecx
	movq	%r15, %rax
	mulq	%rcx
	jo	.LBB1_12
	movl	$1, %edi
	testq	%rax, %rax
	je	.LBB1_6
	movl	$4, %edx
	movq	%rax, %rcx
	callq	__rust_allocate
	movq	%rax, %rdi
	testq	%rdi, %rdi
	je	.LBB1_13
	leaq	12(%rsi), %rdx
	cmpl	$1, %ebx
	cmoveq	16(%rsi), %rdx
	shlq	$2, %rbp
	movq	%rdi, %rcx
	movq	%rbp, %r8
	callq	memcpy
	cmpl	$1, 8(%rsi)
	jne	.LBB1_9
	movq	24(%rsi), %rdx
	testq	%rdx, %rdx
	je	.LBB1_9
	movq	16(%rsi), %rcx
	shlq	$2, %rdx
	movl	$4, %r8d
	callq	__rust_deallocate
	movl	$1, 8(%rsi)
	movq	%rdi, 16(%rsi)
	movq	%r15, 24(%rsi)
	movq	(%rsi), %rax
	jmp	.LBB1_10
	leaq	12(%rsi), %rdi
	cmpl	$1, %ebx
	cmoveq	16(%rsi), %rdi
	movl	%r14d, (%rdi,%rax,4)
	incq	(%rsi)
	addq	$40, %rsp
	popq	%rbx
	popq	%rbp
	popq	%rdi
	popq	%rsi
	popq	%r14
	popq	%r15
	retq

(I stripped out the labels in order to make it more readable).

As you can see this is really suboptimal, both in terms of performances (number of instructions executed) and binary size. As this point I'm even wondering if using a Vec is not faster.
Instead a solution that uses alloca would just do a sub esp followed with four mov and you're done. Five instructions.

If it remained to be proven, this is for me the main motivation for having alloca in Rust.

tomaka commented Dec 7, 2016

SmallVecs work well enough and have the added benefit of limiting stack usage.

As I have already said in an issue somewhere, the assembly generated by SmallVec is very disappointing and saying that it "works well enough" is not really true.

For example this Rust code:

extern {
    fn foo(data: *const u32);
}

pub fn test() {
    let mut a: SmallVec<[u32; 4]> = SmallVec::new();
    a.push(5);
    a.push(12);
    a.push(94);
    a.push(1);

    unsafe { foo(a.as_ptr()) }
}

...generates the assembly below:
(compiled with cargo rustc --release -- --emit=asm with the 2016-11-05 nightly and smallvec v0.3.1)

	pushq	%rbp
	subq	$80, %rsp
	leaq	80(%rsp), %rbp
	movq	$-2, -8(%rbp)
	xorps	%xmm0, %xmm0
	movups	%xmm0, -36(%rbp)
	movaps	%xmm0, -48(%rbp)
	leaq	-48(%rbp), %rcx
	movl	$5, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$12, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$94, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	leaq	-48(%rbp), %rcx
	movl	$1, %edx
	callq	_ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E
	cmpl	$1, -40(%rbp)
	leaq	-36(%rbp), %rcx
	cmoveq	-32(%rbp), %rcx
	callq	foo
	cmpl	$1, -40(%rbp)
	jne	.LBB4_5
	movq	-24(%rbp), %rdx
	testq	%rdx, %rdx
	je	.LBB4_8
	movq	-32(%rbp), %rcx
	shlq	$2, %rdx
	movl	$4, %r8d
	callq	__rust_deallocate
	jmp	.LBB4_8
	leaq	-32(%rbp), %rax
	movl	$1, -40(%rbp)
	xorps	%xmm0, %xmm0
	movups	%xmm0, (%rax)
	addq	$80, %rsp
	popq	%rbp
	retq

And this is the _ZN36_$LT$smallvec..SmallVec$LT$A$GT$$GT$4push17hcb9707aa0afc3e57E function that is called four times:

	pushq	%r15
	pushq	%r14
	pushq	%rsi
	pushq	%rdi
	pushq	%rbp
	pushq	%rbx
	subq	$40, %rsp
	movl	%edx, %r14d
	movq	%rcx, %rsi
	movl	8(%rsi), %ebx
	cmpl	$1, %ebx
	movl	$4, %ebp
	cmoveq	24(%rsi), %rbp
	movq	(%rsi), %rax
	cmpq	%rbp, %rax
	jne	.LBB1_1
	leaq	(%rbp,%rbp), %rax
	cmpq	$1, %rax
	movl	$1, %r15d
	cmovaq	%rax, %r15
	cmpq	%r15, %rbp
	ja	.LBB1_11
	movl	$4, %ecx
	movq	%r15, %rax
	mulq	%rcx
	jo	.LBB1_12
	movl	$1, %edi
	testq	%rax, %rax
	je	.LBB1_6
	movl	$4, %edx
	movq	%rax, %rcx
	callq	__rust_allocate
	movq	%rax, %rdi
	testq	%rdi, %rdi
	je	.LBB1_13
	leaq	12(%rsi), %rdx
	cmpl	$1, %ebx
	cmoveq	16(%rsi), %rdx
	shlq	$2, %rbp
	movq	%rdi, %rcx
	movq	%rbp, %r8
	callq	memcpy
	cmpl	$1, 8(%rsi)
	jne	.LBB1_9
	movq	24(%rsi), %rdx
	testq	%rdx, %rdx
	je	.LBB1_9
	movq	16(%rsi), %rcx
	shlq	$2, %rdx
	movl	$4, %r8d
	callq	__rust_deallocate
	movl	$1, 8(%rsi)
	movq	%rdi, 16(%rsi)
	movq	%r15, 24(%rsi)
	movq	(%rsi), %rax
	jmp	.LBB1_10
	leaq	12(%rsi), %rdi
	cmpl	$1, %ebx
	cmoveq	16(%rsi), %rdi
	movl	%r14d, (%rdi,%rax,4)
	incq	(%rsi)
	addq	$40, %rsp
	popq	%rbx
	popq	%rbp
	popq	%rdi
	popq	%rsi
	popq	%r14
	popq	%r15
	retq

(I stripped out the labels in order to make it more readable).

As you can see this is really suboptimal, both in terms of performances (number of instructions executed) and binary size. As this point I'm even wondering if using a Vec is not faster.
Instead a solution that uses alloca would just do a sub esp followed with four mov and you're done. Five instructions.

If it remained to be proven, this is for me the main motivation for having alloca in Rust.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 7, 2016

SmallVecs work well enough and have the added benefit of limiting stack usage.

I am unconvinced this is a benefit. When properly implemented, alloca is safe, extremely cheap, has nice locality and aliasing guarantees and even avoids an extern crate alloc;. In functional-ish languages you usually have a generational GC where the young heap has most of these nice properties but I do not really expect to see one in Rust any time soon, much less an efficient implementation.

SmallVecs work well enough and have the added benefit of limiting stack usage.

I am unconvinced this is a benefit. When properly implemented, alloca is safe, extremely cheap, has nice locality and aliasing guarantees and even avoids an extern crate alloc;. In functional-ish languages you usually have a generational GC where the young heap has most of these nice properties but I do not really expect to see one in Rust any time soon, much less an efficient implementation.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Dec 7, 2016

Interesting @eddyb so fn g() { let mut x : &[T] = &[T::new(); f()]; ... } should not compile without the : &[T] if the ... only uses x for scratch space and does not return it or anything?

Actually I suppose it'd eventually compile if f() is constant somehow.

burdges commented Dec 7, 2016

Interesting @eddyb so fn g() { let mut x : &[T] = &[T::new(); f()]; ... } should not compile without the : &[T] if the ... only uses x for scratch space and does not return it or anything?

Actually I suppose it'd eventually compile if f() is constant somehow.

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Dec 7, 2016

Contributor

Thanks everyone for the great comments! I'm going to follow @oli-obk 's advice and change to the closure based version. Need some time to do it, though... 😄

Contributor

llogiq commented Dec 7, 2016

Thanks everyone for the great comments! I'm going to follow @oli-obk 's advice and change to the closure based version. Need some time to do it, though... 😄

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 7, 2016

Member

@llogiq I'm trying to tell you that anything using functions is unnecessarily suboptimal from an implementation perspective. The feature LLVM has is effectively VLAs. clang implements alloca(n) as something like &((char[n]) {})[0] (not sure if that literally works in C. char a[n]; in any case).

if the ... only uses x for scratch space and does not return it or anything

@burdges I'm not sure what you mean here, you can never return a stack local and rvalue promotion is backwards compatible in that expanding the set of constants only allows more code to compile.

Member

eddyb commented Dec 7, 2016

@llogiq I'm trying to tell you that anything using functions is unnecessarily suboptimal from an implementation perspective. The feature LLVM has is effectively VLAs. clang implements alloca(n) as something like &((char[n]) {})[0] (not sure if that literally works in C. char a[n]; in any case).

if the ... only uses x for scratch space and does not return it or anything

@burdges I'm not sure what you mean here, you can never return a stack local and rvalue promotion is backwards compatible in that expanding the set of constants only allows more code to compile.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 8, 2016

@llogiq Apart from @eddyb's concerns, which I also support, using a closure will make this feature significantly less useful. For example then you could not call alloca in a loop while accumulating the results, which I would like to rely on for writing zero-heap-allocation parsers.

Also, in general I am opposed to deliberately decreasing usability in arbitrary ways because of your (or anyone else's) subjective opinion of whether it is "considered bad". You are not making a tradeoff with some other feature, as it usually goes in language design, you are simply making my life harder for no reason at all.

@llogiq Apart from @eddyb's concerns, which I also support, using a closure will make this feature significantly less useful. For example then you could not call alloca in a loop while accumulating the results, which I would like to rely on for writing zero-heap-allocation parsers.

Also, in general I am opposed to deliberately decreasing usability in arbitrary ways because of your (or anyone else's) subjective opinion of whether it is "considered bad". You are not making a tradeoff with some other feature, as it usually goes in language design, you are simply making my life harder for no reason at all.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 8, 2016

Now, a constructive suggestion: it looks like there is no existing way to reflect the lifetime of alloca exactly, and I agree with other commenters that introducing VLAs seems like an excessively invasive change for a fairly niche use case. How about making it a built-in macro?

let n = 10; for i in 1..n { let ary: &[u8] = alloca![0; n]; }

This is nicely similar to the existing case of vec![] and I believe it could return a value with any lifetime it wants, especially given that in this case the lifetime is simply fully lexical (the entire containing function).

Now, a constructive suggestion: it looks like there is no existing way to reflect the lifetime of alloca exactly, and I agree with other commenters that introducing VLAs seems like an excessively invasive change for a fairly niche use case. How about making it a built-in macro?

let n = 10; for i in 1..n { let ary: &[u8] = alloca![0; n]; }

This is nicely similar to the existing case of vec![] and I believe it could return a value with any lifetime it wants, especially given that in this case the lifetime is simply fully lexical (the entire containing function).

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 8, 2016

Member

@whitequark While that could work, doing it with an intrinsic is more problematic than having it first-class.

Hmm, we can make the macro encapsulate the unsafety by using a helper function that takes a reference to some other stack local and equates the lifetimes, only problem being that I'm not sure how to create something like that which lives long enough to be useful. May need to abuse some of the rvalue lifetime rules to get it.

EDIT: I suppose there's always the possibility of making the intrinsic generic over a lifetime that the compiler restricts artificially.

Member

eddyb commented Dec 8, 2016

@whitequark While that could work, doing it with an intrinsic is more problematic than having it first-class.

Hmm, we can make the macro encapsulate the unsafety by using a helper function that takes a reference to some other stack local and equates the lifetimes, only problem being that I'm not sure how to create something like that which lives long enough to be useful. May need to abuse some of the rvalue lifetime rules to get it.

EDIT: I suppose there's always the possibility of making the intrinsic generic over a lifetime that the compiler restricts artificially.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 8, 2016

I suppose there's always the possibility of making the intrinsic generic over a lifetime that the compiler restricts artificially.

Yes, that's what I was thinking about. The intrinsic would merely translate to alloca of the right type directly, and the lifetime will be restricted early (you could do it right after you introduce lifetimes into the AST, really). I think this is a very nonintrusive solution.

I suppose there's always the possibility of making the intrinsic generic over a lifetime that the compiler restricts artificially.

Yes, that's what I was thinking about. The intrinsic would merely translate to alloca of the right type directly, and the lifetime will be restricted early (you could do it right after you introduce lifetimes into the AST, really). I think this is a very nonintrusive solution.

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Dec 8, 2016

Contributor

@eddyb Ok, so it would need to be a compiler-internal macro. That's a neat solution to both the lifetime problem and the problem of extending the compiler with as little fuss as possible.

One thing I should probably add – because this is low-level, performance-über-alles, we'd likely not want to initialize the memory, so we could expand to mem::uninitialized(), rigging the type as necessary, but the whole thing would be unsafe and should be marked as such. I'm not too deep into macros; do we have unsafe macros?

Contributor

llogiq commented Dec 8, 2016

@eddyb Ok, so it would need to be a compiler-internal macro. That's a neat solution to both the lifetime problem and the problem of extending the compiler with as little fuss as possible.

One thing I should probably add – because this is low-level, performance-über-alles, we'd likely not want to initialize the memory, so we could expand to mem::uninitialized(), rigging the type as necessary, but the whole thing would be unsafe and should be marked as such. I'm not too deep into macros; do we have unsafe macros?

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Dec 8, 2016

One thing I should probably add – because this is low-level, performance-über-alles, we'd likely not want to initialize the memory, so we could expand to mem::uninitialized(), rigging the type as necessary, but the whole thing would be unsafe and should be marked as such. I'm not too deep into macros; do we have unsafe macros?

No, this isn't necessarily the case. There are different motivations for using alloca: performance, avoiding using the heap (because there might not be one), maybe others. I want to use this alloca facility but I still zero the memory first and in particular I don't want to have to use unsafe for it as avoiding unsafe is a key selling point for what I'm buliding.

I'd suggest instead that the feature should allow the user to somehow specify the value to fill the memory with. Then the user could pass unsafe { mem::uninitialized() } if it doesn't want the memory to be zeroed and if it is OK with using unsafe, or it could use 0u32 or whatever the element type is, if it wants it to be zeroed.

briansmith commented Dec 8, 2016

One thing I should probably add – because this is low-level, performance-über-alles, we'd likely not want to initialize the memory, so we could expand to mem::uninitialized(), rigging the type as necessary, but the whole thing would be unsafe and should be marked as such. I'm not too deep into macros; do we have unsafe macros?

No, this isn't necessarily the case. There are different motivations for using alloca: performance, avoiding using the heap (because there might not be one), maybe others. I want to use this alloca facility but I still zero the memory first and in particular I don't want to have to use unsafe for it as avoiding unsafe is a key selling point for what I'm buliding.

I'd suggest instead that the feature should allow the user to somehow specify the value to fill the memory with. Then the user could pass unsafe { mem::uninitialized() } if it doesn't want the memory to be zeroed and if it is OK with using unsafe, or it could use 0u32 or whatever the element type is, if it wants it to be zeroed.

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Dec 8, 2016

I agree with @briansmith on the safety aspect. I think that a safe interface should be provided as main interface with a additional way to have it uninitialized.

Through passing in mem::uninitalized() might not work depending on how it is normally initialized. (initializing through a closure would call mem::uninitalized() ntimes and copy the result to the place in the stack (should get optimized away, I guess??) and initializing with Clone would try to clone uninitialized memory, lastly initializing with Copy might be to restrictive...)

So maybe alloc![...] and unsafe alloc_unitialized![...].

rustonaut commented Dec 8, 2016

I agree with @briansmith on the safety aspect. I think that a safe interface should be provided as main interface with a additional way to have it uninitialized.

Through passing in mem::uninitalized() might not work depending on how it is normally initialized. (initializing through a closure would call mem::uninitalized() ntimes and copy the result to the place in the stack (should get optimized away, I guess??) and initializing with Clone would try to clone uninitialized memory, lastly initializing with Copy might be to restrictive...)

So maybe alloc![...] and unsafe alloc_unitialized![...].

@comex

This comment has been minimized.

Show comment
Hide comment
@comex

comex Dec 9, 2016

Ideally the initializing version would take an iterator so it could be initialized with something other than copies of the same value.

comex commented Dec 9, 2016

Ideally the initializing version would take an iterator so it could be initialized with something other than copies of the same value.

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Dec 9, 2016

Ideally the initializing version would take an iterator so it could be initialized with something other than copies of the same value.

Ideally that would be true for let x: [T; N] = <iterator>; too, but it isn't, and we get by. If you specify it as an iterator then you have to deal with the case where the iterator iterators over a different number of elements than what is requested, which seems like too much complexity for this.

Ideally the initializing version would take an iterator so it could be initialized with something other than copies of the same value.

Ideally that would be true for let x: [T; N] = <iterator>; too, but it isn't, and we get by. If you specify it as an iterator then you have to deal with the case where the iterator iterators over a different number of elements than what is requested, which seems like too much complexity for this.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 9, 2016

Member

@comex That would be collecting into a "VLArrayVec" (which needs to handle partial drop in case of panic, on top of what @briansmith already mentioned).

Member

eddyb commented Dec 9, 2016

@comex That would be collecting into a "VLArrayVec" (which needs to handle partial drop in case of panic, on top of what @briansmith already mentioned).

@rustonaut

This comment has been minimized.

Show comment
Hide comment
@rustonaut

rustonaut Dec 9, 2016

I think the main stopper for initializing [T; N] with something else than Copy is (not) having numbers as generic arguments, else you could enable <[Foo; 10]>::from(|| Foo::new()) and <[Foo; 10]>::try_from(&mut iter) for iterators (where the type of err represents a partial initialized slice containing informations about how many elements where initialized and a unsafe method to retrieve the partial initialized array, continue initialization etc.).

But implementing this for alloca/reserve would not work, as it is not really a method, so you would have to implement it with a multiple macro's (or a "multi branch" macro). E.g.: alloca![0u32; 10], alloca_from_fn![|| Foo::new(); 10], alloca_from_iter![ iter; 10].

I'm not sure but I think there was some work to properly name space macros, so it would be possible to have something like stack::reserve::by_value![v; n], stack::reserve::from_fn![...] etc.

Note that the above mentioned try_from/from implementations would have to use catch_unwind to prevent a partial initialized slice from being dropped fully, I'm not sure what the overhead of this would be after optimizations.

Edit: there is already a additional way to initialize slices: <[Foo; 10]>::default() if Foo implements default, hope it handles panics in default correctly ;=) <-- it does, tried it out

Edit2: is there a reason, why Default is macro based implemented for arrays up to range 32 but e.g. from,try_from isn't?

rustonaut commented Dec 9, 2016

I think the main stopper for initializing [T; N] with something else than Copy is (not) having numbers as generic arguments, else you could enable <[Foo; 10]>::from(|| Foo::new()) and <[Foo; 10]>::try_from(&mut iter) for iterators (where the type of err represents a partial initialized slice containing informations about how many elements where initialized and a unsafe method to retrieve the partial initialized array, continue initialization etc.).

But implementing this for alloca/reserve would not work, as it is not really a method, so you would have to implement it with a multiple macro's (or a "multi branch" macro). E.g.: alloca![0u32; 10], alloca_from_fn![|| Foo::new(); 10], alloca_from_iter![ iter; 10].

I'm not sure but I think there was some work to properly name space macros, so it would be possible to have something like stack::reserve::by_value![v; n], stack::reserve::from_fn![...] etc.

Note that the above mentioned try_from/from implementations would have to use catch_unwind to prevent a partial initialized slice from being dropped fully, I'm not sure what the overhead of this would be after optimizations.

Edit: there is already a additional way to initialize slices: <[Foo; 10]>::default() if Foo implements default, hope it handles panics in default correctly ;=) <-- it does, tried it out

Edit2: is there a reason, why Default is macro based implemented for arrays up to range 32 but e.g. from,try_from isn't?

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Dec 9, 2016

Is there any need to initializing an alloca this way? About the only reason I can imagine is because you want an immutable array of either variable or fixed length. In that case, you can always initialize it mutably and then borrow it immutably under the desired name. I suppose the syntax would be roughly

let mut arr_mut = [Default::default(); n];
for i in iter { arr_mut[i] = ...; }
let arr : &[u8] = &arr_mut;

or

let arr_mut : &mut [T] = alloca![Default::default(); n];
for i in iter { arr_mut[i] = ...; }
let arr : &[u8] = arr_mut;

burdges commented Dec 9, 2016

Is there any need to initializing an alloca this way? About the only reason I can imagine is because you want an immutable array of either variable or fixed length. In that case, you can always initialize it mutably and then borrow it immutably under the desired name. I suppose the syntax would be roughly

let mut arr_mut = [Default::default(); n];
for i in iter { arr_mut[i] = ...; }
let arr : &[u8] = &arr_mut;

or

let arr_mut : &mut [T] = alloca![Default::default(); n];
for i in iter { arr_mut[i] = ...; }
let arr : &[u8] = arr_mut;
@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Dec 9, 2016

Contributor

@briansmith @comex @dathinab I agree that we may have multiple motivations to use this prospective feature, each with its own cost-benefit tradeoffs.

I think we have a different situation to initializing a Vec, because external code has no way of accessing the slice before we're done with it, so we actually have three options of dealing with it:

  1. We can simply mem::forget() what we have so far and propagate the panic to the caller. This should be a noop, also the program is broken at that point anyway, no need for clean up.
    1.1 We can also drop() what we have so far and then propagate the panic. However, this would swallow the original panic in case the drop() implementation panics, or we'd have to swallow panics from the drop. Both options are suboptimal.
  2. Return a Result<&mut [T], Box<Any + Send + 'static>> (which is pretty much #1 + a plain catch_unwind + try!/?)
  3. Return a (&mut [T], Result<(), Box<Any + Send + 'static>) containing the slice filled so far and the optional error. This would have the benefit of allowing the caller to use what has been initialized so far while still providing error information. This could be implemented with the uninitialized_stack_slice!(ty, num) macro, catch_unwind(..) and a bit of machinery. Note that this version is strictly more useful than #2.

I'm personally in favor of option 1, mostly because I don't want to complicate the implementation (and the return type), also catching panics is an antipattern anyway. With an iterator we can still restrict the returned slice to the actually initialized memory.

Regarding initialization, the following benchmark:

#![feature(test)]
extern crate test;

use test::Bencher;
use std::mem;

#[bench]
fn once(b: &mut Bencher) {
    b.iter(|| {
        let x : &[usize; 4096] = unsafe { mem::uninitialized() };
        x
    });
}

#[bench]
fn each(b: &mut Bencher) {
    b.iter(|| {
        let mut x : [usize; 4096] = unsafe { mem::uninitialized() };
        for mut e in (&mut x[..]).iter_mut() {
            *e = unsafe { mem::uninitialized() };
        }
        x
    });
}

shows 0ns for both once and each when compiled with optimizations (though there may still be a flaw in the benchmark). So writing an iterator that simply returns mem::uninitialized() is probably just as fast and would allow us to do the right thing with only one macro. Even so, I'd personally be more happy with one unsafe-fast and one safe-still-fast version of the macro.

@burdges reading uninitialized memory provokes undefined behavior.

Contributor

llogiq commented Dec 9, 2016

@briansmith @comex @dathinab I agree that we may have multiple motivations to use this prospective feature, each with its own cost-benefit tradeoffs.

I think we have a different situation to initializing a Vec, because external code has no way of accessing the slice before we're done with it, so we actually have three options of dealing with it:

  1. We can simply mem::forget() what we have so far and propagate the panic to the caller. This should be a noop, also the program is broken at that point anyway, no need for clean up.
    1.1 We can also drop() what we have so far and then propagate the panic. However, this would swallow the original panic in case the drop() implementation panics, or we'd have to swallow panics from the drop. Both options are suboptimal.
  2. Return a Result<&mut [T], Box<Any + Send + 'static>> (which is pretty much #1 + a plain catch_unwind + try!/?)
  3. Return a (&mut [T], Result<(), Box<Any + Send + 'static>) containing the slice filled so far and the optional error. This would have the benefit of allowing the caller to use what has been initialized so far while still providing error information. This could be implemented with the uninitialized_stack_slice!(ty, num) macro, catch_unwind(..) and a bit of machinery. Note that this version is strictly more useful than #2.

I'm personally in favor of option 1, mostly because I don't want to complicate the implementation (and the return type), also catching panics is an antipattern anyway. With an iterator we can still restrict the returned slice to the actually initialized memory.

Regarding initialization, the following benchmark:

#![feature(test)]
extern crate test;

use test::Bencher;
use std::mem;

#[bench]
fn once(b: &mut Bencher) {
    b.iter(|| {
        let x : &[usize; 4096] = unsafe { mem::uninitialized() };
        x
    });
}

#[bench]
fn each(b: &mut Bencher) {
    b.iter(|| {
        let mut x : [usize; 4096] = unsafe { mem::uninitialized() };
        for mut e in (&mut x[..]).iter_mut() {
            *e = unsafe { mem::uninitialized() };
        }
        x
    });
}

shows 0ns for both once and each when compiled with optimizations (though there may still be a flaw in the benchmark). So writing an iterator that simply returns mem::uninitialized() is probably just as fast and would allow us to do the right thing with only one macro. Even so, I'd personally be more happy with one unsafe-fast and one safe-still-fast version of the macro.

@burdges reading uninitialized memory provokes undefined behavior.

@djzin

This comment has been minimized.

Show comment
Hide comment
@djzin

djzin Dec 10, 2016

Threw this function together.... There is a good question as to whether rustc will compile it to anything as efficient as simply subtracting from the stack pointer (I suspect no) but it seems to solve some of the complexity issues.

macro_rules! match_arms {
    ($n: ident, $f: ident, $t: ty; $($ns : expr),*) => {
        match $n {
            $($ns => (|| $f(unsafe {&mut ::std::mem::uninitialized::<[$t; $ns]>()}))(),)*
            _ => panic!("n was too large in alloca"),
        }
    }
}

pub fn alloca<T, F, R>(n: usize, f: F) -> R
where F: FnOnce(&mut [T]) -> R
{
    match_arms!(n, f, T; 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
                17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
                33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
                49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64)
}

You can obviously go up to a way higher number, I have it working up to 1024. There is also a possiblility of using it with a T: Default bound (and avoid the unsafe), but unfortunately you can only go up to 32 in this case because of rust limitations.

You would use it like so:

#[test]
fn allocate_and_sum() {
    for i in 1..64 {
        let x: usize = alloca(i, |arr| {
            for (j, elem) in arr.iter_mut().enumerate() {
                *elem = j;
            }
            arr.iter().cloned().sum()
        });
        assert_eq!(x, i * (i - 1) / 2);
    }
}

djzin commented Dec 10, 2016

Threw this function together.... There is a good question as to whether rustc will compile it to anything as efficient as simply subtracting from the stack pointer (I suspect no) but it seems to solve some of the complexity issues.

macro_rules! match_arms {
    ($n: ident, $f: ident, $t: ty; $($ns : expr),*) => {
        match $n {
            $($ns => (|| $f(unsafe {&mut ::std::mem::uninitialized::<[$t; $ns]>()}))(),)*
            _ => panic!("n was too large in alloca"),
        }
    }
}

pub fn alloca<T, F, R>(n: usize, f: F) -> R
where F: FnOnce(&mut [T]) -> R
{
    match_arms!(n, f, T; 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
                17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
                33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
                49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64)
}

You can obviously go up to a way higher number, I have it working up to 1024. There is also a possiblility of using it with a T: Default bound (and avoid the unsafe), but unfortunately you can only go up to 32 in this case because of rust limitations.

You would use it like so:

#[test]
fn allocate_and_sum() {
    for i in 1..64 {
        let x: usize = alloca(i, |arr| {
            for (j, elem) in arr.iter_mut().enumerate() {
                *elem = j;
            }
            arr.iter().cloned().sum()
        });
        assert_eq!(x, i * (i - 1) / 2);
    }
}
@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Dec 10, 2016

Member

@djzin Sadly, it doesn't optimize (select Release and then either LLVM IR or ASM).
In fact, it uses the largest array size as the static stack size, so there's no gain over ArrayVec.

Member

eddyb commented Dec 10, 2016

@djzin Sadly, it doesn't optimize (select Release and then either LLVM IR or ASM).
In fact, it uses the largest array size as the static stack size, so there's no gain over ArrayVec.

@djzin

This comment has been minimized.

Show comment
Hide comment
@djzin

djzin Dec 10, 2016

@eddyb

Shame that it doesn't optimize - at least this pushes the problem to one of optimization rather than adding a new language construct.

In fact, it uses the largest array size as the static stack size, so there's no gain over ArrayVec.

Is this correct? In my testing it was not doing this; that's why I added the closure invocation (|| f(...))() - I do get stack overflow sometimes, but only when actually passing in a large number into alloca - for example:

Works:

#[test]
fn allocate_and_sum_large() {
    for i in 1..8 {
        let x = alloca(i, |arr: &mut [[usize; 65536]]| {});
    }
}

SIGSEGV:

#[test]
fn allocate_and_sum_huge() {
    for i in 1..64 {
        let x = alloca(i, |arr: &mut [[usize; 65536]]| {});
    }
}

Edit: you can see this clearly in this link; it segfaults after getting to 16 (the playground does not report the error, but it does stop printing)

https://play.rust-lang.org/?gist=6a3dc39438c9d42524a85724dca63450&version=stable&backtrace=1

djzin commented Dec 10, 2016

@eddyb

Shame that it doesn't optimize - at least this pushes the problem to one of optimization rather than adding a new language construct.

In fact, it uses the largest array size as the static stack size, so there's no gain over ArrayVec.

Is this correct? In my testing it was not doing this; that's why I added the closure invocation (|| f(...))() - I do get stack overflow sometimes, but only when actually passing in a large number into alloca - for example:

Works:

#[test]
fn allocate_and_sum_large() {
    for i in 1..8 {
        let x = alloca(i, |arr: &mut [[usize; 65536]]| {});
    }
}

SIGSEGV:

#[test]
fn allocate_and_sum_huge() {
    for i in 1..64 {
        let x = alloca(i, |arr: &mut [[usize; 65536]]| {});
    }
}

Edit: you can see this clearly in this link; it segfaults after getting to 16 (the playground does not report the error, but it does stop printing)

https://play.rust-lang.org/?gist=6a3dc39438c9d42524a85724dca63450&version=stable&backtrace=1

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Dec 10, 2016

Shame that it doesn't optimize - at least this pushes the problem to one of optimization rather than adding a new language construct.

It doesn't as this is less powerful than a proper alloca; the lifetimes are restricted to just the closure body whereas in reality they are the same as the entire calling function's frame.

Shame that it doesn't optimize - at least this pushes the problem to one of optimization rather than adding a new language construct.

It doesn't as this is less powerful than a proper alloca; the lifetimes are restricted to just the closure body whereas in reality they are the same as the entire calling function's frame.

@djzin

This comment has been minimized.

Show comment
Hide comment
@djzin

djzin Dec 10, 2016

@whitequark surely any function defined as such:

fn use_alloca(args: Args) -> Output {
    let x = alloca(n);
    function_body
}

can be rewritten like so

fn use_alloca_closure(args: Args) -> Output {
   alloca(n, |x| function_body)
}

or am I missing something?

djzin commented Dec 10, 2016

@whitequark surely any function defined as such:

fn use_alloca(args: Args) -> Output {
    let x = alloca(n);
    function_body
}

can be rewritten like so

fn use_alloca_closure(args: Args) -> Output {
   alloca(n, |x| function_body)
}

or am I missing something?

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Feb 16, 2017

It seems bad to close this without an RFC to replace it and without any clear description of what's wrong with moving ahead with it as an incremental step towards more general support for dynamically-sized types.

Also, is the RFC actually proposing [T] as unsized types? I think they're just dynamically sized. Maybe now in Rust there's no difference between unsized and dynamically-sized types, but I think maybe there should be: dynamically-sized types are types with values whose sizes aren't known statically at compile-time and which aren't equal (different values of a given dynamically-sized type may have different sizes), where unsized types would be types where there is no way—statically or at runtime—to learn the size, e.g. c_void and any other extern types.

In particular, I think that dynamically-sized types could and should be Clone and even Copy, but unsized types never could be.

It seems bad to close this without an RFC to replace it and without any clear description of what's wrong with moving ahead with it as an incremental step towards more general support for dynamically-sized types.

Also, is the RFC actually proposing [T] as unsized types? I think they're just dynamically sized. Maybe now in Rust there's no difference between unsized and dynamically-sized types, but I think maybe there should be: dynamically-sized types are types with values whose sizes aren't known statically at compile-time and which aren't equal (different values of a given dynamically-sized type may have different sizes), where unsized types would be types where there is no way—statically or at runtime—to learn the size, e.g. c_void and any other extern types.

In particular, I think that dynamically-sized types could and should be Clone and even Copy, but unsized types never could be.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Feb 16, 2017

Member

@briansmith A long time ago Rust started using "unsized type" to mean "a type which has values obtained from values sized types, after applying unsizing" where "unsizing" is "to make the information required to compute alignment and size, dynamic".

I agree it's not the best naming scheme, but it's the one all uses of "unsized" (AFAIK) makes sense with (and it happens to also make "truly unsized" an oxymoron).
We could change to "dynamically sized" but we don't have a good verb to go with it. "reify"?

Member

eddyb commented Feb 16, 2017

@briansmith A long time ago Rust started using "unsized type" to mean "a type which has values obtained from values sized types, after applying unsizing" where "unsizing" is "to make the information required to compute alignment and size, dynamic".

I agree it's not the best naming scheme, but it's the one all uses of "unsized" (AFAIK) makes sense with (and it happens to also make "truly unsized" an oxymoron).
We could change to "dynamically sized" but we don't have a good verb to go with it. "reify"?

@eternaleye

This comment has been minimized.

Show comment
Hide comment
@eternaleye

eternaleye Feb 16, 2017

@briansmith: To expand on @eddyb's answer, the term unsized mainly arose because the people naming the feature were, well, operating from the POV of the compiler. And for the compiler, any information that's only available at runtime might as well not exist at all. IIRC, @nikomatsakis remarked on this in one of his posts.

Admittedly, this is suboptimal in some respects, much like how the GNU cross-compilation terminology is misleading when applied to anything but compilers.

It may, however, be early enough yet to address this. In particular, RFC 982 - which would enshrine these terms in the standard library - is not yet stable. It specifies the traits Unsize<T: ?Sized>, whose implementors can unsize to a T (but only the compiler can implement it), and CoerceUnsized<Target>, for smart pointers which coerce "around" a DST.

DST-ifying a type honestly has a very close relationship to type erasure, (erasing the Self parameter of traits, and the length parameter of arrays) and there have been concerns that Unsize<T> "reads the wrong way around" in the past anyway. Thus, let me suggest:

  • EraseTo<T: ?Sized>
  • CoerceErased<Target>

?Sized is already out of the bag, but it may be possible to do better by the word "Unsized".

The thing with "erased types" is that, well, Rust is what erased them - it also puts in enough info to reify them at runtime, specifically in the extra part of the fat pointer. The opaque/extern types of RFC 1861, meanwhile, came to Rust "pre-erased" - thus, it doesn't know how to reify them so as to operate on them, and doesn't need to (more accurately, can't) fatten the pointer with that information.

One neat consequence of this framing is that it does give a rather natural way of going about a TryReifyInto<T> trait, which slices and trait objects could implement - especially once DSTs in argument position are permitted.

It also leaves open for the future the possibility of a trait that given some T: Sized, exposes a *mut T -> (*mut U, usize) function where U is an extern type, and the usize is its reification information. (That or just permit implementing EraseTo on user-defined Sized types.)

eternaleye commented Feb 16, 2017

@briansmith: To expand on @eddyb's answer, the term unsized mainly arose because the people naming the feature were, well, operating from the POV of the compiler. And for the compiler, any information that's only available at runtime might as well not exist at all. IIRC, @nikomatsakis remarked on this in one of his posts.

Admittedly, this is suboptimal in some respects, much like how the GNU cross-compilation terminology is misleading when applied to anything but compilers.

It may, however, be early enough yet to address this. In particular, RFC 982 - which would enshrine these terms in the standard library - is not yet stable. It specifies the traits Unsize<T: ?Sized>, whose implementors can unsize to a T (but only the compiler can implement it), and CoerceUnsized<Target>, for smart pointers which coerce "around" a DST.

DST-ifying a type honestly has a very close relationship to type erasure, (erasing the Self parameter of traits, and the length parameter of arrays) and there have been concerns that Unsize<T> "reads the wrong way around" in the past anyway. Thus, let me suggest:

  • EraseTo<T: ?Sized>
  • CoerceErased<Target>

?Sized is already out of the bag, but it may be possible to do better by the word "Unsized".

The thing with "erased types" is that, well, Rust is what erased them - it also puts in enough info to reify them at runtime, specifically in the extra part of the fat pointer. The opaque/extern types of RFC 1861, meanwhile, came to Rust "pre-erased" - thus, it doesn't know how to reify them so as to operate on them, and doesn't need to (more accurately, can't) fatten the pointer with that information.

One neat consequence of this framing is that it does give a rather natural way of going about a TryReifyInto<T> trait, which slices and trait objects could implement - especially once DSTs in argument position are permitted.

It also leaves open for the future the possibility of a trait that given some T: Sized, exposes a *mut T -> (*mut U, usize) function where U is an extern type, and the usize is its reification information. (That or just permit implementing EraseTo on user-defined Sized types.)

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Feb 16, 2017

Member

FWIW I prefer to generalize CoerceUnsized and thus have a single Coerce trait with a bunch (some builtin?) of impls. As for Unsize, we could invert the relationship and have U: FromSized<T>, perhaps.

Member

eddyb commented Feb 16, 2017

FWIW I prefer to generalize CoerceUnsized and thus have a single Coerce trait with a bunch (some builtin?) of impls. As for Unsize, we could invert the relationship and have U: FromSized<T>, perhaps.

@joshtriplett

This comment has been minimized.

Show comment
Hide comment
@joshtriplett

joshtriplett Feb 23, 2017

Member

Nominating for @rust-lang/lang discussion.

Member

joshtriplett commented Feb 23, 2017

Nominating for @rust-lang/lang discussion.

@aturon

This comment has been minimized.

Show comment
Hide comment
@aturon

aturon Feb 23, 2017

Member

Note: there's now an RFC to subsume this one.

Member

aturon commented Feb 23, 2017

Note: there's now an RFC to subsume this one.

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Feb 23, 2017

@aturon Worth mentioning that these RFCs are not equivalent. Unsized rvalues aren't equivalent to alloca and aren't useful for the embedded use cases. That said I imagine #1909, in some form, will be a prerequisite for an eventual implementation of alloca.

whitequark commented Feb 23, 2017

@aturon Worth mentioning that these RFCs are not equivalent. Unsized rvalues aren't equivalent to alloca and aren't useful for the embedded use cases. That said I imagine #1909, in some form, will be a prerequisite for an eventual implementation of alloca.

@joshtriplett

This comment has been minimized.

Show comment
Hide comment
@joshtriplett

joshtriplett Feb 23, 2017

Member

@whitequark Can you elaborate on cases that this would solve but VLAs wouldn't?

Member

joshtriplett commented Feb 23, 2017

@whitequark Can you elaborate on cases that this would solve but VLAs wouldn't?

@whitequark

This comment has been minimized.

Show comment
Hide comment
@whitequark

whitequark Feb 23, 2017

@joshtriplett Allocating in a loop.

@joshtriplett Allocating in a loop.

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Feb 23, 2017

unnsized rvalues aren't equivalent to alloca and aren't useful for the embedded use cases

This is overstating things a bit. There's one particular thing that you want to do, which that proposal (nor the current form of this one, IIUC), doesn't support. It's not clear that that particular thing really needs to be supported, and that thing doesn't comprise all or even most “embedded” use cases.

However, I think there is a more serious problem with #1909, and maybe also this RFC: The proposed syntax for VLAs hint that they can be used intuitively like arrays [T; n] and vectors Vec<T>, but the restrictions on their usage are quite severe and counterintuitive. This RFC (#1808), reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct { x: [T] }) and giving the variables slice types, e.g. x: &[i32] = &[0i32; dynamic n]; makes more sense to me than #1909 does.

unnsized rvalues aren't equivalent to alloca and aren't useful for the embedded use cases

This is overstating things a bit. There's one particular thing that you want to do, which that proposal (nor the current form of this one, IIUC), doesn't support. It's not clear that that particular thing really needs to be supported, and that thing doesn't comprise all or even most “embedded” use cases.

However, I think there is a more serious problem with #1909, and maybe also this RFC: The proposed syntax for VLAs hint that they can be used intuitively like arrays [T; n] and vectors Vec<T>, but the restrictions on their usage are quite severe and counterintuitive. This RFC (#1808), reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct { x: [T] }) and giving the variables slice types, e.g. x: &[i32] = &[0i32; dynamic n]; makes more sense to me than #1909 does.

@plietar

This comment has been minimized.

Show comment
Hide comment
@plietar

plietar Feb 23, 2017

reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct { x: [T] })

This is already supported AFAIK. The last field of a struct can be a DST, and is itself thus a DST.

plietar commented Feb 23, 2017

reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct { x: [T] })

This is already supported AFAIK. The last field of a struct can be a DST, and is itself thus a DST.

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Feb 23, 2017

reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct Foo { x: [T] })

This is already supported AFAIK. The last field of a struct can be a DST, and is itself thus a DST.

I understand that. But, adding support for stack-allocating values of such structs is, IMO, counterproductive, because of the severe limitations mentioned in #1909: They can't be passed by value (copied/moved) nor returned from functions. Given those limitations, is stack allocation of these types widely useful? Those limitations are dealbreakers for my code. Maybe other people have other use cases for which they'd work, in which case it would be useful to see them.

briansmith commented Feb 23, 2017

reduced to supporting only VLAs as local variables (i.e. not embedded in structs like struct Foo { x: [T] })

This is already supported AFAIK. The last field of a struct can be a DST, and is itself thus a DST.

I understand that. But, adding support for stack-allocating values of such structs is, IMO, counterproductive, because of the severe limitations mentioned in #1909: They can't be passed by value (copied/moved) nor returned from functions. Given those limitations, is stack allocation of these types widely useful? Those limitations are dealbreakers for my code. Maybe other people have other use cases for which they'd work, in which case it would be useful to see them.

@joshtriplett

This comment has been minimized.

Show comment
Hide comment
@joshtriplett

joshtriplett Feb 23, 2017

Member

@whitequark Hmmm. You could certainly allocate a VLA in a loop. But as @briansmith notes, you can't copy/move them, which means you can't accumulate them in a loop.

Fair point. That seems worth considering in the discussions between this issue and #1909.

Member

joshtriplett commented Feb 23, 2017

@whitequark Hmmm. You could certainly allocate a VLA in a loop. But as @briansmith notes, you can't copy/move them, which means you can't accumulate them in a loop.

Fair point. That seems worth considering in the discussions between this issue and #1909.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Feb 24, 2017

Contributor

@whitequark

Accumulating alloca (aka 'fn locals) is quite separate in its problems, uses and implications from VLAs (aka this RFC and #1909). Please move that discussion to #1847. If you come up with a sufficiently organized proposal, we might adopt it (or I might write up such a proposal myself when I have the time).

Contributor

arielb1 commented Feb 24, 2017

@whitequark

Accumulating alloca (aka 'fn locals) is quite separate in its problems, uses and implications from VLAs (aka this RFC and #1909). Please move that discussion to #1847. If you come up with a sufficiently organized proposal, we might adopt it (or I might write up such a proposal myself when I have the time).

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Feb 24, 2017

Contributor

@briansmith

With #1909, you can definitely move rvalues (e.g., you should be able to call Box::new on a local). If we adopt the "unsized ADT expressions" option, you could also put unsized values in structs. If we adopt the Copy: ?Sized option, you could also copy rvalues.

Contributor

arielb1 commented Feb 24, 2017

@briansmith

With #1909, you can definitely move rvalues (e.g., you should be able to call Box::new on a local). If we adopt the "unsized ADT expressions" option, you could also put unsized values in structs. If we adopt the Copy: ?Sized option, you could also copy rvalues.

@briansmith

This comment has been minimized.

Show comment
Hide comment
@briansmith

briansmith Feb 24, 2017

With #1909, you can definitely move rvalues (e.g., you should be able to call Box::new on a local). If we adopt the "unsized ADT expressions" option, you could also put unsized values in structs. If we adopt the Copy: ?Sized option, you could also copy rvalues.

That sounds better than I understood from the discussion. Could we return them from functions? In particular, given #[derive(Clone, Copy)] struct Elem { value: [usize] }, can we follow idiomatic patterns like this?:

`#[derive(Clone, Copy)]
struct Elem {
    value: [usize]
}

impl Elem {
    pub fn new(...) -> Self { ... }
}

pub fn elem_product(x: &Elem, y: &Elem, m: &Modulus) -> Elem { ... }

pub fn elem_pow(base: Elem, e: i32, m: &Modulus) -> Elem {
    let mut table = [Elem { value: [0; dynamic base.value.len()] }; 5];
    ...
}

If VLAs look like arrays then people will expect all of that kind of stuff to work, especially since it works fine for arrays. If that stuff doesn't work then a syntax and semantics that looks and feels more like "alloca(), but safe and Rustic" makes more sense, IMO—especially if the reason for deferring returning VLAs from functions or having arrays of VLA-containing structures are implementation concerns (difficulty modifying LLVM to support them).

briansmith commented Feb 24, 2017

With #1909, you can definitely move rvalues (e.g., you should be able to call Box::new on a local). If we adopt the "unsized ADT expressions" option, you could also put unsized values in structs. If we adopt the Copy: ?Sized option, you could also copy rvalues.

That sounds better than I understood from the discussion. Could we return them from functions? In particular, given #[derive(Clone, Copy)] struct Elem { value: [usize] }, can we follow idiomatic patterns like this?:

`#[derive(Clone, Copy)]
struct Elem {
    value: [usize]
}

impl Elem {
    pub fn new(...) -> Self { ... }
}

pub fn elem_product(x: &Elem, y: &Elem, m: &Modulus) -> Elem { ... }

pub fn elem_pow(base: Elem, e: i32, m: &Modulus) -> Elem {
    let mut table = [Elem { value: [0; dynamic base.value.len()] }; 5];
    ...
}

If VLAs look like arrays then people will expect all of that kind of stuff to work, especially since it works fine for arrays. If that stuff doesn't work then a syntax and semantics that looks and feels more like "alloca(), but safe and Rustic" makes more sense, IMO—especially if the reason for deferring returning VLAs from functions or having arrays of VLA-containing structures are implementation concerns (difficulty modifying LLVM to support them).

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Feb 24, 2017

Contributor

There's also a possible way of returning them from functions, but the ABI implications are... quite interesting in a possibly non-portable way (either the called function must allocate stack on the caller's stack, or the caller needs to access memory after the called function returned). #[derive] is probably not going to work because of that, but if your types are Copy you can implement them yourself.

Contributor

arielb1 commented Feb 24, 2017

There's also a possible way of returning them from functions, but the ABI implications are... quite interesting in a possibly non-portable way (either the called function must allocate stack on the caller's stack, or the caller needs to access memory after the called function returned). #[derive] is probably not going to work because of that, but if your types are Copy you can implement them yourself.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot commented Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot commented Feb 28, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@leonardo-m

This comment has been minimized.

Show comment
Hide comment
@leonardo-m

leonardo-m Mar 8, 2017

It's nice to see Rust take more care of stack allocation. A VLA is usable as brick to build more complex stack-allocated data structures.

It's nice to see Rust take more care of stack allocation. A VLA is usable as brick to build more complex stack-allocated data structures.

@rfcbot

This comment has been minimized.

Show comment
Hide comment
@rfcbot

rfcbot Mar 10, 2017

The final comment period is now complete.

rfcbot commented Mar 10, 2017

The final comment period is now complete.

@llogiq

This comment has been minimized.

Show comment
Hide comment
@llogiq

llogiq Mar 11, 2017

Contributor

It is not yet clear to me whether #1909 or this RFC is the way forward, as both seem incomplete to me.

Contributor

llogiq commented Mar 11, 2017

It is not yet clear to me whether #1909 or this RFC is the way forward, as both seem incomplete to me.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges Mar 11, 2017

Ain't clear if you can avoid some incompleteness here since it'd be nice to future-proof against some rather far out stuff.

As an extreme example, [x; n] could make sense with n being defined at runtime by using dependent types ala #1930 (comment) In essence, our runtime value n could itself be viewed as type level information, but n viewed as an immutable value must outlive n viewed as a type, and n viewed as a type must outlive any type involving it, like [T; n]. If you have fn foo<n: usize>() -> [u8; n] then n gets promoted to a type in the caller which creates the space with alloca and passes the buffer to foo along with n as an implicit argument.

We clearly want stuff like raw-ish alloca well before anything like this. And I think the syntax let x : &[T] = &[T::default(); n]; is actually forwards compatible with "promote n to a type, make a [T; n], and cast its reference to a slice". I thought this made a nice example though.

burdges commented Mar 11, 2017

Ain't clear if you can avoid some incompleteness here since it'd be nice to future-proof against some rather far out stuff.

As an extreme example, [x; n] could make sense with n being defined at runtime by using dependent types ala #1930 (comment) In essence, our runtime value n could itself be viewed as type level information, but n viewed as an immutable value must outlive n viewed as a type, and n viewed as a type must outlive any type involving it, like [T; n]. If you have fn foo<n: usize>() -> [u8; n] then n gets promoted to a type in the caller which creates the space with alloca and passes the buffer to foo along with n as an implicit argument.

We clearly want stuff like raw-ish alloca well before anything like this. And I think the syntax let x : &[T] = &[T::default(); n]; is actually forwards compatible with "promote n to a type, make a [T; n], and cast its reference to a slice". I thought this made a nice example though.

@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Mar 12, 2017

Contributor

@budges yeah that's roughly what we eventually want in principle, but there's so many subtleties in the design I'd be extremely wary about declaring something forward comparable now.

Contributor

Ericson2314 commented Mar 12, 2017

@budges yeah that's roughly what we eventually want in principle, but there's so many subtleties in the design I'd be extremely wary about declaring something forward comparable now.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Mar 15, 2017

Contributor

Per the FCP, I am going to close this RFC, in favor of unsized rvalues. Thanks @llogiq for the work you put into it. =)

Contributor

nikomatsakis commented Mar 15, 2017

Per the FCP, I am going to close this RFC, in favor of unsized rvalues. Thanks @llogiq for the work you put into it. =)

@whitequark whitequark referenced this pull request May 3, 2017

Open

support alloca #618

@bestouff bestouff referenced this pull request Aug 30, 2017

Open

alloca() #2

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges May 13, 2018

Is it worth experimenting with making dyn Trait : Sized when the compiler can statically determine all T : Trait? It could provides auto-generated sum types #2414 that address many iterator/future adapter situations and even allow returning function when only Sized types arise their bodies:

fn foo(a: bool) -> impl FnOnce() {
    if a {
        || { ... } as dyn FnOnce()
    } else {
        || { ... } as dyn FnOnce() 
    }
}

burdges commented May 13, 2018

Is it worth experimenting with making dyn Trait : Sized when the compiler can statically determine all T : Trait? It could provides auto-generated sum types #2414 that address many iterator/future adapter situations and even allow returning function when only Sized types arise their bodies:

fn foo(a: bool) -> impl FnOnce() {
    if a {
        || { ... } as dyn FnOnce()
    } else {
        || { ... } as dyn FnOnce() 
    }
}
@eternaleye

This comment has been minimized.

Show comment
Hide comment
@eternaleye

eternaleye May 13, 2018

@burdges: If a dependency then impls Trait on an unsized type, then that's a semver break. This would basically make unsized types implicitly #[fundamental], as adding an impl may now force semver breaks on dependents.

"If none do X, then Y" - negative reasoning - is a thorny thicket to dive into.

@burdges: If a dependency then impls Trait on an unsized type, then that's a semver break. This would basically make unsized types implicitly #[fundamental], as adding an impl may now force semver breaks on dependents.

"If none do X, then Y" - negative reasoning - is a thorny thicket to dive into.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges May 13, 2018

Oops, I misspoke, but you read into my mistake more sense than existed. ;) There is no way the compiler could statically determine all types T satisfying FnOnce()!

It requires way more hackery to make this make sense:

fn foo(a: bool) -> impl FnOnce() {
    let f = || { ... };
    let g = || { ... };
    trait MyFn : FnOnce() {}
    impl MyFn for type_of<f> {}
    impl MyFn for type_of<g> {}
    if a {
        f as dyn MyFn
    } else {
        g as dyn MyFn
    }
}

In this code, we know all T satiating MyFn because no other crate can impl MyFn, so the compiler really could declare dyn MyFn : Sized, but..

I can now answer my own question: We lack a convenient enough way to impl traits for anonymous types to make this an ergonomic solution for any problems under consideration. If the calling convention ever allows returning VLAs then a much better feature falls out naturally without the hackery. If that calling convention incurs any runtime costs, then the compiler could compile appropriate cases to sized returns under the hood.

burdges commented May 13, 2018

Oops, I misspoke, but you read into my mistake more sense than existed. ;) There is no way the compiler could statically determine all types T satisfying FnOnce()!

It requires way more hackery to make this make sense:

fn foo(a: bool) -> impl FnOnce() {
    let f = || { ... };
    let g = || { ... };
    trait MyFn : FnOnce() {}
    impl MyFn for type_of<f> {}
    impl MyFn for type_of<g> {}
    if a {
        f as dyn MyFn
    } else {
        g as dyn MyFn
    }
}

In this code, we know all T satiating MyFn because no other crate can impl MyFn, so the compiler really could declare dyn MyFn : Sized, but..

I can now answer my own question: We lack a convenient enough way to impl traits for anonymous types to make this an ergonomic solution for any problems under consideration. If the calling convention ever allows returning VLAs then a much better feature falls out naturally without the hackery. If that calling convention incurs any runtime costs, then the compiler could compile appropriate cases to sized returns under the hood.

@eternaleye

This comment has been minimized.

Show comment
Hide comment
@eternaleye

eternaleye May 13, 2018

@burdges: It's more that I then applied the same reasoning to dyn Trait in argument position, which I failed to adequately state. That issue applies even to your expansion. Sizedness is a matter of API, and implementing a Foo trait on struct Bar { a: [u8] } would thus be a "take-back" of all impls of Foo being sized. If Foo and Bar are in different crates, this can get very messy.

eternaleye commented May 13, 2018

@burdges: It's more that I then applied the same reasoning to dyn Trait in argument position, which I failed to adequately state. That issue applies even to your expansion. Sizedness is a matter of API, and implementing a Foo trait on struct Bar { a: [u8] } would thus be a "take-back" of all impls of Foo being sized. If Foo and Bar are in different crates, this can get very messy.

@burdges

This comment has been minimized.

Show comment
Hide comment
@burdges

burdges May 14, 2018

A trait exported from any crate could never benefit. I read you point two ways: First, adding impls can now break things within the same crate. Second, even if Trait is not exported, we might still impl Trait for Foreign where Foreign comes from another crate, and hence Foreign might stop being sized in future. Yes, you'd need an explicit bound, ala

trait MyFn : FnOnce()+Sized {}

burdges commented May 14, 2018

A trait exported from any crate could never benefit. I read you point two ways: First, adding impls can now break things within the same crate. Second, even if Trait is not exported, we might still impl Trait for Foreign where Foreign comes from another crate, and hence Foreign might stop being sized in future. Yes, you'd need an explicit bound, ala

trait MyFn : FnOnce()+Sized {}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment