New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Elide array size #2545

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
@llogiq
Copy link
Contributor

llogiq commented Sep 17, 2018

This is a small-ish RFC for array size elision wherever it can be taken directly from the initializer. It is deliberately restricted like this to uphold locality and avoid errors at a distance.

Rendered

@Centril Centril added the T-lang label Sep 17, 2018

Show resolved Hide resolved text/0000-elide-array-size.md Outdated
Show resolved Hide resolved text/0000-elide-array-size.md Outdated
@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Sep 18, 2018

I'm not opposed to value inference of this kind - in fact it excites me a lot.
However, I am concerned about the ad-hoc / piecemeal nature of doing it in one place and not generally for all type-constructors with const parameters (e.g. struct Foo<const Param: Type> { .. }).
In other words, I think this RFC may not go far enough :)

@llogiq

This comment has been minimized.

Copy link
Contributor

llogiq commented Sep 18, 2018

I can see that there is interest in having more daring inference here. However, as I've pointed out, this can lead to errors from a distance, if one static contains a const down the line whose type changes. Those errors could be hard to track down. At the very least, they'd be surprising. However, we may be able to tweak the error reporting in such a way that such errors are quickly resolved, and that'd be awesome.

For now, similar to the const lifetimes RFC I wrote, let's take a smaller step and elide the length only where it is obvious from the same line of code. This will already give us very real ergonomic wins with very little cost both in readability and teachability (It just occurred to me that I should add a paragraph about error messages), and it doesn't preclude us from aiming for stronger inference later.

For example, you might write:

```rust
const CONST_CHARS: [u8; _] = b"This is really a byte array";

This comment has been minimized.

@SimonSapin

SimonSapin Sep 18, 2018

Contributor

Nit: the expression here needs to be *b"…" (with a star added) because the type of byte string literals is &'static [u8; N]. (Maybe it should have been plain [u8; N] but oh well.)

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Sep 18, 2018

@llogiq

I can see that there is interest in having more daring inference here. However, as I've pointed out, this can lead to errors from a distance, if one static contains a const down the line whose type changes. [...]
[...]
elide the length only where it is obvious from the same line of code.

So with this RFC as-is, the following would be an error, right?

fn foo() -> [u8; 9] { /* details elided */ }

fn main() {
    let bar: [u8; _] = foo();
}

If so, this feels even more ad-hoc, non-uniform, and makes me wary of the teachability aspect.

For now, similar to the const lifetimes RFC I wrote, let's take a smaller step [...]. This will already give us very real ergonomic wins [...]

I am specifically concerned with taking small steps; I wouldn't want to stabilize this, leave the language in an inconsistent state in the interim and do the difficult extensions possibly later. I want to be sure that the extensions both can and will happen.

In particular, I expect that the following should work:

struct Matrix<T, const N: usize, const M: usize> {
    data: [[T; N]; M]
}

fn main() {
    let foo: Matrix<u8, _, _> = Matrix { data: [[1, 2, 3], [4, 5, 6]] };
}
@llogiq

This comment has been minimized.

Copy link
Contributor

llogiq commented Sep 18, 2018

That's an interesting case. As for your first example, foo could be in another module or even another crate. If the function changes, you may get an error, if any, at the use site of bar, which may be at yet another different part of the code.

As such I worry, that your proposal could lead to errors that are extremely hard to track down. So unless you can show me an algorithm to produce error messages for that case that makes those errors easy to fix, I remain unconvinced.

I don't agree with your teachability argument either. If the size is trivially inferrable from the initializer, wildcards are allowed. Otherwise insert the size.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Sep 18, 2018

@llogiq

As for your first example, foo could be in another module or even another crate. If the function changes, you may get an error, if any, at the use site of bar, which may be at yet another different part of the code.

This is no different from using _ for types.
Do note that it is currently valid to write:

// crate A:
pub fn producer() -> [u8; 9] { /* details elided */ }

// crate B:
pub fn consumer(arr: [u8; 9]) { /* details elided */ }

// crate C:
fn main() {
    let bar = producer(); // No type annotations.
    consumer(bar);
}

Thus, by writing let bar: [u8; _] we have actually provided more information than what the compiler already requires of us.

As such I worry, that your proposal could lead to errors that are extremely hard to track down. So unless you can show me an algorithm to produce error messages for that case that makes those errors easy to fix, I remain unconvinced.

cc @varkor

As far as I know, most dependently typed languages can do this sort of in-function inference (or the languages would be intolerable to write in) and so it should not be novel in any way.

I don't agree with your teachability argument either. If the size is trivially inferrable from the initializer, wildcards are allowed. Otherwise insert the size.

It is the "otherwise insert the size" that is surprising, compared to how inference currently works for types coupled with the fact that you are allowed to write let bar = producer();, and leads me to believe than users will inevitably hit such errors. This forces users to learn new rules, whereas if we had a more general value inference mechanism, there would be fewer corner cases and caveats in the language. I've lain out a more general philosophy about the value of uniformity in language design here.

@durka

This comment has been minimized.

Copy link
Contributor

durka commented Sep 19, 2018

To continue with my tradition of macro implementations of RFCs (except in this case I wrote the macro two years ago)... https://crates.io/crates/counted-array

Clearly, I think this is a good idea. :)

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 19, 2018

👍 I like the idea of handling the simple case people constantly run into first. I've wanted this many times.

glandium added a commit to glandium/sccache that referenced this pull request Sep 19, 2018

Use the counted_array macro for the ARGS arrays
One of the most annoying thing from using static arrays in rust is that
there is no size inference, and you end up having to give the proper
size. Which makes updates to the arrays cumbersome.

I was reading "This Week in Rust 252", which linked to (RFC: Elide array
size)[rust-lang/rfcs#2545], set to address this
very problem, and @durka linked to his counted-arrays crate, which
already existed and essentially implements the idea behind the RFC.
@djc

This comment has been minimized.

Copy link
Contributor

djc commented Sep 19, 2018

I like this RFC a lot. Having to maintain the array length manually is boilerplate causing unnecessary work -- counting things is something computers are good at -- I know this would have samed me some edit-compile cycles.

@llogiq

This comment has been minimized.

Copy link
Contributor

llogiq commented Sep 19, 2018

@Centril I am sympathetic to your interest in uniformity and wouldn't be against inference for partial type ascription in let bindings (I note that your examples are all such bindings), but I don't know how complex the implementation would get.

I'm still defending the principle of locality for consts & statics. We want to make the code easier to write, but not harder to read and especially not harder to change later!

@earthengine

This comment has been minimized.

Copy link

earthengine commented Sep 20, 2018

I would like to have this. For the concerns @Centril have, I propose the following rule:

If a type annotation is required (for example top level const or function) and the item is not public, and if the type can be inferenced, _ can be used in the array size position

so

fn foo() -> [u8; 9] { /* details elided */ }

fn main() {
    let bar: [u8; _] = foo();
}

is not legal as bar do not require a type annotation. Instead, foo can return [u8; _] if it can be derived from the return statement. However, this is not allowed if foo is pub.

We need the rule for pub because, we want the code reader know exactly what the type is if it can be accessed from other places. If it can only be accessed locally, the reader should be able to pick up quickly.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Sep 26, 2018

I don't believe this RFC is necessary - it is subsumed by const generics, perhaps other than the surface syntax. The ability to infer constants in types is essential to const generics, particularly when used in impls (but also to keep the design and implementation uniform across liftimes, types and consts).

rust-lang/rust#53645 has already been open for just over a month now, and it already contains the needed implementation work (minus the surface syntax of _).

Nominating for @rust-lang/lang meeting, with a preference to close.

@eddyb eddyb added the I-nominated label Sep 26, 2018

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Sep 26, 2018

To be clear, my comment above is about inference wherever we accept inference today.

For the types of const / static to be determined from the initializer, this decision would be affected: #2010 (comment) - I see no mention of that in this RFC.

Note that said decision interacts with this RFC two-fold:

Array literals where you have to say the length const Foo: [u8; 3] = [...]; often you can do const Foo: &[u8] = &[...] instead here though.

Perhaps something syntactic like @petrochenkov suggested would be better, but that feels kind of scary to me. It's not something we have precedent for. I don't think we have a good idea what it's corner cases are, or how it would feel to have two distinct kinds of inference co-existing in the compiler.

That former shows why this is not as pressing of a problem, while the latter talks about pretty much the same strategy taken by this RFC.

OTOH, the original RFC stands some chance of being revived, with perhaps some restrictions around integer fallback, and ignoring "the closure problem" (#2010 (comment)).

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 27, 2018

Based on discussion in @rust-lang/lang, we agreed that (based on @eddyb's comments above) const generics addresses the implementation of this, so the RFC as written is not quite what we're looking for.

However, we do think that there's an RFC needed to explicitly allow inferring the const values in the types of const values, in specified circumstances, including array sizes.

@llogiq, would you be open to restructuring this RFC into that, and referencing the const generics work as how this will be implemented?

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Sep 27, 2018

I'm still defending the principle of locality for consts & statics. We want to make the code easier to write, but not harder to read and especially not harder to change later!

allow inferring the const values in the types of const values, in specified circumstances, including array sizes.

I think perhaps the problem you are getting at @llogiq with local reasoning is in particular with inference for items defined elsewhere. A more precise mechanism to get at that problem could be to require explicit type ascription for const values (and perhaps types) that arise from const fn function calls referring to functions defined in X and other const values defined in X. Here X could perhaps be "everywhere" or perhaps "other modules". This would also allow you to write things such as: const FOO: Bar<_> = Bar(make_foo() : Foo<3>);.

@llogiq

This comment has been minimized.

Copy link
Contributor

llogiq commented Sep 28, 2018

@joshtriplett will do. I'll just have to understand #2000 and #2010 first 😄

[drawbacks]: #drawbacks

There is a modicum of complexity to extend the parser and AST which needs to be
done for every Rust parser in existenct. However, the feature is minor, so the

This comment has been minimized.

@oli-obk

oli-obk Oct 2, 2018

Contributor

existenct -> existence

[summary]: #summary

In arrays, allow to elide the size in the type and put an underscore there
instead if it can be deduced from the initializer. For example: `static

This comment has been minimized.

@oli-obk

oli-obk Oct 2, 2018

Contributor

I think the summary should already mention that we only want this inference if the length can be trivially obtained by either counting the elements of the array initializer or taking it verbatim from a repeat expression.

needs to count the elements of an array manually to determine its size. Letting
the compiler find the size reduces the potential for error. It also allows us
to ascribe the array component type without requiring the size (which is –
perhaps surprisingly – not currently allowed).

This comment has been minimized.

@oli-obk

oli-obk Oct 2, 2018

Contributor

I have seen many situations where users just gave up and used const X: &[u8] = &[4, 5, 6]; in order to not need to specify the length.

luser added a commit to glandium/sccache that referenced this pull request Oct 9, 2018

Use the counted_array macro for the ARGS arrays
One of the most annoying thing from using static arrays in rust is that
there is no size inference, and you end up having to give the proper
size. Which makes updates to the arrays cumbersome.

I was reading "This Week in Rust 252", which linked to (RFC: Elide array
size)[rust-lang/rfcs#2545], set to address this
very problem, and @durka linked to his counted-arrays crate, which
already existed and essentially implements the idea behind the RFC.

luser added a commit to mozilla/sccache that referenced this pull request Oct 9, 2018

Use the counted_array macro for the ARGS arrays
One of the most annoying thing from using static arrays in rust is that
there is no size inference, and you end up having to give the proper
size. Which makes updates to the arrays cumbersome.

I was reading "This Week in Rust 252", which linked to (RFC: Elide array
size)[rust-lang/rfcs#2545], set to address this
very problem, and @durka linked to his counted-arrays crate, which
already existed and essentially implements the idea behind the RFC.
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 25, 2018

I personally think we should permit _ in const/static types in the same way we do for let. This has complexity for the impl but seems indisputably useful to me (e.g., you could do const EMPTY = Foo { ... }). I would have this generalize to const position when/if that gets implemented in the natural way, but I'm not sure I'm inclined to make a special case here.

(Previous efforts have run aground on the shoals of integral fallback. I personally would prefer to just permit fallback as normal but create a lint if a pub const winds up being affected by it. I think we could (with relative ease) figure this out, though it would take some refactoring of our type inferencer. Much easier though to figure out after the fact what happened than to prevent it from happening in the first place, I think.)

While we're nothing extensions, I'd also like to see generic constants (and probably statics, too).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 25, 2018

That said, I definitely think this RFC is targeting a real need.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 3, 2018

This has complexity for the impl

I think the implementation would only be a few lines, switching from:

  • type_check(X) uses type_of(X) as the expected type

to

  • type_of(X) uses type_check(X) to get the initializer type

when X is missing its type.

To also support _ somewhere in the type, the presence of _ would cause the dependency reversal, and type_check(X) would lower the expected type of X without getting it from type_of(X), instead of leaving it unrestricted (as you can when the entire type is _ or not written out at all).

@llogiq llogiq force-pushed the llogiq:elide-array-size branch from ad1e870 to f1d1d4e Nov 3, 2018

@llogiq llogiq force-pushed the llogiq:elide-array-size branch from f1d1d4e to e205d75 Nov 13, 2018

@llogiq

This comment has been minimized.

Copy link
Contributor

llogiq commented Nov 13, 2018

I've rewritten some parts of the RFC to hopefully observe the changes @eddyb and others have suggested, In short:

  • We support full inference for the length
  • However, we can lint cases where it isn't obvious (either in rustc or clippy)
@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Nov 13, 2018

The new version of this seems much clearer.

@rust-lang/lang, should we discuss this in the next meeting, or do we potentially have consensus to P-FCP this?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Dec 20, 2018

@rfcbot merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Dec 20, 2018

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Dec 20, 2018

@rfcbot concern rfc-is-vague-or-maybe-niko-is-just-lazy

The reference-level text states:

This feature is a simple extension of the parser plus AST transformation. In its proposed form, there are no corner cases, because the only thing it enables that is currently disallowed is to put a _ wildcard in place of the number of elements in array types in lets, consts and statics.

I felt a bit unclear what was really being proposed here. Are we proposing that we will type the body of the static/const and then infer the resulting _ from that?

What happens in the event of cycles?

Can this be used in more complex types, e.g., ([u8; _], [u16; _])? Or must the type be precisely an array?

Sorry if these questions are all answered in the RFC, I don't have time this second for a detailed reading.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Dec 20, 2018

To add to @nikomatsakis's point, can I write const FOO: [u8; { _ }]= expr; (note the _ as a placeholder expression)? My overall feeling is that not much has changed and that this is too focused on arrays specifically rather than inferring constrained values.

@rfcbot concern general-value-inference

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Dec 21, 2018

One thing that occurred to me that I hadn't considered before is that this sets a precedence for something we currently don't have (much of) in Rust-- expressions changing the public type signature of an item. For example, adding an element to a pub const array is a breaking change in a library, but users might not realize this since the type of the array itself is inferred. However, adding an element to a pub const &[u8] is not a (compile-time) breaking change, and pushing users to realize that asymmetry seems potentially valuable.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Dec 21, 2018

For example, adding an element to a pub const array is a breaking change in a library, but users might not realize this since the type of the array itself is inferred.

This reminds me of the impl Trait discussions. Maybe we can make [T; _] be its very own kind of impl Trait, so basically impl [T], which hides the actual size from the user, but allows it to be inferred if the type implements Deref<Target = [T]>? Well.. or just outright have [T] be valid wherever trait bounds are valid and thus get impl [T] "for free" (language wise, the compiler impls will be fun). This opens the question for whether impl [T] allows moving out of elements (e.g. via into_iter being owned iteration)

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Dec 27, 2018

@cramertj To solve that concern, we could limit the inference to max(crate, $vis) privacy. This could also work for crate type Foo = _; (i.e. we could tweak #2524).

@oli-obk

This reminds me of the impl Trait discussions. Maybe we can make [T; _] be its very own kind of impl Trait, so basically impl [T], which hides the actual size from the user, but allows it to be inferred if the type implements Deref<Target = [T]>?

At least not with the syntax impl [T] as that is mixing the bound and type categories in a confusing way.
Besides, you can write impl Deref<Target = [T]> directly...

Well.. or just outright have [T] be valid wherever trait bounds are valid and thus get impl [T] "for free" (language wise, the compiler impls will be fun).

If [T] is a bound then you get [T] with the meaning type expression meaning dyn [T] which then conflicts with [T].

Adding special syntax for this doesn't seem well motivated imo in terms of complexity.

@Centril Centril removed the I-nominated label Jan 3, 2019

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Jan 5, 2019

I have to say I'm torn here. I like the let x: [u8; _] = form (to parallel with let x: [_; 18] =). But I also feel like impl Deref<Target = [T]> is a cool way to not over-promise in statics.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Jan 5, 2019

@scottmcm

I have to say I'm torn here. I like the let x: [u8; _] = form (to parallel with let x: [_; 18] =).

But I also feel like impl Deref<Target = [T]> is a cool way to not over-promise in statics.

I think these two can be mutually compatible; i.e. permit let x: [u8; _] = expr and let x: Matrix<u8, _, _> as a means of value inference, but do a semantic check in typeck disallowing _ anywhere in the type of static or const items (particularly if they are public... but maybe allow it if the items are private / in an fn-body?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment