Change array syntax to prevent ambiguity introduced by RFC 439 #520

quantheory · 2014-12-14T01:50:57Z

Update: This request is now about syntax resembling [x; N] rather than [N of x]. I am still open to changing this if needed, but we need a decision very soon. The other leading candidate is [x for N].

This is an alternative to #498.

The change is not complex; most of the length of the RFC is due to my getting carried away when listing alternatives. I'm still open to changing details (e.g. [x by N] or [x for N] instead of [N of x]), but this is my current preference, and I feel that something along these lines is preferable to fixing this via a change to the RFC 439 range syntax.

Rendered View

mahkoh · 2014-12-14T02:01:11Z

This is better than the .. syntax even if there were no ambiguity.

quantheory · 2014-12-14T02:35:44Z

Sorry for many little commits. I have a bad habit of reading through, thinking I caught all the typos, committing, and then finding one more 10 minutes later.

Where credit is due, the idea for "of" in particular was from @mdinger here.

mdinger · 2014-12-14T02:52:46Z

text/0000-new-array-repeat-syntax.md

+arguments (or for slice patterns under the `advanced_slice_patterns` feature
+gate). This RFC does not change that. In a match pattern, `..` will always be
+interpreted as a wildcard, and never as sugar for a range constructor. This
+restriction may be lifted backwards-compatibly in the future, if it becomes


This sentence is kinda awkward.

I'm not sure what I was trying to say, and any change to match patterns is supposed to be "unresolved" anyway, so I just took this sentence out.

mdinger · 2014-12-14T03:06:09Z

One nice thing is it isn't larger; it is exactly the same length as before. It does read nicely:

let a: [uint, ..2] = [0u, ..2];
let a: [2 of uint] = [2 of 0u];

petrochenkov · 2014-12-14T14:17:27Z

FWIW, [N of x] nicely aligns with Vec::from_elem(N, x)

nrc · 2014-12-14T20:10:04Z

I much prefer [T for n] and [expr for n] because it means not adding a keyword, it is a smaller change because the order doesn't change (s/, ../for would get us 90% of the way there), and because of the resemblance to Python comprehensions (I see this as an advantage, not a disadvantage).

glaebhoerl · 2014-12-14T21:19:46Z

But it doesn't make any literal sense. "An array containing five of this thing" is a meaningful expression. "An array containing this thing for five" isn't. Am I missing something?

nrc · 2014-12-14T21:39:40Z

@glaebhoerl I don't think you're missing anything. It's just that PL syntax is usually not read so literally. for is just associated with iteration, which makes more sense for the repetition syntax than the type syntax, but I think the association in most programmers heads is clear enough.

However, now that I think about types a bit more, we use for to mean 'for all' in higher ranked types, I'm not sure if that makes the use here clearer or less clear.

Kimundi · 2014-12-14T21:51:54Z

You can just spin it as "repeated for 5 times".

One thing I didn't see mentioned here explicitly is how it interacts pedagogically with unsized array coercions.

Right now, a [T, ..N] can be coerced into a [T], which can be understood "throwing away the ..N part". While the same intuition can apply to the [N of T] syntax, I feel like having the number to the right just reads better. So I'd be in favor for [T for N] just for that reason.

mdinger · 2014-12-14T23:28:44Z

Technically you're somewhat limited by English here I supposed. I buy 3 apples or 7 peaches but never apples 3 or peaches 7. Even so, we're trying to write it int for 7 which subverts the normal spoken order. Some other languages work differently I'm pretty sure.

You can read [int for 6] as an int array ~~sized~~ for 6 elements. [int by 6] might be better than [int for 6] but honestly, both read fine.

Why avoid a new keyword? Aren't words like in on at and or is by are better used as keywords than as functions? I'm just curious. I'm actually surprised these aren't already all reserved.

quantheory · 2014-12-15T00:36:22Z

As you might guess from the RFC, I think that avoiding too much semantic overloading on for is more important than avoiding a (relatively innocuous) new keyword.

"throwing away the ..N part"

Hmm, I guess that wasn't my mental metaphor at all. I think of the N as being a modifier on the bracket syntax, i.e. something you should see up front because it changes the way you interpret the whole expression. The coercion isn't throwing away the length, it just gets "sucked into" the brackets, which hide it at run time but will give it up if you use len.

That said, I still don't have a strong opinion on of/for/by/in/whatever. I would just like to see some option be broadly acceptable to most people.

I don't think that swapping the order is that much harder to script. I'm probably mixing up regex dialects here (the escapes for brackets are ugly), but something similar to s/\[([^[]+),\s*\.\.([^\]]+)]/\[\2 of \1]/ will probably do it for most cases?

Edit: Actually, this is a smarter perl-ish regex, but you'd have to run it multiple times for files with nested arrays:
s/\[(([^[\]]|\[[^[\]]*])*),\s*\.\.([^[\]]*)]/\[\3 of \1]/g
If that doesn't seem confusing enough, you could use sed:
s/\[\(\([^][]\|\[[^][]*]\)*\),\s*\.\.\([^][]*\)]/\[\3 of \1]/g

sinistersnare · 2014-12-15T08:08:16Z

+1 for [T for N]. IT only mildly changes the current syntax and does not require use of another reserved word.

nikomatsakis · 2014-12-15T21:48:15Z

One big problem with this suggestion is that, in the type position, we don't know whether we are parsing a type or an expression ([expr of type] vs [type]). This is traditionally a challenge for us: it's important to know whether we have a type or an expression at each point.

quantheory · 2014-12-15T22:52:31Z

Can you elaborate? Specifically:

Do you mean that the reversed order in particular is what would make parsing more difficult than with the current syntax?
Do you foresee a conflict between [T for N] and the use of the for keyword for HRTB (probably not an actual ambiguity, but maybe a similar difficulty in parsing)?

I ask because I want to know if this could be addressed by changing the proposal to either [T for N] or [T by N] (or [T in N], etc.).

nikomatsakis · 2014-12-16T10:55:14Z

On Mon, Dec 15, 2014 at 02:52:32PM -0800, Sean Patrick Santos wrote:

Can you elaborate? Specifically:

Do you mean that the reversed order in particular is what would
make parsing more difficult than with the current syntax?

Yes, and specifically in the type position (not expression). The
problem is that the type and expression grammars are currently pretty
disjoint, so we usually try to ensure that we know at all times what
we are looking at, but in this case, after consuming the [, we have
two possibilities and we'd need arbitrary lookahead to disambiguate
them:

[Type]
[Expr of Type]

Do you foresee a conflict between [T for N] and the use of the
for keyword for HRTB (either an actual ambiguity, or a similar
difficulty in parsing)?

No, because for appears at the beginning of a type, not as a binary
operator.

I ask because I want to know if this could be addressed by changing the proposal to either [T for N] or [T by N] (or [T in N], etc.).

Those options would address the concern I raised, yes.

That said, I am not particularly keen on any of those choices, though,
as they don't feel particularly intuitive. I guess I don't think we
would pick those keywords except that we happen to have them "lying
around". I admit I don't have a better suggestion at the moment other
than using x as a contextual keyword ([0 x 1024], [T x N]),
which is something we've traditionally shied away from, though I'm not
actually sure there is a particularly good reason for that.

quantheory · 2014-12-16T17:31:33Z

I guess I don't think we would pick those keywords except that we happen to have them "lying around".

Well, I think it's just genuinely hard to come up with something better. Other words that come to mind are "number", "times", "across", "repeat", "duplicate", "multiply", "extend", "expand", "over", "at", "size", "length", "dimension" and "on". Not all of these are clearly better, most are too long unless abbreviated, and most seem more likely to be troublesome as reserved keywords.

(Actually, for some reason I do like at.)

using x as a contextual keyword

I think that would be a bit weird. If it was contextual rather than reserved, that allows funny expressions like [5 * x x 5]. Perhaps more importantly, even if you avoid silly expressions like that, it's a lot easier to see what's going on if separators can be picked out with syntax highlighting, and contextual keywords complicate that.

My inclination is to replace this proposal with one for for (since it seems to be a favorite), or for [T#N] (since honestly, all this is just making me feel like there's no good keyword, so let's just use the symbol for "number").

nikomatsakis · 2014-12-16T21:59:11Z

I agree that [5 + x x 1024] reads badly, but then [(5 + x) x 1024] reads tolerably well, and it's clear that there is no optimal syntax. I could probably live with for or #, though I do have this feeling that the meaning of [0 for 1024] is just really unclear. If we're going to change this, though, we clearly have to make a call soon.

nrc · 2014-12-16T22:00:56Z

We had a bit of a discussion about this at the weekly meeting this week. The room was broadly in favour of some change and we agree there is some urgency in making this decision. No one was super keen on any particular proposal though. We did decide context-sensitive x was probably a bit too weird.

Therefore I would like to propose [T for n] and [e for n] as the syntax for fixed length arrays and repeating arrays. Anyone hate this? Have better suggestions? Comments? (re by and in as alternative keywords, but keeping the same format, they don't seem better enough to justify adding an extra keyword, unless there is some strong reason to avoid for).

alexcrichton · 2014-12-16T22:08:09Z

Sprinkling some more sigils (perhaps other options):

[22 @ 1024]   [int @ 1024]
[22 ~ 1024]   [int ~ 1024]
[22 by 1024]  [int by 1024]
[22, ...1024] [int, ...1024]

Purely just food for thought!

sfackler · 2014-12-16T22:09:26Z

[T for n] seems fine to me. IMO, [0, ..1204] isn't significantly more clear than [0 for 1024] anyway.

I'm not a huge fan of by since [10 by 12] seems more like a type declaration of a 10x12 array than a 12 element array of 10s.

ben0x539 · 2014-12-16T22:10:50Z

[22; 1024] [int; 1024]

sfackler · 2014-12-16T22:11:02Z

The [foo @ 100] syntax is potentially ambiguous if we expand the pattern syntax to allow @ bindings inside of a pattern.

tikue · 2014-12-18T00:51:09Z

I find [x; ..n] easier to parse in those examples .

mcpherrinm · 2014-12-18T00:52:23Z

I am strongly opposed to [T; ..n]. Having written a bunch of code with fixed length arrays, the four character sequence , .. is a nuisance, and I'd much rather write [T; n]. I agree there's some possibility of confusion with an expression discarding T, but I think that case is uncommon. I don't want to sacrifice the usual case for confusion with an uncommon idiom.

quantheory · 2014-12-18T01:07:54Z

@nick29581: True. It seems that [x; N] only barely has a lead over [x; ..N] in terms of positive versus negative comments, so here is another slapped-together variant. (Of course, no option has a really strong majority of positive comments; otherwise this would have been much easier...)

xgalaxy · 2014-12-18T01:08:55Z

I prefer the by syntax myself. But since that doesn't appear to be an option anymore I guess [T; n] is the best. That weird x suggestion is just.. I don't even know.

engstad · 2014-12-18T01:12:36Z

Please note that in OCaml, you write lists like this:

let a = [ 1; 2; 3 ]

For me, and anyone else used to OCaml, this would be a huge surprise.

rkjnsn · 2014-12-18T01:18:08Z

I like [T; n] for the type syntax. It's concise and it makes sense, to me. I'm not sure I like [24i; 3] as much for repetition, but it probably makes sense for them to match.

glaebhoerl · 2014-12-18T01:52:37Z

I don't think trying to "count votes" based on this discussion thread makes very much sense, because many of the options were introduced midway through, and it also wasn't advertised as being a vote. If ever there was an occasion for the core team to hold a meeting and make an arbitrary decision, this seems to be it.

sinistersnare · 2014-12-18T06:54:11Z

@engstad note that this is type and repetition syntax. The syntax you mention (when translated to rust) is still valid.

let xs: [u8, ..3] = [1, 2, 3]; // fixed sized array containing [1, 2, 3]
let ys: [u8, ..3] = [1, ..3]; // fixed sized array containing [1, 1, 1]

engstad · 2014-12-18T11:49:37Z

@sinistersnare I was arguing against the [1;2] syntax, which is too close to [1,2] in my opinion.

I think the main cause of debate here is the repetition syntax, not the type syntax. Why do we even have repetition syntax? Can't we use macros for it?

As far as type syntax goes, I also feel that we should just get rid of it. Let's just use Array<T, N>. We do not use sigils for boxes anymore, and closures have a quite minimum of syntax. Why should fixed-size arrays be any different?

pnkfelix · 2014-12-18T14:08:39Z

@engstad If by Array<T, N> you mean some syntax that is treated by the parser just like any other generic type-constructor applied to some arguments, there is a problem there that Rust currently does not support integer constants as input arguments to type constructors. (We only support regions 'a and type expressions in such contexts.) Rust might support it in the future, but I do not anticipate it for 1.0.

(You may instead mean that Array<T, N> should be a special case, treated specially by the parser even though it looks very much like any other type; however I would oppose making such a special case in the parser for this.)

quantheory · 2014-12-18T16:13:50Z

Why do we even have repetition syntax? Can't we use macros for it?

This was discussed in the other proposal, and it turns out that the macro system can't replace the repeat syntax in constant expressions, which is a very common use case.

netvl · 2014-12-18T21:21:27Z

I really do like [value x 10]/[T x 10] variant. [value for 10]/[T for 10] is nice too. [value; ..10]/[T; ..10] is not that nice, as it only resolves the ambiguity with ranges but does not really improve the syntax. Personally I don't use fixed-size arrays much, so it will be fine for me, but I can see the point of those who dislikes it.

eddyb · 2014-12-18T21:29:22Z

@netvl I've always found LLVM's [N x T] syntax quite neat, but I didn't think it would fit Rust that well.
However, reversing the order does make it better suited for Rust's grammar.
The only remaining issue is the special-casing of x - if I didn't know Rust had a policy against Unicode abuse I'd propose × instead: let xs: [T × N] = [x × N];.

dlesco · 2014-12-18T22:00:08Z

It's too bad @ may be confused with match pattern binding. A current meaning of the @ symbol is the French word 'à' (or Italian/Spanish/etc word 'a'), and that word's meaning includes 'to' or 'by'. They are descended from the Latin word 'ad', as in 'ad infinitum.' So [T @ N] would mean 'T by N', or 'T to memory location N'. It's clearer from this definition of the Latin 'ad':

ad denotes, first, the direction toward an object; then the reaching of or attaining to it; and finally, the being at or near it.

That sounds like a vector to me.

nrc · 2014-12-19T00:44:14Z

This RFC has been accepted, r=nikomatsakis.

Tracking issue

I'm meant to write a bit more about the discussion that led to acceptance here, but I believe that has mostly been played out in the comments. This is really a case of making a call and picking the least bad syntax, so there is no super strong argument against the other suggestions, just that [T; n] was preferred.

tikue · 2014-12-19T00:55:24Z

\o/

dobkeratops · 2014-12-26T17:15:38Z

I see this has been accepted. I was going to mention, the absence of ';' in types makes simplified parsing (e.g. in syntax highlighters? )easier. I think its' possible to simply/cheaply disambiguate < .. > for typeparams by scanning between a '<' and if you don't find various characters ({ } ;) its a good assumption its' a type-param.
I know that rust currently uses :: to disambiguate , but you might be closing a door or making this harder. (i'm looking to do this as an option in my pet language with rust derived syntax.. optionally get rid of the ugly :: in cases where its' trivial to disambiguate, its' one less thing stabbing ` C++ user in the eye)

[T x N] looked like the best proposal to me above, similar to LLVM syntax.its 'longer' but the space is easy to type.

I think the [ ] with ";"'s inside has other interesting potential uses better than an array type.

tikue · 2014-12-26T17:19:07Z

I seriously doubt [T; N] is so hard to parse that syntax highlighters will not be able to.

dobkeratops · 2014-12-26T17:21:16Z

its not impossible but it closes other uses of the ';' character.
in many editors syntax highlighting is just done with regex. The fact that rust is currently so friendly to simplified searches with regex is a really nice property.

Currently the absence of semicolons in types is nice, IMO. designing my own language syntax(layered on Rust), it seems really helpful to use the combinations of {} () [] and internal delimiters , ; for 'other things'.(map literals? list comp ? whatever ..). I think the syntax has better uses than this and it closes future doors.

and generally since experimenting I seem to have learned semicolons help disambiguating type-params.(as in, separating types from statements)... I think this is a 'happy accident' that kept <T> popular and copied. (rust currently compromises with :: and i think its' possible to escape that in simple cases).

When you know types dont include ';' I think its's possible to make a simplified lookahead without needing a whole GLR parser.

';' says statements, or data separator.
'[ x ]' already says array, in LLVM.. its much nicer, imo.

eddyb · 2014-12-26T20:25:24Z

You don't need to disambiguate type params with a hack, but I guess the lack of an official grammar doesn't help with writing proper syntax highlighters. Using ; that way involves arbitrary lookahead and/or backtracking, which is less efficient and probably not as widely supported as partial grammars are.
IMO, as is way more damaging because the type is not enclosed in any delimiters and there are special rules catering for the common cases.

Change all instances of the [_, ..n] array syntax to the new [_; n] syntax. See rust-lang/rfcs#520 and rust-lang/rust#19999

quantheory added 2 commits December 13, 2014 18:26

Array repetition expression proposal.

2d2d81a

Minor edits for clarity/linking.

7a3df4d

quantheory added 2 commits December 13, 2014 19:24

Typo corrections and small edits for clarity.

c402452

One more typo.

734e123

mdinger reviewed Dec 14, 2014
View reviewed changes

Edits suggested by mdinger, quote code correctly.

e58a632

nrc mentioned this pull request Dec 15, 2014

Add range notation rust-lang/rust#19794

Closed

nrc self-assigned this Dec 16, 2014

nrc merged commit d4eec1f into rust-lang:master Dec 19, 2014

blaenk mentioned this pull request Dec 26, 2014

range syntax doesn't work with non-literal endpoints in for statements rust-lang/rust#20241

Closed

jedisct1 pushed a commit to jedisct1/sodiumoxide that referenced this pull request Oct 20, 2015

Update array syntax to conform to RFC 520

bc50639

Change all instances of the [_, ..n] array syntax to the new [_; n] syntax. See rust-lang/rfcs#520 and rust-lang/rust#19999

chriskrycho mentioned this pull request Dec 31, 2016

Document all features in the reference rust-lang/rust#38643

Closed

17 tasks

chriskrycho mentioned this pull request Mar 11, 2017

Document all features rust-lang/reference#9

Closed

48 tasks

Centril added A-syntax Syntax related proposals & ideas A-array Array related proposals & ideas A-patterns Pattern matching related proposals & ideas A-expressions Term language related proposals & ideas labels Nov 23, 2018

crlf0710 mentioned this pull request Jan 8, 2020

Tracking issue for RFC 2044: dual-MIT/Apache2 licensing rust-lang/rust#43461

Open

7 tasks

Change array syntax to prevent ambiguity introduced by RFC 439 #520

Change array syntax to prevent ambiguity introduced by RFC 439 #520

Conversation

quantheory commented Dec 14, 2014

mahkoh commented Dec 14, 2014

quantheory commented Dec 14, 2014

mdinger Dec 14, 2014

Choose a reason for hiding this comment

quantheory Dec 14, 2014

Choose a reason for hiding this comment

mdinger commented Dec 14, 2014

petrochenkov commented Dec 14, 2014

nrc commented Dec 14, 2014

glaebhoerl commented Dec 14, 2014

nrc commented Dec 14, 2014

Kimundi commented Dec 14, 2014

mdinger commented Dec 14, 2014

quantheory commented Dec 15, 2014

sinistersnare commented Dec 15, 2014

nikomatsakis commented Dec 15, 2014

quantheory commented Dec 15, 2014

nikomatsakis commented Dec 16, 2014

quantheory commented Dec 16, 2014

nikomatsakis commented Dec 16, 2014

nrc commented Dec 16, 2014

alexcrichton commented Dec 16, 2014

sfackler commented Dec 16, 2014

ben0x539 commented Dec 16, 2014

sfackler commented Dec 16, 2014

tikue commented Dec 18, 2014

mcpherrinm commented Dec 18, 2014

quantheory commented Dec 18, 2014

xgalaxy commented Dec 18, 2014

engstad commented Dec 18, 2014

rkjnsn commented Dec 18, 2014

glaebhoerl commented Dec 18, 2014

sinistersnare commented Dec 18, 2014

engstad commented Dec 18, 2014

pnkfelix commented Dec 18, 2014

quantheory commented Dec 18, 2014

netvl commented Dec 18, 2014

eddyb commented Dec 18, 2014

dlesco commented Dec 18, 2014

nrc commented Dec 19, 2014

tikue commented Dec 19, 2014

dobkeratops commented Dec 26, 2014

tikue commented Dec 26, 2014

dobkeratops commented Dec 26, 2014

eddyb commented Dec 26, 2014