Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change array syntax to prevent ambiguity introduced by RFC 439 #520

Merged
merged 7 commits into from Dec 19, 2014

Conversation

Projects
None yet
@quantheory
Copy link
Contributor

commented Dec 14, 2014

Update: This request is now about syntax resembling [x; N] rather than [N of x]. I am still open to changing this if needed, but we need a decision very soon. The other leading candidate is [x for N].

This is an alternative to #498.

The change is not complex; most of the length of the RFC is due to my getting carried away when listing alternatives. I'm still open to changing details (e.g. [x by N] or [x for N] instead of [N of x]), but this is my current preference, and I feel that something along these lines is preferable to fixing this via a change to the RFC 439 range syntax.

Rendered View

@mahkoh

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2014

This is better than the .. syntax even if there were no ambiguity.

@quantheory

This comment has been minimized.

Copy link
Contributor Author

commented Dec 14, 2014

Sorry for many little commits. I have a bad habit of reading through, thinking I caught all the typos, committing, and then finding one more 10 minutes later.

Where credit is due, the idea for "of" in particular was from @mdinger here.

arguments (or for slice patterns under the `advanced_slice_patterns` feature
gate). This RFC does not change that. In a match pattern, `..` will always be
interpreted as a wildcard, and never as sugar for a range constructor. This
restriction may be lifted backwards-compatibly in the future, if it becomes

This comment has been minimized.

Copy link
@mdinger

mdinger Dec 14, 2014

Contributor

This sentence is kinda awkward.

This comment has been minimized.

Copy link
@quantheory

quantheory Dec 14, 2014

Author Contributor

I'm not sure what I was trying to say, and any change to match patterns is supposed to be "unresolved" anyway, so I just took this sentence out.

`..j`) while retaining other features of RFC 439. This is the simplest
resolution, but removes some convenience from the language. It is also
counterintuitive, because `RangeFrom` (i.e. `i..`) is retained, and because `..`
still has several different meanings in the language (ranges, repitition, and

This comment has been minimized.

Copy link
@mdinger

mdinger Dec 14, 2014

Contributor

s/repitition/repetition/

@mdinger

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2014

One nice thing is it isn't larger; it is exactly the same length as before. It does read nicely:

let a: [uint, ..2] = [0u, ..2];
let a: [2 of uint] = [2 of 0u];
@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2014

FWIW, [N of x] nicely aligns with Vec::from_elem(N, x)

@nrc

This comment has been minimized.

Copy link
Member

commented Dec 14, 2014

I much prefer [T for n] and [expr for n] because it means not adding a keyword, it is a smaller change because the order doesn't change (s/, ../for would get us 90% of the way there), and because of the resemblance to Python comprehensions (I see this as an advantage, not a disadvantage).

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2014

But it doesn't make any literal sense. "An array containing five of this thing" is a meaningful expression. "An array containing this thing for five" isn't. Am I missing something?

@nrc

This comment has been minimized.

Copy link
Member

commented Dec 14, 2014

@glaebhoerl I don't think you're missing anything. It's just that PL syntax is usually not read so literally. for is just associated with iteration, which makes more sense for the repetition syntax than the type syntax, but I think the association in most programmers heads is clear enough.

However, now that I think about types a bit more, we use for to mean 'for all' in higher ranked types, I'm not sure if that makes the use here clearer or less clear.

@Kimundi

This comment has been minimized.

Copy link
Member

commented Dec 14, 2014

You can just spin it as "repeated for 5 times".

One thing I didn't see mentioned here explicitly is how it interacts pedagogically with unsized array coercions.

Right now, a [T, ..N] can be coerced into a [T], which can be understood "throwing away the ..N part". While the same intuition can apply to the [N of T] syntax, I feel like having the number to the right just reads better. So I'd be in favor for [T for N] just for that reason.

@mdinger

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2014

Technically you're somewhat limited by English here I supposed. I buy 3 apples or 7 peaches but never apples 3 or peaches 7. Even so, we're trying to write it int for 7 which subverts the normal spoken order. Some other languages work differently I'm pretty sure.

You can read [int for 6] as an int array sized for 6 elements. [int by 6] might be better than [int for 6] but honestly, both read fine.

Why avoid a new keyword? Aren't words like in on at and or is by are better used as keywords than as functions? I'm just curious. I'm actually surprised these aren't already all reserved.

@nrc nrc referenced this pull request Dec 15, 2014

Closed

Add range notation #19794

@quantheory

This comment has been minimized.

Copy link
Contributor Author

commented Dec 15, 2014

As you might guess from the RFC, I think that avoiding too much semantic overloading on for is more important than avoiding a (relatively innocuous) new keyword.

"throwing away the ..N part"

Hmm, I guess that wasn't my mental metaphor at all. I think of the N as being a modifier on the bracket syntax, i.e. something you should see up front because it changes the way you interpret the whole expression. The coercion isn't throwing away the length, it just gets "sucked into" the brackets, which hide it at run time but will give it up if you use len.

That said, I still don't have a strong opinion on of/for/by/in/whatever. I would just like to see some option be broadly acceptable to most people.

I don't think that swapping the order is that much harder to script. I'm probably mixing up regex dialects here (the escapes for brackets are ugly), but something similar to s/\[([^[]+),\s*\.\.([^\]]+)]/\[\2 of \1]/ will probably do it for most cases?

Edit: Actually, this is a smarter perl-ish regex, but you'd have to run it multiple times for files with nested arrays:
s/\[(([^[\]]|\[[^[\]]*])*),\s*\.\.([^[\]]*)]/\[\3 of \1]/g
If that doesn't seem confusing enough, you could use sed:
s/\[\(\([^][]\|\[[^][]*]\)*\),\s*\.\.\([^][]*\)]/\[\3 of \1]/g

@sinistersnare

This comment has been minimized.

Copy link

commented Dec 15, 2014

+1 for [T for N]. IT only mildly changes the current syntax and does not require use of another reserved word.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Dec 15, 2014

One big problem with this suggestion is that, in the type position, we don't know whether we are parsing a type or an expression ([expr of type] vs [type]). This is traditionally a challenge for us: it's important to know whether we have a type or an expression at each point.

@quantheory

This comment has been minimized.

Copy link
Contributor Author

commented Dec 15, 2014

Can you elaborate? Specifically:

  1. Do you mean that the reversed order in particular is what would make parsing more difficult than with the current syntax?
  2. Do you foresee a conflict between [T for N] and the use of the for keyword for HRTB (probably not an actual ambiguity, but maybe a similar difficulty in parsing)?

I ask because I want to know if this could be addressed by changing the proposal to either [T for N] or [T by N] (or [T in N], etc.).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Dec 16, 2014

On Mon, Dec 15, 2014 at 02:52:32PM -0800, Sean Patrick Santos wrote:

Can you elaborate? Specifically:

  1. Do you mean that the reversed order in particular is what would
    make parsing more difficult than with the current syntax?

Yes, and specifically in the type position (not expression). The
problem is that the type and expression grammars are currently pretty
disjoint, so we usually try to ensure that we know at all times what
we are looking at, but in this case, after consuming the [, we have
two possibilities and we'd need arbitrary lookahead to disambiguate
them:

[Type]
[Expr of Type]
  1. Do you foresee a conflict between [T for N] and the use of the
    for keyword for HRTB (either an actual ambiguity, or a similar
    difficulty in parsing)?

No, because for appears at the beginning of a type, not as a binary
operator.

I ask because I want to know if this could be addressed by changing the proposal to either [T for N] or [T by N] (or [T in N], etc.).

Those options would address the concern I raised, yes.

That said, I am not particularly keen on any of those choices, though,
as they don't feel particularly intuitive. I guess I don't think we
would pick those keywords except that we happen to have them "lying
around". I admit I don't have a better suggestion at the moment other
than using x as a contextual keyword ([0 x 1024], [T x N]),
which is something we've traditionally shied away from, though I'm not
actually sure there is a particularly good reason for that.

@quantheory

This comment has been minimized.

Copy link
Contributor Author

commented Dec 16, 2014

I guess I don't think we would pick those keywords except that we happen to have them "lying around".

Well, I think it's just genuinely hard to come up with something better. Other words that come to mind are "number", "times", "across", "repeat", "duplicate", "multiply", "extend", "expand", "over", "at", "size", "length", "dimension" and "on". Not all of these are clearly better, most are too long unless abbreviated, and most seem more likely to be troublesome as reserved keywords.

(Actually, for some reason I do like at.)

using x as a contextual keyword

I think that would be a bit weird. If it was contextual rather than reserved, that allows funny expressions like [5 * x x 5]. Perhaps more importantly, even if you avoid silly expressions like that, it's a lot easier to see what's going on if separators can be picked out with syntax highlighting, and contextual keywords complicate that.

My inclination is to replace this proposal with one for for (since it seems to be a favorite), or for [T#N] (since honestly, all this is just making me feel like there's no good keyword, so let's just use the symbol for "number").

@nrc nrc self-assigned this Dec 16, 2014

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Dec 16, 2014

I agree that [5 + x x 1024] reads badly, but then [(5 + x) x 1024] reads tolerably well, and it's clear that there is no optimal syntax. I could probably live with for or #, though I do have this feeling that the meaning of [0 for 1024] is just really unclear. If we're going to change this, though, we clearly have to make a call soon.

@nrc

This comment has been minimized.

Copy link
Member

commented Dec 16, 2014

We had a bit of a discussion about this at the weekly meeting this week. The room was broadly in favour of some change and we agree there is some urgency in making this decision. No one was super keen on any particular proposal though. We did decide context-sensitive x was probably a bit too weird.

Therefore I would like to propose [T for n] and [e for n] as the syntax for fixed length arrays and repeating arrays. Anyone hate this? Have better suggestions? Comments? (re by and in as alternative keywords, but keeping the same format, they don't seem better enough to justify adding an extra keyword, unless there is some strong reason to avoid for).

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Dec 16, 2014

Sprinkling some more sigils (perhaps other options):

[22 @ 1024]   [int @ 1024]
[22 ~ 1024]   [int ~ 1024]
[22 by 1024]  [int by 1024]
[22, ...1024] [int, ...1024]

Purely just food for thought!

@sfackler

This comment has been minimized.

Copy link
Member

commented Dec 16, 2014

[T for n] seems fine to me. IMO, [0, ..1204] isn't significantly more clear than [0 for 1024] anyway.

I'm not a huge fan of by since [10 by 12] seems more like a type declaration of a 10x12 array than a 12 element array of 10s.

@ben0x539

This comment has been minimized.

Copy link

commented Dec 16, 2014

[22; 1024] [int; 1024]

@sfackler

This comment has been minimized.

Copy link
Member

commented Dec 16, 2014

The [foo @ 100] syntax is potentially ambiguous if we expand the pattern syntax to allow @ bindings inside of a pattern.

@comex

This comment has been minimized.

Copy link

commented Dec 18, 2014

I really really don't like [T; N] because, in addition to looking like ,, it also resembles [{T; N}] (i.e. discarding T), and looks little like an array. [T; ..N] and [T for N] are less ugly but much worse than an array syntax should be. I'm an outsider, but if it's too hard to implement the type inference thing I described, I would implement [T * n] for types and any of the ugly options for values, with the intent that the latter be deprecated in favor of just using Default someday when there are constexpr functions.

Well, at least I can say I told you so.

@tikue

This comment has been minimized.

Copy link

commented Dec 18, 2014

I find [x; ..n] easier to parse in those examples .

@mcpherrinm

This comment has been minimized.

Copy link

commented Dec 18, 2014

I am strongly opposed to [T; ..n]. Having written a bunch of code with fixed length arrays, the four character sequence , .. is a nuisance, and I'd much rather write [T; n]. I agree there's some possibility of confusion with an expression discarding T, but I think that case is uncommon. I don't want to sacrifice the usual case for confusion with an uncommon idiom.

@quantheory

This comment has been minimized.

Copy link
Contributor Author

commented Dec 18, 2014

@nick29581: True. It seems that [x; N] only barely has a lead over [x; ..N] in terms of positive versus negative comments, so here is another slapped-together variant. (Of course, no option has a really strong majority of positive comments; otherwise this would have been much easier...)

@xgalaxy

This comment has been minimized.

Copy link

commented Dec 18, 2014

I prefer the by syntax myself. But since that doesn't appear to be an option anymore I guess [T; n] is the best. That weird x suggestion is just.. I don't even know.

@engstad

This comment has been minimized.

Copy link

commented Dec 18, 2014

Please note that in OCaml, you write lists like this:

let a = [ 1; 2; 3 ]

For me, and anyone else used to OCaml, this would be a huge surprise.

@rkjnsn

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2014

I like [T; n] for the type syntax. It's concise and it makes sense, to me. I'm not sure I like [24i; 3] as much for repetition, but it probably makes sense for them to match.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

commented Dec 18, 2014

I don't think trying to "count votes" based on this discussion thread makes very much sense, because many of the options were introduced midway through, and it also wasn't advertised as being a vote. If ever there was an occasion for the core team to hold a meeting and make an arbitrary decision, this seems to be it.

@sinistersnare

This comment has been minimized.

Copy link

commented Dec 18, 2014

@engstad note that this is type and repetition syntax. The syntax you mention (when translated to rust) is still valid.

let xs: [u8, ..3] = [1, 2, 3]; // fixed sized array containing [1, 2, 3]
let ys: [u8, ..3] = [1, ..3]; // fixed sized array containing [1, 1, 1]
@engstad

This comment has been minimized.

Copy link

commented Dec 18, 2014

@sinistersnare I was arguing against the [1;2] syntax, which is too close to [1,2] in my opinion.

I think the main cause of debate here is the repetition syntax, not the type syntax. Why do we even have repetition syntax? Can't we use macros for it?

As far as type syntax goes, I also feel that we should just get rid of it. Let's just use Array<T, N>. We do not use sigils for boxes anymore, and closures have a quite minimum of syntax. Why should fixed-size arrays be any different?

@pnkfelix

This comment has been minimized.

Copy link
Member

commented Dec 18, 2014

@engstad If by Array<T, N> you mean some syntax that is treated by the parser just like any other generic type-constructor applied to some arguments, there is a problem there that Rust currently does not support integer constants as input arguments to type constructors. (We only support regions 'a and type expressions in such contexts.) Rust might support it in the future, but I do not anticipate it for 1.0.

(You may instead mean that Array<T, N> should be a special case, treated specially by the parser even though it looks very much like any other type; however I would oppose making such a special case in the parser for this.)

@quantheory

This comment has been minimized.

Copy link
Contributor Author

commented Dec 18, 2014

Why do we even have repetition syntax? Can't we use macros for it?

This was discussed in the other proposal, and it turns out that the macro system can't replace the repeat syntax in constant expressions, which is a very common use case.

@netvl

This comment has been minimized.

Copy link

commented Dec 18, 2014

I really do like [value x 10]/[T x 10] variant. [value for 10]/[T for 10] is nice too. [value; ..10]/[T; ..10] is not that nice, as it only resolves the ambiguity with ranges but does not really improve the syntax. Personally I don't use fixed-size arrays much, so it will be fine for me, but I can see the point of those who dislikes it.

@eddyb

This comment has been minimized.

Copy link
Member

commented Dec 18, 2014

@netvl I've always found LLVM's [N x T] syntax quite neat, but I didn't think it would fit Rust that well.
However, reversing the order does make it better suited for Rust's grammar.
The only remaining issue is the special-casing of x - if I didn't know Rust had a policy against Unicode abuse I'd propose × instead: let xs: [T × N] = [x × N];.

@dlesco

This comment has been minimized.

Copy link

commented Dec 18, 2014

It's too bad @ may be confused with match pattern binding. A current meaning of the @ symbol is the French word 'à' (or Italian/Spanish/etc word 'a'), and that word's meaning includes 'to' or 'by'. They are descended from the Latin word 'ad', as in 'ad infinitum.' So [T @ N] would mean 'T by N', or 'T to memory location N'. It's clearer from this definition of the Latin 'ad':

ad denotes, first, the direction toward an object; then the reaching of or attaining to it; and finally, the being at or near it.

That sounds like a vector to me.

@nrc

This comment has been minimized.

Copy link
Member

commented Dec 19, 2014

This RFC has been accepted, r=nikomatsakis.

Tracking issue

I'm meant to write a bit more about the discussion that led to acceptance here, but I believe that has mostly been played out in the comments. This is really a case of making a call and picking the least bad syntax, so there is no super strong argument against the other suggestions, just that [T; n] was preferred.

@nrc nrc merged commit d4eec1f into rust-lang:master Dec 19, 2014

@tikue

This comment has been minimized.

Copy link

commented Dec 19, 2014

\o/

@dobkeratops

This comment has been minimized.

Copy link

commented Dec 26, 2014

I see this has been accepted. I was going to mention, the absence of ';' in types makes simplified parsing (e.g. in syntax highlighters? )easier. I think its' possible to simply/cheaply disambiguate < .. > for typeparams by scanning between a '<' and if you don't find various characters ({ } ;) its a good assumption its' a type-param.
I know that rust currently uses :: to disambiguate , but you might be closing a door or making this harder. (i'm looking to do this as an option in my pet language with rust derived syntax.. optionally get rid of the ugly :: in cases where its' trivial to disambiguate, its' one less thing stabbing ` C++ user in the eye)

[T x N] looked like the best proposal to me above, similar to LLVM syntax.its 'longer' but the space is easy to type.

I think the [ ] with ";"'s inside has other interesting potential uses better than an array type.

@tikue

This comment has been minimized.

Copy link

commented Dec 26, 2014

I seriously doubt [T; N] is so hard to parse that syntax highlighters will not be able to.

@dobkeratops

This comment has been minimized.

Copy link

commented Dec 26, 2014

its not impossible but it closes other uses of the ';' character.
in many editors syntax highlighting is just done with regex. The fact that rust is currently so friendly to simplified searches with regex is a really nice property.

Currently the absence of semicolons in types is nice, IMO. designing my own language syntax(layered on Rust), it seems really helpful to use the combinations of {} () [] and internal delimiters , ; for 'other things'.(map literals? list comp ? whatever ..). I think the syntax has better uses than this and it closes future doors.

and generally since experimenting I seem to have learned semicolons help disambiguating type-params.(as in, separating types from statements)... I think this is a 'happy accident' that kept <T> popular and copied. (rust currently compromises with :: and i think its' possible to escape that in simple cases).

When you know types dont include ';' I think its's possible to make a simplified lookahead without needing a whole GLR parser.

';' says statements, or data separator.
'[ x ]' already says array, in LLVM.. its much nicer, imo.

@eddyb

This comment has been minimized.

Copy link
Member

commented Dec 26, 2014

You don't need to disambiguate type params with a hack, but I guess the lack of an official grammar doesn't help with writing proper syntax highlighters. Using ; that way involves arbitrary lookahead and/or backtracking, which is less efficient and probably not as widely supported as partial grammars are.
IMO, as is way more damaging because the type is not enclosed in any delimiters and there are special rules catering for the common cases.

jedisct1 pushed a commit to jedisct1/sodiumoxide that referenced this pull request Oct 20, 2015

Update array syntax to conform to RFC 520
Change all instances of the [_, ..n] array syntax to the new [_; n]
syntax. See rust-lang/rfcs#520 and rust-lang/rust#19999

@chriskrycho chriskrycho referenced this pull request Dec 31, 2016

Closed

Document all features in the reference #38643

0 of 17 tasks complete

@chriskrycho chriskrycho referenced this pull request Mar 11, 2017

Closed

Document all features #9

18 of 48 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.