Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure syntax requires arbitrary lookahead #664

Closed
thomcc opened this issue Apr 19, 2018 · 9 comments
Closed

Closure syntax requires arbitrary lookahead #664

thomcc opened this issue Apr 19, 2018 · 9 comments
Milestone

Comments

@thomcc
Copy link

@thomcc thomcc commented Apr 19, 2018

You need unbounded lookahead to distinguish between function parameters and a list. This makes syntax highlighting and other parsing or analysis much more difficult.

Specifically, when you encounter [ you don't know if the items inside are function parameters or members of a list until you hit the ]{. This is generally something you want to avoid, especially in cases where you're defining symbols into a scope, or doing other things that tooling might want to understand.

As a result, syntax highlighting this correctly for elvish isn't really possible using common methods (tmLanguage, sublime-syntax, etc) without fragile hacks. It also means that it's harder to report errors if a user puts a space between ] and { (you have no way of knowing if it was intentional or not).

Many parser generators also have trouble with cases like these, as well (although I'm not sure when someone would be implementing a parser generator for the language they use with their shell).

The easiest fix would probably be adding a token that comes before the parameter list. fn[a b c]{ ... } would be somewhat consistent with function declaration. A sigil already used by the language like $, @, & in the place of fn would also work as well.

There are a lot of other choices as well, and I have no preference for which you end up going with (so long as it doesn't also have the same issue).

That said, this would require breaking a large amount of existing elvish code, so I'd understand being reluctant about changing it. A tool to fix code like that automatically would not be hard to implement though.

@zzamboni
Copy link
Contributor

@zzamboni zzamboni commented Apr 19, 2018

@xiaq commented about this a few days ago in the Telegram group. Search for "unapologetically haskellesque" in the group to find the discussion :)

@zzamboni
Copy link
Contributor

@zzamboni zzamboni commented Apr 19, 2018

Personally I like the current syntax. It reminds me of Clojure's argument lists - I agree from a purely formal point of view what you say is an issue, but in practice it seems to work well - hence the working syntax-highlighting support already existing for multiple editors.

@thomcc
Copy link
Author

@thomcc thomcc commented Apr 19, 2018

hence the working syntax-highlighting support already existing for multiple editors.

Do they work if a new line is in the middle of the argument list? You can do some hacks where match the whole decl as a regex, and pull it apart, but at least in editors that use tmLanguage (sublime text also has sublime-syntax, but that has this restriction as well), those regexps cannot go over multiple lines (there are other issues with doing this as well, but this is the biggest one).

Maybe that's enough of an edge case where it doesn't matter though, although not being able to offer good help for someone who accidentally separates the ] and { with a space like ] { also is an issue caused by this, which might matter more to some.

@zzamboni
Copy link
Contributor

@zzamboni zzamboni commented Apr 19, 2018

I agree about the thing with ] {, and it has bitten me more than once. IIRC, that was one of the points mentioned by @xiaq in the discussion about maybe changing the syntax for closure declarations.

The Elvish mode in Emacs uses regex-based syntax highlighting, so many cases are not properly handled - e.g. the new-line-in-the-middle (note the color change):
image
image

@xiaq
Copy link
Member

@xiaq xiaq commented Apr 23, 2018

Some thoughts:

  • Keyword-based syntax doesn't work in shells, at least not in Elvish, due to barewords. In most cases, new syntax needs to be introduced by using metacharacters. @ is not really a metacharacter in Elvish - it only has special meaning in the context of variable names, akin to :. That leaves your proposal to $[a b c]{ ... } and &[a b c]{ ... }. Both are technically acceptable designs and serve to resolve the problem you posed. But I am still waiting for a new syntax that would also eliminate the awkward ]{ combination.

  • In terms of syntax highlighting, if you are interested in highlighting command blocks, you could probably just look for { or ]{, no? In most editors I have used, syntax highlighting is not so advanced for reliably recognizing closures to matter. Of course with Elvish we do not restrict ourselves to the convention, but it's not clear to me what is needed here.

@thomcc
Copy link
Author

@thomcc thomcc commented Apr 23, 2018

Keyword-based syntax doesn't work in shells, at least not in Elvish, due to barewords. In most cases, new syntax needs to be introduced by using metacharacters. @ is not really a metacharacter in Elvish - it only has special meaning in the context of variable names, akin to :.

Yeah, that makes sense. I feel like it could be made work in either case, but would likely introduce other confusing aspects to the grammar.

That leaves your proposal to $[a b c]{ ... } and &[a b c]{ ... }. Both are technically acceptable designs and serve to resolve the problem you posed. But I am still waiting for a new syntax that would also eliminate the awkward ]{ combination.

Well, really the point of my proposal wasn't to propose new syntax so much as to indicate that the current syntax was not good for these reasons. But yeah, I gave examples so as not to simply be complaining without offering solutions.

That said, either of these could eliminate the requirement for no spaces between ]{: For example, for $, the token sequence $ [ doesn't appear anywhere else in the grammar, and so you know if you see it, you expect to parse the rest of the list, then any amount of whitespace, then a block. I believe the same is true of for &. That is to say, both $[a b c] { ... } and &[a b c] { ... } are ambiguous closure syntaxes, aren't they?

@xiaq
Copy link
Member

@xiaq xiaq commented Oct 5, 2021

While trying to implement #1279 I realized that the syntax of using arg=default to indicate that an argument is optional is quite hard to parse, exactly because of this issue. When Elvish sees [arg=default, it doesn't know whether the = should be a metacharacter or not, until it sees the closing ] and the character after it.

This has led me to revisit the issue. Unfortunately I'm still going to reject the $[a b c]{ ... } and &[a b c]{ ... } proposals, because I want to keep the syntax for thunks (argument-less closures) just { ... }, and requiring an additional metacharacter when you add an argument list feels unnatural.

The syntax I'm considering now is { |a b c| ... }. This works because | is not allowed in braced lists, and can't appear at the start of a chunk, so there is no ambiguity whatsoever.

This is how it can look like, from tight to spacey:

fn f {|a b c|echo $a $b $c}
fn f { |a b c| echo $a $b $c }
fn f { |a b c|
  echo $a $b $c
}
fn f {
  | a b c |
  echo $a $b $c
}

put a b c | each {|x|echo $x}
put a b c | each { |x| echo $x }
put a b c | each { |x|
  echo $x
}
put a b c | each {
  | x |
  echo $x
}

After this change, the condition for parsing a lambda becomes:

If { is followed by whitespace or |, parse a lambda. Otherwise parse a braced list.

There is no longer any overlap between parsing of lambdas and lists.

FWIW it appears that this looks the same as Ruby's block syntax. I was thinking of Smalltalk's block syntax consciously but I may have seen Ruby's syntax somewhere before.

@xiaq xiaq added this to the 0.17.0 milestone Oct 12, 2021
@xiaq xiaq removed the maybe label Oct 12, 2021
xiaq added a commit that referenced this issue Oct 13, 2021
@xiaq
Copy link
Member

@xiaq xiaq commented Oct 13, 2021

The new syntax is now supported. Next steps are

  • Update documentation and release notes
  • Extend the upgrader to migrate the syntax
  • Deprecate the old syntax from 0.17

xiaq added a commit that referenced this issue Oct 14, 2021
@xiaq
Copy link
Member

@xiaq xiaq commented Oct 14, 2021

Nothing more to do for 0.17.

@xiaq xiaq removed this from the 0.17.0 milestone Oct 14, 2021
@xiaq xiaq added this to the 0.18.0 milestone Oct 14, 2021
krader1961 added a commit to krader1961/elvish that referenced this issue Nov 28, 2021
I was surprised to see so many legacy lambda syntax examples in the
documentation. This replaces all of them with the new syntax -- excluding
the handful of cases meant to explicitly verify the legacy form is still
valid. This also adds a link to the issue in the release notes which
documents the change in syntax.

Related elves#664
xiaq added a commit that referenced this issue Nov 28, 2021
I was surprised to see so many legacy lambda syntax examples in the
documentation. This replaces all of them with the new syntax -- excluding
the handful of cases meant to explicitly verify the legacy form is still
valid. This also adds a link to the issue in the release notes which
documents the change in syntax.

Related #664
@xiaq xiaq closed this in 003557c Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants