Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line-expressions #114

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open

Line-expressions #114

wants to merge 50 commits into from

Conversation

jeapostrophe
Copy link
Collaborator

Here's yet another proposal for syntax. This one comes with a real parser and a full test suite of examples that are in the write-up. I really like this one a lot.

@rocketnia
Copy link

I think I noticed a discrepancy: Right after you say "dots are left-associative," you illustrate that with (#%dot x (#%dot y z)), which would actually indicate they're right-associative. It looks like "left-associative" really is what you have in mind and that there's just a simple typo in the example.

@jeapostrophe
Copy link
Collaborator Author

Thanks @rocketnia I fixed the mistake.

@gus-massa
Copy link

I think that (1 + 2 - 3 + 4) and (1 * 2 / 3 * 4) are parsed in the unexpected way.

Does this allow unitary -?

@jeapostrophe
Copy link
Collaborator Author

jeapostrophe commented Sep 2, 2019

I've added those two examples to the document.

Lexprs ONLY support binary operators. I'll clarify that in an update.

@rocketnia
Copy link

Oh, you don't intend for dots to be left-associative? If they're meant for field and method lookup as in Java, that would usually make them left-associative (since the thing on the right of the dot isn't even an expression, let alone another dot expression). If you're making them right-associative, what use cases do you have in mind?

Meanwhile, I think it's a mistake for (1 + 2 - 3 + 4) to mean anything too different from (((1 + 2) - 3) + 4). It'll mess people up when they try to apply the math they learned in school. If it's an error to write (1 + 2 - 3 + 4) and people have to write a more explicit (((1 + 2) - 3) + 4) or (1 + 2 + neg(3) + 4), that's probably something people can work with, but if it's a non-error that results in something similar to ((1 + 2) - (3 + 4)), that'll be pretty surprising.

@jeapostrophe
Copy link
Collaborator Author

jeapostrophe commented Sep 2, 2019

@rocketnia Re: infix ---- You've made me realize that I always naively interpreted PEMDASFLTR to mean something different than what people actually do. Dumb me. I've changed it to allow */% and +- to be combined while denying others.

@rocketnia Re: dots ---- I was imagining it would work like Remix where the thing on the left of a dot gets to decide what things on the right are valid. It may, for instance, allow another dot expression and look at the first thing. In other words, I'm imaging #%dot as being a macro rather than a function, so (#%dot x (#%dot y z)) makes more sense because of the order of evaluation of macros. I thought about maybe making it (#%dot x y z) which is a bit more agnostic.

@mflatt
Copy link
Member

mflatt commented Sep 2, 2019

@jeapostrophe A few small questions:

Is the strict spacing requirement a question of style (i.e., ensuring that all program have the same shape) or is it important to parsing?

Is it important for . to be an identifier? (Offhand, that looks like trouble.) Along similar lines, is it important for 1+2 to parse as an identifier, instead of being an addition?

Do I understand correctly that an if using this notation would be required to take up 2-4 lines? That is, it wouldn't work to remove the newline before else in if form? (The answer here is probably obvious to someone who has written Python programs.)

There seems to be a kind of ambiguity in parsing : as a follower continued on the same line versus : as an identifier/operator in a line. Are there other ambiguities that you've noticed that may not be obvious?

I worry about having to use parentheses to get infix operators. For example, I don't think the let example in "Line follower: Bar" is as intended, because I think you meant to add x and y (so the source should have (x + y)). It also seems that parentheses disable indentation-based forms within the parentheses. Is that right, or am I misunderstanding?

@jeapostrophe
Copy link
Collaborator Author

@mflatt

Re: spaces --- I think that all places that require 1 space, could be changed to "at least 1" space, except of course that indentation needs to be well-defined. I don't think that there's any place where less spaces would be okay and where you could have more or less newlines.

Re: . --- This is just so that ... is not a special case.

Re: +/-/etc in identifiers --- This is so that existing Racket identifiers like take-right and system* and so on work okay. I think it is better to say infix only happens in strict circumstances than to have fewer options for identifiers.

Re: if --- No. First, the first line after a : can be before the newline, so a double-armed if like

if (x < y) : f(x)
else : g(y)

is allowed. (I'll add this example.) Second, there's nothing about Lexprs that restrict the if macro from accepting an AST like (if cond true false) in addition to (if cond (#%indent true) else (#%indent false)). However, it would be very awkward for the if macro to discover an else inside the true #%indent (which would happen if you removed the newline before the else :.

Re: ambiguity of : --- Line followers are not disallowed inside of symbols (so :, &, etc are okay), so else: is a valid symbol, just like :. However, a position inside a line cannot be a symbol unless it does not start with a line follower. Thus, the only way to include a : symbol inside a line is by wrapping it in parens. I'll add an example for this.

Re: infix --- You are correct that I intended to get (#%line (+ x y)) not (#%line x + y), although I copied that example from something else so didn't think about it too much. I think you are correct that this rule is subtle and will require getting used to. I fell into this because I originally wanted something like let x = 1 + 2 in x + x be (let (= x (+ 1 2)) in (+ x x)) but I don't know how to make that happen in a consistent way without knowing something about let and in. I realized that things would be greatly simplified at the line level if there was no precedence, but that if precedence was trivial to enable anywhere and default enabled in most positions (like in function application sequences.)

Re: infix and indentation --- You are not correct. A group (the inside of parens) is a series of units separated by spaces---thus newlines are not included. If you put a newline in, then you have a mismatched ), because the line ended without the ) appearing.

@rocketnia
Copy link

@jeapostrophe

Re: dots ---- I was imagining it would work like Remix where the thing on the left of a dot gets to decide what things on the right are valid. It may, for instance, allow another dot expression and look at the first thing. In other words, I'm imaging #%dot as being a macro rather than a function, so (#%dot x (#%dot y z)) makes more sense because of the order of evaluation of macros. I thought about maybe making it (#%dot x y z) which is a bit more agnostic.

Ah, I see. The x in x.y.z could be a macro that interpreted y.z in a certain way. I agree this is a reasonable approach, but maybe for complex reasons. In case your reasoning diverges from mine at some point, or in case my points spark some inspiration, I'll lay out my reasoning explicitly....

Even as a fan of s-expressions, I find the dot notation particularly compelling for things that are like namespaces. Specifically, I think situations come up where a big section of code uses "variables" that aren't quite like the usual variables in the language. Often this means every variable (say, foo) in a piece of code needs to be surrounded with some operation (say, ns). Writing this operation using delimiters like (ns foo) or ns(foo) or ns"foo" is substantially less readable, in my opinion, than writing it like ns.foo. For something as common as a variable reference, putting delimiters on both sides just creates too much visual noise.

Extrapolating from that, I'd say chained dots are particularly helpful for namespace-like things that are looked up from other namespace-like things. Essentially, the expression x.y.z can be intended as merely a variable reference. It only becomes a lookup of z from x.y if we want to get nitty-gritty. And it only becomes a lookup of z from the namespace resulting from looking up y from namespace x if we want to get nitty-grittier!

I think a natural way to look at this situation is that the result of a macro call (in this case, the result of a namespace lookup) isn't always an expression; it's sometimes another macro (in this case, namespace) that can be called again. I've explored this approach before in Penknife, a language I wrte where a macro result was a "fork" object that could be converted into an expression or called as a macro.

But in the context of Racket, macros are (more or less) s-expression-to-s-expression transformers, and the thing in the macro position is (almost?) never a compound expression like x.y. (The expression macroexpander treats ((x y) z) as a function call regardless of what (x y) expands to.) Since the expansion of x.y.z must ultimately begin by calling macro x, it's understandable to conclude, in the context of Racket, that the x in x.y.z should be thought of as being directly in the macro position and the ".y.z" part should be thought of as its body.

@97jaz
Copy link

97jaz commented Sep 3, 2019

@jeapostrophe The line continuations (the slash line followers) in match l \ and mac timed \ seem visually odd to me, seeming to suggest that the match expression (or macro definition) only extends to the next (editor) line because it wouldn't fit on the current one. (No doubt this is at least partly because of the way I've seen \ used as a line continuation in other contexts.)

The upshot (I think) is that the newline+indentation after match l is optional; you could also write this as:

match l | empty : 
             0
        | [...]

whereas the newline+indentation+bar after the colon is required. I don't have a better way to describe this than to say it feels odd to me.

@jeapostrophe
Copy link
Collaborator Author

@97jaz I agree that \ in Lexprs is different than \ in shells, because it mandates indentation. & is more like what a shell does. But both are like a shell in that they just allow something to be spread across multiple lines.

I used \ in my examples because I felt like the | look like it was too far to the right and thought it looked cooler. I think an & in these examples looks nice too. I think that we don't know what tastes will emerge for when and why to use \ and &.

@mflatt
Copy link
Member

mflatt commented Sep 27, 2019

@jeapostrophe To refresh my memory and improve my understanding of Lexprs, I tried converting "demo.sap" to Lexprs:

https://gist.github.com/mflatt/aa09fd2ac9e445bbddd99f6243a21458

Does that look like the kind of syntax that would you choose? Search ??? for places where I felt especially uncertain an/or had specific questions.

As a general observation, my naive expectations were often defeated by the way that a :-continued line also continues after a sequence of indented lines. For example, I'd write something like

if (x = 1) :
  println("one")
else :
  println("other")
println("done")

and expect it to be like

(if (= x 1)
    (println "one")
    else (println "other"))
(println "done")

but it parses as

(if (= x 1)
    (println "one")
    else (println "other")
    (println "done"))

@jeapostrophe
Copy link
Collaborator Author

@mflatt

Your question on top about : shows up a few times in the file. The done is part of the if for the same reason the else is. : always includes the line after the indented block ends, and you have a second : on the line where the else is.

In general, I assume that people won't use [] in normal code, but will only use it in special circumstances, like defining a macro that consumes a line. You used it a lot in your code.

1: I'd expect let to use a |, so:

let | x = 1
    | y = 2
in :
  x + y

or

let | x : 1
    | y : 2
in :
  x + y

19 vs 26: I like 19.

48: You can't break it across lines inside of ()s. I'm willing to relax this rule.

55 & 62: I don't understand what these things are supposed to be doing, but it is an application of the : rule

71: I thought that structs would be written something like:

struct posn :
  x mutable
  y default(7)
where :
  methods equality :
    define equal(a, b) : 
      (is_posn(b) => (a.x == b.x && a.y == b.y))

101: Ya, there's no way to end most lines without a newline

104: I like that usage of []. If you wanted to not do that, then you could have

define go() :
  define helper(n) :
    list(n, n)
  ;
  define more(m) :
    if (m = 0) : "done"
    else : more(m - 1)
  ;
  helper(more(9))

120: Ya, you've got two :s so the v can have either 1 level (like you have) and it goes with the first : or 2 levels and it goes with the lambda.

127: () by itself would be an empty parenthesized expression, which is an error because parens are not visible in the output and there is no "empty" lexpr leader. In this case, these positions are not expressions, but function calls, so you have discovered a cute way to write a function call where you don't care what the name is. Another way to do case lambda might be:

define approx_thunk(x) :
  match x \
    | something(v) : [lambda \
                        | : v
                        | n : (v + n)]
    | nothing : [lambda \
                   | : 0
                   | n : n]

where each item to the left of : is a different argument.

148: I think I'd write it like:

define dictionary : dict :
    foo : 17
    bar : string
    baz : true

153: No, for the same reason as above, and I'm okay with changing that. I might want to do something like enforcing Haskell style , alignment :)

162: I think I'd use |:

define show_zip(l, l2) :
  for | x : in_list(l)
      | x2 : in_list(l2)
  in :
    print(x)
    print_string(\" \")
    print(x2)
    newline()

(This revealed an error in the code, which I've updated.)

@mflatt
Copy link
Member

mflatt commented Sep 28, 2019

Ah, I missed that a line continues after a last |, similar to the way that a :-continued line continues after nested indentation. (Does the description say that?) And I now see that the description includes the let example.

Yes, I used [] too much, falling back to it when I couldn't figure out a better way of grouping.

I updated the Gist with my improved understanding of the right style.

@jeapostrophe
Copy link
Collaborator Author

I'll add that to the description. It didn't say it, except for :. I looked at a lot of your new examples and find them quite appealing (the fors at the bottom for instance.)

@mflatt
Copy link
Member

mflatt commented Sep 28, 2019

I updated the examples slightly, replacing most \s with :s (which I think looks better and parses fine).

@jeapostrophe jeapostrophe changed the title RFC: Line-expressions for beautiful Racket2 syntax Line-expressions Sep 30, 2019
@gus-massa gus-massa mentioned this pull request Oct 20, 2019
@jackfirth jackfirth added the surface syntax Related to possible surface syntaxes for Rhombus label Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
surface syntax Related to possible surface syntaxes for Rhombus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants