-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrubbery notation #122
Shrubbery notation #122
Conversation
I think the link to the RFC discussion is wrong. Can you add something comparing it to sapling notation? |
Fixed and added. The short answer in comparison to sapling notation is that shrubbery notation is indentation-sensitive. |
See In I think this a lot and am willing to move forward with this. For next steps:
|
I think forbiding About delaying the precedence to another level, I think it is necessary to support user-defined operator (my examples in a recent thread was the |
@jeapostrophe I think it was confusing to have the "a colon could go here" comment in the middle of the first example, since that's not the first place where a colon is optional. I changed the text before the example to explain @gus-massa I think the C and the Lisp ends of the identifier spectrum make sense, but I'm skeptical of in-between points. With Lisp-style identifiers, you have to put space around any operator (including something like |
Also, Rust and JavaScript now both use |
I think Ruby allows (This is not even to talk about how people who never program before wouldn't feel |
A clarification on Ruby: Ruby allows I am not aware of any formal study that would confirm whether |
As much as I don't like this, I agree 100% that people are constantly asking if |
About the main part of this proposal: I like the idea of meaningful indentation. It reduces the amount of parenthesis a lot, like in my examples in samplings. I don't use paredit, so I rely in the parenthesis to detect the mismatch. I sometimes use I'd like to add the magic I'm not convinced of the optional I like the two spaces before the I'm still not sure why the
|
Now that I think about it, I've probably also had to explain that dozens of times. It would be nice to just... not have to do that. I'm coming around on the idea of C-style identifiers for Rhombus. |
@gus-massa On the optional I also share your concern about the special rule for operators to continue a line within I'm not initially enthusiastic about |
+1 for required I'm not sure I like the idea of preserving commas on the parse. I'd prefer that either:
When would a group that mixes use or absence of commas be desirable?
|
@michaelballantyne Requiring commas in |
I've updated the description and parser:
Extra |
I really like all of these changes, because I like the greater specificity. Thank you! |
Are characters used as "operators" like "+", "-" not allowed to be shown in an identifier? I see some identifiers like |
@zamora You're right — that description is backwards. The implementation is the other way around with |
Leaving the behavior of `#//` on its on line intact, add support for `#//` at the start of a group or just before an inline `|`. For example, the new rules make `f(#//x, y)` equivalent to `f(x)` and `if a | 1 #// | 1.5 | 2` equivalent to `if a | 1 | 2`.
Each `{}` argument is a separate list. For example, `f[0]{a}{b}` turns into `f(0, ["a"], ["b"])`.
Would it be a good idea to syntactically disallow an empty block after a
the only possible indentation for the next line is more indented, and some indentation problems would be flagged even earlier. Unlike the case of empty groups, I don't have in mind disallowing the representation of empty blocks. They are useful, for example, in representing an empty sequence from by expanding a definition macro or, similarly, an empty sequence of generated alternatives for |
I'm not sure if this is a goo idea, but... One problem that I have in Python is when I comment the only instruction in some code that I use to debug or to have a verbose output, but I don't want it in the final version. (And I don't want to remove it, because I may need more debugging later.)
|
Also, add `;« ... »` as a group-sequence splice form. The idea here is that `«»` can be used to fully bracket groups, and then program text is is armored so that line and indentation changes to not change the way the text parses.
The latest version is an experiment changing In some ways, this change brings us full circle to proposals that advocate a choice of equivalent notations, where one is indentation-sensitive and the other is not. A difference here, though, is that the indentation-insensitive notation isn't meant to be particularly convenient to type or pretty on its own, and so it can be closer to the indentation-sensitive syntax. That is, it's meant as a kind of "armor" mode to minimally adjust program text while ensuring that accidental line or indentation changes won't change the way the text parses (except for text in One possible use of armoring is just before copying some program text to move it into a different context, where the text could be pasted and reindented in the target context. Unarmoring in the target context then ensures that it parses the same as before it was copied. The https://gist.github.com/mflatt/b932084a4b2489abbe115022b2b81b9b The next step would be to make Meta-A toggle armored text to unamored. To support a sequence of groups that is not in a block, a |
Correct me if I'm wrong, but mainly, there are two ways to style a colon.
An opening paren-like (brace, bracket, paren) at EOL increases the current indentation by some amount (let's say 2). For example:
QuestionIn JS, people usually write something along this line:
Shrubbery notation can read this code just fine. But the style doesn't follow the "rules" I wrote above, and the current indenter will fight against this style. I guess the main question I have is, is this style endorsed? And if so, how should the "rules" by adjusted? |
@sorawee That output might be implemented as "two more than whatever indentation starts the line with the opener". But that choice also leads to an example like this:
instead of
which seems like it might be a bad idea. Special-casing indentation for an opener–closer pair at the end of a block might work, but indentation is currently determined only by looking before the line to indent. I'm not at all sure those are the only possibilities. |
Consider:
Here's the raw information on each node:
I will need to think more, but my first impression is that I'd really love for EDITED: FWIW, in |
Disallowing an empty block seems to enable clearer/earlier errors. Empty blocks can still be represented, though, so an explicit `«»` provides a way to write an empty block.
Indenter nit: The interaction of the indenter and automatic closer insertion is not great. Here are some examples, where
Example 1 looks incorrect. There is no reason that
Example 2 might appear to indent correctly, but in practice, programmers expect that after entering a newline, they will be able to continue coding right away. As it currently is, they would need to back up one line and enter a newline again to achieve:
which is not ergonomic. Most code editors will produce the above editor state right away. Note that the above observation ("programmers expect that after entering a newline, they will be able to continue coding right away") also validates:
since they can add more code right away. |
I've changed the indenter for example 1. For example 2, the behavior is still currently the same. Possibly the right interaction there is for a programmer to type the start of whatever goes in parentheses and then hit Tab again. This sort of thing happens with |
Thanks! Speaking of |
@sorawee I've (slooooowly) been fixing that in DrRacket and I'll try to get something pushed soon that fixes that. |
Trying to find a balance between empty blocks being confusing and some sensitible uses of empty blocks, such as `'(:)` to represent an empty definition-macro expansion or a `:` by iself to put a REPL in multi-line mode.
@sorawee I pushed the change to keep same-line whitespace and comments with the preceding token as a You mentioned in person that it would be useful to preserve the parsed structure of commented-out groups and alternatives. I'd rather not have that as the default, because the idea is that plain strings might be lightweight enough to preserve in syntax objects for compiled form (to be used later when reporting errors about macro expansions). So, there would need to be a special mode — and in that mode, would it be better to just leave the commented-out group as non-comments in the parse? The |
An option to preserve comments in the parse tree would definitely make the life of fmt, resyntax and other custom tools that modify source codes much easier! |
Conceptually, I think quoting and commenting can interact. Within a deeply nested quotation, a comment can be intended to be preserved through several levels of quotation before the comment is supposed to kick in and erase it for the rest. This is a niche concern—virtually all mainstream string syntaxes don't even have a single way to write comments inside them, much less ones that interact carefully with nesting—but it's something that's been on my mind in my designs, and now seems like an apropos time to mention it for Rhombus. For this to work in my design notes, I've treated comments as being a kind of escape sequence, and I've given escape sequences a prefix that allows them to be annotated with simple information relevant to determining the depth they apply at. (In the most straightforward cases this would be a number written in unary, but in my designs, I want to be able to specify labels to jump directly to certain nesting levels.) Unfortunately, the presence of these prefixes makes it hard to determine what kind of escape sequence is coming up by peeking for it, so for instance it's hard to write something that skips whitespace and comments but doesn't read into any other escape sequence. I've considered making up for this by giving them another prefix that acts as a forward declaration of what kind of escape sequence is coming up. Frankly, it's messy, and the resulting comment notations are probably too verbose. But I thought it could be useful to share this experience report in case some of these concerns are relevant. More to the point, if comments are part of the AST, then at some point someone might ask for comments that are really comments rather than being part of the AST. And then if that exists, someone might ask for a quoting operator that really quotes things, so that those comments are part of the AST too. I think the idea of different comments having different escaping depths can help explain how these competing concerns can interact and coexist. |
@sorawee I made a further change that I think you'll approve of, but mentioning in case not: When a block is followed by a comment that starts on the same column as the block's content, then the comment and intervening whitespace is kept with the block as a tail, instead of left to be a tail of an enclosing for or a prefix on the next form. (This rule makes a Scribble |
@rocketnia I'm not sure it's deeply related to what you have in mind, but something like that happens with |
@mflatt Thanks! I totally agree that this is better. |
Oh yeah, I learned about that feature recently (in regular Racket-based Scribble rather than Rhombus), and I really like that it's available as a technique. I think it is related, not just in superficial terms but potentially in more intricate ways as well. I think the essential difference is that Scribble's labels apply to lexer syntaxes rather than quotation levels. The difference has been pretty subtle for me as I've worked on Punctaffy, which deals mainly with higher-dimensional analogues of lexer syntax, but has an application as a building block for quasiquotation syntaxes. I've mixed up these concerns more than I'd like to admit. I think in the majority of practical cases, Scribble's syntax labels are sufficient for suppressing the syntactic features of arbitrary code, in the same way that a preprocessor is often a sufficient substitute for a function. And just like preprocessors and functions, these techniques each serve their own purposes. Statically scoped quotation labelsThe mental model I have in mind is that each comment (whether labeled or not) should be statically associated with a quotation level, in a way that's as reliable as static lexical binding. If a comment is associated with a quotation level that's at a nonzero depth, that comment is suppressed; it represents its source text rather than doing anything comment-like. Whether or not a comment is suppressed this way, it still begins and ends in the same places. We might consider those beginning and ending points to be determined before scope resolution figures out what level the comment's label refers to. The code of a suppressed comment might even be syntax-highlighted in a way that reflects its status as a quoted comment, like I started to describe in racket/drracket#512 (comment). Relevant excerpt:
Statically scoped lexemesAnyhow, if I understand right, the In the following example, some code initially works because some erroneous (or perhaps malicious) notation is commented out. Once it's moved into another quotation that suppresses the comment, the notation contained in the comment gets interpreted with a different semantics that makes it actually do something: #lang at-exp racket
; Here's an example:
(displayln
@~a{
hello
@; Commented-out }>>|}) (displayln (string-upcase "code injection")) (displayln @~a{@~a|<<{ code.
world})
; Here it is in a quoted string:
(displayln
@~a{
; Here's an example:
@~a|<<{
(displayln
@~a{
hello
@; Commented-out }>>|}) (displayln (string-upcase "code injection")) (displayln @~a{@~a|<<{ code.
world})}>>|}) Output:
Code injection?Using "code injection" in that example might be catastrophizing. I mean, my concern is to be able to move code around without adjusting its escape sequences. If my concern here were security, just about every language under the sun would be vulnerable to something like this on account of the popularity of double-quoted strings (and single-quoted ones, etc.):
And if everyone's vulnerable to this, then someone must have thought hard about why double-quoted strings are actually okay... right? Or is that wrong? Well, personally, I don't know how to tell people it's a catastrophe even if it is, and what I do know is that I prefer the alternative quotation syntax approach I'm describing, which doesn't share the same usability gotchas. Shortcomings of my approach; and tying back to Scribble and RhombusI wouldn't say my statically scoped quotation levels are necessarily free of gotchas altogether. I think most of the gotchas are herded into a particular corner, though: Moving code around is still dangerous if it contains unmatched brackets or free variables which might interact differently with their new surroundings. Of course, if nothing else, the code pasted into the string could simply contain an unmatched string-ending bracket sequence, which can match up in a potentially surprising way with the string boundary itself. The need to avoid unmatched brackets is more pervasive than that, though. Strings that track quotation depth need to recognize nested quotation forms so they can adjust the depth, and they need to be able to match up brackets to figure out where those nested quotations begin and end. Unmatched or mismatched brackets can interfere with this, so they need to be meticulously escaped. And if these unmatched or mismatched brackets are intended to affect the quoting level, then I suppose their effect on the quoting level and the extent of that effect has to be meticulously explained in another escape sequence (one I haven't ever prototyped yet). If I'm in a scenario where I have a whole mess of unmatched brackets to deal with, I'll probably prefer wrapping them in Scribble-style labeled brackets rather than escaping them all individually. That's why I think of these techniques as each serving their own purposes. Another issue that's come up in my explorations is that when I have code that resembles English text, that becomes a problem, because it can't be distinguished as program syntax inside a string. The same concern is addressed in Scribble, where |
I am wondering if it possible to simplify the block syntax by replacing the pipe with a preceding colon. I think it might be possible to give the sequence
I think the colon is a better allegory, because it is itself a sequence of dots. |
An indentation-sensitive notation that remixes elements of
#lang something
, Lexprs, and saplings.Rendered