Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrubbery notation #122

Closed
wants to merge 44 commits into from
Closed

Shrubbery notation #122

wants to merge 44 commits into from

Conversation

mflatt
Copy link
Member

@mflatt mflatt commented Oct 1, 2019

An indentation-sensitive notation that remixes elements of #lang something, Lexprs, and saplings.

Rendered

@mflatt mflatt changed the title shrubbery notation Shrubbery notation Oct 1, 2019
@samth
Copy link
Sponsor Member

samth commented Oct 1, 2019

I think the link to the RFC discussion is wrong.

Can you add something comparing it to sapling notation?

@mflatt
Copy link
Member Author

mflatt commented Oct 1, 2019

Fixed and added. The short answer in comparison to sapling notation is that shrubbery notation is indentation-sensitive.

@jeapostrophe
Copy link
Collaborator

See | used for if was enlightening to me! Beautiful. I think if I had thought of that, I wouldn't feel like the line continuing rule after the indented block was as necessary.

In print_sexp, the for has indentation but no suffix... I assume this is "visible", might be nice to add a comment in the first example about the indentation grouping rule

I think this a lot and am willing to move forward with this. For next steps:

  • I think that the @ notation is hopefully obvious
  • I think the #{} idea is fine, although I think it would fine to tweak C names a little bit to accommodate ? and maybe a few others
  • I am anxious about delaying precedence/etc to another level, but am willing to say that shruberies
    can do whatever, but the next language may have some particular system.

@gus-massa
Copy link

I think forbiding . in identifiers is a good idea in spite the incompatibilities that it may cause, but nobody is using it. Probably forbiding : is also good, it may cause more problems like in exn:fail?, but help to avoid some bugs using syntax-parse. I'm not sure about !, I like it but it's a nice shorthand of not. And I really like ?.

About delaying the precedence to another level, I think it is necessary to support user-defined operator (my examples in a recent thread was the @ in python and the .* in matlab).

@mflatt
Copy link
Member Author

mflatt commented Oct 2, 2019

@jeapostrophe I think it was confusing to have the "a colon could go here" comment in the middle of the first example, since that's not the first place where a colon is optional. I changed the text before the example to explain : as indentation up-front and ignore the detail that extra :s are allowed.

@gus-massa I think the C and the Lisp ends of the identifier spectrum make sense, but I'm skeptical of in-between points. With Lisp-style identifiers, you have to put space around any operator (including something like :), so that's why I lean toward C-style identifiers in an infix-friendly notation. I also lead toward C-style because I am tired of having to explain that the - in x-y or the ! in set! is just part of the name; it means the syntax has already confused my audience, and I'm explaining to try to undo that confusion. I can see how allowing ? in identifiers just might work ok with C-style syntax, because ? is not commonly used in operators (and we don't need the C-style ?...: since we're not going to make the mistake of if being a statement and not an expression). Still, ? won't look like part of a name to many programmers, and I would prefer not to spend time in the years ahead explaining that ? is just a part of an identifier's name.

@samth
Copy link
Sponsor Member

samth commented Oct 2, 2019

Also, Rust and JavaScript now both use ? as an operator (for related things, but not related to the C ?).

@sorawee
Copy link
Contributor

sorawee commented Oct 2, 2019

@jeapostrophe I think it was confusing to have the "a colon could go here" comment in the middle of the first example, since that's not the first place where a colon is optional. I changed the text before the example to explain : as indentation up-front and ignore the detail that extra :s are allowed.

@gus-massa I think the C and the Lisp ends of the identifier spectrum make sense, but I'm skeptical of in-between points. With Lisp-style identifiers, you have to put space around any operator (including something like :), so that's why I lean toward C-style identifiers in an infix-friendly notation. I also lead toward C-style because I am tired of having to explain that the - in x-y or the ! in set! is just part of the name; it means the syntax has already confused my audience, and I'm explaining to try to undo that confusion. I can see how allowing ? in identifiers just might work ok with C-style syntax, because ? is not commonly used in operators (and we don't need the C-style ?...: since we're not going to make the mistake of if being a statement and not an expression). Still, ? won't look like part of a name to many programmers, and I would prefer not to spend time in the years ahead explaining that ? is just a part of an identifier's name.

I think Ruby allows ! and ?, and lots of programmers really like Ruby. Is it really the case that people are surprised by this issue?

(This is not even to talk about how people who never program before wouldn't feel ! and ? particularly weird... the whole programming is weird to them!)

@mflatt
Copy link
Member Author

mflatt commented Oct 2, 2019

A clarification on Ruby: Ruby allows ? and ! specifically at the end of method names (not in identifiers in general, and not in other places within the identifier). It also has ! as an operator. It seems that the expression x!(1) or even x!1 is a call to the x! method with the argument 1, while x ! 1 is a call to x with the argument false (since !1 is false).

I am not aware of any formal study that would confirm whether - or ? as part of an identifier really surprises people. Maybe there has been one. Currently, I can only report my perception from my experience (working with programmers at various levels) that it does surprise them, and it requires specific explanation in a way that a or x as part of an identifier does not.

@jeapostrophe
Copy link
Collaborator

As much as I don't like this, I agree 100% that people are constantly asking if ?, -, and ! are part of the syntax of Racket identifiers, same with ^ and % if they ever get to those.

@gus-massa
Copy link

About the main part of this proposal:

I like the idea of meaningful indentation. It reduces the amount of parenthesis a lot, like in my examples in samplings. I don't use paredit, so I rely in the parenthesis to detect the mismatch. I sometimes use {} for define and [] for long begin blocks to make the detection of mismatch easier. I will need to be more carefully when editing, and it will need more editor support to move the blocks to the left/right as automatically as possible. I like this anyway.

I'd like to add the magic & for for/let in addition of the magic | for if/match and the magic : for everything. But I think this addition is quite orthogonal and I don't expect to cause too much problems, so perhaps we can delay the discussion for later.

I'm not convinced of the optional :. I prefer a strict rue of a : before every new indented block that is not a | block, and never a : before a | block. It makes the code core consistent.

I like the two spaces before the | blocks. Indenting and using the | is a little redundant, but it looks nicer.

I'm still not sure why the {} blocks must be different of the () blocks. But if they are equivalent I'm worried about the confusion of an operator that can be unary and binary an is at the begining of the line. I still can't make a good example, but something like this example. (Note the lack of the ,, display is perhaps a bad election, and I need a unitary operator that has side effects and also can be used as a binary operator so I used -- that is the nearest candidate.)

display(x
        -- y) 

display{x
        -- y} 

@jackfirth
Copy link
Sponsor Collaborator

I also lead toward C-style because I am tired of having to explain that the - in x-y or the ! in set! is just part of the name; it means the syntax has already confused my audience, and I'm explaining to try to undo that confusion.

Now that I think about it, I've probably also had to explain that dozens of times. It would be nice to just... not have to do that. I'm coming around on the idea of C-style identifiers for Rhombus.

@mflatt
Copy link
Member Author

mflatt commented Oct 3, 2019

@gus-massa On the optional :, I'm sympathetic to your point. I started with the #lang something optional :, but maybe the choice there is related to how : enables indentation within (). Otherwise, I thought it might be annoying to have to delete a : when editing code in some cases. But it does seem like the reader can simply disallow : before |, {, or indentation, and providing a good error message seems straightforward, so maybe that's better.

I also share your concern about the special rule for operators to continue a line within () and [] and the way that {} is different. Within () and [], a compensating factor that we expect , to separate groups, as you say. The remaining risk of confusion seems a worthwhile trade-off to avoid \ for breaking arithmetic across lines, but I'm not 100% certain.

I'm not initially enthusiastic about &, because getting rid of it seemed like one of the two big improvements over sapling notation (where the other is getting rid of the need for blank lines). But I encourage you to try/share examples and maybe try adjusting the parser.

@michaelballantyne
Copy link
Contributor

michaelballantyne commented Oct 3, 2019

+1 for required : before indented blocks.

I'm not sure I like the idea of preserving commas on the parse. I'd prefer that either:

  1. separating eg. function arguments with newlines be exactly the same as using commas
  2. comma-separated groups be a special kind of group, where commas are always required, but dropped in the parse.

When would a group that mixes use or absence of commas be desirable?

f(a, b
  c, d,
  e)

@mflatt
Copy link
Member Author

mflatt commented Oct 4, 2019

@michaelballantyne Requiring commas in () and [] seems likely a good idea. I avoided that requirement originally, because I wanted to make the notation flexible — deferring when possible to a language built with the notation. But enforcing a use of commas in () and [] at the shrubbery-notation level seems consistent with the way the division of responsibility has evolved.

@mflatt
Copy link
Member Author

mflatt commented Oct 17, 2019

I've updated the description and parser:

  • A , is now required between each group in () or []. Extra ,s are not allowed. I like how this removes the need to represent , in parsed forms.

  • A | is now implicitly indented by half a column. Although @gus-massa liked how indentation was required for nested |, it feels a little less confusing to allow a | to line up with the "keyword" that starts a conditional form, because the shape of | makes it look a little indented already. More significantly, a half-column alignment side-steps the question of mixing groups that start with | and groups that don't. And since they can't be mixed, a block that contains | groups can be simplified to an 'alts variant of 'block in parsed form.

  • A : (or |) is now required before indentation. I'm ambivalent. It often looks noisy and feels genuinely redundant to me. But a redundant : is useful as a kind of belt-and-suspenders notation to help detect earlier when indentation goes wrong — we like indentation, but don't really trust it? — and I think that's probably where the suggestion comes from. (@gus-massa and @michaelballantyne can correct me if I'm wrong.)

Extra :s are still allowed. Requiring a : before indentation is redundant and slightly pendantic, but workable. Having the parser complain when you have an extra : (because you just inserted a newline after a :, for example) seems unnecessarily pedantic. A style guide and code-formatting tool should normalize :s as well as indentation, of course.

@jeapostrophe
Copy link
Collaborator

I really like all of these changes, because I like the greater specificity. Thank you!

@yfzhe
Copy link

yfzhe commented Oct 18, 2019

Are characters used as "operators" like "+", "-" not allowed to be shown in an identifier? I see some identifiers like make_adder and color_posn in the demo, and in-list in (#lang) racket is transformed to in_list.

@mflatt
Copy link
Member Author

mflatt commented Sep 20, 2021

@zamora You're right — that description is backwards. The implementation is the other way around with [] => 'brackets. I'll fix the description. Thanks!

Leaving the behavior of `#//` on its on line intact, add support for
`#//` at the start of a group or just before an inline `|`. For
example, the new rules make `f(#//x, y)` equivalent to `f(x)` and `if
a | 1 #// | 1.5 | 2` equivalent to `if a | 1 | 2`.
Each `{}` argument is a separate list. For example, `f[0]{a}{b}` turns
into `f(0, ["a"], ["b"])`.
@mflatt
Copy link
Member Author

mflatt commented Oct 29, 2021

Would it be a good idea to syntactically disallow an empty block after a:? That way, with something like

fun f(x):

the only possible indentation for the next line is more indented, and some indentation problems would be flagged even earlier.

Unlike the case of empty groups, I don't have in mind disallowing the representation of empty blocks. They are useful, for example, in representing an empty sequence from by expanding a definition macro or, similarly, an empty sequence of generated alternatives for cond or match. But it seems rare that you'd want to explicitly write an empty sequence. Maybe «» would be required to write an empty sequence, as in cond:«».

@gus-massa
Copy link

I'm not sure if this is a goo idea, but...

One problem that I have in Python is when I comment the only instruction in some code that I use to debug or to have a verbose output, but I don't want it in the final version. (And I don't want to remove it, because I may need more debugging later.)

#lang python 
for i in range(10):
    print(i)
    if i == 7:
        #print(i, i*i)
        #continue # <-- why is this necessary

Also, add `;« ... »` as a group-sequence splice form.

The idea here is that `«»` can be used to fully bracket groups, and
then program text is is armored so that line and indentation changes
to not change the way the text parses.
@mflatt
Copy link
Member Author

mflatt commented Nov 1, 2021

The latest version is an experiment changing « and » so that parsing the content inside is not sensitive to line breaks or indentation.

In some ways, this change brings us full circle to proposals that advocate a choice of equivalent notations, where one is indentation-sensitive and the other is not. A difference here, though, is that the indentation-insensitive notation isn't meant to be particularly convenient to type or pretty on its own, and so it can be closer to the indentation-sensitive syntax. That is, it's meant as a kind of "armor" mode to minimally adjust program text while ensuring that accidental line or indentation changes won't change the way the text parses (except for text in @ notation, for now, but likely support there can improve).

One possible use of armoring is just before copying some program text to move it into a different context, where the text could be pasted and reindented in the target context. Unarmoring in the target context then ensures that it parses the same as before it was copied.

The #lang rhombus prototype binds Meta-A to armor a selected region of a program. Here's what the current "demo.rkt" looks like after armoring:

https://gist.github.com/mflatt/b932084a4b2489abbe115022b2b81b9b

The next step would be to make Meta-A toggle armored text to unamored.

To support a sequence of groups that is not in a block, a « can now appear just after ;. In that case, the groups between « and » are spliced into the enclosing context.

@sorawee
Copy link
Contributor

sorawee commented Nov 1, 2021

Correct me if I'm wrong, but mainly, there are two ways to style a colon.

  1. If colon is at EOL, then the current indentation is increased by some amount (let's say 2). For example:
yyyyyyy:
  // current indentation = 2
  xxxxx:
    // current indentation = 4
    foo
  1. If colon is not at EOL, then there's something after colon. The current indentation is set the the column position of that "something". For example:
yyyyyy:
  // current indentation = 2
  xxxxx:     hello
             // current indentation matches hello's position

An opening paren-like (brace, bracket, paren) at EOL increases the current indentation by some amount (let's say 2). For example:

yyyyyyy:
  // current indentation = 2
  hello(
    // current indentation = 4
yyyyyyy: hello(
           // current indentation matches hello's position + 2

Question

In JS, people usually write something along this line:

yyyyyyy:
  // current indentation = 2
  val x: {
    // current indentation = 4
    a: 1,
    b: 2,
  }

Shrubbery notation can read this code just fine. But the style doesn't follow the "rules" I wrote above, and the current indenter will fight against this style.

I guess the main question I have is, is this style endorsed? And if so, how should the "rules" by adjusted?

@mflatt
Copy link
Member Author

mflatt commented Nov 1, 2021

@sorawee That output might be implemented as "two more than whatever indentation starts the line with the opener". But that choice also leads to an example like this:

top:
  val x: form {
    inside
  }
         after

instead of

top:
  val x: form {
           inside
         }
         after

which seems like it might be a bad idea.

Special-casing indentation for an opener–closer pair at the end of a block might work, but indentation is currently determined only by looking before the line to indent. I'm not at all sure those are the only possibilities.

@sorawee
Copy link
Contributor

sorawee commented Nov 5, 2021

Consider:

// pre
fun foo(a, b): // qwe
  // www
  a // def
   + b // qqq
// post

Here's the raw information on each node:

#<syntax (top (group fun foo (parens (group a) (group b)) (block (group a (op +) b))))>: #f #f #f

#<syntax top>: () #f (" " "// qqq" "\n" "// post")

#<syntax (group fun foo (parens (group a) (group b)) (block (group a (op +) b)))>: #f #f #f

#<syntax group>: () ("\n" "// pre" "\n") #f

#<syntax:string:3:0 fun>: "fun" #f #f

#<syntax:string:3:4 foo>: "foo" (" ") #f

#<syntax:string:3:7 (parens (group a) (group b))>: #f #f #f

#<syntax:string:3:7 parens>: "(" #f (")")

#<syntax (group a)>: #f #f #f

#<syntax group>: () #f #f

#<syntax:string:3:8 a>: "a" #f #f

#<syntax (group b)>: #f #f #f

#<syntax group>: () ("," " ") #f

#<syntax:string:3:11 b>: "b" #f #f

#<syntax:string:3:13 (block (group a (op +) b))>: #f #f #f

#<syntax:string:3:13 block>: ":" #f #f

#<syntax (group a (op +) b)>: #f #f #f

#<syntax group>: () (" " "// qwe" "\n" "  " "// www" "\n" "  ") #f

#<syntax:string:5:2 a>: "a" #f #f

#<syntax:string:6:3 (op +)>: () (" " "// def" "\n" "   ") #f

#<syntax:string:6:3 op>: () #f #f

#<syntax:string:6:3 +>: "+" #f #f

#<syntax:string:6:5 b>: "b" (" ") #f

I will need to think more, but my first impression is that I'd really love for // qqq to associate with b, // def to associate with a.

EDITED: // qwe probably should also be associated with block (:)

FWIW, in fmt, I essentially peeked after reading a node if the next token is a whitespace. If so and it contains a newline, normal reading is resumed. Otherwise, it grabs one more invisible node and put it as a property of the current node.

Disallowing an empty block seems to enable clearer/earlier errors.
Empty blocks can still be represented, though, so an explicit `«»`
provides a way to write an empty block.
@sorawee
Copy link
Contributor

sorawee commented Nov 8, 2021

Indenter nit:

The interaction of the indenter and automatic closer insertion is not great. Here are some examples, where | indicates the caret position.

//// Example 1 (there are stuff after opener)
// before
foo(arg,|) // enter

// after
foo(arg,
|)

//// Example 2 (opener is at [to-be] EOL)
// before
foo(|) // enter

// after
foo(
|)

Example 1 looks incorrect. There is no reason that ) should be put there. Any other text would have been indented to align with arg, so I would expect the indenter to produce:

foo(arg,
    |)

Example 2 might appear to indent correctly, but in practice, programmers expect that after entering a newline, they will be able to continue coding right away. As it currently is, they would need to back up one line and enter a newline again to achieve:

foo(
  |
)

which is not ergonomic.

Most code editors will produce the above editor state right away.

Note that the above observation ("programmers expect that after entering a newline, they will be able to continue coding right away") also validates:

foo(arg,
    |)

since they can add more code right away.

@mflatt
Copy link
Member Author

mflatt commented Nov 8, 2021

I've changed the indenter for example 1.

For example 2, the behavior is still currently the same. Possibly the right interaction there is for a programmer to type the start of whatever goes in parentheses and then hit Tab again. This sort of thing happens with |, too, where the initial indentation isn't right for |, but after typing |, you can hit Tab again.

@sorawee
Copy link
Contributor

sorawee commented Nov 8, 2021

Thanks!

Speaking of |, when "Enable automatic parentheses, square brackets, and quotes" is enabled, and | is typed, it will create two |s. Is there a way to adjust this behavior based on #lang? The behavior makes sense for #lang racket, but does not make sense for #lang shrubbery/#lang rhombus.

@rfindler
Copy link
Member

rfindler commented Nov 8, 2021

@sorawee I've (slooooowly) been fixing that in DrRacket and I'll try to get something pushed soon that fixes that.

Trying to find a balance between empty blocks being confusing and some
sensitible uses of empty blocks, such as `'(:)` to represent an empty
definition-macro expansion or a `:` by iself to put a REPL in
multi-line mode.
@mflatt
Copy link
Member Author

mflatt commented Nov 10, 2021

@sorawee I pushed the change to keep same-line whitespace and comments with the preceding token as a 'raw-suffix property.

You mentioned in person that it would be useful to preserve the parsed structure of commented-out groups and alternatives. I'd rather not have that as the default, because the idea is that plain strings might be lightweight enough to preserve in syntax objects for compiled form (to be used later when reporting errors about macro expansions). So, there would need to be a special mode — and in that mode, would it be better to just leave the commented-out group as non-comments in the parse? The #// comment token could be in the commented-out term's prefix, and maybe there should also be an extra syntax property to indicate when something is commented out.

@Metaxal
Copy link
Sponsor

Metaxal commented Nov 10, 2021

An option to preserve comments in the parse tree would definitely make the life of fmt, resyntax and other custom tools that modify source codes much easier!

@rocketnia
Copy link

Conceptually, I think quoting and commenting can interact. Within a deeply nested quotation, a comment can be intended to be preserved through several levels of quotation before the comment is supposed to kick in and erase it for the rest. This is a niche concern—virtually all mainstream string syntaxes don't even have a single way to write comments inside them, much less ones that interact carefully with nesting—but it's something that's been on my mind in my designs, and now seems like an apropos time to mention it for Rhombus.

For this to work in my design notes, I've treated comments as being a kind of escape sequence, and I've given escape sequences a prefix that allows them to be annotated with simple information relevant to determining the depth they apply at. (In the most straightforward cases this would be a number written in unary, but in my designs, I want to be able to specify labels to jump directly to certain nesting levels.) Unfortunately, the presence of these prefixes makes it hard to determine what kind of escape sequence is coming up by peeking for it, so for instance it's hard to write something that skips whitespace and comments but doesn't read into any other escape sequence. I've considered making up for this by giving them another prefix that acts as a forward declaration of what kind of escape sequence is coming up.

Frankly, it's messy, and the resulting comment notations are probably too verbose. But I thought it could be useful to share this experience report in case some of these concerns are relevant.

More to the point, if comments are part of the AST, then at some point someone might ask for comments that are really comments rather than being part of the AST. And then if that exists, someone might ask for a quoting operator that really quotes things, so that those comments are part of the AST too. I think the idea of different comments having different escaping depths can help explain how these competing concerns can interact and coexist.

@mflatt
Copy link
Member Author

mflatt commented Nov 19, 2021

@sorawee I made a further change that I think you'll approve of, but mentioning in case not: When a block is followed by a comment that starts on the same column as the block's content, then the comment and intervening whitespace is kept with the block as a tail, instead of left to be a tail of an enclosing for or a prefix on the next form. (This rule makes a Scribble rhombusblock form work better.)

@mflatt
Copy link
Member Author

mflatt commented Nov 19, 2021

@rocketnia I'm not sure it's deeply related to what you have in mind, but something like that happens with @ comments. When you group @ content using { and }, then @// within the group is a comment that is not preserved with the rest of the content in { and } (although it's still stashed in properties). But if you wrap that whole thing as @ content using, say, |<<{ and }>>|, then the comment using @// will be preserved as text, while a comment using @|<<// would disappear.

@sorawee
Copy link
Contributor

sorawee commented Nov 19, 2021

@mflatt Thanks! I totally agree that this is better.

@rocketnia
Copy link

rocketnia commented Nov 20, 2021

@mflatt

But if you wrap that whole thing as @ content using, say, |<<{ and }>>|, then the comment using @// will be preserved as text, while a comment using @|<<// would disappear.

Oh yeah, I learned about that feature recently (in regular Racket-based Scribble rather than Rhombus), and I really like that it's available as a technique. I think it is related, not just in superficial terms but potentially in more intricate ways as well.

I think the essential difference is that Scribble's labels apply to lexer syntaxes rather than quotation levels. The difference has been pretty subtle for me as I've worked on Punctaffy, which deals mainly with higher-dimensional analogues of lexer syntax, but has an application as a building block for quasiquotation syntaxes. I've mixed up these concerns more than I'd like to admit. I think in the majority of practical cases, Scribble's syntax labels are sufficient for suppressing the syntactic features of arbitrary code, in the same way that a preprocessor is often a sufficient substitute for a function. And just like preprocessors and functions, these techniques each serve their own purposes.

Statically scoped quotation labels

The mental model I have in mind is that each comment (whether labeled or not) should be statically associated with a quotation level, in a way that's as reliable as static lexical binding. If a comment is associated with a quotation level that's at a nonzero depth, that comment is suppressed; it represents its source text rather than doing anything comment-like.

Whether or not a comment is suppressed this way, it still begins and ends in the same places. We might consider those beginning and ending points to be determined before scope resolution figures out what level the comment's label refers to. The code of a suppressed comment might even be syntax-highlighted in a way that reflects its status as a quoted comment, like I started to describe in racket/drracket#512 (comment).

Relevant excerpt:

[...] each level could be tagged with the language that the editor is basing its editor functionality on at that level (usually Racket), and each pair of adjacent levels could be tagged with the kind of embedding that's going on between them (commenting-out or quotation).

Statically scoped lexemes

Anyhow, if I understand right, the |<<{ |<<@ }>>| labeling in Scribble applies to lexer syntaxes. In this way, Scribble can suppress comments by changing what kind of syntax even counts as comment syntax in the first place. This prevents things from being comments so decisively that we don't even consult the comment syntax's logic to determine where these non-comments begin and end. If someone suppresses a comment this way, its source text can be given a surprising new interpretation.

In the following example, some code initially works because some erroneous (or perhaps malicious) notation is commented out. Once it's moved into another quotation that suppresses the comment, the notation contained in the comment gets interpreted with a different semantics that makes it actually do something:

#lang at-exp racket

; Here's an example:

(displayln
  @~a{
    hello
    @; Commented-out }>>|}) (displayln (string-upcase "code injection")) (displayln @~a{@~a|<<{ code.
    world})

; Here it is in a quoted string:

(displayln
  @~a{
    ; Here's an example:
    
    @~a|<<{
      (displayln
        @~a{
          hello
          @; Commented-out }>>|}) (displayln (string-upcase "code injection")) (displayln @~a{@~a|<<{ code.
          world})}>>|})

Output:

hello
world
; Here's an example:

(displayln
  @~a{
    hello
    @; Commented-out 
CODE INJECTION
 code.
world})

Code injection?

Using "code injection" in that example might be catastrophizing. I mean, my concern is to be able to move code around without adjusting its escape sequences. If my concern here were security, just about every language under the sun would be vulnerable to something like this on account of the popularity of double-quoted strings (and single-quoted ones, etc.):

; Safely prints some code we don't want to run.
(displayln "(/ 1 0)")

; Does *not* safely print some code that would safely print some code we don't
; want to run.
(displayln "(displayln "(/ 1 0)")")

And if everyone's vulnerable to this, then someone must have thought hard about why double-quoted strings are actually okay... right? Or is that wrong? Well, personally, I don't know how to tell people it's a catastrophe even if it is, and what I do know is that I prefer the alternative quotation syntax approach I'm describing, which doesn't share the same usability gotchas.

Shortcomings of my approach; and tying back to Scribble and Rhombus

I wouldn't say my statically scoped quotation levels are necessarily free of gotchas altogether. I think most of the gotchas are herded into a particular corner, though: Moving code around is still dangerous if it contains unmatched brackets or free variables which might interact differently with their new surroundings.

Of course, if nothing else, the code pasted into the string could simply contain an unmatched string-ending bracket sequence, which can match up in a potentially surprising way with the string boundary itself.

The need to avoid unmatched brackets is more pervasive than that, though. Strings that track quotation depth need to recognize nested quotation forms so they can adjust the depth, and they need to be able to match up brackets to figure out where those nested quotations begin and end. Unmatched or mismatched brackets can interfere with this, so they need to be meticulously escaped. And if these unmatched or mismatched brackets are intended to affect the quoting level, then I suppose their effect on the quoting level and the extent of that effect has to be meticulously explained in another escape sequence (one I haven't ever prototyped yet).

If I'm in a scenario where I have a whole mess of unmatched brackets to deal with, I'll probably prefer wrapping them in Scribble-style labeled brackets rather than escaping them all individually. That's why I think of these techniques as each serving their own purposes.

Another issue that's come up in my explorations is that when I have code that resembles English text, that becomes a problem, because it can't be distinguished as program syntax inside a string. The same concern is addressed in Scribble, where @ keeps a lot of Scribble notations distinguishable from other text. But I think even if Rhombus doesn't make itself pervasively distinctive the way Scribble does, there's still a viable approach to nested quotation: Users can use escape sequences to break up strings explicitly into regions of plain text and regions of nested code. Hmm, I suppose escape sequences like these might even salvage double-quoted strings. :)

@ceving
Copy link

ceving commented Nov 25, 2023

I am wondering if it possible to simplify the block syntax by replacing the pipe with a preceding colon. I think it might be possible to give the sequence newline whitespace : the meaning of the pipe.

if x > 0
  : do-if-1-then-1
    do-if-1-then-2
  if x < 0
    do-if-1-else-1-if-2-then-1
    : do-if-1-else-1-if-2-else-1
      do-if-1-else-1-if-2-else-2

I think the colon is a better allegory, because it is itself a sequence of dots.

@mflatt
Copy link
Member Author

mflatt commented Jul 29, 2024

#527

@mflatt mflatt closed this Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
surface syntax Related to possible surface syntaxes for Rhombus yep moving on
Projects
None yet
Development

Successfully merging this pull request may close these issues.