- Feature Name: Shrubbery notation
- Start Date: 2019-10-01
- RFC PR: racket/racket2-rfcs#122
Shrubbery notation is similar to S-expression notation, but instead of generating fully formed trees, it is intended to partially group input for further enforestation by another parser (e.g., as in Honu). The notation is line- and indentation-sensitive, and the parsed form of a shrubbery imposes grouping to ensure that further parsing is consistent with the shrubbery's lines and indentation.
S-expression notation imposes a grouping at the lexeme level that is all but guaranteed to be respected by further parsing via macro expansion. One consequence of this lexeme-based grouping is that programs can be pretty-printed and textually traversed in standard ways.
A traditional use of S-expression notation, however, insists that all grouping is reflected in the S-expression. Reifying all grouping at the lexeme level is so onerous that many practical deployments of S-expressions include deviations from the rule, such as keyword-based arguments or implicit grouping by position (as in various Clojure forms).
Another disadvantage of S-expressions is that many of the parentheses are redundant after the expression is pretty-printed, because indentation provides the same grouping information in a more human-readable way. That observation suggests instead relying on line breaks and indentation to impart grouping information, as in Python.
Shrubbery notation explores a point in the design space where the notation is
- line- and indentation-sensitive, and
- intended to constrain grouping but not reflect every detail of grouping.
Deferring complete grouping to another parser relieves a burden on reader-level notation. At the same time, line- and indentation-sensitive rules constrain parsing to ensure that line breaks and indentation in the source are not misleading.
Here are some example shrubberies. Each line either uses old
indentation to continue a nesting level that was started on a previous
line, starts with new indentation and follows a line that ends with
:
, or starts with new indentation and a |
on the same line. A :
or |
can also appear in the middle of a line, but that's roughly a
shorthand for starting a new indented line after the :
or before the
|
. The complete rules involve more terminology, but that's enough to
get a sense of the examples.
def identity(x): x
def fib(n):
cond
| n == 0: 0
| n == 1: 1
| else: fib(n-1) + fib(n-2)
def print_sexp(v):
match v
| empty: display("()")
| cons(a, d):
if is_list(d)
| display("(")
print_sexp(a)
for (v = in_list(d)):
display(" ")
print_sexp(v)
display(")")
| display("(")
print_sexp(a)
display(". ")
print_sexp(d)
display(")")
| v: print_atom(v)
Forms like def
, cond
, and match
are not specified by
shrubbery notation, since specifying those forms is up to a language
that is built on top of shrubbery notation. Still, shrubbery notation
is meant to accommodate a particular kind of syntax for nested blocks
(via :
and indentation) and conditional branches (via |
).
Identifiers are C-style with alphanumerics and underscores. Operators
are sequences of symbolic characters in the sense of char-symbolic?
,
roughly. No spaces are needed between operators and non-operators, so
1+2
and 1 + 2
mean the same thing. Comments are C-style, plus a
#//
group-comment form. See lexeme parsing for
more information.
The following tokens are used for grouping, in addition to line breaks and indentation:
( ) [ ] { } ; , : | « » \
Parentheses, square brackets, and curly braces are used to form groups
in the obvious way. A ;
or ,
acts as a group separator, even
within a single line. A :
or |
treats remaining item on the same
line like a new indented line, which forms a subgroup. A guillemet
pair «
and »
can be used (probably very rarely) to explicitly
bracket subgroups formed by :
and |
without line breaks. A \
continues a line, effectively shifting all columns on the next line as
if they appeared immediately after the \
.
The main grouping rule is that sequences on different lines with the same indentation create separate groups, one for each line.
this is the first group
this is the second group
Comments and lines with only whitespace are ignored. They don't count when this document says “the previous line” or “the next line.”
An opener-closer pair (
and )
, [
and ]
, or {
and }
forms a nested group that can span lines. Within the opener-closer
pair, ,
sparates groups. Groups can be on separate lines at the same
indentation, but groups on separate lines still must be separated by
,
. Parsing retains whether a subgroup is formed by ()
, []
, or
{}
.
group 1
[group 2 - subgroup I,
group 2 - subgroup II,
(group 2 - subgroup III - subsubgroup A,
group 2 - subgroup III - subsubgroup B,
{group 2 - subgroup III - subsubgroup C, subsubsubgroup α,
group 2 - subgroup III - subsubgroup C, subsubsubgroup β})]
(group 3 - subgroup I, group 3 - subgroup II,
group 3 - subgroup III)
The following three forms are not allowed, because they are missing a
,
between two groups:
// Not allowed
(1
2)
[1
2]
{1
2}
A ,
is disallowed if it would create an empty group, except that a
trailing ,
is allowed.
// Not allowed
(, 1)
(1,, 2)
// Allowed, but not standard
(1, 2,)
A trailing ,
is only standard style when the closer that follows is
on its own line.
list(
red,
green,
blue,
orange,
)
A sequence of groups has a particular indentation that is determined by the first group in the sequence. Subsequent groups in a sequence must start with the same indentation as the first group.
group 1
group 2
// error, because the group is indented incorrectly:
group 3
When a line ends with :
and the next line is more indented, then
it starts a new sequence of groups that form a block:
group:
subgroup 1
subgroup 2
There is no constraint on how much indentation a nested group sequence
must use, as long as the indentation is more than the enclosing group.
Also, a new line is not required after :
, but then it's as if the
:
is followed by a newline plus spaces that reach the same column as
the :
. All four of the following groups are the same, each with one
block that has two nested groups:
hello:
world
universe
hello:
world
universe
hello: world
universe
hello: world
universe
Within an opener-closer pair, a nested group sequence can start at any indentation; it doesn't have to be indented to the right of the opener.
function(
argument,
more
)
A block that is started with :
normally cannot be empty (unless
explicit-grouping «
and »
are used as described in a later
section), so the following is ill-formed:
bad_empty: // empty block disallowed
However, :
can be used at the start of a group so that the group
contains only a block. When :
starts a group that is in the
top-level sequence or within an opener-closer pair, the block
created by :
is allowed to be empty (because that provides a way to
express an empty in a context where it likely to be intentional
instead of confusing). For example, the first of the following three
top-level groups has just a block that contains one group with the
single element untagged
, the second top-level group has just a
block with zero groups, and the third has a group with one parenthesized
sequence of groups where the middle one has an empty block:
: untagged
:
(1, :, 2)
When a newly indented line starts with an operator and when the
preceding line does not end with :
, then the indented line does
not form a block, and it may instead continue the previous line. The
operator-starting line continues only if the previous line was not a
continuing line; however, additional continuing lines can start with
an operator (not necessarily the same one) at the same indentation as
the original continuing line. The following two groups are the same:
f(1) + 2
+ 3 + 4
- 5 - 6
f(1) + 2 + 3 + 4 - 5 - 6
A block is always at the end of its immediately containing group. One consequence is that an operator-starting line cannot continue a group that already has a block:
hello: world
+ 3 // bad indentation
Along those lines, there is no ambiguity when an indented line appears
after :
and starts with an operator. In that case, the indented line
is part of the block, since it cannot continue the group that contains
the block. For example, the following two groups are the same, each
with a block that has a + 3
group:
hello: + 3
hello:
+ 3
A |
is implicitly shifted half a column right (so, implicitly
nested), and it is implicitly followed by a :
that conceptually
occupies same column as the |
. That is, like :
, a |
always
creates a nested block. Furthermore, |
starts an enclosing block
that includes the |
block plus subsequent |
blocks that are at the
same indentation. A |
that starts the enclosing block can appear at
the start of a line with new indentation. The following four groups
are the same:
hello
| world
| universe
hello
| world
| universe
hello | world
| universe
hello |
world
|
universe
Each of the four groups has two elements: hello
and a block. The
block has two groups, each of which is a more nested block. The first
nested block has world
in a single group, and the second nested
block as universe
in a single group.
A |
cannot be a in a top-level sequence of groups or start a group
immediately within ()
, []
, or {}
, and it cannot appear just
after :
. Like :
, the content of a block after |
cannot be empty
unless explicit-grouping «
and »
are used.
If a |
appears on the same line as an earlier |
and is not more
nested inside ()
, []
, or {}
, then the |
terminates the earlier
|
's block and continues its enclosing block with a new |
group.
The intent and consequence of this rule is that multiple |
s can be
used on a single line as an alternative to starting each |
on its
own line, making the following groups the same as the above groups:
hello | world | universe
hello
| world | universe
The implicit shifting of |
by half a column is consistent with its
visual representation, and it avoids the possibility of a group
sequence that contains a mixture of |
-started groups and other kinds
of groups. Standard indentation uses no additional space of
indentation before |
relative to its enclosing block's group.
A ;
separates two groups on the same line. A ;
is allowed in any
context—except between groups immediately within, ()
, []
, or {}
,
where a ,
separates groups. The following three blocks
are the same:
hello:
world
universe
hello:
world; universe
hello: world; universe
The ;
and ,
separators interact differently with blocks formed by
:
and |
. A ,
closes subgroups and blocks as necessary to reach
an enclosing ()
, []
, or {}
, while a ;
separate groups within a
nested group sequence. If ;
would create an empty group, it is
ignored.
For example, the following two groups are the same, and they have one parenthesized term that has a single block, and the block has two groups:
(hello: world; universe)
(hello: world
universe)
The following two groups are also the same, where the group has one parenthesized term, but that term contains two groups, where the first group is a block that contains a single group:
(hello: world, universe)
(hello: world,
universe)
A block can be delimited explicitly with «
and »
to disable the
use of line and column information for parsing between «
and »
. A
«
can be used immediately after :
or immediately after |
, in
which case a »
indicates the end of the block that starts after the
:
or |
. Within the block, an explicit ;
must be used to separate
groups.
A sequence of groups, either at the top level or within a block, can
be written without line and column sensitivity as ;
followed
immediately by «
, in which case a »
indicates the end of the
sequence, and groups within the sequence are separated by ;
. When
parsing, the groups within the sequence are spliced into the enclosing
context. The combination of ;
and «
is intended for entering line-
and column-insensitive mode for a single group or for representing a
sequence of groups that is not within a block.
Whitespace and block comments are allowed between a :
, |
, or ;
and its «
, but in a line-sensitive context, the «
must be on the
same line as its :
, |
, or ;
.
The following five groups are the same:
hello:
if x
| world
planet
| universe
hello: if x | world; planet | universe
hello:«
if x
|« world;
planet »
|« universe »»
hello:« if x |« world; planet » |« universe »»
;«hello
:
«
if
x
|
«
world
;
planet
»
|
«
universe
»
»
»
Using «
and »
can “armor” a shrubbery for transport from one
context to another where its line breaks or indentation might get
mangled. For example, an editor might offer an operation to armor a
range of text in perparation for moving or copying the text, and then
it can be properly indentend in its destination before unmarmoring.
Along similar lines, when writing code as data to be read back later,
it's easy for a printer to insert explicit «
and »
.
In rare cases, a programmer might write «
and »
directly. Although
many shrubbery forms can be written with :
, |
, and ;
on a single
line, as illustrated above, not all forms can be collapsed to a single
line without extra delimiters. For example, these six groups are all
different:
outside:
inside: fruit
rind
// not the same, because `rind` is within `inside:`
outside: inside: fruit; rind
if true
| if false
| x
| y
| z
// not the same, because there's one block with five `|` alternatives
if | true | if false | x | y | z
hello:
if x
| world
| universe
the end
// not the same, because `the end` is in the second `|`:
hello: if x | world | universe; the end
Using «
and »
can help in those cases:
outside:
inside: fruit
rind
outside: inside:« fruit »; rind
if true
| if false
| x
| y
| z
if | true |« if false | x | y » | z
hello:
if x
| world
| universe
the end
hello: if x | world |« universe »; the end
Even so, delimiting blocks with «
and »
is expected to be rare in
practice, both because programmers are likely to break things across
lines and because a language that uses shrubbery notation is likely to
allow ()
in places where grouping might be needed. For example,
assuming that if
is an expression form and ()
can wrap an
expression, a nested conditional is probably better written like this:
if | true | (if false | x | y) | z
Using ()
in this way does not produce an equivalent shrubbery to if | true |« if false | x | y »| z
, but it might represent an equivalent
expression in the language using shrubbery notation.
To stay consistent with blocks expressed through line breaks and
indentation, a block with «
and »
must still appear at the end of
its enclosing group.
// not allowed, because a block must end a group
inside:« fruit » more
As a last resort, \
can be used at the end of a line (optionally
followed by whitespace and coments on the line) to continue the next
line as it if were one line continuing with the next line. The itself
\
does not appear in the parsed form. A that is not at the end of a
line (followed by whitespace and coments) is treated the same as
whitespace.
Lines contianing only whitespace and (non-term) comments do not count
as “the next line” even for \
continuations, so any number of
whitespace and comment lines can appear between \
and the line that
it continues.
this is \
the first group
this \ is \ the \ second \ group
this is a group \
with (a,
nested,
list)
this is a group \
with (a,
\
nested,
\
list)
this is a group \
with (a,
\
/* this a comment on `nested`: */
nested,
\
list)
A #//
comments out a group or |
alternative. To comment out a
group, #//
must appear either on its own line before a group or at
the start of a group. To comment out an alternative, #//
must appear
on its own line before the alternative or just before a |
that does
not start a new line.
The interaction between #//
and indentation depends on how it is
used:
-
When
#//
appears completely on its own line (possibly with whitespace and non-group comments), then its indentation does not matter. It comments out the next group or alternative—which might be a single-line group, block, or|
block. -
When
#//
appears at the start of a group with more tokens afterward on the same line, it determines that group's indentation, and it must obey any constraints on the group's indentation. When#//
appears immediately after an opener but with nothing else afterward on the same line, it determines indentation for the groups immediately within the opener, and it comments out the first group. -
When
#//
appears just before a|
on the same line, then unlike the case for groups, it does not affect the the column of the|
as used to align alternatives on later lines. (That's because the half-column alignment of|
does not fit with the column alignment of#
.) Along those lines and to avoid an indentation mismatch, a#//
is not allowed to start a line for commenting out a|
alternative on the same line.
A #//
is not allowed without a group or alternative afterward to
comment out. Multiple #//
s do not nest (i.e., two #//
s in a row is
always an error).
The following three groups all parse the same:
{
hello:
val x: f(1, 2 + 3)
match x
| 1: 'one'
| 2: 'two'
}
{
hello:
val x:
#//
g(-1)
f(
#//
0,
1,
2 + 3,
#//
4 + 5)
#//
not included in the code
match x
#//
| 0: no
| 1: 'one'
#//
| 1.5: no
| 2: 'two'
#//
| 3: no,
#//
goodbye:
the enclosing group of the block is commented out
}
{
hello:
val x:
#// g(-1)
f(#// 0, 1, 2 + 3, #// 4 + 5)
#// not included in the code
match x #// | 0: no | 1: 'one' #// | 1.5: no
| 2: 'two' #// | 3: no,
#// goodbye:
the enclosing group of the block is commented out
}
Here are more example shrubberies. These shrubberies are not necessarily consistent with each other in the sense of sketching a single language that uses shrubbery notation; they show different potential ways of using the notation.
define pi: 3.14
define
| fib(0): 0
| fib(1): 1
| fib(n): fib(n-1) + fib(n-2)
define fib(n):
match n
| 0: 0
| 1: 1
| n: fib(n-1) + fib(n-2)
define fib(n):
match n | 0: 0
| 1: 1
| n: (fib(n-1)
+ fib(n-2))
define fib(n):
match n
| 0:
0
| 1:
1
| n:
fib(n-1) + fib(n-2)
define make_adder(n):
lambda (m):
printf("adding to ~a\n", m)
define fourth(n: integer):
define m: n*n
define v: m*m
printf("~a^4 = ~a\n", n, v)
v
struct posn(x, y):
property prop_equal_and_hash:
let (hc = lambda (a: posn, hc):
hc(a.x) + hc(a.y),
eql = lambda (a: posn, b: posn, eql):
eql(a.x, b.x) && eql(a.y, b.y)):
values(eql, hc, hc)
define go():
define helper(n):
list(n, n)
define more(m):
if m == 0 | "done"
| more(m - 1)
helper(more(9))
define curried:
lambda (x):
lambda (y):
lambda (z):
list(x, y, z)
let (x = 1,
y = 2):
printf("About to add")
x+y
define show_zip(l, l2):
for (x = in_list(l),
x2 = in_list(l2)):
print(x)
print_string(" ")
print(x2)
newline()
define show_combos(l, l2):
for (x = in_list(l)):
then (x2 = in_list(l2)):
print(x)
print_string(" ")
print(x2)
newline()
The parse of a shrubbery can be represented by an S-expression:
-
Each group is represented as a list that starts
'group
, and the rest of the list are the elements of the group. -
Atom elements are represented as “themselves” within a group, including identifers a symbols, except that an operator is represented as a 2-list that is
'op
followed by the operator name as a symbol. -
A group sequence is represented as a list of
'group
lists. -
An element created by
()
is represented by'parens
consed onto a group-sequence list. -
An element created by
[]
is represented by'brackets
consed onto a group-sequence list. -
An element created by
{}
is represented by'braces
consed onto a group-sequence list. -
A block is represented as either
'block
or'alts
consed onto a group-sequence list. The representation uses'alts
if the content of the block is a squence of groups started with|
, and it's'block
otherwise. -
A block created to follow
|
appears immediately in an'alts
list.
Note that a block can only appear immediately in a 'group
or 'alts
list. Note also that there is no possibility of confusion between
symbol atoms in the input and 'group
, 'block
, etc., at the start
of a list in an S-expression representation, because symbol atoms will
always appear as non-initial items in a 'group
list.
Here are some example shrubberies with their S-expression parsed representations:
define pi: 3.14
(group define pi (block (group 3.14)))
define fourth(n: integer):
define m: n*n
define v: m*m
printf("~a^4 = ~a\n", n, v)
v
(group define
fourth
(parens (group n (block (group integer))))
(block
(group define m (block (group n (op *) n)))
(group define v (block (group m (op *) m)))
(group printf
(parens (group "\"~a^4 = ~a\\n\"") (group n) (group v)))
(group v)))
if x = y
| same
| different
(group if x (op =) y (alts (block (group same))
(block (group different))))
define fib(n):
match n
| 0: 0
| 1: 1
| n: fib(n-1) + fib(n-2)
(group define
fib
(parens (group n))
(block
(group match
n
(alts
(block (group 0 (block (group 0))))
(block (group 1 (block (group 1))))
(block
(group n
(block
(group fib
(parens (group n (op -) 1))
(op +)
fib
(parens (group n (op -) 2))))))))))))
The tokens used for grouping and indentation are distinct lexemes:
( ) [ ] { } ; , : | « » \
Other lexemes are described by the grammar in the table below, where
an asterisk in the left column indicates the productions that
correspond to lexemes. Only simple forms of numbers are supported
directly (decimal integers, decimal floating point, and hexadecimal
integers, in all cases allowing _
s between digits), but a #{
...}
escape provides access to the full Racket S-expression number grammar.
Boolean literals are Racket-style, instead of reserving identifiers.
Special floating-point values similarly use a #
notation.
Identifiers are formed from Unicode alphanumeric characters plus _
,
where the initial character must not be a numeric character. An
identifier prefixed with ~
forms a keyword, analogous to prefixing an
identifier with #:
in Racket.
Operators are formed from Unicode symbolic and punctuation characters
other than the ones listed above as distinct lexemes (plus a few more,
like "
and '
), but |
or :
is also
allowed in an operator name as long as it is not by itself. A
multi-character operator cannot end in +
, -
, or .
to avoid
ambiguity in cases like 1+-2
(which is 1
plus -2
, not 1
and
2
combined with a +-
operator), unless the operator contains
only +
, -
, or .
(so ++
, --
, and ...
are allowed).
Also, multi-character operator cannot end with /
or contain //
or
/*
, because that can create ambiguities with comments.
Implicit in the grammar is the usual convention of choosing the
largest possible match at the start of a stream. Not reflected in the
grammar is a set of delimiter requirements: numbers, #true
, and
#false
must be followed by a delimiter. For example, 1x
is a
lexical error, because the x
after 1
is not a delimiter.
Non-alphanumeric characters other than _
and .
are delimiters.
Finally, the treatment of +
and -
as a number prefix versus an
operator is subject to a special rule: they are parsed as operators
when immediately preceded by an alphanumeric character, _
, )
, ]
, or }
with no whitespace in between. For
example, 1+2
is 1
plus 2
, but 1 +2
is 1
followed by the
number +2
.
When a #{
...}
escape describes an identifier S-expression, it is
an identifier in the same sense as a shrubbery-notation identifier.
the same holds for numbers, booleans, strings, byte strings, and
keywords. A #{
...}
escape must not describe a pair, because
pairs are used to represent a parsed shrubbery, and allowing pairs
would create ambiguous or ill-formed representations.
A @
starts an at-expression form similar to the notaton supported by
#lang at-exp
(which oriented toward S-expressions and
readtable-based). The next subsection explains in more detail, but the
tabel below sketches the shape of @
forms.
nonterminal | production | adjustment | ||
---|---|---|---|---|
* | identifier | is | alpha alphanum * | |
alpha | is | an alphabetic Unicode character or _ |
||
alphanum | is | alpha | ||
or | a numeric Unicode character | |||
* | keyword | is | ~ identifier |
|
* | operator | is | opchar * tailopchar | not ❘ or : ... |
or | . + |
... or containing // ... |
||
or | + + |
... or containing /* |
||
or | - + |
|||
opchar | is | a symbolic Unicode character not in special | ||
or | a punctuation Unicode character not in special | |||
or | one of : , ❘ |
|||
tailopchar | is | anything in opchar except + , - , . , / |
||
special | is | one of ( , ) , [ , ] , { , } , « , » |
||
or | one of " , ; , , , ~ , # , \ , _ , @ |
|||
* | number | is | integer | |
or | float | |||
or | hexinteger | |||
integer | is | sign ? nonneg | ||
sign | is | one of + or - |
||
nonneg | is | decimal usdecimal + | ||
decimal | is | 0 through 9 |
||
usdecimal | is | decimal | ||
or | _ decimal |
|||
float | is | sign ? nonneg ? . nonneg? exp ? |
||
or | sign ? nonneg exp | |||
or | #inf |
|||
or | #neginf |
|||
or | #nan |
|||
exp | is | e sign ? nonneg |
||
or | E sign ? nonneg |
|||
hexinteger | is | 0x hex ushex * |
||
hex | is | one of 0 through 9 |
||
or | one of a through f |
|||
or | one of A through F |
|||
ushex | is | hex | ||
or | _ hex |
|||
* | boolean | is | #true |
|
or | #false |
|||
* | string | is | " strelem * " |
|
strelem | is | element in Racket string | \U ≤ 6 digits |
|
* | bytestring | is | #" bytestrelem * " |
|
bytestrelem | is | element in Racket byte string | ||
* | sexpression | is | #{ racket } |
|
racket | is | any non-pair Racket S-expression | ||
* | comment | is | // nonnlchar |
|
or | /* anychar */ |
nesting allowed | ||
* | termcomment | is | #// |
|
nonnlchar | any character other than newline | |||
* | atexpression | is | @ command ? arguments ? body ? |
no space between these parts |
command | is | identifier | ||
or | keyword | |||
or | operator | |||
or | number | |||
or | boolean | |||
or | string | |||
or | bytestring | |||
or | racket | |||
or | ( group * ) |
usual comma-separated groups | ||
or | « group » |
one spliceable group, no block | ||
arguments | is | [ group * ] |
usual comma-separated groups | |
body | is | { text } |
possible escapes in text | |
or | atopen text atclose | atcloser matching atopen | ||
atopen | is | ❘ asciisym * { |
||
atclose | is | } asciisym * ❘ |
reverses and flips paren-like |
An @
form of the shape
@«command ...»[arg, ...]{ body }...
is parsed into the same representation as
command ...(arg, ..., [parsed_body, ...], ...)
That is, the command part is left at the front and spliced into its
enclosing group, while the argument and body parts are wrapped with
parentheses to make them like arguments. Each body text is parsed into
a list of string literals and escapes, and multiple body texts can
be provided in multiple {
...}
s.
The command part usually does not have «»
, and it is instead
usually written as an identifier, operator, or parenthesized term. The
argument and body parts, when present, always use []
and {}
,
respectively. Any of the three kinds parts can be omitted, but when
multiple parts are present, they must have no space between them or
the leading @
. When the argument and body parts are both
omitted, the command part is simply spliced into its context.
The conversion to a call-like form, keeping each body in a separate
list, and allowing multiple body arguments are the three main ways
that shrubbery @
notation differs from #lang at-exp
notation. The
other differences are the use of «
...»
instead of |
...|
for
delimiting a command, and the use of @//
instead of @;
for
comments. The details are otherwise meant to be the same, and the rest
of this section is mostly a recap.
A body part is treated as literal text, except where @
is used in a
body to escape. An unescaped }
closes a body, except that an
unescaped {
must be balanced by an unescaped }
, with both treated
as part of the body text. Instead of {
, a body-starting opener can
be |
plus {
with any number of ASCII punctuation and symbol
characters (other than {
) in between; the corresponding closer is
then the same sequence in reverse, except that some characters are
flpped: {
to }
, (
to )
, )
to (
, [
to ]
, ]
to [
,
<
to >
, and >
to <
. With an |
...{
opener, an escape is
formed by using the opener followed by @
, while opener–closer pairs
balance within the body text. When multiple body parts are provided,
each can use a different opener and closer. The parsed form of the
body breaks up the body text into lines and "\n"
as separate string
literals in the parsed list form, with each escape also being its own
element in the list form. Parsed body text also has leading and
trailing whitespace adjusted the same as with #lang at-exp
.
After the @
of an escape in body text, the escape has the same
form as an at-notaton form that starts with @
as a shubbery. That
is, @
forms are essentially the same whether starting in shrubbery
mode or body-text mode.
In body text, there are two additional comment forms that are not
supported in shrubbery mode. A @//{
starts a block comment that ends
with }
, and the comment form is not part of the body text. The @//
comment form must be prefixed with an opener when its enclosing body
is started with an opener that isn't just {
, and the {
after @//
can more generally be an |
...{
opener with the corresponding
closer. Opener–closer pairs must be balanced in the commented block,
the same as in body text. A @//
comment form (prefixed with an
opener as needed to form an escape) that is not followed by {
or an
|
...{
opener comments out the rest of the line, including a
comment-terminating newline.
See parse.rkt. Note that if you run with no arguments,
the that program will read from stdin. Supply one or more files to
read from those files instead of stdin. Supply --recover
to continue
parsing after indentation or closer errors.
See demo.shrb, interp.shrb, and weird.shrb for more examples.
Shrubbery notation may not be a good choice where precise and complete
grouping is needed, both because its grouping is coarse-grained and
the grouping rules generate lots of extra group
layers.
Shrubbery notation does not resolve the question of how infix
expressions parse. There is no precedence at the shrubbery level, for
example, other than the way that a :
has higher precedence (in a
sense) than |
.
The lexeme-level syntax is chosen to be familiar to programmers
generally. The sequence 1+2
is one plus two, not a strangely spelled
identifier. Tokens like (
, ,
, {
and ;
are used in familiar
ways. Shrubbery notation provides enough grouping structure that code
navigation and transformation should be useful and straightforward in an
editor.
Parentheses in shrubbery notation do not disable indentation, unlike some indentation-sensitive notations. That choice supports a language in shrubbery notation where parentheses can be added around any expression — even if the expression is written with indentation (although the expression may need to be shifted right to preserve relative indentation, depending on how parentheses are added).
The inclusion of |
in shrubbery notation reflects the fact that
conditional forms (such a if
, cond
, and match
) are important and
common. A distinct, pleasant, and uniform pattern for conditionals
deserves direct support in the notation.
Requiring a preceding :
or preceding/following |
for
block-creating indentation is mostly a kind of consistency check to
enable better and earlier errors when indentation goes wrong. It also
allows indentation that starts with an operator to continue a group;
it's possible for bad indentation to inadvertently cause an operator
to be treated as continuing a group, but hopefully that will be rare.
Always requiring a preceding :
before an indented |
line would be
consistent, but it adds extras :
s where |
already provides one
consistency check. Allowing an optional :
before |
would work, but
programmers may then choose differently on omitting or including the
:
, leading to subtly divergent conventions.
Explicit block grouping via «
and »
is expected to be rare. The
grouping characters were intentionally chosen from the Latin-1
extension of ASCII to avoid reserving additional ASCII characters.
Making whitespace and comment lines ignored in all contexts means that
they can be freely added without intefering with grouping. The \
continuation operator is somewhat unusual in that it skips blank and
comment lines to continue, as opposed to requiring \
on every
continuing line; that, too, allows extra blank and comment lines to be
added, even amid continuing lines.
The interaction of indentation and \
differs slightly from Python,
which does not count the space for \
itself or any leading
whitespace on a continuing line toward indentation. Counting the
leading whitespace on a continuing line has the advantage that it can
reach an arbitrary amount of identation within a constrained textual
width. Counting the \
itself is consistent with ignoring \
when it
appears within a line, so grouping stays the same whether there's a
newline or the continue line immediately after \
. The whitespace
role of \
also means that spaces can be turned into \
to “harden”
code for transfer via media (such as email) that might mangle
consecutive spaces.
Using ~
for keywords has a precedent in OCaml. Using ~
for
keywords uses up a character that might otherwise be used for
operators, but keywords seem useful enough to be worth this cost. The
notion of keywords as distinct from identifiers has been liberating
for Racket syntax (particularly since keywords can be kept disintinct
from expressions more generally), and we expect similar benefits for
having keywords in shrubbery notation.
The #{....}
escape to S-expressions bridges between shrubbery
notation and Racket identifiers. For example, #{exact-integer?}
is
an identifier with -
and ?
as part of the identifier. Shrubbery
notation could be adapted to support Lisp-style identifiers by
requiring more space around operators, but the rule for continuing a
group between (
and )
or [
and ]
currently depends on
distinguishing operators from non-operators.
For @
, the choice of treating @f[arg]{text}
as f(arg, ["text"])
instead of f(arg, "text")
reflects experience with S-expression @
notation. Although it seems convenient that, say @bold{x}
is treated
as (bold "x")
, the consequence is that a function like bold
might
be implemented at first to take a single argument; later, a use like
@bold{Hello @name}
breaks, because two arguments are provided.
Making explicit the list that's inherent in body parsing should help
reduce such mistakes (or bad design choices) for functions that are
meant to be used with @
notation.
Indentation-sensitive parsing and the use of :
is obviously informed
by Python.
Sampling notation's rules relating indentation, lines, ;
, and :
are originally based on the #lang something
reader, which
also targets an underlying expander that further groups tokens.
Shrubbery notation evolved away from using {}
for blocks, however,
because :
was nearly always preferred in experiements with the
notation. For the very rare case that explicit gropuing is needed for
a block, «
and »
can be used. Freeing {}
from use for blocks,
meanwhile, allows its use for set and map notations.
Shrubbery notation is also based on
Lexprs,
particularly its use of |
. Lexprs uses mandatory :
and |
tokens
as a prefix for indentation, and it absorbs an additional line after
an indented section to allow further chaining of the group. Although
«»
can be used to form multiple subgroups within a shrubbery group,
the notation discourages that style in favor of further nesting (or,
in the case of if
, in favor of |
notation like other
conditionals).
Shrubbery notation is in some sense a follow-up to sapling notation. The primary difference is that shrubbery notation is indentation-sensitive, while sapling notation is indentation-insensitive. Indentation sensitivity and block conventions in shrubbery notation avoid some delimiters and blank lines that are needed in sapling notation.
More generally, shrubbery notation takes inspiration from
S-expressions and alternative S-expression notations. The idea that,
even in an S-expression-like setting, some parsing can be deferred a
later parser has many precedents, including Clojure's choice of where
to put parentheses and notations that use something like $
to escape
to infix mode.
Like other notation designs, this one leaves open exactly the way that the notation would be used to express a new programming language. The examples are meant to be suggestive and have influenced many of the notational choices, though.