Skip to content
hhas edited this page Feb 3, 2023 · 6 revisions

Number

( '+' | '-' )? digit+ ( '.' digit+ )?

3
+3
-3
3.14

TO DO: exponent notation, e.g. 3.14e-10

TO‘ DO: lexer for matching thousands notation, e.g. 12,345,678

String

( '"' | '“' | '”' ) char* ( ( '""' | '““' | '””' ) char* )* ( '"' | '“' | '”' )

Strings are delimited by double quotes (straight or typographer’s):

"Hello"
“World”

Strings can contain any character, including linebreaks. To include a double quote, type it twice (i.e. double-quote characters escape themselves):

“Bob says ““Hello”” to you.”

Name

A name consists of one or more alphanumeric characters (the first character must not be a digit), OR one or more symbol characters:

i
document
and
+
>=
≤

Names are case-insensitive.

Multiple words in a name should always be separated by underscores:

application_file
this_is_a_name

(Always use snake_case, not camelCase, to prevent ambiguous names and to facilitate fuzzy autocomplete, including underscore insertion, and text-to-voice conversion. The pretty printer can also de-emphasize underscores in names so that the displayed text reads more naturally.)

Alphanumeric names are assumed to be ordinary command names (i.e. unreserved names). If a library’s custom operator syntax is loaded, any names reserved by those operators will be matched according to the operators’ custom syntax rules.

Symbolic names (e.g. +) are assumed to be custom operator names (i.e. reserved names). If no custom operator syntax is found for a symbolic name, it is assumed to be a syntax error.

To treat a name as an ordinary command name, regardless of whether or not it is reserved, enclose it in single quotes (straight or typographer’s):

‘document’
‘and’
‘+’
‘>=’

Symbol

'#' name

Symbols (a.k.a. “hashtags”) consist of a hash character followed by a valid name:

#document
#some_tag

Reserved alphanumeric names are treated as normal names when preceded by a hash:

#and

Symbolic names must be single-quoted when preceded by a hash:

#‘+’

Ordered list

'[' ']' | '[' expr ( sep expr )* ']'

Ordered lists are delimited by square brackets and contain an ordered sequence of 0+ values (“items”):

[]
[ 1, 2, 3 ]

Multiple items must be separated by commas and/or linebreaks:

[
  1
  2
  3
]

BUG: exprs within lists are not eagerly evaled, e.g. [ 1+1 ][ ‘+’ {1, 1} ] but should yield [ 2 ] unless explicitly coerced to expression

Key-value list

'[' : ']' | '[' key ':' expr ( sep key ':' expr )* ']'

Key-value lists are delimited by square brackets and contain an unordered sequence of 0+ colon-delimited key: value pairs (“items”):

[:]
[ “name”: “Bob”, “age”: 42 ]

Multiple items must be separated by commas and/or linebreaks:

[
  “name”: “Bob”
  “age”: 42
]

Keys are arbitrary numbers, strings, and/or symbols.

BUG: the parser currently fails on Symbol keys.

Record

'{' '}' | '{' label ':' expr ( sep label ':' expr )* '}'

Records are delimited by curly braces and contain an ordered sequence of 0+ colon-delimited label: value pairs (“properties”):

{}
{ name: “Bob”, age: 42 }

Multiple items must be separated by commas and/or linebreaks:

{
  name: “Bob”
  age: 42
}

Labels are names. Reserved alphanumeric names are treated as normal names. Symbolic names must be single-quoted.

Labels are optional:

{ “Bob”, 42 }

A record can contain both labeled and unlabeled properties. Unlabeled properties will be matched by position when coercing the record to a specific record type:

✎ { “Bob”, 42 } as record { name: text, age: integer }
☺︎ { name: “Bob”, age: 42 }

BUG: record coercions match property labels but do not coerce property values (the record constructor incorrectly ignores the given types and treats all properties as type anything).

Labeled properties can be accessed by name:

✎ name of { name: “Bob”, age: 42 }
☺︎ “Bob”

TO DO: access property name and/or value by index (need to decide semantics, since LH operand is a command [name]; thus key {at: 1} of {…} may be ambiguous if record contains a property named key); main reason for this is to enable native introspection of records

Group

'(' ')' | '(' expr ( sep expr )* ')'

Parentheses provide grouping of single expressions (e.g. to override operator precedence) or sequences of 0+ expressions (to denote blocks).

(1 + 2) * 3

( say { “Hello” }, say { “World” } )

Command

name record?

Commands are values. A command consists of a name optionally followed by an argument record of 0+ properties (“arguments”):

hello
hello {}
uppercase { “Bob” }
uppercase { text: “Bob” }
‘if’ { test: expr , then: action }

An iris script is composed almost entirely of nested and/or sequential commands. (Exceptions to this rule are other value literals such as numbers and strings, code annotations, and core punctuation.) This includes library-defined operators, which apply custom syntax and precedence rules on top of library-defined commands; e.g.:

(1 + 2) * 3

is equivalent to:

‘*’ { ‘+’ {1, 2}, 3 }    

Sequences of commands can be grouped in parentheses (blocks), e.g.:

( do_this, do_that )

An argument record’s { } braces and comma separators may be omitted for brevity (low-punctuation command syntax), e.g.:

‘if’ { test, then: action }

can be abbreviated to:

‘if’ test then: action

Caveat: when nesting low-punctuation commands, any labeled arguments are assumed to belong to the outermost command. If the inner command has any labeled arguments it must either be parenthesized or use explicit record syntax to prevent ambiguity, e.g.:

a_command { b_command { b_label: value }, a_label: value }

can be abbreviated to one of the following:

a_command b_command { b_label: value } a_label: value

a_command ( b_command b_label: value ) a_label: value

but not:

a_command b_command b_label: value a_label: value

as this will treat the b_label: value argument as belonging to a_command.

Operator syntax

Some commonly used commands define custom operators syntax as an alternative to standard command syntax. For example, the standard library defines a custom + operator over the + command so that:

‘+’ { LEFT_EXPR, RIGHT_EXPR }

can be written using standard arithmetic notation:

 LEFT_EXPR + RIGHT_EXPR

Similarly, it defines a custom while operator over the while command so that:

‘while’ { TEST, repeat: ACTION }

can be written as:

 while TEST repeat ACTION

Advantage: TEST can be written as a low-punctuation command. Disadvantage: while and repeat are reserved words, which cannot be be used elsewhere unless single-quoted.

Caution: library-defined operator syntax reserves symbolic and/or alphanumeric names for use in that syntax. e.g. The standard library reserves to, if, then, else, repeat, while, tell, and other alphanumeric names. If a reserved name is used outside of its operator syntax, a syntax or other error will occur.

e.g. If operator syntax is not loaded, this is a valid command:

while { TEST, repeat: ACTION }

If operator syntax is loaded, the following are valid:

‘while’ { TEST, repeat: ACTION }

 while TEST repeat ACTION

but this will produce a syntax error as the complete while TEST repeat ACTION operator was not matched:

while { TEST, repeat: ACTION }

User caution: Take care when operator syntax is close to low-punctuation command syntax, e.g.:

 while TEST repeat: ACTION

is a valid low-punctuation command only if the while operator is not loaded. If the while operator is loaded, the colon will cause a syntax error.

Developer caution: Avoid overuse of operator syntax, especially when reserving alphanumeric names which can conflict with names of commands used in user scripts. Reserved names can significantly affect how user scripts are parsed.

When a library-defined operator syntax is imported, it [currently] applies to the entire script. Consider if a custom operator syntax is warranted, minimize use of alphanumeric verbs and nouns, and favor command syntax where practical.

TO DO: operator import behavior needs to be stable and predictable within user scripts. For convenience, standard library commands and operators are imported by default (though one or both may be explicitly excluded). Importing a third-party library will not import its operators by default; these must be explicitly requested by the script. Library imports may/should/must be versioned to avoid ambiguity: a script may declare the versions of the libraries against which it was originally written, and it is up to the library importer to determine if a newer/older installed library is API compatible.

Core punctuation

In addition to the punctuation characters described above, the following characters are reserved by iris:

. ? ! ;

Period, question, and exclamation marks can be used interchangeably with comma (,) to separate/terminate expressions. Currently there is no difference in behavior, but in future custom interpreter behaviors may be assigned to each, e.g. ? might invoke a debugger dialog upon evaluating the preceding command; ! might force a “destructive” command to be performed without displaying “Are you sure?” confirmation.

Semi-colons are used to “pipe” the output value of one command as the first argument to the next, e.g.:

say { read “Enter name:” }

may be sequentially written as:

read “Enter name:”; say

TO DO: This behavior may change in future to allow other arguments to be substituted using _.

Annotations

Annotations are delimited by « and » characters and may contain user documentation, code comments, TODOs, etc. The parser currently discards all annotations; this will change in future.

Clone this wiki locally