Skip to content

Language Specification (1) : Lexical Entities

Benjamin Kowarsch edited this page Jun 16, 2023 · 15 revisions

Lexical Entities

Reserved Words

Classical Tokens

ALIAS, AND, ARGLIST, ARRAY, BEGIN, CASE, CONST, COPY, DIV, DO, ELSE, ELSIF, END, EXIT, FOR, IF, IMPLEMENTATION, IMPORT, IN, INTERFACE, LOOP, MOD, MODULE, NEW, NOP, NOT, OCTETSEQ, OF, OPAQUE, OR, POINTER, PROCEDURE, READ, RECORD, RELEASE, REPEAT, RETAIN, RETURN, SET, THEN, TO, TYPE, UNQUALIFIED, UNTIL, VAR, WHILE, WRITE;

Schroedinger's Tokens

Tokens that may represent a reserved word or an identifier. The ambiguity is resolved during syntax analysis.

ADDRESS, CAPACITY, CAST, NIL;

See also Schrödinger's token

Special Symbols

Delimiters

( ) [ ] { } ' "

Punctuation

. , : ; = # + * @ | .. := ++ -- .*

Operators

= # > >= < <= + - & \ * / :: ^ .

Comment Delimiters

! (* *)

Pragma Delimiters

<* *>

Identifiers

Identifiers denote predefined or user defined names for syntactic entities. A standard identifier starts with a letter and may be followed by letters and digits.

StdIdent := Letter ( Letter | Digit )* ;

Numeric Literals

Numeric literals denote numeric values, either real number values, whole number values or character code values.

Real Numbers

Real number values are always given in decimal notation, start with an integral part, followed by an optional fractional part, followed by an optional exponent.

RealNumber := integralPart fractionalPart? exponentialPart? ;

integralPart := '0' | ( '1' .. '9' ) ( DigitSeparator? DigitSequence )? ;

fractionalPart := '.' DigitSequence ;

exponent := 'e' ( '+' | '-' )? DigitSequence ;

DigitSequence := DecimalNumber ;

Whole Numbers

Whole number values may be given in decimal, radix-2 or radix-16 notation. C-style prefixes are used to indicate the radix of non-decimal literals. Prefix 0b indicates radix-2 and prefix 0x indicates radix-16. Digits may be grouped using ' as a digit separator. A digit separator must always be preceded and followed by a digit.

WholeNumber := DecimalNumber | Base2Number | Base16Number ;

DecimalNumber := Digit+ ( DigitSeparator Digit+ )* ;

Base2Number := '0b' Base2Digit+ ( DigitSeparator Base2Digit+ )* ;

Base16Number := '0x' Base16Digit+ ( DigitSeparator Base16Digit+ )* ;

Digit := '0' .. '9' ;

Base2Digit := '0' | '1' ;

Base16Digit := Digit | 'A' .. 'F' ;

alias DigitSeparator = "'" ;

Character Codes

Character code values are always given in radix-16 notation with prefix 0u.

CharacterCode := '0u' Base16Digit+ ( DigitSeparator Base16Digit+ )* ;

Quoted Literals

Quoted literals denote text. They are delimited by single quotes ' or double quotes ".

QuotedLiteral := SingleQuotedLiteral | DoubleQuotedLiteral ;

A single quoted literal may contain double quotes but not single quotes. A double quoted literal may contain single quotes but not double quotes. A quoted literal may contain whitespace but no tabulator, no newline, nor any other control codes. As a result, it may not span multiple lines.

SingleQuotedLiteral := "'" ( AnyPrintableExceptSingleQuote | EscSeq )* "'" ;

DoubleQuotedLiteral := '"' ( AnyPrintableExceptDoubleQuote | EscSeq )* '"' ;

Backslash escape sequences may be used to denote newline and tabulator control codes within a quoted literal. Escape sequence \n denotes newline and \t denotes tabulator. A verbatim backslash must be escaped as \\. No other escape sequences shall be supported.

EscSeq := '\' ( 'n' | 't' | '\' ) ;

Non-Semantic Entities

Whitespace

Whitespace ASCII(0x20) terminates a symbol, except within quoted literals, pragmas and comments.

Tabulator

Tabulator ASCII(0x09) terminates a symbol, except within pragmas and comments.

Newline

Newline terminates a symbol, except within pragmas and block comments. A newline increments the line counter and resets the column counter used in compile-time messages.

Comments

Comments are non-semantic symbols ignored by the language processor. They may occur anywhere before or after semantic symbols and are used for documentation and annotation. There are line comments and block comments.

Line Comments

Line comments start with a ! and are terminated by newline.

LineComment := '!' ( AnyPrintable | Tabulator )* Newline ;

Block Comments

Block comments are delimited by block comment delimiters (* and *). Block comments may span multiple lines and may be nested up to a maximum of nine levels, not counting the outermost comment.

BlockComment := '(*' ( BlockComment | AnyPrintable | Tabulator | Newline )* '*)' ;

Pragmas

Pragmas are directives to the language processor to control or influence the compilation process. They are delimited by delimiters <* and *>, may span multiple lines but may not be nested.

Pragma := '<*' PragmaBody '*>' ;

Symbols Reserved for Other Uses

Symbols %, %%, <#, #>, /* and */ are reserved for the Modula-2 template engine; _ and $ for foreign identifiers; ? for source code transliterators and preprocessors; <<, >>, ` and ~ for future use.

Clone this wiki locally