Skip to content

Commit

Permalink
SI-6810 Spec reflects literal parsing literally
Browse files Browse the repository at this point in the history
Emphasize that literal parsing accepts Unicode escapes
as if they were escaped. In particular, a newline
represented by its Unicode escape does not terminate
the line in the middle of a literal.
  • Loading branch information
som-snytt committed Jun 29, 2015
1 parent aad7c67 commit ab527ce
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 24 deletions.
49 changes: 27 additions & 22 deletions spec/01-lexical-syntax.md
Expand Up @@ -398,40 +398,46 @@ members of type `Boolean`.
### Character Literals

```ebnf
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
```

A character literal is a single character enclosed in quotes.
The character is either a printable unicode character or is described
by an [escape sequence](#escape-sequences).
The character can be any Unicode character except the single quote
delimiter or `\u000A` (LF) or `\u000D` (CR);
or any Unicode character represented by either a
[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).

> ```scala
> 'a' '\u0041' '\n' '\t'
> ```
Note that `'\u000A'` is _not_ a valid character literal because
Unicode conversion is done before literal parsing and the Unicode
character `\u000A` (line feed) is not a printable
character. One can use instead the escape sequence `'\n'` or
the octal escape `'\12'` ([see here](#escape-sequences)).
Note that although Unicode conversion is done early during parsing,
so that Unicode characters are generally equivalent to their escaped
expansion in the source text, literal parsing accepts arbitrary
Unicode escapes, including the character literal `'\u000A'`,
which can also be written using the escape sequence `'\n'`.

### String Literals

```ebnf
stringLiteral ::= ‘"’ {stringElement} ‘"’
stringElement ::= printableCharNoDoubleQuote | charEscapeSeq
stringElement ::= charNoDoubleQuoteOrNewline | UnicodeEscape | charEscapeSeq
```

A string literal is a sequence of characters in double quotes. The
characters are either printable unicode character or are described by
[escape sequences](#escape-sequences). If the string literal
contains a double quote character, it must be escaped,
i.e. `"\""`. The value of a string literal is an instance of
class `String`.
A string literal is a sequence of characters in double quotes.
The characters can be any Unicode character except the double quote
delimiter or `\u000A` (LF) or `\u000D` (CR);
or any Unicode character represented by either a
[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).

If the string literal contains a double quote character, it must be escaped using
`"\""`.

The value of a string literal is an instance of class `String`.

> ```scala
> "Hello,\nWorld!"
> "This string contains a \" character."
> "Hello, world!\n"
> "\"Hello,\" replied the world."
> ```
#### Multi-Line String Literals
Expand All @@ -443,11 +449,10 @@ multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}

A multi-line string literal is a sequence of characters enclosed in
triple quotes `""" ... """`. The sequence of characters is
arbitrary, except that it may contain three or more consuctive quote characters
only at the very end. Characters
must not necessarily be printable; newlines or other
control characters are also permitted. Unicode escapes work as everywhere else, but none
of the escape sequences [here](#escape-sequences) are interpreted.
arbitrary, except that it may contain three or more consecutive quote characters
only at the very end. In particular, embedded newlines
are permitted. Unicode escapes work as everywhere else, but none
of the [escape sequences](#escape-sequences) are interpreted.

> ```scala
> """the present string
Expand Down
5 changes: 3 additions & 2 deletions spec/13-syntax-summary.md
Expand Up @@ -57,11 +57,12 @@ floatType ::= ‘F’ | ‘f’ | ‘D’ | ‘d’
booleanLiteral ::= ‘true’ | ‘false’
characterLiteral ::= ‘'’ (printableChar | charEscapeSeq) ‘'’
characterLiteral ::= ‘'’ (charNoQuoteOrNewline | UnicodeEscape | charEscapeSeq) ‘'’
stringLiteral ::= ‘"’ {stringElement} ‘"’
| ‘"""’ multiLineChars ‘"""’
stringElement ::= (printableChar except ‘"’)
stringElement ::= charNoDoubleQuoteOrNewline
| UnicodeEscape
| charEscapeSeq
multiLineChars ::= {[‘"’] [‘"’] charNoDoubleQuote} {‘"’}
Expand Down

0 comments on commit ab527ce

Please sign in to comment.