-
Notifications
You must be signed in to change notification settings - Fork 11
/
unicode
105 lines (95 loc) · 3.91 KB
/
unicode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
Minigrace supports Unicode programs and data. This document describes
the intended behaviour of the compiler and runtime relating to this
feature.
The JavaScript runtime does not have access to the Unicode Character
Database, and so does not support any of these functionalities.
Grace source code is written in Unicode. Minigrace recognises only the
UTF-8 encoding, which is compatible with US-ASCII. Source text should
not contain byte-order marks.
String data is stored in a Unicode format. The ord method on strings
returns the first codepoint in the string as a number. The length of the
string is the number of codepoints. Combining characters are regarded as
separate codepoints for this purpose and for iteration and indexing.
Minigrace does not normalise any input and does not currently provide
any means for doing so.
String literals may contain escape sequences referring to Unicode
characters. The following escape sequences are supported:
\n LINE FEED (U+000a)
\t CHARACTER TABULATION (U+0009)
\r CARRIAGE RETURN (U+000d)
\l LINE SEPARATOR (U+2028)
\b BACKSPACE (U+0008)
\f FORM FEED (U+000c)
\e ESCAPE (U+001b)
\\ REVERSE SOLIDUS (U+005c)
\" QUOTATION MARK (U+0022)
\{ LEFT CURLY BRACKET (U+007b)
\uXXXX BMP character with hexadecimal codepoint U+XXXX (lower case)
\UXXXXXX Character with hexadecimal codepoint U+XXXXXX (lower case)
Literal instances of any character except those with one-character
escapes are also permitted.
Ordinary identifiers must begin with members of one of the following
Unicode categories:
LC Letter, Cased
Ll Letter, Lowercase
Lm Letter, Modifier
Lo Letter, Other
Lt Letter, Titlecase
Lu Letter, Uppercase
Ordinary identifiers may also contain members of the following Unicode
categories:
Nd Number, Decimal Digit
Nl Number, Letter
No Number, Other
Identifiers may also contain these characters:
' APOSTROPHE (U+0027)
_ LOW LINE (U+005f)
Method names may additionally:
- Contain the sequence ":=" at the end of any otherwise valid
single-word name.
- Be "[]" or "[]:=".
Operators consist of one or more characters drawn from the following:
- HYPHEN-MINUS (U+002d)
& AMPERSAND (U+0036)
| VERTICAL LINE (U+007c)
: COLON (U+003a)
% PERCENT SIGN (U+0025)
^ CIRCUMFLEX ACCENT (U+005e)
* ASTERISK (U+002a)
/ SOLIDUS (U+002f)
+ PLUS SIGN (U+002b)
! EXCLAMATION MARK (U+0021)
Any member of the Unicode category Sm Symbol, Mathematical
The sequence ".." is also an operator.
Ordinary numeric literals may contain only the ASCII digits 0-9 and
U+002e FULL STOP. No other Unicode digits or numeric values are
permitted or interpreted. Literals in non-standard bases may be written
in the form:
BxNNNNN...
where B is the base in decimal and N is drawn from the first B
characters of "0123456789abcdefghijklmnopqrstuvwxyz", all the ASCII
digits in order followed by all the lower-case ASCII letters in order.
B may range from 0 to 36 but not be 1. The special base of 0 is
equivalent to 16.
Programs may not contain any control or separator characters other than:
U+000a LINE FEED
U+000d CARRIAGE RETURN
U+0020 SPACE
U+2028 LINE SEPARATOR
The unicode module contains methods for dealing with Unicode data:
category(char : String) -> String
bidirectional(char : String) -> String
combining(char : String) -> Number
mirrored(char : String) -> Boolean
name(char : String) -> String
iscategory(char : String, category : String) -> Boolean
isSeparator(char : String) -> Boolean
isControl(char : String) -> Boolean
isLetter(char : String) -> Boolean
isNumber(char : String) -> Boolean
isSymbolMathematical(char : String) -> Boolean
create(codepoint : Number) -> String
lookup(name : String) -> String
All methods return the corresponding property from the Unicode Character
Database, or in the case of create and lookup return a string of size 1
consisting of the character with the given code-point or name.