Skip to content
Newer
Older
100644 409 lines (346 sloc) 10.3 KB
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
1 REBOL [
2 Title: "Red Lexical Scanner"
3 Author: "Nenad Rakocevic"
4 File: %lexer.r
5 Rights: "Copyright (C) 2011 Nenad Rakocevic. All rights reserved."
6 License: "BSD-3 - https://github.com/dockimbel/Red/blob/master/BSD-3-License.txt"
7 ]
8
9 lexer: context [
10 verbose: 0
11
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
12 line: none ;-- source code lines counter
f68212c @dockimbel FEAT: Added newline markers to the loaded block. (Multiple newlines n…
dockimbel authored
13 lines: [] ;-- offsets of newlines marker in current block
14 count?: yes ;-- if TRUE, lines counter is enabled
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
15 pos: none ;-- source input position (error reporting)
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
16 s: none ;-- mark start position of new value
17 e: none ;-- mark end position of new value
18 value: none ;-- new value
19 fail?: none ;-- used for failing some parsing rules
20 type: none ;-- define the type of the new value
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
21
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
22 ;====== Parsing rules ======
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
23
24 digit: charset "0123465798"
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
25 hexa: union digit charset "ABCDEF"
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
26
27 ;-- UTF-8 encoding rules from: http://tools.ietf.org/html/rfc3629#section-4
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
28 UTF-8-BOM: #{EFBBBF}
b7a9267 @dockimbel FEAT: added word datatypes rules to lexer.
dockimbel authored
29 ws-ASCII: charset " ^-^M" ;-- ASCII common whitespaces
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
30 ws-U+2k: charset [#"^(80)" - #"^(8A)"] ;-- Unicode spaces in the U+2000-U+200A range
31
b7a9267 @dockimbel FEAT: added word datatypes rules to lexer.
dockimbel authored
32 UTF8-tail: charset [#"^(80)" - #"^(BF)"]
33
34 UTF8-1: charset [#"^(00)" - #"^(7F)"]
35
36 UTF8-2: reduce [
37 charset [#"^(C2)" - #"^(DF)"]
38 UTF8-tail
39 ]
40
41 UTF8-3: reduce [
42 #{E0} charset [#"^(A0)" - #"^(BF)"] UTF8-tail
43 '| charset [#"^(E1)" - #"^(EC)"] 2 UTF8-tail
44 '| #{ED} charset [#"^(80)" - #"^(9F)"] UTF8-tail
45 '| charset [#"^(EE)" - #"^(EF)"] 2 UTF8-tail
46 ]
47
48 UTF8-4: reduce [
49 #{F0} charset [#"^(90)" - #"^(BF)"] 2 UTF8-tail
50 '| charset [#"^(F1)" - #"^(F3)"] 3 UTF8-tail
51 '| #{F4} charset [#"^(80)" - #"^(8F)"] 2 UTF8-tail
52 ]
53
54 UTF8-char: [pos: UTF8-1 | UTF8-2 | UTF8-3 | UTF8-4]
55
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
56 not-word-char: charset {/\^^,'[](){}"#%$@:;}
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
57 not-word-1st: union not-word-char digit
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
58 not-file-char: charset {[](){}"%@:;}
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
59 not-str-char: #"^""
60 not-mstr-char: #"}"
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
61 caret-char: charset [#"@" - #"_"]
62 printable-char: charset [#"^(20)" - #"^(7E)"]
63 char-char: exclude printable-char charset {"^^}
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
64 integer-end: charset {^{"])}
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
65 stop: none
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
66
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
67 UTF8-filtered-char: [
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
68 [pos: stop :pos (fail?: [end skip]) | UTF8-char e: (fail?: none)]
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
69 fail?
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
70 ]
71
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
72 ;-- Whitespaces list from: http://en.wikipedia.org/wiki/Whitespace_character
73 ws: [
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
74 pos: #"^/" (
f68212c @dockimbel FEAT: Added newline markers to the loaded block. (Multiple newlines n…
dockimbel authored
75 if count? [
76 line: line + 1
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
77 append/only lines stack/tail?
f68212c @dockimbel FEAT: Added newline markers to the loaded block. (Multiple newlines n…
dockimbel authored
78 ]
79 )
80 | ws-ASCII ;-- only the common whitespaces are matched
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
81 | #{C2} [
82 #{85} ;-- U+0085 (Newline)
83 | #{A0} ;-- U+00A0 (No-break space)
84 ]
85 | #{E1} [
86 #{9A80} ;-- U+1680 (Ogham space mark)
87 | #{A08E} ;-- U+180E (Mongolian vowel separator)
88 ]
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
89 | #{E2} [
90 #{80} [
91 ws-U+2k ;-- U+2000-U+200A range
92 | #{A8} ;-- U+2028 (Line separator)
93 | #{A9} ;-- U+2029 (Paragraph separator)
94 | #{AF} ;-- U+202F (Narrow no-break space)
95 ]
96 | #{819F} ;-- U+205F (Medium mathematical space)
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
97 ]
98 | #{E38080} ;-- U+3000 (Ideographic space)
99 ]
100
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
101 newline-char: [
102 #"^/"
103 | #{C285} ;-- U+0085 (Newline)
104 | #{E280} [
105 #{A8} ;-- U+2028 (Line separator)
106 | #{A9} ;-- U+2029 (Paragraph separator)
107 ]
108 ]
b7a9267 @dockimbel FEAT: added word datatypes rules to lexer.
dockimbel authored
109
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
110 counted-newline: [pos: #"^/" (line: line + 1)]
111
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
112 ws-no-count: [(count?: no) ws (count?: yes)]
113
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
114 any-ws: [pos: any ws]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
115
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
116 symbol-rule: [
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
117 (stop: [not-word-char | ws-no-count])
118 some UTF8-filtered-char e:
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
119 ]
120
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
121 begin-symbol-rule: [
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
122 (stop: [not-word-1st | ws-no-count]) ;-- 1st char is restricted
123 UTF8-filtered-char
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
124 opt symbol-rule
125 ]
126
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
127 path-rule: [some [slash [begin-symbol-rule | paren-rule]] e:]
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
128
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
129 word-rule: [
130 (type: word!) s: begin-symbol-rule
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
131 opt [path-rule (type: path!)]
132 opt [#":" (type: either type = word! [set-word!][set-path!])]
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
133 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
134
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
135 get-word-rule: [#":" (type: get-word!) s: begin-symbol-rule]
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
136
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
137 lit-word-rule: [
138 #"'" (type: lit-word!) s: begin-symbol-rule
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
139 opt [path-rule (type: lit-path!)]
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
140 ]
141
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
142 issue-rule: [#"#" (type: issue!) s: symbol-rule]
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
143
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
144 refinement-rule: [slash (type: refinement!) s: symbol-rule]
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
145
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
146 slash-rule: [s: [slash opt slash] e:]
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
147
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
148 integer-rule: [
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
149 (type: integer!)
150 opt #"-" digit any [digit | #"'" digit] e:
7d2ae0d @dockimbel FIX: issue #182
dockimbel authored
151 pos: [ ;-- protection rule from typo with sticky words
152 [integer-end | ws-no-count] (fail?: none)
153 | skip (fail?: [end skip])
154 ] :pos
155 fail?
156 ]
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
157
23ecd67 @dockimbel FIX: minor parsing rule renaming.
dockimbel authored
158 block-rule: [#"[" (stack/push block!) any-value #"]" (value: stack/pop block!)]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
159
23ecd67 @dockimbel FIX: minor parsing rule renaming.
dockimbel authored
160 paren-rule: [#"(" (stack/push paren!) any-value #")" (value: stack/pop paren!)]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
161
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
162 escaped-char: [
163 "^^(" [
164 s: [6 hexa | 4 hexa | 2 hexa] e: ( ;-- Unicode values allowed up to 10FFFFh
2335231 @dockimbel FEAT: added conversion of encoded codepoints in char and string.
dockimbel authored
165 value: encode-UTF8-char s e
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
166 )
167 | [
168 "null" (value: #"^(00)")
169 | "back" (value: #"^(08)")
170 | "tab" (value: #"^(09)")
171 | "line" (value: #"^(0A)")
172 | "page" (value: #"^(0C)")
173 | "esc" (value: #"^(1B)")
174 | "del" (value: #"^(7F)")
175 ]
176 ] #")"
177 | #"^^" [
178 s: caret-char (value: to char! s/1 - #"@")
179 | [
180 #"/" (value: #"^/")
181 | #"-" (value: #"^-")
182 | #"?" (value: #"^(del)")
183 ]
184 ]
185 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
186
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
187 char-rule: [
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
188 {#"} (type: char! fail?: none) [
189 s: char-char (value: to char! s/1) ;-- allowed UTF-1 chars
190 | newline-char (fail?: [end skip]) ;-- fail rule
191 | copy value [UTF8-2 | UTF8-3 | UTF8-4] ;-- allowed Unicode chars
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
192 | escaped-char
193 ] fail? {"}
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
194 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
195
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
196 line-string: [
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
197 {"} s: (type: string! stop: [not-str-char | newline-char])
198 any UTF8-filtered-char
199 e: {"}
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
200 ]
201
202 multiline-string: [
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
203 #"{" s: (type: string! stop: not-mstr-char)
204 any [counted-newline | "^^}" | UTF8-filtered-char]
205 e: #"}"
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
206 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
207
9cebc38 @dockimbel FEAT: added path, lit-path and set-path rules
dockimbel authored
208 string-rule: [line-string | multiline-string]
209
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
210 binary-rule: [
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
211 "#{" (type: binary!)
212 s: any [counted-newline | 2 hexa | ws-no-count | comment-rule]
213 e: #"}"
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
214 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
215
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
216 file-rule: [
5bfa2c7 @dockimbel FEAT: parsing rules simplifications.
dockimbel authored
217 #"%" (type: file! stop: [not-file-char | ws-no-count])
218 s: some UTF8-filtered-char e:
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
219 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
220
7beb342 @dockimbel FEAT: added escaped values rule.
dockimbel authored
221 escaped-rule: [
222 "#[" any-ws [
223 "none" (value: none)
224 | "true" (value: true)
225 | "false" (value: false)
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
226 | s: [
7beb342 @dockimbel FEAT: added escaped values rule.
dockimbel authored
227 "none!" | "logic!" | "block!" | "integer!" | "word!"
228 | "set-word!" | "get-word!" | "lit-word!" | "refinement!"
229 | "binary!" | "string!" | "char!" | "bitset!" | "path!"
230 | "set-path!" | "lit-path!" | "native!" | "action!"
231 | "issue!" | "paren!" | "function!"
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
232 ] e: (value: get to word! copy/part s e)
7beb342 @dockimbel FEAT: added escaped values rule.
dockimbel authored
233 ] any-ws #"]"
234 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
235
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
236 comment-rule: [#";" to #"^/"]
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
237
238 multiline-comment-rule: [
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
239 "comment" any-ws #"{" (stop: not-mstr-char) any [
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
240 counted-newline | "^^}" | UTF8-filtered-char
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
241 ] #"}"
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
242 ]
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
243
244 wrong-delimiters: [
245 pos: [
246 #"]" (value: #"[") | #")" (value: #"(")
247 | #"[" (value: #"]") | #"(" (value: #")")
248 ] :pos
249 (throw-error/with ["missing matching" value])
250 ]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
251
23ecd67 @dockimbel FIX: minor parsing rule renaming.
dockimbel authored
252 literal-value: [
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
253 pos: (e: none) s: [
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
254 comment-rule
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
255 | multiline-comment-rule
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
256 | integer-rule (stack/push load-integer copy/part s e)
257 | word-rule (stack/push to type copy/part s e)
258 | lit-word-rule (stack/push to type copy/part s e)
259 | get-word-rule (stack/push to get-word! copy/part s e)
260 | refinement-rule (stack/push to refinement! copy/part s e)
261 | slash-rule (stack/push to word! copy/part s e)
262 | issue-rule (stack/push to issue! copy/part s e)
263 | file-rule (stack/push to file! copy/part s e)
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
264 | char-rule (stack/push value)
265 | block-rule (stack/push value)
266 | paren-rule (stack/push value)
267 | escaped-rule (stack/push value)
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
268 | string-rule (stack/push load-string s e)
269 | binary-rule (stack/push load-binary s e)
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
270 ]
271 ]
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
272
23ecd67 @dockimbel FIX: minor parsing rule renaming.
dockimbel authored
273 any-value: [pos: any [literal-value | ws]]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
274
b7a9267 @dockimbel FEAT: added word datatypes rules to lexer.
dockimbel authored
275 header: [any-ws pos: "Red" any-ws block-rule]
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
276
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
277 program: [
278 pos: opt UTF-8-BOM
279 header
23ecd67 @dockimbel FIX: minor parsing rule renaming.
dockimbel authored
280 any-value
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
281 opt wrong-delimiters
282 ]
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
283
284 ;====== Helper functions ======
285
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
286 stack: context [
287 stk: []
288
289 push: func [value][
290 either any [value = block! value = paren!][
291 insert/only tail stk value: make value 1
292 value
293 ][
294 insert/only tail last stk :value
295 ]
296 ]
297
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
298 pop: func [type [datatype!]][
299 if type <> type? last stk [
300 throw-error/with ["invalid" mold type "closing delimiter"]
301 ]
302 also last stk remove back tail stk
303 ]
304
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
305 tail?: does [tail last stk]
306 reset: does [clear stk]
307 ]
308
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
309 throw-error: func [/with msg [string! block!]][
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
310 print rejoin [
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
311 "*** Syntax Error: " either with [
312 uppercase/part reform msg 1
313 ][
314 reform ["Invalid" mold type "value"]
315 ]
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
316 "^/*** line: " line
317 "^/*** at: " mold copy/part pos 40
318 ]
319 halt
320 ]
321
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
322 add-line-markers: func [blk [block!]][
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
323 foreach pos lines [new-line pos yes]
324 clear lines
325 ]
326
2335231 @dockimbel FEAT: added conversion of encoded codepoints in char and string.
dockimbel authored
327 encode-UTF8-char: func [s [string!] e [string!] /local c code new][
328 c: trim/head debase/base copy/part s e 16
329 code: to integer! c
330
331 case [
332 code <= 127 [new: to char! last c] ;-- c <= 7Fh
333 code <= 2047 [ ;-- c <= 07FFh
334 new: copy #{0000}
335 new/1: #"^(C0)"
336 or (shift/left to integer! (either code <= 255 [0][c/1]) and 7 2)
337 or shift/logical to integer! last c 6
338 new/2: #"^(80)" or (63 and last c)
339 ]
340 code <= 65535 [ ;-- c <= FFFFh
341 new: copy #{E00000}
342 new/1: #"^(E0)" or shift/logical to integer! c/1 4
343 new/2: #"^(80)"
344 or (shift/left to integer! c/1 and 15 2)
345 or shift/logical to integer! c/2 6
346 new/3: #"^(80)" or (c/2 and 63)
347 ]
348 code <= 1114111 [ ;-- c <= 10FFFFh
349 new: copy #{F0000000}
350 new/2: #"^(80)"
351 or (shift/left to integer! c/1 and 3 4)
352 or (shift/logical to integer! c/2 4)
353 new/3: #"^(80)"
354 or (shift/left to integer! c/2 and 15 2)
355 or shift/logical to integer! c/3 6
356 new/4: #"^(80)" or (c/3 and 63)
357 ]
358 'else [
359 throw-error/with "Codepoints above U+10FFFF are not supported"
360 ]
361 ]
362 new
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
363 ]
364
365 load-integer: func [s [string!]][
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
366 unless attempt [s: to integer! s][throw-error]
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
367 s
368 ]
369
2335231 @dockimbel FEAT: added conversion of encoded codepoints in char and string.
dockimbel authored
370 load-string: func [s [string!] e [string!] /local new][
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
371 new: make string! offset? s e ;-- allocated size close to final size
372
373 parse/all/case s [
374 some [
375 escaped-char (insert tail new value)
376 | s: UTF8-filtered-char e: ( ;-- already set to right filter
377 insert/part tail new s e
378 )
379 ] ;-- exit on matching " or }
380 ]
381 new
382 ]
383
b780a23 @dockimbel FEAT: added issue, binary and comment rules.
dockimbel authored
384 load-binary: func [s [string!] e [string!] /local new byte][
385 new: make binary! (offset? s e) / 2 ;-- allocated size above final size
386
387 parse/all/case s [
388 some [
389 copy byte 2 hexa (insert tail new debase/base byte 16)
390 | ws | comment-rule
391 | #"}" end skip
392 ]
393 ]
394 new
395 ]
396
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
397 run: func [src [string! binary!] /local blk][
5142c58 @dockimbel FEAT: char parsing added, string parsing completed
dockimbel authored
398 line: 1
399 count?: yes
400
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
401 blk: stack/push block! ;-- root block
e5e9dc1 @dockimbel FEAT: added file, integer and string rules to lexer.r
dockimbel authored
402
886e5b7 @dockimbel FEAT: more accurate syntax error reportings.
dockimbel authored
403 unless parse/all/case src program [throw-error]
f68212c @dockimbel FEAT: Added newline markers to the loaded block. (Multiple newlines n…
dockimbel authored
404
35cf492 @dockimbel FEAT: factorized stack management
dockimbel authored
405 add-line-markers blk
406 stack/reset
407 blk
d5abc09 @dockimbel FEAT: first revision of Red lexical scanner.
dockimbel authored
408 ]
409 ]
Something went wrong with that request. Please try again.