Skip to content

Commit

Permalink
ocamllex: better support for union of character sets (#11166)
Browse files Browse the repository at this point in the history
  • Loading branch information
nojb committed Sep 27, 2022
1 parent 477d6bb commit edd5432
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 2 deletions.
4 changes: 4 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ Working version
This allows to see optimized bytecode with -dlambda.
(Jacques Garrigue, review by Gabriel Scherer)

- #11166: ocamllex: the union of two character sets "cset1 | cset2" can now be
used in any context where a character set is expected.
(Nicolás Ojeda Bär, Martin Jambon, review by Sébastien Hinderer)

### Manual and documentation:

- #9430, #11291: Document the general desugaring rules for binding operators.
Expand Down
3 changes: 2 additions & 1 deletion lex/parser.mly
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ let rec remove_as = function
| Alternative (e1, e2) -> Alternative (remove_as e1, remove_as e2)
| Repetition e -> Repetition (remove_as e)

let as_cset = function
let rec as_cset = function
| Characters s -> s
| Alternative (e1, e2) -> Cset.union (as_cset e1) (as_cset e2)
| _ -> raise Cset.Bad

%}
Expand Down
5 changes: 4 additions & 1 deletion manual/src/cmds/lexyacc.etex
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,10 @@ strings that match @regexp@.
(option) Match the empty string, or a string matching @regexp@.

\item[@regexp_1 '|' regexp_2@]
(alternative) Match any string that matches @regexp_1@ or @regexp_2@
(alternative) Match any string that matches @regexp_1@ or @regexp_2@.
If both @regexp_1@ and @regexp_2@ are character sets, this constructions
produces another character set, obtained by taking the union of @regexp_1@ and
@regexp_2@.

\item[@regexp_1 regexp_2@]
(concatenation) Match the concatenation of two strings, the first
Expand Down
22 changes: 22 additions & 0 deletions testsuite/tests/tool-lexyacc/csets.mll
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
(* TEST
ocamllex_flags = " -q "
*)

let digit = ['0'-'9']
let alpha = ['a'-'z']
let alpha' = (digit | alpha) # digit

rule read = parse
| alpha'+ as lxm { Some lxm }
| digit+ as lxm { Some lxm }
| eof { None }

{
let () =
let rec aux lexbuf =
match read lexbuf with
| Some x -> x :: aux lexbuf
| None -> []
in
List.iter print_endline (aux (Lexing.from_string "abc0345ghz"))
}
3 changes: 3 additions & 0 deletions testsuite/tests/tool-lexyacc/csets.reference
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
abc
0345
ghz

0 comments on commit edd5432

Please sign in to comment.