Camllexer is a enhanced lexer for OCaml like source.
The lexer has been extracted from the Camlp4 (> 3.10) lexer, which in turns was reimplemented as a derivative of the lexer from the compiler.
This lexer has the following particularities:
- Correct and complete: as far as testing gone (~800_000 distinct lines over ~3_000_000 lines of OCaml like files).
- Supports most OCaml dialects:
- By re-using the lexer of Camlp4 this lexer works on any extension of the OCaml language made with Camlp4. In particular it has a support for quotations and anti-quotations.
- Works fine on lexers and parsers (ocamllex, ocamlyacc), except when using the C style of comments.
- Lossless: every single bit of the input file is kept. Blanks, comments, newlines, lexical conventions for writing literals, all of it is kept in the returned token stream. Undesired information can easily be thrown out of the stream.
- Keyword independent: while there is a token for keywords, it is not generated by this lexer. This is up to you to apply a keyword table to turn some LIDENTs and some SYMBOLs into KEYWORDs.
- Fault tolerant: errors takes part of the token stream, allowing to write fault tolerant translations.