Thanks for taking a look at Lexie.
Note I use this code on a regular basis. The "cli/cli" main program with the dgango2.lex input specification supports about 20 websites at this point. It is run like this:
$ ./cli -l ../in/django3.lex -i ./site/www_pschlump_com/index.html -o ./www/www_pschlump_com/index.html >,a
Lexie is designed to generate fast lexical analyzer based on transforming a set of regular expressions into a nondeterministic finite state machine (NFA) and then taking that NFA and transforming it into a deterministic finite state (DFA) machine. Multiple DFAs can be generated and can be switched between and pushed and popped to allow context sensitive scanning of input.
Lexie has the ability to change the specification and then regenerate the NFA and DFA at run time.
An Example. You want to specify the scanner for a template language
that starts template with {{
and ends them with }}
. Inside the
template you want to recognize strings. You can embed {{
and }}
inside your strings. The is the context dependent part of lexie.
Also you need to be able to chagne the {{
and }}
to some other
tokens. For example you may want yor tool to work with AngularJS
and it already uses {{
and }} as delimeters. So you wat at runtime to be able to say,
{{changequote "{{" "-=[" "}}" "]=-"}}and change from
{{to
-=[and chagne the closing template marker from
}}to
]=-`.
Lexie can scan languages that involve nested items. For example
you can specify a C-like comment and make it nest and contain
other C-like comments. This make /* commented out /* comment */ nests */
a legitimate input. This is easy to do and an example of this
is in the ./examles directory.
Lexie also has a concept of reserved words to that a pattern match can
pick out word tokens and then lookup specific values of that word as
a reserved word. Example: The pattern [a-zA-Z_][a-zA-Z_0-9]*
matches all identifieers in a language. After the match a check
can be made to see if the identifer is one of the reserved words
or
, and
and return a different token for these.
Lexie is used inside Ringo, the template/macro processor that implements a superset of Django Templates in go. Ringo is based on pongo2 and I am thankful for the wonderful work on pongo2 that lead to building this tool. No code in Lexie is taken from pongo2.
Lexie is also used for a fast user-agent identification library, ua-quick (not open soruce yet, but I am working on it). This has improved the speed of parsing and identifying user agens by a factor of 10,000.
The set of regular expressions that lexie understands is limited but growing. It is adequate to specify simple languages. This is not a Perl-regular expression matcher - nor is it Posix. That said... It is fast and usable.
- Works with UTF8 / Unicode.
- Runs multiple sets of pattern matchers in a context dependent fasion.
- Changeable on the fly at runtime.
- Clear error reporting.
- Embedable - can be used inside another proram.
- Stand alone - can be run as a fast pattern maching tool in a stand alone configuraiton.
- Fast. Fast to perform pattern matches. Fast to modify an existing matcher.
- Clear error messages if a modification of a pattern matcher breaks the matcher.
- Context Senstive Matching with multiple machines and push/pop of machines and states.
- Ability to push-back onto input and re-scan input if necessary.
- Extensible matchers that allow for non-text pattern matching (Think greenhouse control and real time systems control).
- Runs as a goroutine - this improves performance.
- Has a cute name and a cute mascot.
- State machines can be cached in Redis so that they do not need to be regenrated every time.
- Can directly generate state machines in Go code for fast static state macines.
"The name Lexie is an American baby name. In American the meaning of the name Lexie is: defender of mankind. ... 1. People with this name tend to initiate events, to be leaders rather than followers, with powerful personalities." From: www.sheknows.com/baby-names/name/lexie
This is not ment to be a comprehensive list. These are the tools that I use.
Lex is now a quite old tool. I still refer to its documentation it when I am workin in C or C++. It can be used with other languages. The newer replacement flex is a much better choice. Lex apears to me to have a number of serious defects (atleast 20+ years ago it had defects - then I switched to flex).
The open source version of Lex with lots of fixes. Lots of languages are supported. It generates static tables and supports multiple input states. Unicode support is bascially missing. Dynamic re-configuration is not really a choice. It works best in C and C++ with Bison for a parser generator.
- Be able to output the DFA into a file for re-reading quicey - so as to not need to re-build it every time.
- Output should be in "JSON" or ".go" code.