My (WIP) attempt at an understandable scanner generator. The final result should be about 500 lines of boring and mostly 'obvious' C, such that once the basic concept is understood, re-typing a full implementation from memory should be doable for most programmers.
This program-in-progress is entirely bared around Brzozowski's derivative, an operation on regular expressions which 'removes' a single character from the language they represent. By computing every derivitive of every regex and caching the results, the cache becomes a representation of the DFA for the regex.
Features and Progress
✔️AND and NOT operations on regular expressions ✔️'marks' in an expression which can call native code ✔️Arbitrary lookahead ❌Automatically identify erroneous or overlapping patterns ❌Full unicode support