Skip to content
[WIP] An understandable scanner generator for extended regular expressions
C Other
  1. C 99.9%
  2. Other 0.1%
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
backtracking.c
class.c
compiled.c
construct.c
construct.h
dfa.c
frama-test.c
int.c
int2.c
lib-int.c
lib.c
lib.py
lib2.c
lib3.c
libn.c
make.h
parse.c
parse.h
parse.hs
reg.h
regdx-flat.c
regdx.c
regdx2.c
regdx3.c
regdx4.c
regdx5.c
regdx6.c
util.c

README.md

regdx

My (WIP) attempt at an understandable scanner generator. The final result should be about 500 lines of boring and mostly 'obvious' C, such that once the basic concept is understood, re-typing a full implementation from memory should be doable for most programmers.

The Method

This program-in-progress is entirely bared around Brzozowski's derivative, an operation on regular expressions which 'removes' a single character from the language they represent. By computing every derivitive of every regex and caching the results, the cache becomes a representation of the DFA for the regex.

Features and Progress

  • ✔️ AND and NOT operations on regular expressions
  • ✔️ 'marks' in an expression which can call native code
  • ✔️ Arbitrary lookahead
  • Automatically identify erroneous or overlapping patterns
  • Full unicode support
You can’t perform that action at this time.