Tiny React+UIkit website written in typescript that uses parsimmon to parse a regular expression, refa to build a finite automaton out of the expression and viz.js to display said automaton.
The site is hosted on GitHub Pages, and built from source using a GitHub Workflow.
This project was created as part of a master's course in computer science at UIBK.
Entered regular expressions can be used to test a string and support the following syntax (in order of precedence):
<regex> ::= <union> | <intersection> | <concatenation> | <repetition> | <negation> | <group> | <text>
<union> ::= <regex> "|" <regex>
<intersection> ::= <regex> "&" <regex>
<concatenation> ::= <regex> <regex>
<repetition> ::= <regex> ("*" | "+" | "?" | "{" (NUMBER | [NUMBER] "," [NUMBER]) "}")
<negation> ::= "!" <regex>
<group> ::= "(" <regex> ")"
<text> ::= (<range> | <class> | <char> | ".")+
<range> ::= "[" ["^"] <char> "-" <char> "]"
<class> ::= "\" ("d" | "D" | "w" | "W" | "s" | "S")
<char> ::= NON_META | "\" META
NON_META
and META
characters depend on the context:
- If used in a
<range>
, the followingMETA
characters need to be escaped:\
,^
,-
and]
- If used anywhere else, the following
META
characters need to be escaped:\
,[
,(
,)
,{
,}
,[
,]
,|
,&
,*
,+
,?
,!
and.
All other characters are consider NON_META
.
There are some small but important differences in the way regular expressions are validated:
- The regular expression
a|b|
is not considered valid by this parser, instead use(a|b)?
. Same holds fora||b
ora~b~
. - Regular expressions like
[ab^c]
orab[c
, where^
and[
are treated like ordinary characters, are considered invalid. Instead use the escaped form, i.e.[ab\^c]
orab\[c
.
The following characters can always be entered in escaped form:
- horizontal tab:
\t
- carriage return:
\r
- linefeed:
\n
- vertical tab:
\v
- form-feed:
\f
- backspace:
\b
- NUL character:
\0
- arbitrary Unicode character:
\uXXXX
, whereXXXX
is a char code from0000
toFFFF
For a description on the available character classes, have a look at the MDN documentation.