Skip to content
/ regex Public

A regex parser using FSM(Finite State Machine)

License

Notifications You must be signed in to change notification settings

elinx/regex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

A regex parser using FSM(Finite State Machine)

Project Structure

Scanner.py: Convert regular expression into tokens stream

Parser.py: Parse tokens stream, generate a abstract syntax tree for specific regex

NFA.py: A non-deterministic automaton implementation including three basic operations:

  • |(or operator): alternative operation
  • &(and operator): concatenating operation
  • ~(invert operator): repeating operation

NFABuilder.py: A builder wrapper for nfa

tests: A bunch of unittest files

Example

if __name__ == '__main__':
    scanner = Scanner('(a|b)*abb')
    scanner.lex()

    parser = Parser(scanner.tokens)
    parser.parse()
    print(parser.ast)

    nfa = NFABuilder.ast_to_nfa(parser.ast)
    print(nfa)

output:

  • abstract syntax tree:
Cat
	Cat
		Cat
			Star
				|
					a
					b
			a
		b
	b
  • nfa:
size: 11 start: 0, final: 10, transactions:
 from 0 to 1 when input is EPS
from 0 to 7 when input is EPS
from 1 to 2 when input is EPS
from 1 to 4 when input is EPS
from 2 to 3 when input is a
from 3 to 6 when input is EPS
from 4 to 5 when input is b
from 5 to 6 when input is EPS
from 6 to 1 when input is EPS
from 6 to 7 when input is EPS
from 7 to 8 when input is a
from 8 to 9 when input is b
from 9 to 10 when input is b

About

A regex parser using FSM(Finite State Machine)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages