Reimplementation of NFA to DFA, and Thompson NFA to Glushkov NFA conversions #193

katef · 2019-12-26T21:52:31Z

Here I've rewritten these FSM conversions mostly for sake of performance, but also for simplicity. The approach here is focused on trying to do operations in bulk, rather than by doing a sequence of things for each item in turn. This itself gives about a 3x increase over the previous code, but more importantly it sets the scene for future optimisations, in particular for operating on sets of items in parallel.

The Glushkov construction in particular now operates in-situ, rather than by constructing a new FSM.

The diff here is super confusing; github shows this as a modification to the existing determinise.c. Of course that's nonsense, this is a rewrite from scratch (and in fact I did it in separate files before removing the previous implementations). If you're reading this, I suggest looking at determinise.c and glushkov.c just in isolation.

This builds on a lot of previous work; in particular for introducing numeric state indicies (so we can make assumptions about states created in one FSM being equivalent to states in another), and by undertaking epsilon closures in advance.

This is unfortunately relatively expensive, but does have some potential for optimisation.

This is written in terms of the bulk epsilon and symbol closures, in the same style as the corresponding rewrite for NFA to DFA conversion.

…hkov construction. Farewell, code. And thank you to everybody who helped work on it.

…ons.

… place.

…s beforehand.

katef added 14 commits December 22, 2019 20:26

Have a symbol closure include a subsequent epsilon closure.

17b1b20

This is unfortunately relatively expensive, but does have some potential for optimisation.

Expose fsm_state_cmpedges() as a convenience.

5dabf2d

First cut at a new implementation of determinisation.

2306131

A reimplementation of conversion from Thompson to Glushkov NFA.

eadf69f

This is written in terms of the bulk epsilon and symbol closures, in the same style as the corresponding rewrite for NFA to DFA conversion.

Retire the previous implementation of DFA to NFA conversion, and Glus…

81e4e0c

…hkov construction. Farewell, code. And thank you to everybody who helped work on it.

Naming; replace the previous determinise and glushkovise implementati…

3055bc7

…ons.

Hide away single-state epsilon closure; it's only needed for this one…

cc000f7

… place.

Naming; no need for the _bulk suffix here.

59fb85a

NFA to DFA conversion now operates on Glushkov NFA, resolving epsilon…

e7815f8

…s beforehand.

Missing allocation wrappers.

6169c7b

A very complicated way to avoid introducing epsilons.

4b74d6a

Centralise predicates on state sets.

a774ec6

Clarification.

9b2ec66

Cruft; these are no longer used for the determinisation implementation.

25e7144

katef merged commit 9c3321f into master Dec 27, 2019

katef deleted the kate/more-determination branch December 27, 2019 01:02

silentbicycle mentioned this pull request Jul 16, 2020

Incomplete minimization via Brzozowski's #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplementation of NFA to DFA, and Thompson NFA to Glushkov NFA conversions #193

Reimplementation of NFA to DFA, and Thompson NFA to Glushkov NFA conversions #193

katef commented Dec 26, 2019

Reimplementation of NFA to DFA, and Thompson NFA to Glushkov NFA conversions #193

Reimplementation of NFA to DFA, and Thompson NFA to Glushkov NFA conversions #193

Conversation

katef commented Dec 26, 2019