In [None]:
%%HTML
<style>
.container { width: 100% }
</style>

# Converting a Deterministic <span style="font-variant:small-caps;">Fsm</span> into a Regular Expression

In [None]:
def arb(S):
    for x in S:
        return x

The function `dfa_2_regexp` takes a deterministic <span style="font-variant:small-caps;">Fsm</span> $A$ and computes a regular expression $r$ that describes the same language as $A$, i.e. we have
$$ L(A) = L(r). $$
It tries to simplify the regular expression $r$ using some algebraic rules.

In [None]:
def dfa_2_regexp(A):
    States, Sigma, delta, q0, Accepting = A
    r = regexp_sum({ rpq(q0, p, Sigma, delta, States) for p in Accepting })
    return r

The function `regexp_sum` takes a set $S = \{ r_1, \cdots, r_n \}$ of regular expressions
as its argument.  It returns the regular expression 
$$ r_1 + \cdots + r_n. $$

In [None]:
def regexp_sum(S):
    n = len(S)
    if n == 0:
        return 0
    elif n == 1:
        return arb(S)
    else:
        r  = arb(S)
        S1 = S - { r }
        return ('+', r, regexp_sum(S1))

The function `rpq` assumes there is some <span style="font-variant:small-caps;">Fsm</span>
$$ A = \langle \texttt{States}, \texttt{Sigma}, \texttt{delta}, \texttt{q0}, \texttt{Accepting} \rangle $$
given and takes five arguments:
- `p1` and `p2` are states of the <span style="font-variant:small-caps;">Fsm</span> $A$,
- `Sigma` is the alphabet of the <span style="font-variant:small-caps;">Fsm</span>,
- `delta` is the transition function of the <span style="font-variant:small-caps;">Fsm</span>, and
- `Allowed` is a subset of the set `States`.

The function `rpq` computes a regular expression that describes those strings that take the 
<span style="font-variant:small-caps;">Fsm</span> $A$ from the state `p1` to state `p2`.
When going `p1` to `p2` only states in the set `Allowed` my be visited.

In [None]:
def rpq(p1, p2, Sigma, delta, Allowed):
    if Allowed == set():
        all_chars = { c for c in Sigma 
                        if delta.get((p1, c)) == p2 
                    }
        r = regexp_sum(all_chars)
        if p1 == p2:
            if all_chars == set():
                return ''
            else:
                return ('+', '', r)
        else:
             return r
    else:
        qk = arb(Allowed)
        NewAllowed = Allowed - { qk }
        rkp1p2 = rpq(p1, p2, Sigma, delta, NewAllowed)
        rkp1qk = rpq(p1, qk, Sigma, delta, NewAllowed)
        rkqkqk = rpq(qk, qk, Sigma, delta, NewAllowed)
        rkqkp2 = rpq(qk, p2, Sigma, delta, NewAllowed)
        return ('+', rkp1p2, ('&', ('&', rkp1qk, ('*', rkqkqk)), rkqkp2))