This repository has been archived by the owner on Jun 6, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6
/
FSA.txt
215 lines (181 loc) · 7.38 KB
/
FSA.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
FSA
---
This module defines an FSA class, for representing and operating on
finite-state automata (FSAs). FSAs can be used to represent regular
expressions and to test sequences for membership in the languages
described by regular expressions.
FSAs can be deterministic or nondeterministic, and they can contain
epsilon transitions. Methods to determinize an automaton (also
eliminating its epsilon transitions), and to minimize an automaton,
are provided.
The transition labels for an FSA can be symbols from an alphabet, as
in the standard formal definition of an FSA, but they can also be
instances which represent predicates. If these instances implement
instance.matches(), then the FSA nextState() function and accepts()
predicate can be used. If they implement instance.complement() and
instance.intersection(), the FSA can be be determinized and minimized,
to find a minimal deterministic FSA that accepts an equivalent
language.
Quick Start
-----------
Instances of FSA can be created out of labels (for instance, strings)
by the singleton() function, and combined to create more complex FSAs
through the complement(), closure(), concatenation(), union(), and
other constructors. For example, concatenation(singleton('a'),
union(singleton('b'), closure(singleton('c')))) creates an FSA that
accepts the strings 'a', 'ab', 'ac', 'acc', 'accc', and so on.
Instances of FSA can also be created with the compileRE() function,
which compiles a simple regular expression (using only '*', '?', '+',
'|', '(', and ')' as metacharacters) into an FSA. For example,
compileRE('a(b|c*)') returns an FSA equivalent to the example in the
previous paragraph.
FSAs can be determinized, to create equivalent FSAs (FSAs accepting
the same language) with unique successor states for each input, and
minimized, to create an equivalent deterministic FSA with the smallest
number of states. FSAs can also be complemented, intersected, unioned,
and so forth as described under 'FSA Functions' below.
FSA Methods
-----------
The class FSA defines the following methods.
Acceptance
``````````
fsa.nextStates(state, input)
returns a list of states
fsa.nextState(state, input)
returns None or a single state if
nextStates <= 1, otherwise it raises an exception
fsa.nextStateSet(states, input)
returns a list of states
fsa.accepts(sequence)
returns true or false
Accessors and predicates
````````````````````````
isEmpty()
returns true iff the language accepted by the FSA is the empty language
labels()
returns a list of labels that are used in any transition
nextAvailableState()
returns an integer n such that no states in the FSA
are numeric values >= n
Reductions
``````````
sorted(initial=0)
returns an equivalent FSA whose states are numbered
upwards from 0
determinized()
returns an equivalent deterministic FSA
minimized()
returns an equivalent minimal FSA
trimmed()
returns an equivalent FSA that contains no unreachable or dead
states
Presentation
````````````
toDotString()
returns a string suitable as \*.dot file for the 'dot'
program from AT&T GraphViz
view()
views the FSA with a gs viewer, if gs and dot are installed
FSA Functions
-------------
Construction from FSAs
``````````````````````
complement(a)
returns an fsa that accepts exactly those sequences that its
argument does not
closure(a)
returns an fsa that accepts sequences composed of zero or more
concatenations of sequences accepted by the argument
concatenation(a, b)
returns an fsa that accepts sequences composed of a
sequence accepted by a, followed by a sequence accepted by b
containment(a, occurrences=1)
returns an fsa that accepts sequences that
contain at least occurrences occurrences of a subsequence recognized by the
argument.
difference(a, b)
returns an fsa that accepts those sequences accepted by a
but not b
intersection(a, b)
returns an fsa that accepts sequences accepted by both a
and b
iteration(a, min=1, max=None)
returns an fsa that accepts sequences
consisting of from min to max (or any number, if max is None) of sequences
accepted by its first argument
option(a)
equivalent to union(a, EMPTY_STRING_FSA)
reverse(a)
returns an fsa that accepts strings whose reversal is accepted by
the argument
union(a, b)
returns an fsa that accepts sequences accepted by both a and b
Predicates
``````````
equivalent(a, b)
returns true iff a and b accept the same language
Reductions (these equivalent to the similarly-named methods)
````````````````````````````````````````````````````````````
determinize(fsa)
returns an equivalent deterministic FSA
minimize(fsa)
returns an equivalent minimal FSA
sort(fsa, initial=0)
returns an equivalent FSA whose states are numbered from
initial
trim(fsa)
returns an equivalent FSA that contains no dead or unreachable
states
Construction from labels
````````````````````````
compileRE(string)
returns an FSA that accepts the language described by
string, where string is a list of symbols and '*', '+', '?', and '|'
operators, with '(' and ')' to control precedence.
sequence(sequence)
returns an fsa that accepts sequences that are matched by
the elements of the argument. For example, sequence('abc') returns an fsa that
accepts 'abc' and ['a', 'b', 'c'].
singleton(label)
returns an fsa that accepts singletons whose elements are
matched by label. For example, singleton('a') returns an fsa that accepts only
the string 'a'.
FSA Constants
-------------
EMPTY_STRING_FSA is an FSA that accepts the language consisting only
of the empty string.
NULL_FSA is an FSA that accepts the null language.
UNIVERSAL_FSA is an FSA that accepts S*, where S is any object.
FSA instance creation
---------------------
FSA is initialized with a list of states, an alphabet, a list of
transition, an initial state, and a list of final states. If fsa is an
FSA, fsa.tuple() returns these values in that order, i.e. (states,
alphabet, transitions, initialState, finalStates). They're also
available as fields of fsa with those names.
Each element of transition is a tuple of a start state, an end state,
and a label: (startState, endSTate, label).
If the list of states is None, it's computed from initialState,
finalStates, and the states in transitions.
If alphabet is None, an open alphabet is used: labels are assumed to
be objects that implements label.matches(input), label.complement(),
and label.intersection() as follows:
- label.matches(input) returns true iff label matches input
- label.complement() returnseither a label or a list of labels which,
together with the receiver, partition the input alphabet
- label.intersection(other) returns either None (if label and other don't
both match any symbol), or a label that matches the set of symbols that
both label and other match
As a special case, strings can be used as labels. If a strings 'a' and
'b' are used as a label and there's no alphabet, '~a' and '~b' are
their respective complements, and '~a&~b' is the intersection of '~a'
and '~b'. (The intersections of 'a' and 'b', 'a' and '~b', and '~a'
and 'b' are, respectively, None, 'a', and 'b'.)
Goals
-----
Design Goals:
- easy to use
- easy to read (simple implementation, direct expression of algorithms)
- extensible
Non-Goals:
- efficiency