New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automaton for word reduction in rewriting systems #14735

Merged
merged 29 commits into from Jul 8, 2018

Conversation

Projects
None yet
6 participants
@RavicharanN
Copy link
Contributor

RavicharanN commented May 23, 2018

Implement an automaton for word reduction.

References to other Issues or PRs

Brief description of what is fixed or changed

A custom class for a StateMachine and the State of the finite state machine have been added in a new file rewriting_fsm.
Two new methods, construct_automaton and reduce_using_automaton have been added to the RewritingSystem class.

Other comments

Currently, the word is reduced using automaton as:

>>> from sympy.combinatorics.fp_groups import FpGroup
>>> from sympy.combinatorics.free_groups import free_group
>>> F, a, b = free_group("a, b")
>>> G = FpGroup(F, [a*b*a**-1*b**-1])
>>> a, b = G.generators
>>> R = G._rewriting_system
>>> R.reduce_using_automaton(b**2*a*b**3)
a*b**5
'''

generators = [x for x in self.alphabet]

This comment has been minimized.

@jksuom

jksuom May 23, 2018

Member

list(self.alphabet) would probably be more efficient.

This comment has been minimized.

@RavicharanN

RavicharanN May 23, 2018

Contributor

Okay, I've used this in other places too. Will make the changes

generators = [x for x in self.alphabet]
for gen in generators:
if not gen**-1 in generators:
generators.append(gen**-1)

This comment has been minimized.

@jksuom

jksuom May 23, 2018

Member

Could this loop be replaced by generators += [gen**-1 for gen in generators]?

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented May 23, 2018

You should also add more tests, for more than just the one group.

@RavicharanN

This comment has been minimized.

Copy link
Contributor

RavicharanN commented May 23, 2018

Sure, I'll add more tests and make the changes.

generators.append(gen**-1)

# Contains the alphabets that will be used for state transitions.
self.automaton_alphabets = generators

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

A minor point: should this be automaton_alphabet in singular rather than in plural? It seems to be just one alphabet and if it is, then alphabets is a bit confusing.

# Store the complete left hand side of the rules - dead states.
for r in self.rules:
if not r in self.left_hand_rules:
self.left_hand_rules.append(r)

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

self.left_hand_rules = list(self.rules) should do the trick


def reduce_using_automaton(self, word):
'''
The method for word reduction using automaton is mentioned in the section 13.1.3 of the Handook.

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

Handbook (typo)

This is repeated until the word reached the end and the automaton stays in the accept state.
'''
fsm = self.construct_automaton()

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

You should probably have an attribute self.reduction_automaton so that you can store the automaton once constructed, then you won't have to reconstruct it every time you reduce a word and you won't need self.automaton_alphabets, self.left_hand_rules or self.proper_prefixes. Or was there some particular reason you did it this way?

This comment has been minimized.

@RavicharanN

RavicharanN May 24, 2018

Contributor

Yeah, I was thinking of making fsm an attribute of the class. And instead of constructing when the reduce _word is called, can we just construct/modify it once a new rule is added?

This comment has been minimized.

@valglad

valglad May 24, 2018

Contributor

Modifying it will be faster I expect but immutable objects are safer. Hmm. But yeah, if you do modify it at any point, it would need to be done in functions that add new rules. Are you doing anything about rules being removed? This happens in make_confluent while new rules are being added.

This comment has been minimized.

@RavicharanN

RavicharanN May 24, 2018

Contributor

I think modifying might increase the complexity. 'Coz whenever a new rule is added we have to check for the redundant ones and this might as well involve a few states to be deleted. Also the transitions need to be verified again at every existing state (to check if the there is a transition to the newly added state).

So, instead of this, I think it would be better to dump the previous StateMachine object instance and create a new one (by calling create_automaton) whenever the states are added/removed. And I think efficiency for both would be the same as either way it requires iteration through all the states.

This comment has been minimized.

@valglad

valglad May 24, 2018

Contributor

Fair enough. In that case yes, definitely best to keep it immutable and just construct a new one when needed.

This method is to add a transition from the current state to a new state.
Arguments:
alphabet: The alphabet the current state reads to make the state transition.

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

letter or symbol or alphabet_element would be more suitable names, since a transition depends on a single element of the alphabet, not all of it.

(That said, are you perhaps misunderstanding what "alphabet" means? An alphabet is a collection of symbols, not a member of such a collection).

This comment has been minimized.

@RavicharanN

RavicharanN May 24, 2018

Contributor

Sure, I'll change them to letter


def set_start(self):
self.is_start = True
self.state_type = "start"

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

Is there actually a situation when you might want to change the type of a state?

This comment has been minimized.

@RavicharanN

RavicharanN May 24, 2018

Contributor

Well, currently this isn't required but if someone wishes to use the FSM implementation again I thought this might be useful.

This comment has been minimized.

@valglad

valglad May 24, 2018

Contributor

I feel that until such times, it'd be better to leave it out. And in any case, State is a small object and keeping it nice and immutable isn't going to be time-costly, so if a user does need a state of a different type, they can just construct a new instance.

This comment has been minimized.

@RavicharanN

RavicharanN May 24, 2018

Contributor

Sure

Attribute:
states (list of States): Collection of all registered.
name (str): Name of the state machine.

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

Do we really need the name? You are only using it for printing and I'm not sure it would ever come up.

This comment has been minimized.

@RavicharanN

RavicharanN May 24, 2018

Contributor

Again the same, not currently required because we are just using one FSM here, but I think it would be useful for differentiating between the FSMs if at all more than one is required.

This comment has been minimized.

@valglad

valglad May 24, 2018

Contributor

I suppose that would make sense. Though in that case, maybe you could call the reduction FMS here something more informative, like "word reduction fms" or something, instead of just "fms"?

self.states = [] # Contains all the states in the machine.
self.state_names = []

def add_state(self, state_name, is_start=False, is_dead=False, is_accept=False):

This comment has been minimized.

@valglad

valglad May 23, 2018

Contributor

I think it would it be better to pass a dictionary or some other structure containing states directly to the machine constructor instead of adding states externally via this method. You could keep the method and call it from the constructor though.

transisitons (OrderedDict): Represents all the transitions of the state object.
is_start (boolean): To mark if the state is a start state.
is_dead (boolean): To mark if the state is a dead state.
is_accept (boolean): To mark if te state is an accept state.

This comment has been minimized.

@smichr

smichr May 23, 2018

Member

te->the

The method for word reduction using automaton is mentioned in the section 13.1.3 of the Handook.
All the elements of the automaton are stored in an array and are given as the input to the automaton.
If the automaton reaches a dead state that subword is replaced and the automaton is run from the beginning.
This is repeated until the word reached the end and the automaton stays in the accept state.

This comment has been minimized.

@smichr

smichr May 23, 2018

Member

reached->reaches

@RavicharanN RavicharanN force-pushed the RavicharanN:automata branch 3 times, most recently from 3312008 to 0a00e46 May 26, 2018

@RavicharanN

This comment has been minimized.

Copy link
Contributor

RavicharanN commented May 30, 2018

ping @jksuom @valglad @asmeurer. I've made the changes, added a few tests and its stable for all the Python versions now. So, I think this can be reviewed.

The method to modify the automaton(on the addition of new states) is also done, but there are a few doubts to be clarified(posted on the channel) before I push that code.

@@ -285,3 +296,133 @@ def reduce(self, word, exclude=None):
if new != prev:
again = True
return new

def compute_inverse_rules(self):
'''

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

I think this should be part of construct_automaton, perhaps as a subfunction there - there is no reason for it to be among the RewritingSystem methods

This comment has been minimized.

@RavicharanN

RavicharanN Jun 1, 2018

Contributor

I had to make use of this method in the reduce_using_automaton method as well, so, I defined it as a RewritingSystem method.

'''
for r in self.rules:
rule_key_inverse = r.inverse()

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

r**-1 is shorter. Come to think of it, I'm not sure why there is a separate inverse method at all.

rule_value_inverse = (self.rules[r]).inverse()
if (rule_value_inverse < rule_key_inverse):
self.inverse_rules[rule_key_inverse] = rule_value_inverse
elif (rule_key_inverse < rule_value_inverse):

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

Simply else should be fine, because rule_key_inverse == rule_value_inverse iff rule_key == rule_value but such a rule would never be in the rules dictionary anyway.

This comment has been minimized.

@RavicharanN

RavicharanN May 31, 2018

Contributor

Just put that so that it would be clear. Will update it

# compute all_rules when the automaton is constucted.
rules = self.rules
inverse_rules = self.inverse_rules
all = rules

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

all is a name of a Python built-in function (hence why it's highlighted in blue here). It'd be better to use a different name to avoid confusion. You could always just update rules and use that (so actually, is there any need for having all_rules as a class attribute when you can simply use the local variable rules)?

current_state = next_state
# Break if the whole word is read and no dead state is encountered.
if flag == 1:
break

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

I think this loop would look better if you defined flag = 1 just before it, and changed while True to while flag - then you wouldn't need this break statement.


def add_transition(self, letter, state):
"""
This method is to add a transition from the current state to a new state.

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

A note on docstrings. Have a look at this wiki page. In particular, it suggests that these conventions should be followed for docstring writing. Here's a quote from there:

The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...".

So this docstring should be something like Add a transition from the current state to a new state. And the same applies to all other docstrings here (except classes I suppose; class docstring tend to be nouns with descriptions).

This comment has been minimized.

@valglad

valglad Jun 5, 2018

Contributor

You still need to change the docstrings to the right format

This comment has been minimized.

@RavicharanN

RavicharanN Jun 5, 2018

Contributor

Yes, I will push it along with these changes

This instantiates a state object and stores this in the 'states' list.
Arguments:
Same as the __init__ function of the State class.

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

Would be better to expand on this. Docstrings are often intended for users and search the source code for the __init__ of the state class could be tiresome.

class StateMachine(object):
""" The state machine class which manages the states and their transisitons of the automata.
Attribute:

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

typo: Attributes

def __init__(self, name):
self.name = name
self.states = [] # Contains all the states in the machine.
self.state_names = []

This comment has been minimized.

@valglad

valglad May 31, 2018

Contributor

Could you have self.states being a dictionary of the form state_name: state?

This comment has been minimized.

@RavicharanN

RavicharanN May 31, 2018

Contributor

Yes, sure

if state == next_state_name:
next_state = self.reduction_automaton.states[state]
if next_state.state_type == "dead":
subst = all_rules[next_state_name]

This comment has been minimized.

@valglad

valglad Jun 1, 2018

Contributor

Perhaps, this information could be stored in the reduction automaton. Maybe each state could have a subst attribute, containing the right hand side of the corresponding rule. It would be more efficient than computing all inverse rules again, when you already do this in construct_automaton.

This comment has been minimized.

@valglad

valglad Jun 3, 2018

Contributor

Have you thought about storing right hand sides of rules in the automaton? Then you wouldn't be recomputing all_rules in reduce_using_automaton (you already did it in constuct_automaton) and wouldn't need to have inverse_rules as an attribute of the RewritingSystem class.

This comment has been minimized.

@RavicharanN

RavicharanN Jun 3, 2018

Contributor

Currently, I made a few changes in which inverse_rules is not an attribute of the RewritingSystemclass. And I'm computing the all_rules again because, if at all make_confluent is called anywhere by the user it throws a key_error for a few cases while substituting the new word. So, yes I think we can have subst as an attribute for each state.

@RavicharanN RavicharanN force-pushed the RavicharanN:automata branch from dd86d07 to 2695919 Jun 2, 2018


# All the transition symbols in the automaton
generators = list(self.alphabet)
generators += [gen**-1 for gen in generators]

This comment has been minimized.

@valglad

valglad Jun 27, 2018

Contributor

This is going to be the same every time - I think it would make sense for StateMachine to have alphabet as an attribute - you could add if self.reduction_automaton.alphabet = None before these two lines then.

This comment has been minimized.

@RavicharanN

RavicharanN Jun 27, 2018

Contributor

I think these 2 lines can be removed if alphabet is added as an attribute to the fsm.

This comment has been minimized.

@valglad

valglad Jun 27, 2018

Contributor

You need them for construct_automaton. I suppose you could actually move these two lines there, just before calling _add_to_automaton

This comment has been minimized.

@RavicharanN

RavicharanN Jun 28, 2018

Contributor

reduction_automaton.alphabet can be used instead of the generators when _add_to_automaton is called from construct_automaton. So, I meant to say we don't need to compute the generators if we set the attribute alphabet when the start state is initialized.

rules (dictionary) -- Dictionary of the newly added rules.
'''
# Automaton vairables

This comment has been minimized.

@valglad

valglad Jun 27, 2018

Contributor

variables

elif self.state_type == 'a':
self.state_type = "accept"
elif self.state_type == 'd':
self.state_type = "dead"

This comment has been minimized.

@valglad

valglad Jun 27, 2018

Contributor

I think it makes more sense to favor the shorter representation. So the user may enter "start" but we should be storing it as just "s". In fact, you can just say self.state_type = state_type[0] and not have any of these if statements at all.

Also, probably worth checking if state_type entered by the user is actually valid.

This comment has been minimized.

@RavicharanN

RavicharanN Jun 28, 2018

Contributor

So the user may enter "start" but we should be storing it as just "s".

But I thought that when someone tries to understand the working of the code, the lines like if automaton.state_type == "start" would be more readable, right?

This comment has been minimized.

@valglad

valglad Jun 28, 2018

Contributor

I don't think it's going to be less readable with just the first letter. You could always add a comment along the lines of state_type = 's' means a 'start' state before the first line where you use it in other functions if you wanted.

state -- This will be an instance of the State object which represents a new state after in the transition after the alphabet is read.
'''
self.transitions[letter] = state

This comment has been minimized.

@valglad

valglad Jun 27, 2018

Contributor

I feel that a state should have a state machine associated with it, because it doesn't make sense to have a state without a machine. Maybe as a state_machine attribute? Then if you also add an alphabet attribute to StateMachine, you could check here whether letter is in state_machine.alphabet.

# dictionary of reductions
self.rules = {}
self.rules_cache = deque([], 50)
self._init_rules()

# Create a finite state machine as an instance of the StateMachine object
self.reduction_automaton = StateMachine('fsm')

This comment has been minimized.

@valglad

valglad Jun 27, 2018

Contributor

Would probably be better if the name was something like 'Reduction automaton for ' + repr(G)' where G is the group.

elif self.state_type == "accept":
self.state_type = 'a'
elif self.state_type == "dead":
self.state_type = 'd'

This comment has been minimized.

@valglad

valglad Jun 29, 2018

Contributor

I think may have missed a part of my comment from last time: this can be more compactly written as self.state_type = state_type[0], whatever the user's input.

assert R.reduce_using_automaton(a**3*b**-2*a**2*b) == a**-1*b**-1
# Check after adding a rule
R.add_rule(a**2, b)
assert R.reduce_using_automaton(a**2*b**-2*a**2*b) == b**-1

This comment has been minimized.

@valglad

valglad Jun 29, 2018

Contributor

Just noticed this: is this correct? With the rule you added, we should have R.reduce_using_automaton(a**2*b**-2*a**2*b) == b - even if b**-1 is equivalent to b under the rule, b is smaller in the ordering that the rewriting system enforces on generators. What does reduce return in this case? Does G.equals(b, b**-1) work? You could also try running make_confluent after adding the rule and seeing if that makes a difference.

This comment has been minimized.

@RavicharanN

RavicharanN Jun 29, 2018

Contributor

What does reduce return in this case?

reduce returns b**-1 in Python3 and b in Python 2.

This comment has been minimized.

@valglad

valglad Jun 29, 2018

Contributor

Even after make_confluent?

This comment has been minimized.

@RavicharanN

RavicharanN Jun 29, 2018

Contributor

They return identity after make confluent

This comment has been minimized.

@valglad

valglad Jun 30, 2018

Contributor

Okay, this does make sense. If reduce_using_automaton gives the identity after make_confluent as well, then this is fine.

for elem in prop_prefix:
if not elem in self.reduction_automaton.states:
self.reduction_automaton.add_state(elem, state_type='a')
accept_states.append(elem)

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

(Okay, I know I said that there shouldn't be any more changes to make but now that I've noticed multiple small inefficiencies in this part of the code, I can't just let them pass. I promise what I suggest here will be the absolutely last changes and this will be good to merge)

The last 4 for loops all iterate over all_rules - it can all be done in one loop where what follows after for elem in prop_prefix is inserted into where the prefixes are calculated to avoid yet another loop. Do you see this?

for state in self.reduction_automaton.states:
current_state_name = state
current_state_type = self.reduction_automaton.states[state].state_type
# Transitions will be modified only when current_state belongs to the proper_perfixes of the new rules.

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

prefixes - typo. Also, you only need a suffix of current_state to be in proper_prefixes, not the whole word

# Add accept states.
if current_state_type == 'a':
for letter in automaton_alphabet:
next = current_state_name*letter

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

next is a python function, it doesn't seem to cause problems but I'd still rename it to e.g. _next

else:
self.reduction_automaton.states[state].add_transition(letter, current_state_name)
else:
# Store subwords of current state in an array.

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

suffixes is a more precise description

current_state_name_array.append(next_sub)
next_sub = next_sub.subword(1, len(next_sub))
# Check if the transition to any new state in posible.
for sub_word in current_state_name_array:

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

No need to store the suffixes in current_state_name_array - you can just replace this for loop with while len(next_sub) and compute sub_word just before you do the rest.

for sub_word in current_state_name_array:
if sub_word in accept_states:
# Add accept states.
if current_state_type == 'a':

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

I think you can avoid a bit of computation if you move this check to the last else to make it elif current_state_type == 'a'

This comment has been minimized.

@RavicharanN

RavicharanN Jul 4, 2018

Contributor

But, this check prevents the computation for dead states, 'coz adding the transitions for dead states isn't needed anyways. Removing that would iterate through the automaton_alphabet even for dead states.

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

I'm not suggesting to remove the check, only to move it several lines above

This comment has been minimized.

@RavicharanN

RavicharanN Jul 4, 2018

Contributor

Alright 👍

self.reduction_automaton.states[state].add_transition(letter, 'start')
break
else:
if next in self.reduction_automaton.states:

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

I think you could replace while True up there with while len(next) > 1 or not next in self.reduction_automaton.states, only do next = next.subword(1, len(next)) inside the while loop, and do

if next in self.reduction_automaton.states:
    self.reduction_automaton.states[state].add_transition(letter, next)
else:
    self.reduction_automaton.states[state].add_transition(letter, 'start')

once right after the loop. This is equivalent to what you have but would look neater.

else:
if next in self.reduction_automaton.states:
self.reduction_automaton.states[state].add_transition(letter, next)
break

This comment has been minimized.

@valglad

valglad Jul 4, 2018

Contributor

Same comments as above

if _next in self.reduction_automaton.states:
self.reduction_automaton.states[state].add_transition(letter, _next)
break
elif len(_next) == 1:

This comment has been minimized.

@valglad

valglad Jul 5, 2018

Contributor

Should this not be len(_next) == 0?
Also, if you were to do

while len(_next) and _next not in self.reduction_automaton.states:
    _next = _next.subword(1, len(_next))
if not len(_next):
    _next = 'start'
self.reduction_automaton.states[state].add_transition(letter, _next)

that would be a bit shorter.

This comment has been minimized.

@RavicharanN

RavicharanN Jul 5, 2018

Contributor

Alright 👍

next = next.subword(1, len(next))
elif len(next) == 1:
self.reduction_automaton.states[state].add_transition(letter, 'start')
next = next.subword(1, len(next))

This comment has been minimized.

@valglad

valglad Jul 5, 2018

Contributor

The variable here is still called next

if len(self.reduction_automaton.automaton_alphabet) != len(automaton_alphabet):
for state in proper_prefixes:
for state in accept_states:

This comment has been minimized.

@valglad

valglad Jul 5, 2018

Contributor

What about the dead states?

This comment has been minimized.

@RavicharanN

RavicharanN Jul 5, 2018

Contributor

Dead state transitions could be avoided anyway, right?

This comment has been minimized.

@valglad

valglad Jul 5, 2018

Contributor

Ah, I see, you bring it back to start state in reduce_using_automaton. Yes, that's fine.


automaton_alphabet = set(automaton_alphabet)
# Add accept states
for rule in all_rules:

This comment has been minimized.

@valglad

valglad Jul 6, 2018

Contributor

Why does this need to be in a separate loop?

This comment has been minimized.

@RavicharanN

RavicharanN Jul 7, 2018

Contributor

I posted it on the channel, the assertion error caused in the test cases might have been because of this. It works fine if we add the accept states after all the dead states are added. There must've been an overlap in the case with the assertion error

This comment has been minimized.

@valglad

valglad Jul 7, 2018

Contributor

Okay, I see. So there must be an overlap between the dead states and the accept states (is that what you meant on the channel?), and whenever both are possible we should be making them dead - because then we actually need to make a substitution and that's important. Is it possible to rewrite the state type? Because then you could have everything in one loop but run self.reduction_automaton.add_state(rule, state_type='d', rh_rule=all_rules[rule]) even if rule is already in self.reduction_automaton.states (as an accept state) - maybe you could modify add_state to rewrite a state with new information in that case?

self.reduction_automaton.states[state].add_transition(letter, 'start')
next = next.subword(1, len(next))
_next = current_state_name*letter
len_next_word = len(_next)

This comment has been minimized.

@valglad

valglad Jul 6, 2018

Contributor

You don't use this variable anywhere else.

if rule in accept_states:
self.reduction_automaton.states[rule].state_type = 'd'
self.reduction_automaton.states[rule].rh_rule = all_rules[rule]
accept_states.remove(rule)

This comment has been minimized.

@valglad

valglad Jul 8, 2018

Contributor

Not quite what I had in mind but I suppose this works too - the error is gone and hopefully this is slightly more efficient what we had before.

@valglad

This comment has been minimized.

Copy link
Contributor

valglad commented Jul 8, 2018

I think this is ready as well - good job! 👍

@valglad valglad merged commit fa5e1ac into sympy:master Jul 8, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment