Avoid duplicate output in acyclic_breadth_first #3245
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #3244.
Duplicate elimination is a central theme in the literature on the transitive closure of relational operators.
With cyclic cases, however, duplicate elimination becomes also crucial for termination, as seen in #2314.
Currently, the closure operation in wordnet.py uses the acyclic_breadth_first() function imported from nltk/util.py, which is supposed to prevent cycles. However, this function presently only detects cyclic children branches, but it does not properly eliminate some duplicates, that are accessible through different parent paths.
Consider for ex. the transitive closure of the hypernyms of 'calamagrostis.n.01', which presently duplicates the nodes that are accessible by two different paths.
With the present PR, the same call does not output duplicates:
[Synset('gramineae.n.01'), Synset('monocot_genus.n.01'), Synset('monocot_family.n.01'), Synset('genus.n.02'), Synset('family.n.06'), Synset('taxonomic_group.n.01'), Synset('biological_group.n.01'), Synset('group.n.01'), Synset('abstraction.n.06'), Synset('entity.n.01')]