Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Comparing regexps #44856

Closed
thomasda mannequin opened this issue Apr 16, 2007 · 5 comments
Closed

Feature request: Comparing regexps #44856

thomasda mannequin opened this issue Apr 16, 2007 · 5 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@thomasda
Copy link
Mannequin

thomasda mannequin commented Apr 16, 2007

BPO 1701452
Nosy @rhettinger

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2007-04-19.06:44:11.000>
created_at = <Date 2007-04-16.12:28:16.000>
labels = ['type-feature', 'library']
title = 'Feature request: Comparing regexps'
updated_at = <Date 2007-04-19.06:44:11.000>
user = 'https://bugs.python.org/thomasda'

bugs.python.org fields:

activity = <Date 2007-04-19.06:44:11.000>
actor = 'rhettinger'
assignee = 'niemeyer'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2007-04-16.12:28:16.000>
creator = 'thomasda'
dependencies = []
files = []
hgrepos = []
issue_num = 1701452
keywords = []
message_count = 5.0
messages = ['55057', '55058', '55059', '55060', '55061']
nosy_count = 3.0
nosy_names = ['rhettinger', 'niemeyer', 'thomasda']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1701452'
versions = []

@thomasda
Copy link
Mannequin Author

thomasda mannequin commented Apr 16, 2007

It would be very nice with a function in the re module to compare two regular expressions and see how they overlap.
Return value would perhaps be an a constant like DISJOINT, SUBSET etc.

@thomasda thomasda mannequin closed this as completed Apr 16, 2007
@thomasda thomasda mannequin assigned niemeyer Apr 16, 2007
@thomasda thomasda mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 16, 2007
@thomasda thomasda mannequin closed this as completed Apr 16, 2007
@thomasda thomasda mannequin assigned niemeyer Apr 16, 2007
@thomasda thomasda mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 16, 2007
@rhettinger
Copy link
Contributor

Can this be done in the existing implementation by comparing the racetrack diagrams (character transitions)?

Thomas, do you know of anywhere this have been done (third-party modules or in other languages)?

@thomasda
Copy link
Mannequin Author

thomasda mannequin commented Apr 17, 2007

I found this page with the function to do so: http://terpstra.ca/compare.html

I also found this thread: http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses how to do this with some canonical formed expressions.

A function like this would really be able to speed some applications up, which matches a lot of strings with a number of expressions. If you know which ones are disjoint, you can break the test when one test matches.

@thomasda
Copy link
Mannequin Author

thomasda mannequin commented Apr 18, 2007

I talked with the guy who wrote the ZZ comparator.

"""I can give you the source code under the GPL if you like. However, I
think it would be difficult to port to python. It's 1100 lines of
very SML-style mostly-uncommented code. Do you know SML?

It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp.
I would expect credit for the algorithm. :-) Let me know if you
succeed in porting it."""

I don't know any SML though.
If anybody does or can write a python equaliant of the algorithm:

"""1. Parse the regular expression (in GNU regular expression syntax)
2. Convert that parse tree into an NFA
3. Convert the NFA into a DFA by an algorithm I invented (it's why I
wrote this program; I thought of the algorithm and found it amusing
to see it in action)

For comparing regular expressions I take the following additional
steps
4. Combine the two DFA's (with the cross product)
4a. For finding common words, I intersect them
4b. For finding words in A, but not B, I intersect A with the negated
DFA of B
4c. ...
5. Find the minimal DFA corresponding to this intersection
6. Run a weighted depth-first search (similar to Dijkstra's) to find
the least-weighted path from the initial state to an accepting state
(the weighting makes it prefer human readable characters in the
examples)"""

It could be cool.
Otherwise I can also try to learn sml and port it.

@rhettinger
Copy link
Contributor

Am closing the feature request because it has very little chance of making it into the core distribution without being a third-party module first. The existing implementation is already somewhat difficult to maintain and we don't want to make that worse. Also, the new functionality is of interest to only a small subset of the re module users.

If you do write it, respect the GPL license on the SML code and take some care to make sure you can license your new code in any way you see fit.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant