Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour #31

Closed
reverofevil opened this issue May 11, 2016 · 8 comments
Closed

Strange behaviour #31

reverofevil opened this issue May 11, 2016 · 8 comments

Comments

@reverofevil
Copy link

reverofevil commented May 11, 2016

>>> print(parse("(\d{2})+") & parse("(\d{3})+") == parse("(\d{6})+"))
False

I don't see how \d(\d{6})*\d{5} differs from \d{6} (neither I see why it's computed to be a minimal regex).

@qntm
Copy link
Owner

qntm commented May 11, 2016

Yeah, the two are equivalent:

>>> (lego.parse("(\\d{2})+") & lego.parse("(\\d{3})+")).equivalent(lego.parse("(\\d{6})+"))
True

but obviously that's not the minimal regex. In fact computing a minimal regex is insanely hard computationally, I believe. I don't know if there's an effective procedure for doing that.

I'll consider an extra reduction rule which can do that simplification.

@reverofevil
Copy link
Author

@ferno The computationally hard (worst-case exponential) but easy to implement in greenery is Brzozowski's algorithm, where edges of DFA are reversed, resulting NFA is compiled into DFA, and then its edges are reversed once again. But nevertheless there're better alternatives. Hopcroft's algorithm is no harder than sorting, and (luckily) well covered in the "dragon book".

I didn't really expect == to behave differently from equivalent. Thank you!

@qntm
Copy link
Owner

qntm commented May 13, 2016

Finding a minimal DFA using Brzozowski's algorithm is already part of greenery... but sadly a minimal DFA is not the same thing as a minimal regular expression! I don't know of any algorithm to find the latter. (Other than iterating over every possible regular expression in order of length.)

@reverofevil
Copy link
Author

@ferno I forgot that you have to convert DFA back to regex somehow. I see it was totally my misunderstanding.

@reverofevil
Copy link
Author

Well, it seems I still have some stupid questions to ask.

>>> (lego.parse("(\\d{2})+") & lego.parse("(\\d{3})+")).equivalent(lego.parse("(
\\d{6})+"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'conc' object has no attribute 'equivalent'

@reverofevil reverofevil reopened this May 27, 2016
@reverofevil
Copy link
Author

Oh, please, update the version available in pip.

@qntm
Copy link
Owner

qntm commented May 31, 2016

Yes, I will. Sorry, I got involved in a long series of breaking changes and new features (and bumped the version number) but then I never really finished them, so I never published the new stuff to pip.

@qntm
Copy link
Owner

qntm commented Jun 5, 2016

Version 3.0 of greenery is now available on PyPI, this is the version which pip install greenery should retrieve for you next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants