Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you interested in contributions? #37

Closed
Zac-HD opened this issue Apr 21, 2018 · 5 comments
Closed

Are you interested in contributions? #37

Zac-HD opened this issue Apr 21, 2018 · 5 comments

Comments

@Zac-HD
Copy link

Zac-HD commented Apr 21, 2018

I finally picked up greenery this afternoon, and have had a lot of fun with it. Before I spend much more time on it though I thought I'd check if you're interested in contributions:

  • I wrote an alternative string-to-lego parser which leans heavily on the CPython sre_parse module - master...Zac-HD:pyregex. It supports every construct that can be re.compiled, but has somewhat worse errors at the moment (eg no context given when bailing on a groupref). Also needs more testing for eg repeats 😄

  • I also wrote some property-based tests with Hypothesis, master...Zac-HD:property-tests. This has already turned up a few bugs of the form x != parse(str(x)) for some lego object x, but there's little point looking for more if you don't consider this a bug worth fixing. (I originally started this on the parser branch, but similar problems seem to exist on master too)

Either way, thanks for a great little library and a fun evening poking at it!

@qntm
Copy link
Owner

qntm commented Apr 22, 2018

Hi! I'm glad you found greenery to have some entertainment value. I consider this to be a completed project at the moment, and it's been in "if it ain't broke, don't touch it" mode for several years.

If it does have bugs, then let's fix them. I can think of one case where x == parse(str(x)) would return False but I would be unconcerned, for example when x is charclass('a') but parse(str(x)) is a full-blown pattern object, pattern(conc(mult(charclass('a'), multiplier(bound(1), bound(1))))). I don't know if this is the kind of thing you're talking about. But if you have some examples which look more like bugs, we can take a look at fixing them.

I see in one case you've suggested changing a test where parsing "a.b()()" would result in a lego object where the empty subpatterns had been actively suppressed. In my opinion this introduces a x != str(parse(x)) bug because the empty subpatterns in the original have been removed after a round trip. In general I would prefer to keep parsing and reducing regular expressions as separate operations, and in fact not reduceing the returned lego object at parsing time was an intentional decision.

@Zac-HD
Copy link
Author

Zac-HD commented Apr 23, 2018

Alright then! In that case I'll keep new features in my own copy, but bring tests and bugs upstream. Glad to set expectations up front 😄

I'd encourage you to have a look at the property-tests branch to reproduce this (and check that this isn't the only bug), but here's the test case:

a = conc(charclass('0'), pattern(charclass('1'), charclass('0')))
# str(a) == '00|1'  # <- there's the bug - should be '0(0|1)'!
b = parse(str(a))
assert a.reduce() == b.reduce()  # Fails

Note that str(parse('0(0|1)')) == '0(0|1)' - it's not obvious to me that you can get this bug in a chunk of lego parsed from a string, but it's certainly possible from direct construction and I suspect from reduction too.

I would like to use the lego.equivalent method for the assertion, but it's incredibly slow on non-trivial inputs - I didn't leave it running overnight, but otherwise the test wouldn't finish 😞 Possibly another issue to look into here?

(And I should note: happy to look into these myself if you have some tips about where to start!)

@qntm
Copy link
Owner

qntm commented Apr 23, 2018

Right, I think the problem there is probably that pattern(charclass('1'), charclass('0')) isn't throwing an exception. A pattern should be a sequence of conc objects, it shouldn't be permissible to use a charclass there. I see that even pattern('a', 'b') and pattern(782, 0) seem to have no problems and they all strify just fine. It looks like there's almost no input validation going on in the lego constructors. Which I think might have been intentional? Possibly I wasn't thinking about people building these objects by hand except when I do it internally for testing purposes? The constructors themselves are not documented, only lego.parse is.

@qntm
Copy link
Owner

qntm commented Apr 23, 2018

lego.equivalent first converts the two lego objects to finite state machines, which is potentially very time-consuming when large numbers of possible characters are involved. It then computes the symmetric difference of the two FSMs, also potentially time-consuming, and finally checks that the result is empty, which is effectively instantaneous. However, in this case, I find that parse('00|1').equivalent(parse('0(0|1)')) returns False pretty much instantly.

@Zac-HD
Copy link
Author

Zac-HD commented Apr 24, 2018

Sounds like usage errors on my end then! Thanks again for greenery, and for your help here 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants