Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated code for quantifier accepts incorrect sequence? #358

Closed
dkegel-fastly opened this issue Jun 1, 2021 · 2 comments
Closed

Generated code for quantifier accepts incorrect sequence? #358

dkegel-fastly opened this issue Jun 1, 2021 · 2 comments

Comments

@dkegel-fastly
Copy link

re -k str -l c -r pcre 'a+$'

generates a function which accepts the sequence a\na, which seems rather wrong...?

This is with the tip of main. Also happens with -l go.

@katef
Copy link
Owner

katef commented Jun 2, 2021

Thanks for the report.

$ in pcre (with the default line-ending flag) means "end of text with an optional newline". contrast to \Z which means "end of text", without a newline.

The NFA constructed is:

image

Because the \n is optional for this particular regular expression, the acceptingness of ((1)) is visible back through the epsilon transitions from (5), (4), and (7). Those states all accept.

This NFA can match by reading a\n entirely on the "any" transition for (2), then match the literal a at the end of your input, without a newline, and then accept on (7).

This matches the behaviour for pcregrep:

; echo -n 'a\na' | pcregrep 'a+$'
a
a

@katef
Copy link
Owner

katef commented Jun 2, 2021

Closing this because I believe the behaviour here is correct; please re-open if that's wrong.

@katef katef closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants