New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document S05 treatment of alternations as atoms for ratchet #1962

Open
ronaldxs opened this Issue Apr 25, 2018 · 0 comments

Comments

Projects
None yet
4 participants
@ronaldxs
Contributor

ronaldxs commented Apr 25, 2018

S05 explains that | and || alternatives are atomic with respect to ratchet restriction on backtracking. In a discussion with @moritz on IRC #perl6 he seemed to explain that Perl 6 alternatives do not use backtracking at all (https://irclog.perlgeek.de/perl6/2018-04-12#i_16038420). After further research and testing I believe that to explain the problem discussed on IRC #perl6, the atomicity of alternatives with respect to ratchet discussed in S05 needs to be in the public facing documentation which I do not believe to be true now. I try to make the case below that there was adequate reason for me to be confused about the backtracking behavior of alternatives with ratchet. I am sorry if it is not so easy to read but it was not easy for me to write either and I felt it was worth the trouble.

The question of backtracking and alternatives came up for me in practice for the Perl 6 Grammar::Modelica parser discussed in an advent calendar posting. A Modelica feature called "named arguments" allows a function to be called either as CircleArea(x) or CircleArea(radius=x). The grammar failed on the second form until I changed a grammar rule which was an alternative, similar to /\w+ || \w+ '=' \w+/, into a grammar regex which can backtrack. Reversing the alternatives so that alternative atomicity for backtracking parses "named arguments" with a rule seems to be at the heart of the matter.

The example below fails because the first alternative matches and ratchet prevents backtracking to match the end of the string. If you remove the ratchet modifier the match succeeds.

> say so "a=b" ~~ /:r ^ [ <[ab]>+ || \w+ '=' \w+ ] $ /
False

If we switch the order of alternatives both sides work fine with ratchet.

> say so "a=b" ~~ /:r ^ [ \w+ '=' \w+ || <[ab]>+ ] $ /
True
> say so "a" ~~ /:r ^ [ \w+ {say 'alt1'} '=' \w+ || {say 'alt2'} <[ab]>+ ] $ /
alt1
alt2
True

For "a=b" there is now no need to look at the second alternative so there is no question of backtracking. For "a" the regex seems to first match \w+, fail on '=', and then move back its position in the string so it says 'alt2' before matching the second alternative. @moritz seemed to claim there is no backtracking here.

A DFA, which makes one transition between states on each input symbol, could do the matching without backtracking. The transition from after \w+ to rematch the a with the second alternative does not look like DFA processing although adding the code blocks might effect the parsing engine and there might be some clever way to execute the code blocks in the right order and still only make one state transition. I have read that NFA engines can backtrack as is the case of the regex with the first ordering that parses "a=b" only without ratchet. In the Regex Mechanics" chapter (Chapter 6) of @moritz fine book, "Parsing with Perl 6 Regexes and Grammars", it says that | disjunctions are handled by an NFA engine which can't handle || sequential disjunctions. Perl 5 backtracking for alternatives is also mentioned in the "Grouping things" section of the Perl regular expressions tutorial.

It seems reasonable to me for a programmer to wonder about the possibility that in the last code example, accepting "a", the regex might be backtracking. Documenting the relevant ratchet paragraph from Synopsis 5#Modifiers seems to allow individual regex alternatives to backtrack if helpful and cleanly fix the confusion:

The new :r or :ratchet modifier causes this regex to not backtrack by
default. (Generally you do not use this modifier directly, since it's
implied by token and rule declarations.) The effect of this modifier is
to imply a : after every atom, including but not limited to *, +, and
? quantifiers, as well as alternations.

I think the feature should be explained in the public facing documentation. If my arguments are accepted I will later open an issue requesting roast tests for this feature in S05-modifier/ratchet.t.

@ronaldxs ronaldxs assigned moritz and unassigned moritz Apr 25, 2018

moritz added a commit that referenced this issue Apr 25, 2018

ronaldxs added a commit to ronaldxs/Grammar-Modelica that referenced this issue Apr 25, 2018

Update Expressions.pm6 to use rule instead of backtracking regex base…
…d on

alternatives being atomic to backtracking to be documented and tested
resulting from new perl6 doc issue #1962:
perl6/doc#1962

@TisonKun TisonKun self-assigned this Apr 27, 2018

@TisonKun TisonKun removed their assignment May 12, 2018

@JJ JJ added docs update labels May 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment