Skip to content

Commit 71b822c

Browse files
authored
Merge pull request #1640 from perl6/W4anD0eR96-patch-1
Rewrite the document of Longest Alternation: C<|>
2 parents a10c20d + 7e878dd commit 71b822c

File tree

1 file changed

+39
-3
lines changed

1 file changed

+39
-3
lines changed

doc/Language/regexes.pod6

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -512,10 +512,46 @@ string of non-whitespace characters.
512512
513513
=head1 X<Longest Alternation: C<|>|regex,|>
514514
515-
In regexes branches separated by C<|>, the longest match wins, independent of
516-
the lexical ordering in the regexes.
515+
In short, in regexes branches separated by C<|>, the longest token match wins,
516+
independent of the textual ordering in the regexes. However, what C<|> really
517+
does is more than that.
517518
518-
say ('abc' ~~ / a | .b /).Str; # OUTPUT: «ab␤»
519+
C<|> does not decide which branch to win after finishing the whole match,
520+
but follows the L<longest token match, for short, LTM|
521+
https://design.perl6.org/S05.html#Longest-token_matching> strategy.
522+
523+
Briefly, what C<|> does is this:
524+
525+
=item1 First, select the branch which has the longest declarative prefix.
526+
527+
say "abc" ~~ /ab | a.* /; # Output: ⌜abc⌟
528+
say "abc" ~~ /ab | a {} .* /; # Output: ⌜ab⌟
529+
say "if else" ~~ / if | if <.ws> else /; # Output: 「if」
530+
say "if else" ~~ / if | if \s+ else /; # Output: 「if else」
531+
532+
As is shown above, C<a.*> is a declarative prefix, while C<a {} .*> terminates
533+
at C<{}>, then its declarative prefix is C<a>. Note that non-declarative atoms
534+
terminate declarative prefix. This is quite important if you want to apply
535+
C<|> in a C<rule>, which automatically enables C<:s>, and C«<.ws>» accidentally
536+
terminates declarative prefix.
537+
538+
=item1 If it's a tie, select the match with the highest specificity.
539+
540+
say "abc" ~~ /a. | ab { print "win" } /; # Output: win「ab」
541+
542+
When two alternatives match at the same length, the tie is broken by
543+
specificity. That is, C<ab>, as an exact match, counts as closer than C<a.>,
544+
which uses character classes.
545+
546+
=item1 If it's still a tie, use additional tie-breakers.
547+
548+
say "abc" ~~ /a\w| a. { print "lose" } /; # Output: ⌜ab⌟
549+
550+
If the tie breaker above doesn't work, then the textually earlier alternative
551+
takes precedence.
552+
553+
For more details, see
554+
L<the LTM strategy|https://design.perl6.org/S05.html#Longest-token_matching>.
519555
520556
=head1 X<Conjunction: C<&&>|regex,&&>
521557

0 commit comments

Comments
 (0)