@@ -512,10 +512,46 @@ string of non-whitespace characters.
512
512
513
513
= head1 X < Longest Alternation: C < | > |regex,| >
514
514
515
- In regexes branches separated by C < | > , the longest match wins, independent of
516
- the lexical ordering in the regexes.
515
+ In short, in regexes branches separated by C < | > , the longest token match wins,
516
+ independent of the textual ordering in the regexes. However, what C < | > really
517
+ does is more than that.
517
518
518
- say ('abc' ~~ / a | .b /).Str; # OUTPUT: «ab»
519
+ C < | > does not decide which branch to win after finishing the whole match,
520
+ but follows the L < longest token match, for short, LTM|
521
+ https://design.perl6.org/S05.html#Longest-token_matching > strategy.
522
+
523
+ Briefly, what C < | > does is this:
524
+
525
+ = item1 First, select the branch which has the longest declarative prefix.
526
+
527
+ say "abc" ~~ /ab | a.* /; # Output: ⌜abc⌟
528
+ say "abc" ~~ /ab | a {} .* /; # Output: ⌜ab⌟
529
+ say "if else" ~~ / if | if <.ws> else /; # Output: 「if」
530
+ say "if else" ~~ / if | if \s+ else /; # Output: 「if else」
531
+
532
+ As is shown above, C < a.* > is a declarative prefix, while C < a {} .* > terminates
533
+ at C < {} > , then its declarative prefix is C < a > . Note that non-declarative atoms
534
+ terminate declarative prefix. This is quite important if you want to apply
535
+ C < | > in a C < rule > , which automatically enables C < :s > , and C « <.ws> » accidentally
536
+ terminates declarative prefix.
537
+
538
+ = item1 If it's a tie, select the match with the highest specificity.
539
+
540
+ say "abc" ~~ /a. | ab { print "win" } /; # Output: win「ab」
541
+
542
+ When two alternatives match at the same length, the tie is broken by
543
+ specificity. That is, C < ab > , as an exact match, counts as closer than C < a. > ,
544
+ which uses character classes.
545
+
546
+ = item1 If it's still a tie, use additional tie-breakers.
547
+
548
+ say "abc" ~~ /a\w| a. { print "lose" } /; # Output: ⌜ab⌟
549
+
550
+ If the tie breaker above doesn't work, then the textually earlier alternative
551
+ takes precedence.
552
+
553
+ For more details, see
554
+ L < the LTM strategy|https://design.perl6.org/S05.html#Longest-token_matching > .
519
555
520
556
= head1 X < Conjunction: C < && > |regex,&& >
521
557
0 commit comments