Adds lookaround assertions closes #2009

JJ · JJ · commit 633932784baa · 2019-07-22T09:36:31.000+02:00
diff --git a/doc/Language/regexes.pod6 b/doc/Language/regexes.pod6
@@ -193,12 +193,12 @@ alphabet) match C<\d>, but also digits from other scripts.
 
 Examples for digits are:
 
-    =begin code :lang<text>
-    U+0035 5 DIGIT FIVE
-    U+0BEB ௫ TAMIL DIGIT FIVE
-    U+0E53 ๓ THAI DIGIT THREE
-    U+17E5 ៥ KHMER DIGIT FIVE
-    =end code
+=begin code :lang<text>
+U+0035 5 DIGIT FIVE
+U+0BEB ௫ TAMIL DIGIT FIVE
+U+0E53 ๓ THAI DIGIT THREE
+U+17E5 ៥ KHMER DIGIT FIVE
+=end code
 
 =head3 X<C<\w> and C<\W>|regex,\w;regex,\W>
 
@@ -425,15 +425,15 @@ which takes a single L<Int|/type/Int> or a L<Range|/type/Range> on the right-han
 the number of times to match. If L<Range|/type/Range> is specified, the end-points specify
 the minimum and maximum number of times to match.
 
-    =begin code
-    say 'abcdefg' ~~ /\w ** 4/;      # OUTPUT: «｢abcd｣␤»
-    say 'a'       ~~ /\w **  2..5/;  # OUTPUT: «Nil␤»
-    say 'abc'     ~~ /\w **  2..5/;  # OUTPUT: «｢abc｣␤»
-    say 'abcdefg' ~~ /\w **  2..5/;  # OUTPUT: «｢abcde｣␤»
-    say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «｢abcd｣␤»
-    say 'abcdefg' ~~ /\w ** ^3/;     # OUTPUT: «｢ab｣␤»
-    say 'abcdefg' ~~ /\w ** 1..*/;   # OUTPUT: «｢abcdefg｣␤»
-    =end code
+=begin code
+say 'abcdefg' ~~ /\w ** 4/;      # OUTPUT: «｢abcd｣␤»
+say 'a'       ~~ /\w **  2..5/;  # OUTPUT: «Nil␤»
+say 'abc'     ~~ /\w **  2..5/;  # OUTPUT: «｢abc｣␤»
+say 'abcdefg' ~~ /\w **  2..5/;  # OUTPUT: «｢abcde｣␤»
+say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «｢abcd｣␤»
+say 'abcdefg' ~~ /\w ** ^3/;     # OUTPUT: «｢ab｣␤»
+say 'abcdefg' ~~ /\w ** 1..*/;   # OUTPUT: «｢abcdefg｣␤»
+=end code
 
 Only basic literal syntax for the right-hand side of the quantifier
 is supported, to avoid ambiguities with other regex constructs. If you need
@@ -550,16 +550,16 @@ single letter to match the C<\w+> expression at the end of the line.
 
 By default, quantifiers request a greedy match:
 
-    =begin code
-    'abababa' ~~ /a .* a/ && say ~$/;   # OUTPUT: «abababa␤»
-    =end code
+=for code
+'abababa' ~~ /a .* a/ && say ~$/;   # OUTPUT: «abababa␤»
+
 
 You can attach a C<?> modifier to the quantifier to enable frugal
 matching:
 
-    =begin code
-    'abababa' ~~ /a .*? a/ && say ~$/;   # OUTPUT: «aba␤»
-    =end code
+=for code
+'abababa' ~~ /a .*? a/ && say ~$/;   # OUTPUT: «aba␤»
+
 
 You can also enable frugal matching for general quantifiers:
 
@@ -888,6 +888,38 @@ lookahead and lookbehind assertions.
 Technically, anchors are also zero-width assertions, and they can look
 both ahead and behind.
 
+=head2 X«Lookaround assertions|regex,positive lookaround assertion;regex,negative lookaround assertion»
+
+Lookaround assertions work both ways. They match, but they don't consume a
+character.
+
+=begin code
+my regex key {^^ <![#-]> \d+ }
+say "333" ~~ &key;                  # OUTPUT: «｢333｣␤»
+say '333$' ~~ m/ \d+ <?[$]>/;       # OUTPUT: «｢333｣␤»
+say '$333' ~~ m/^^ <?[$]> . \d+ /;  # OUTPUT: «｢$333｣␤»
+=end code
+
+They can be positive or negative: C<![]> is negative, while C<?[]> is
+positive; the square braces will include the characters or backslashed
+character classes that are going to be matched.
+
+You can use predefined character classes and Unicode properties directly
+preceded by the semicolon:
+
+=for code
+say '333' ~~ m/^^ <?alnum> \d+ /;          # OUTPUT: «｢333｣␤»
+say '333' ~~ m/^^ <?:Nd> \d+ /;            # OUTPUT: «｢333｣␤»
+say '333' ~~ m/^^ <!:L> \d+ /;             # OUTPUT: «｢333｣␤»
+say '333' ~~ m/^^ \d+ <!:Script<Tamil>> /; # OUTPUT: «｢33｣␤»
+
+
+In the first two cases, the character class matches, but does not consume,
+the first digit, which is then consumed by the expression; in the third, the
+negative lookaround assertion behaves in the same way. In the fourth
+statement the last digit is matched but not consumed, thus the match includes
+ only the first two digits.
+
 =head2 X<Lookahead assertions|regex,before>
 
 To check that a pattern appears before another pattern, use a