Skip to content

Commit 6339327

Browse files
committed
Adds lookaround assertions closes #2009
1 parent 616e6b0 commit 6339327

File tree

1 file changed

+53
-21
lines changed

1 file changed

+53
-21
lines changed

doc/Language/regexes.pod6

Lines changed: 53 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -193,12 +193,12 @@ alphabet) match C<\d>, but also digits from other scripts.
193193
194194
Examples for digits are:
195195
196-
=begin code :lang<text>
197-
U+0035 5 DIGIT FIVE
198-
U+0BEB ௫ TAMIL DIGIT FIVE
199-
U+0E53 ๓ THAI DIGIT THREE
200-
U+17E5 ៥ KHMER DIGIT FIVE
201-
=end code
196+
=begin code :lang<text>
197+
U+0035 5 DIGIT FIVE
198+
U+0BEB ௫ TAMIL DIGIT FIVE
199+
U+0E53 ๓ THAI DIGIT THREE
200+
U+17E5 ៥ KHMER DIGIT FIVE
201+
=end code
202202
203203
=head3 X<C<\w> and C<\W>|regex,\w;regex,\W>
204204
@@ -425,15 +425,15 @@ which takes a single L<Int|/type/Int> or a L<Range|/type/Range> on the right-han
425425
the number of times to match. If L<Range|/type/Range> is specified, the end-points specify
426426
the minimum and maximum number of times to match.
427427
428-
=begin code
429-
say 'abcdefg' ~~ /\w ** 4/; # OUTPUT: «「abcd」␤»
430-
say 'a' ~~ /\w ** 2..5/; # OUTPUT: «Nil␤»
431-
say 'abc' ~~ /\w ** 2..5/; # OUTPUT: «「abc」␤»
432-
say 'abcdefg' ~~ /\w ** 2..5/; # OUTPUT: «「abcde」␤»
433-
say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «「abcd」␤»
434-
say 'abcdefg' ~~ /\w ** ^3/; # OUTPUT: «「ab」␤»
435-
say 'abcdefg' ~~ /\w ** 1..*/; # OUTPUT: «「abcdefg」␤»
436-
=end code
428+
=begin code
429+
say 'abcdefg' ~~ /\w ** 4/; # OUTPUT: «「abcd」␤»
430+
say 'a' ~~ /\w ** 2..5/; # OUTPUT: «Nil␤»
431+
say 'abc' ~~ /\w ** 2..5/; # OUTPUT: «「abc」␤»
432+
say 'abcdefg' ~~ /\w ** 2..5/; # OUTPUT: «「abcde」␤»
433+
say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «「abcd」␤»
434+
say 'abcdefg' ~~ /\w ** ^3/; # OUTPUT: «「ab」␤»
435+
say 'abcdefg' ~~ /\w ** 1..*/; # OUTPUT: «「abcdefg」␤»
436+
=end code
437437
438438
Only basic literal syntax for the right-hand side of the quantifier
439439
is supported, to avoid ambiguities with other regex constructs. If you need
@@ -550,16 +550,16 @@ single letter to match the C<\w+> expression at the end of the line.
550550
551551
By default, quantifiers request a greedy match:
552552
553-
=begin code
554-
'abababa' ~~ /a .* a/ && say ~$/; # OUTPUT: «abababa␤»
555-
=end code
553+
=for code
554+
'abababa' ~~ /a .* a/ && say ~$/; # OUTPUT: «abababa␤»
555+
556556
557557
You can attach a C<?> modifier to the quantifier to enable frugal
558558
matching:
559559
560-
=begin code
561-
'abababa' ~~ /a .*? a/ && say ~$/; # OUTPUT: «aba␤»
562-
=end code
560+
=for code
561+
'abababa' ~~ /a .*? a/ && say ~$/; # OUTPUT: «aba␤»
562+
563563
564564
You can also enable frugal matching for general quantifiers:
565565
@@ -888,6 +888,38 @@ lookahead and lookbehind assertions.
888888
Technically, anchors are also zero-width assertions, and they can look
889889
both ahead and behind.
890890
891+
=head2 X«Lookaround assertions|regex,positive lookaround assertion;regex,negative lookaround assertion»
892+
893+
Lookaround assertions work both ways. They match, but they don't consume a
894+
character.
895+
896+
=begin code
897+
my regex key {^^ <![#-]> \d+ }
898+
say "333" ~~ &key; # OUTPUT: «「333」␤»
899+
say '333$' ~~ m/ \d+ <?[$]>/; # OUTPUT: «「333」␤»
900+
say '$333' ~~ m/^^ <?[$]> . \d+ /; # OUTPUT: «「$333」␤»
901+
=end code
902+
903+
They can be positive or negative: C<![]> is negative, while C<?[]> is
904+
positive; the square braces will include the characters or backslashed
905+
character classes that are going to be matched.
906+
907+
You can use predefined character classes and Unicode properties directly
908+
preceded by the semicolon:
909+
910+
=for code
911+
say '333' ~~ m/^^ <?alnum> \d+ /; # OUTPUT: «「333」␤»
912+
say '333' ~~ m/^^ <?:Nd> \d+ /; # OUTPUT: «「333」␤»
913+
say '333' ~~ m/^^ <!:L> \d+ /; # OUTPUT: «「333」␤»
914+
say '333' ~~ m/^^ \d+ <!:Script<Tamil>> /; # OUTPUT: «「33」␤»
915+
916+
917+
In the first two cases, the character class matches, but does not consume,
918+
the first digit, which is then consumed by the expression; in the third, the
919+
negative lookaround assertion behaves in the same way. In the fourth
920+
statement the last digit is matched but not consumed, thus the match includes
921+
only the first two digits.
922+
891923
=head2 X<Lookahead assertions|regex,before>
892924
893925
To check that a pattern appears before another pattern, use a

0 commit comments

Comments
 (0)