Skip to content

Commit 775f199

Browse files
committed
[regexes] describe capturing and groups
1 parent 7fba5cf commit 775f199

File tree

1 file changed

+62
-1
lines changed

1 file changed

+62
-1
lines changed

lib/Language/regexes.pod

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,68 @@ non-whitespace characters.
347347
348348
=head1 Grouping and Capturing
349349
350-
TODO
350+
In regular (non-regex) Perl 6, you can use parenthesis to group things
351+
together, usually to override operator precedence:
352+
353+
say 1 + 4 * 2; # 9, because it is parsed as 1 + (4 * 2)
354+
say (1 + 4) * 2; # 10
355+
356+
The same grouping facility is available in regexes:
357+
358+
/ a || b c / # matches 'a' or 'bc'
359+
/ ( a || b ) c / # matches 'ac' or 'bc'
360+
361+
The same grouping applies to quantifiers:
362+
363+
/ a b+ / # Matches an 'a' followed by one or more 'b's
364+
/ (a b)+ / # Matches one or more sequences of 'ab'
365+
/ (a || b)+ / # Matches a sequence of 'a's and 'b's, at least one long
366+
367+
=head2 Capturing
368+
369+
The round parenthesis don't just group, they also I<capture>; that is, they
370+
make the string that is matched by grouped part available:
371+
372+
my $str = 'number 42';
373+
if $str ~~ /'number ' (\d+) / {
374+
say "The number is $0";
375+
}
376+
377+
Pairs of parenthesis are numbered left to right, starting from zero.
378+
379+
if 'abc' ~~ /(a) b (c)/ {
380+
say "0: $0; 1: $1"; # 0: a; 1: c
381+
}
382+
383+
The C<$0> and C<$1> etc. syntax is actually just a short-hand; these captures
384+
are canonically available from the match object C<$/> by using it as a list,
385+
so C<$0> is actually a short way to write C<$/[0]>.
386+
387+
Coercing the match object to a list gives an easy way to programmatically
388+
access all elements:
389+
390+
if 'abc' ~~ /(a) b (c)/ {
391+
say $/.list.join: ', ' # a, c
392+
}
393+
394+
=head2 Non-capturing grouping
395+
396+
The parenthesis in regexes perform a double role: they group the regex
397+
elements inside, and they capture what is matched by the sub-regex inside.
398+
399+
To get only the grouping behavior, you can use brackets C<[ ... ]> instead.
400+
401+
if 'abc' ~~ / [a||b] (c) / {
402+
say ~$0; # c
403+
}
404+
405+
If you do not need the captures, using non-capturing groups provides three
406+
benefits: it communicates the intent more clearly, it makes it easier to count
407+
the capturing groups that you do care about, and it is a bit faster.
408+
409+
=head2 Capture Numbers
410+
411+
TODO: describe how alternations affect capturing numbers; nested captures
351412
352413
=head1 Adverbs
353414

0 commit comments

Comments
 (0)