Skip to content

Commit 79b69a8

Browse files
committed
[regexes] named captures, subrules
1 parent e3fec75 commit 79b69a8

File tree

1 file changed

+83
-0
lines changed

1 file changed

+83
-0
lines changed

lib/Language/regexes.pod

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -433,6 +433,89 @@ Captures can be nested, in which case they are numbered per level
433433
say "Inner: $0[0] and $0[1]"; # Inner: b and c
434434
}
435435
436+
=head2 Named Captures
437+
438+
Instead of numbering captures, you can also give them names. The generic,
439+
and slightly verbose way of giving out names is like this:
440+
441+
if 'abc' ~~ / $<myname> = [ \w+ ] / {
442+
say ~$<myname> # abc
443+
}
444+
445+
The access to the named capture, C<< $<myname> >>, is a shortcut for indexing
446+
the match object as a hash, so C<$/{ 'myname' }> or C<< $/<myname> >>.
447+
448+
Coercing the match object to a hash gives you easy programmatic access to all
449+
named captures:
450+
451+
if 'count=23' ~~ / $<variable>=\w+ '=' $<value>=\w+ / {
452+
my %h = $/.hash;
453+
say %h.keys.sort.join: ', '; # value, variable
454+
say %h.values.sort.join: ', '; # 23, count
455+
for %h.kv -> $k, $v {
456+
say "Found value '$v' with key '$k'";
457+
# outputs two lines:
458+
# Found value 'count' with key 'variable'
459+
# Found value '23' with key 'value'
460+
}
461+
}
462+
463+
But there is a more convenient way to get named captures, discussed in the
464+
next section.
465+
466+
=head1 Subrules
467+
468+
Just like you can put pieces of code into subroutines, so you can also put
469+
pieces of regex into named rules.
470+
471+
my regex line { \N*\n }
472+
if "abc\ndef" ~~ /<line> def/ {
473+
say "First line: ", $<line>.chomp; # first line: abc
474+
}
475+
476+
A named regex can be declared with C<my regex thename { body here }>, and
477+
called with C<< <thename> >>. At the same time, calling a named regex
478+
installs a named capture with the same name.
479+
480+
If the capture should be of a different name, that can be achieved with the
481+
syntax C<< <capturename=regexname> >>. If no capture at all is desired, a
482+
leading dot will surpress it: C<< <.regexname> >>.
483+
484+
Here is a bit more complete (yet still fairly limited) code for parsing ini
485+
files:
486+
487+
my regex header { \s* '[' (\w+) ']' \h* \n+ }
488+
my regex identifier { \w+ }
489+
my regex kvpair { \s* <key=identifier> '=' <value=identifier> \n+ }
490+
my regex section {
491+
<header>
492+
<kvpair>*
493+
}
494+
495+
my $contents = q:to/EOI/;
496+
[passwords]
497+
jack=password1
498+
joy=muchmoresecure123
499+
[quotas]
500+
jack=123
501+
joy=42
502+
EOI
503+
504+
my %config;
505+
if $contents ~~ /<section>*/ {
506+
for $<section>.list -> $section {
507+
my %section;
508+
for $section<kvpair>.list -> $p {
509+
say $p<value>;
510+
%section{ $p<key> } = ~$p<value>;
511+
}
512+
%config{ $section<header>[0] } = %section;
513+
}
514+
}
515+
say %config.perl;
516+
# ("passwords" => {"jack" => "password1", "joy" => "muchmoresecure123"},
517+
# "quotas" => {"jack" => "123", "joy" => "42"}).hash
518+
436519
=head1 Adverbs
437520
438521
TODO

0 commit comments

Comments
 (0)