Skip to content

Commit 092c748

Browse files
New regex interpolation trap section
1 parent 98f809b commit 092c748

File tree

1 file changed

+59
-0
lines changed

1 file changed

+59
-0
lines changed

doc/Language/traps.pod6

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -921,6 +921,65 @@ of the assignment operators instead:
921921
922922
=head1 Regexes
923923
924+
=head1 Interpolation constructs
925+
926+
Perl 6 offers several constructs to generate regexes at runtime through
927+
interpolation (see their detailed description
928+
L<here|/language/regexes#Regex_interpolation>). When a thus generated regex
929+
contains only characters that match themselves, some of these constructs behave
930+
identically, as if they are equivalent alternatives. As soon as the generated
931+
regex contains metacharacters, however, they behave differently, which may come
932+
as an unpleasant and confusing surprise.
933+
934+
The first two constructs that may easily be confused with each other are
935+
C«$variable» and C«<$variable>». The former causes the (stringified) variable to
936+
match literally, while the latter causes the (stringified) variable to match as
937+
a regex. As long as the variable comprises only characters that, in a regex,
938+
match themselves (i.e. alphanumeric characters and the underscore), there is no
939+
distinction between the constructs:
940+
941+
my $variable = 'camelia';
942+
say ‘I ♥ camelia’ ~~ / $variable /; # OUTPUT: 「camelia」
943+
say ‘I ♥ camelia’ ~~ / <$variable> /; # OUTPUT: 「camelia」
944+
945+
But when the variable is changed to comprise regex metacharacters, i.e.
946+
characters that are neither alphanumeric nor the underscore C<_>, the outputs
947+
become different:
948+
949+
my $variable = '#camelia';
950+
say ‘I ♥ #camelia’ ~~ / $variable /; # OUTPUT: 「#camelia」
951+
say ‘I ♥ #camelia’ ~~ / <$variable> /; # !! Error: malformed regex
952+
953+
What happens here is that the string C<#camelia> contains the metacharacter
954+
C<#>. In the context of a regex, this character should be quoted to match
955+
literally; without quoting, the C<#> is parsed as the start of a comment that
956+
runs until the end of the line, which in turn causes the regex not to be
957+
terminated, and thus to be malformed.
958+
959+
Two other constructs that must similarly be distinguished from one another are
960+
C«$(code)» and C«<{code}>». The former construct runs user-specified code within
961+
the regex and interpolates the (stringified) return value literally. The latter
962+
also runs user-specified code within the regex, but interpolates the
963+
(stringified) return value as a regex. So, like before, as long as the return
964+
value comprises only characters that match literally in a regex, there is no
965+
distinction between the two:
966+
967+
my $variable = 'ailemac;
968+
say ‘I ♥ camelia’ ~~ / $($variable.flip) /; # OUTPUT: 「camelia」
969+
say ‘I ♥ camelia’ ~~ / <{$variable.flip}> /; # OUTPUT: 「camelia」
970+
971+
But when the return value is changed to comprise regex metacharacters, the
972+
outputs diverge:
973+
974+
my $variable = 'ailema.';
975+
say ‘I ♥ camelia’ ~~ / $($variable.flip) /; # OUTPUT: Nil
976+
say ‘I ♥ camelia’ ~~ / <{$variable.flip}> /; # OUTPUT: 「camelia」
977+
978+
In this case the return value of the code is the string C<.amelia>, which
979+
contains the metacharacter C<.>. The above attempt by C«$(code)» to match the
980+
dot literally fails; the attempt by C«<{code}>» to match the dot as a regex
981+
wildcard succeeds. Hence the different outputs.
982+
924983
=head2 C<|> vs C<||>: which branch will win
925984
926985
To match one of several possible alternatives, C<||> or C<|> will be used. But

0 commit comments

Comments
 (0)