Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/traps'
Browse files Browse the repository at this point in the history
  • Loading branch information
threadless-screw committed Aug 13, 2019
2 parents f06c83c + 092c748 commit 78193a4
Showing 1 changed file with 58 additions and 40 deletions.
98 changes: 58 additions & 40 deletions doc/Language/traps.pod6
Expand Up @@ -921,46 +921,64 @@ of the assignment operators instead:
=head1 Regexes
=head2 C«<{$x}>» vs C«$($x)»: Implicit EVAL
Sometimes you may need to match a generated string in a regex. This can be done
using C<$(…)> or C«<{…}>» syntax:
=for code
my $x = ‘ailemac’;
say ‘I ♥ camelia’ ~~ / $($x.flip) /; # OUTPUT: «「camelia」␤»
say ‘I ♥ camelia’ ~~ / <{$x.flip}> /; # OUTPUT: «「camelia」␤»
However, the latter only works I<sometimes>.
Internally C«<{…}>» EVAL-s the given string inside an anonymous regex, while
C<$(…)> lexically interpolates the given string. So C«<{…}>» immediately breaks
with more complicated inputs. For example:
=for code
my $x = ‘ailemac#’;
say ‘I ♥ #camelia’ ~~ / $($x.flip) /; # OUTPUT: «「#camelia」␤»
# ⚠ ↓↓ WRONG ↓↓ ⚠
say ‘I ♥ #camelia’ ~~ / <{$x.flip}> /;
# OUTPUT:
# ===SORRY!===
# Regex not terminated.
# at EVAL_0:1
# ------> anon regex { #camelia}⏏<EOL>
# Malformed regex
# at EVAL_0:1
# ------> anon regex { #camelia}⏏<EOL>
# expecting any of:
# infix stopper
Therefore, try not to use C«<{}>» unless you really need EVAL.
Note that even though EVAL is normally considered unsafe, in this case
it is restricted to a set of safe operations (which is why it works
without L<MONKEY-SEE-NO-EVAL|
/language/pragmas#index-entry-MONKEY-SEE-NO-EVAL-MONKEY-SEE-NO-EVAL> pragma).
In theory, careless use of C«<{}>» will only result in an exception being
thrown, and should not introduce security issues.
=head1 Interpolation constructs
Perl 6 offers several constructs to generate regexes at runtime through
interpolation (see their detailed description
L<here|/language/regexes#Regex_interpolation>). When a thus generated regex
contains only characters that match themselves, some of these constructs behave
identically, as if they are equivalent alternatives. As soon as the generated
regex contains metacharacters, however, they behave differently, which may come
as an unpleasant and confusing surprise.
The first two constructs that may easily be confused with each other are
C«$variable» and C«<$variable>». The former causes the (stringified) variable to
match literally, while the latter causes the (stringified) variable to match as
a regex. As long as the variable comprises only characters that, in a regex,
match themselves (i.e. alphanumeric characters and the underscore), there is no
distinction between the constructs:
my $variable = 'camelia';
say ‘I ♥ camelia’ ~~ / $variable /; # OUTPUT: 「camelia」
say ‘I ♥ camelia’ ~~ / <$variable> /; # OUTPUT: 「camelia」
But when the variable is changed to comprise regex metacharacters, i.e.
characters that are neither alphanumeric nor the underscore C<_>, the outputs
become different:
my $variable = '#camelia';
say ‘I ♥ #camelia’ ~~ / $variable /; # OUTPUT: 「#camelia」
say ‘I ♥ #camelia’ ~~ / <$variable> /; # !! Error: malformed regex
What happens here is that the string C<#camelia> contains the metacharacter
C<#>. In the context of a regex, this character should be quoted to match
literally; without quoting, the C<#> is parsed as the start of a comment that
runs until the end of the line, which in turn causes the regex not to be
terminated, and thus to be malformed.
Two other constructs that must similarly be distinguished from one another are
C«$(code)» and C«<{code}>». The former construct runs user-specified code within
the regex and interpolates the (stringified) return value literally. The latter
also runs user-specified code within the regex, but interpolates the
(stringified) return value as a regex. So, like before, as long as the return
value comprises only characters that match literally in a regex, there is no
distinction between the two:
my $variable = 'ailemac;
say ‘I ♥ camelia’ ~~ / $($variable.flip) /; # OUTPUT: 「camelia」
say ‘I ♥ camelia’ ~~ / <{$variable.flip}> /; # OUTPUT: 「camelia」
But when the return value is changed to comprise regex metacharacters, the
outputs diverge:
my $variable = 'ailema.';
say ‘I ♥ camelia’ ~~ / $($variable.flip) /; # OUTPUT: Nil
say ‘I ♥ camelia’ ~~ / <{$variable.flip}> /; # OUTPUT: 「camelia」
In this case the return value of the code is the string C<.amelia>, which
contains the metacharacter C<.>. The above attempt by C«$(code)» to match the
dot literally fails; the attempt by C«<{code}>» to match the dot as a regex
wildcard succeeds. Hence the different outputs.
=head2 C<|> vs C<||>: which branch will win
Expand Down

0 comments on commit 78193a4

Please sign in to comment.