Skip to content

Commit 424e055

Browse files
Further revision; clarification of cascading interpolation
1 parent 553d4bc commit 424e055

File tree

1 file changed

+63
-47
lines changed

1 file changed

+63
-47
lines changed

doc/Language/regexes.pod6

Lines changed: 63 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1603,7 +1603,7 @@ pattern, which may be summarized as follows:
16031603
| stringified return value literally.
16041604
C«<$variable>» | Interpolates stringified contents of variable as a regex.
16051605
C«<{code}>» | Runs Perl6 code inside the regex, and interpolates the
1606-
| return value as a regex.
1606+
| stringified return value as a regex.
16071607
16081608
=end table
16091609
@@ -1614,26 +1614,33 @@ value isn't a L<Regex object|/type/Regex>. If the value is a Regex, it will not
16141614
be stringified, but instead be interpolated as such. 'Literally' means
16151615
I<strictly literally>, that is: as if the respective stringified value is quoted
16161616
with a basic C<Q> string L<C<Q[...]>|/language/quoting#Literal_strings:_Q>.
1617-
Consequently, the stringified value will not itself undergo any (second-level)
1617+
Consequently, the stringified value will not itself undergo any further
16181618
interpolation.
16191619
1620-
my $string = 'Is this a regex or a string: 123\w+False ?';
1620+
my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?';
16211621
my $pattern1 = 'string';
16221622
my $pattern2 = '\w+';
16231623
my $pattern3 = 'gnirts';
1624+
my $pattern4 = '$pattern1';
16241625
my $number = 123;
16251626
my $bool = True;
1627+
my $regex = /\w+/;
1628+
my sub f1 { return Q[$pattern1] };
16261629
1627-
say $string.match: / 'string' /; # [1] OUTPUT: 「string」
1628-
say $string.match: / $pattern1 /; # [2] OUTPUT: 「string」
1629-
say $string.match: / $pattern2 /; # [3] OUTPUT: 「\w+」
1630-
say $string.match: / $number /; # [4] OUTPUT: 「123」
1630+
say $string.match: / 'string' /; # [1] OUTPUT: 「string」
1631+
say $string.match: / $pattern1 /; # [2] OUTPUT: 「string」
1632+
say $string.match: / $pattern2 /; # [3] OUTPUT: 「\w+」
1633+
say $string.match: / $regex /; # [4] OUTPUT: 「Is」
1634+
say $string.match: / $number /; # [5] OUTPUT: 「123」
16311635
1632-
say $string.match: / $pattern3.flip /; # [5] OUTPUT: Nil
1633-
say $string.match: / "$pattern3.flip()" /; # [6] OUTPUT: 「string」
1634-
say $string.match: / $($pattern3.flip) /; # [7] OUTPUT: 「string」
1635-
say $string.match: / $([~] $pattern3.comb.reverse) /; # [8] OUTPUT: 「string」
1636-
say $string.match: / $(!$bool) /; # [9] OUTPUT: 「False」
1636+
say $string.match: / $pattern3.flip /; # [6] OUTPUT: Nil
1637+
say $string.match: / "$pattern3.flip()" /; # [7] OUTPUT: 「string」
1638+
say $string.match: / $($pattern3.flip) /; # [8] OUTPUT: 「string」
1639+
say $string.match: / $([~] $pattern3.comb.reverse) /; # [9] OUTPUT: 「string」
1640+
say $string.match: / $(!$bool) /; # [10] OUTPUT: 「False」
1641+
1642+
say $string.match: / $pattern4 /; # [11] OUTPUT: 「$pattern1」
1643+
say $string.match: / $(f1) /; # [12] OUTPUT: 「$pattern1」
16371644
16381645
In this example, the statements C<[1]> and C<[2]> are equivalent and meant to
16391646
illustrate a plain case of regex interpolation. Since unescaped/unquoted
@@ -1643,62 +1650,71 @@ to emphasize the correspondence between the first two statements. Statement
16431650
C<[3]> unambiguously shows that the string pattern held by C<$pattern2> is
16441651
interpreted literally, and not as a regex. In case it would have been
16451652
interpreted as a regex, it would have matched the first word of C<$string>, i.e.
1646-
C<「Is」>. Statement C<[4]> shows how the stringified number is used as a match
1647-
pattern.
1653+
C<「Is」>, as can be seen in statement <[4]>. Statement C<[5]> shows how the
1654+
stringified number is used as a match pattern.
16481655
1649-
Statement C<[5]> does not work as intended. To the human reader, the dot C<.>
1650-
may seem to represent the L<method call operator|/language/operators#methodop_.>,
1651-
but given the regex context the compiler will parse it as the regex wildcard
1656+
Statement C<[6]> does not work as probably intended. To the human reader, the
1657+
dot C<.> may seem to represent the L<method call operator|/language/operators#methodop_.>,
1658+
but since a dot is not a valid character for an L<ordinary identifier|/language/syntax#Ordinary_identifiers>,
1659+
and given the regex context, the compiler will parse it as the regex wildcard
16521660
L<.|/language/regexes#Wildcards> that matches any character. The apparent
16531661
ambiguity may be resolved in various ways, for instance through the use of
1654-
straightforward L<string interpolation|/language/quoting#Interpolation:_qq> from
1655-
the regex as in statement C<[6]> (note that the inclusion of the call operator
1656-
C<()> is key here), or by using the second syntax form from the above table as
1657-
in statement C<[7]>, in which case the match pattern C<'string'> first emerges
1658-
as the return value of the C<flip> method call. Since general Perl6 code may be
1659-
run from within the parentheses of C<$( )>, the same effect can also be achieved
1660-
with a bit more effort, like in statement C<[8]>. Statement C<[9]> illustrates
1661-
how the stringified version of the code's return value (the boolean value
1662-
C<False>) is matched literally.
1662+
straightforward L<string interpolation|/language/quoting#Interpolation:_qq>
1663+
from the regex as in statement C<[7]> (note that the inclusion of the call
1664+
operator C<()> is key here), or by using the second syntax form from the above
1665+
table as in statement C<[8]>, in which case the match pattern C<string> first
1666+
emerges as the return value of the C<flip> method call. Since general Perl6
1667+
code may be run from within the parentheses of C<$( )>, the same effect can
1668+
also be achieved with a bit more effort, like in statement C<[9]>. Statement
1669+
C<[10]> illustrates how the stringified version of the code's return value (the
1670+
boolean value C<False>) is matched literally.
1671+
1672+
Finally, statements C<[11]> and C<[12]> show how the value of C<$pattern4> and
1673+
the return value of C<f1> are I<not> subject to a further round of
1674+
interpolation. Hence, in general, after possible stringification, C«$variable»
1675+
and C«$(code)» provide for a strictly literal match of the variable or return
1676+
value.
16631677
16641678
Now consider the second two syntactical forms from the table above:
16651679
C«<$variable>» and C«<${code}>». These forms will stringify the value of the
16661680
variable or the return value of the code and interpolate it as a regex. If the
1667-
respective value is a Regex, it is interpolated as such. 'Interpolated as a
1668-
regex' means interpolated/inserted into the target Regex without protective
1669-
quoting. Consequently, the further evaluation of the target Regex may trigger
1670-
the (second-level) interpolation of any variables it contains.
1681+
respective value is a Regex, it is interpolated as such:
16711682
16721683
my $string = 'Is this a regex or a string: 123\w+$x ?';
16731684
my $pattern1 = '\w+';
16741685
my $number = 123;
16751686
my sub f1 { return /s\w+/ };
1676-
my sub f2 (Str $x) { return /$x x/ };
1677-
my sub f3 { return Q[$x] };
16781687
16791688
say $string.match: / <$pattern1> /; # [1] OUTPUT: 「Is」
16801689
say $string.match: / <$number> /; # [2] OUTPUT: 「123」
16811690
say $string.match: / <{ f1 }> /; # [3] OUTPUT: 「string」
16821691
1683-
my $x = "rege";
1684-
say $string.match: / <{ f2($x) }> /; # [4] OUTPUT: 「regex」
1685-
say $string.match: / <{ f3 }> /; # [5] OUTPUT: 「rege」
1686-
1687-
In statement C<[4]> use is made of the function C<f2>, which acts as a (very
1688-
simple) "regex factory": you can pass it a string variable, and it will return a
1689-
Regex object into which the variable has been interpolated. In this case, C<f2>
1690-
appends the letter 'x' to whatever string it is passed. The Regex that is
1691-
returned by C<f2> is in turn inserted into target Regex by the C<{...}>
1692-
construct. Statement C<[5]> illustrates another case of two-fold regex
1693-
interpolation. When the target Regex is constructed, the strictly literal string
1694-
value C<$x> is interpolated into it by the C<{...}> construct. When the Regex is
1695-
evaluated further, the unprotected variable C<$x> is interpolated, i.e. replaced
1696-
by the string value C<rege>, which explains the match.
1692+
Importantly, 'interpolated as a regex' means interpolated/inserted into the
1693+
target Regex without protective quoting. Consequently, if the value of the
1694+
variable C<$variable1> is itself of the form C<$variable2>, evaluation of
1695+
C«<$variable1>» or C«<{ $variable1 }>» inside a target regex C</.../> will cause
1696+
the target regex to assume the form C</$variable2/>. As described above, the
1697+
evaluation of this regex will then trigger further interpolation of
1698+
C<$variable2>:
1699+
1700+
my $string = Q[Mindfuck \w+ $variable1 $variable2];
1701+
my $variable1 = Q[\w+];
1702+
my $variable2 = Q[$variable1];
1703+
my sub f1 { return Q[$variable2] };
1704+
1705+
# /<{ f1 }>/ ==> /$variable2/ ==> / '$variable1' /
1706+
say $string.match: / <{ f1 }> /; # OUTPUT: 「$variable1」
1707+
1708+
# /<$variable2>/ ==> /$variable1/ ==> / '\w+' /
1709+
say $string.match: /<$variable2>/; # OUTPUT: 「\w+」
1710+
1711+
# /<$variable1>/ ==> /\w+/
1712+
say $string.match: /<$variable1>/; # OUTPUT: 「Mindfuck」
16971713
16981714
Note: it may be desired to run arbitrary code from within the regex I<without>
16991715
making use of its return value inside the regex. This may, for instance, come in
17001716
handy when debugging a regex or figuring out just how it matches. In such a
1701-
case, rather than (ab)using either C«$($pattern) or C«<{$pattern}>», you may
1717+
case, rather than (ab)using either C«$($pattern)» or C«<{$pattern}>», you may
17021718
simply use C<{ }> to insert a code block:
17031719
17041720
my sub nplus1($n) {$n +1}

0 commit comments

Comments
 (0)