Skip to content

Commit

Permalink
Minor amendments
Browse files Browse the repository at this point in the history
  • Loading branch information
threadless-screw committed Aug 5, 2019
1 parent 424e055 commit 856e194
Showing 1 changed file with 35 additions and 49 deletions.
84 changes: 35 additions & 49 deletions doc/Language/regexes.pod6
Expand Up @@ -1588,7 +1588,7 @@ Instead of using a literal pattern for a regex match, you can use a variable
that holds that pattern. This variable can then be 'interpolated' into a regex,
such that its appearance in the regex is replaced with the pattern that it
holds. The advantage of using interpolation this way, is that the pattern need
not be hardcoded in the source of your Perl6 program, but may instead be
not be hardcoded in the source of your Perl 6 program, but may instead be
variable and generated at runtime.
There are four different ways of interpolating a variable into a regex as a
Expand All @@ -1599,49 +1599,38 @@ pattern, which may be summarized as follows:
Syntax | Description
===============+===========================================================
C«$variable» | Interpolates stringified contents of variable literally.
C«$(code)» | Runs Perl6 code inside the regex, and interpolates the
C«$(code)» | Runs Perl 6 code inside the regex, and interpolates the
| stringified return value literally.
C«<$variable>» | Interpolates stringified contents of variable as a regex.
C«<{code}>» | Runs Perl6 code inside the regex, and interpolates the
C«<{code}>» | Runs Perl 6 code inside the regex, and interpolates the
| stringified return value as a regex.
=end table
Let's start with the first two syntactical forms: C«$variable» and C«$(code)».
These forms will interpolate the stringified value of the variable or the
stringified return value of the code literally, provided that the respective
value isn't a L<Regex object|/type/Regex>. If the value is a Regex, it will not
be stringified, but instead be interpolated as such. 'Literally' means
value isn't a L<C<Regex>|/type/Regex> object. If the value is a C<Regex>, it
will not be stringified, but instead be interpolated as such. 'Literally' means
I<strictly literally>, that is: as if the respective stringified value is quoted
with a basic C<Q> string L<C<Q[...]>|/language/quoting#Literal_strings:_Q>.
Consequently, the stringified value will not itself undergo any further
interpolation.
For C«$variable» this means the following:
my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?';
my $pattern1 = 'string';
my $pattern2 = '\w+';
my $pattern3 = 'gnirts';
my $pattern4 = '$pattern1';
my $number = 123;
my $bool = True;
my $regex = /\w+/;
my sub f1 { return Q[$pattern1] };
say $string.match: / 'string' /; # [1] OUTPUT: 「string」
say $string.match: / $pattern1 /; # [2] OUTPUT: 「string」
say $string.match: / $pattern2 /; # [3] OUTPUT: 「\w+」
say $string.match: / $regex /; # [4] OUTPUT: 「Is」
say $string.match: / $number /; # [5] OUTPUT: 「123」
say $string.match: / $pattern3.flip /; # [6] OUTPUT: Nil
say $string.match: / "$pattern3.flip()" /; # [7] OUTPUT: 「string」
say $string.match: / $($pattern3.flip) /; # [8] OUTPUT: 「string」
say $string.match: / $([~] $pattern3.comb.reverse) /; # [9] OUTPUT: 「string」
say $string.match: / $(!$bool) /; # [10] OUTPUT: 「False」
say $string.match: / $pattern4 /; # [11] OUTPUT: 「$pattern1」
say $string.match: / $(f1) /; # [12] OUTPUT: 「$pattern1」
In this example, the statements C<[1]> and C<[2]> are equivalent and meant to
illustrate a plain case of regex interpolation. Since unescaped/unquoted
alphabetic characters in a regex match literally, the single quotes in the regex
Expand All @@ -1650,9 +1639,27 @@ to emphasize the correspondence between the first two statements. Statement
C<[3]> unambiguously shows that the string pattern held by C<$pattern2> is
interpreted literally, and not as a regex. In case it would have been
interpreted as a regex, it would have matched the first word of C<$string>, i.e.
C<「Is」>, as can be seen in statement <[4]>. Statement C<[5]> shows how the
C<「Is」>, as can be seen in statement C<[4]>. Statement C<[5]> shows how the
stringified number is used as a match pattern.
This code exemplifies the use of the C«$(code)» syntax:
my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?';
my $pattern1 = 'string';
my $pattern3 = 'gnirts';
my $pattern4 = '$pattern1';
my $bool = True;
my sub f1 { return Q[$pattern1] };
say $string.match: / $pattern3.flip /; # [6] OUTPUT: Nil
say $string.match: / "$pattern3.flip()" /; # [7] OUTPUT: 「string」
say $string.match: / $($pattern3.flip) /; # [8] OUTPUT: 「string」
say $string.match: / $([~] $pattern3.comb.reverse) /; # [9] OUTPUT: 「string」
say $string.match: / $(!$bool) /; # [10] OUTPUT: 「False」
say $string.match: / $pattern4 /; # [11] OUTPUT: 「$pattern1」
say $string.match: / $(f1) /; # [12] OUTPUT: 「$pattern1」
Statement C<[6]> does not work as probably intended. To the human reader, the
dot C<.> may seem to represent the L<method call operator|/language/operators#methodop_.>,
but since a dot is not a valid character for an L<ordinary identifier|/language/syntax#Ordinary_identifiers>,
Expand All @@ -1663,7 +1670,7 @@ straightforward L<string interpolation|/language/quoting#Interpolation:_qq>
from the regex as in statement C<[7]> (note that the inclusion of the call
operator C<()> is key here), or by using the second syntax form from the above
table as in statement C<[8]>, in which case the match pattern C<string> first
emerges as the return value of the C<flip> method call. Since general Perl6
emerges as the return value of the C<flip> method call. Since general Perl 6
code may be run from within the parentheses of C<$( )>, the same effect can
also be achieved with a bit more effort, like in statement C<[9]>. Statement
C<[10]> illustrates how the stringified version of the code's return value (the
Expand All @@ -1676,21 +1683,21 @@ and C«$(code)» provide for a strictly literal match of the variable or return
value.
Now consider the second two syntactical forms from the table above:
C«<$variable>» and C«<${code}>». These forms will stringify the value of the
C«<$variable>» and C«<{code}>». These forms will stringify the value of the
variable or the return value of the code and interpolate it as a regex. If the
respective value is a Regex, it is interpolated as such:
respective value is a C<Regex>, it is interpolated as such:
my $string = 'Is this a regex or a string: 123\w+$x ?';
my $pattern1 = '\w+';
my $number = 123;
my sub f1 { return /s\w+/ };
say $string.match: / <$pattern1> /; # [1] OUTPUT: 「Is」
say $string.match: / <$number> /; # [2] OUTPUT: 「123」
say $string.match: / <{ f1 }> /; # [3] OUTPUT: 「string」
say $string.match: / <$pattern1> /; # OUTPUT: 「Is」
say $string.match: / <$number> /; # OUTPUT: 「123」
say $string.match: / <{ f1 }> /; # OUTPUT: 「string」
Importantly, 'interpolated as a regex' means interpolated/inserted into the
target Regex without protective quoting. Consequently, if the value of the
Importantly, 'to interpolate as a regex' means to interpolate/insert into the
target regex without protective quoting. Consequently, if the value of the
variable C<$variable1> is itself of the form C<$variable2>, evaluation of
C«<$variable1>» or C«<{ $variable1 }>» inside a target regex C</.../> will cause
the target regex to assume the form C</$variable2/>. As described above, the
Expand All @@ -1711,32 +1718,11 @@ C<$variable2>:
# /<$variable1>/ ==> /\w+/
say $string.match: /<$variable1>/; # OUTPUT: 「Mindfuck」
Note: it may be desired to run arbitrary code from within the regex I<without>
making use of its return value inside the regex. This may, for instance, come in
handy when debugging a regex or figuring out just how it matches. In such a
case, rather than (ab)using either C«$($pattern)» or C«<{$pattern}>», you may
simply use C<{ }> to insert a code block:
my sub nplus1($n) {$n +1}
my regex nnplus1 { ^ (\d+) { say $0 } <{ nplus1($0) }> $ }
say so '23' ~~ &nnplus1; # Output: 「23」, 「2」, True
say so '22' ~~ &nnplus1; # Output: 「22」, 「2」, False
Here the C<{ say $0 }> block was included to provide insight into the matching
process: it prints the first positional capture C<$0> each time regex evaluation
reaches its position, and so reveals that the matcher is
L<backtracking|/language/regexes#Backtracking> in its attempts to match the
regex to the target strings. Of course, the C<say> statement could alternatively
have been included in the C«<{ }>» construct, notably I<before> the function
call C<nplus1($0)> to avoid its own return value (i.e C<True>) from being
returned from the C«<{ }>» construct instead of the return value of C<nplus1()>,
but for illustrative purposes the two constructs are shown side by side.
When an array variable is interpolated into a regex, the regex engine handles it
like a C<|> alternative of the regex elements (see the documentation on
L<embedded lists|/language/regexes#Quoted_lists_are_LTM_matches>, above). The
interpolation rules for individual elements are the same as for scalars, so
strings and numbers match literally, and L<Regex|/type/Regex> objects match as
strings and numbers match literally, and L<C<Regex>|/type/Regex> objects match as
regexes. Just as with ordinary C<|> interpolation, the longest match succeeds:
my @a = '2', 23, rx/a.+/;
Expand Down

0 comments on commit 856e194

Please sign in to comment.