@@ -1813,7 +1813,7 @@ This leaves us with
1813
1813
which is fine if you are only processing one line. But if you're processing
1814
1814
a whole file, suddenly the regex parses
1815
1815
1816
- = begin code :skip-test
1816
+ = begin code :lang<text>
1817
1817
[with a
1818
1818
newline in between]
1819
1819
= end code
@@ -1921,7 +1921,7 @@ the rest.
1921
1921
Since it is possible to execute a piece of code within a regular expression, it is also possible
1922
1922
to inspect the L < Match|/type/Match > object within the regular expression itself:
1923
1923
1924
- = begin code :preamble<my $string = '';>
1924
+ = begin code :preamble<my $string = '';>
1925
1925
my $iteration = 0;
1926
1926
sub show-captures( Match $m ){
1927
1927
my Str $result_split;
@@ -1937,63 +1937,63 @@ sub show-captures( Match $m ){
1937
1937
$string ~~ /(.+)(SQL) (.+) $1 (.+) { show-captures( $/ ); }/;
1938
1938
= end code
1939
1939
1940
- The C < show_captures > method will dump all the elements of C < $/ > producing
1940
+ The C < show-captures > method will dump all the elements of C < $/ > producing
1941
1941
the following output:
1942
1942
1943
- = for code :lang<output >
1944
- === Iteration 1 ===
1945
- Capture 0 = Postgre
1946
- Capture 1 = SQL
1947
- Capture 2 = is an
1948
- [Postgre][SQL][ is an ]
1943
+ = for code :lang<text >
1944
+ === Iteration 1 ===
1945
+ Capture 0 = Postgre
1946
+ Capture 1 = SQL
1947
+ Capture 2 = is an
1948
+ [Postgre][SQL][ is an ]
1949
1949
1950
1950
showing that the string has been splitted around the second occurency of I < SQL > , that
1951
1951
is the repetition of the first capture (C < $/[1] > ).
1952
1952
1953
1953
With that in place, it is now possible to see how the engine backtracks
1954
- to find the above match: it does suffice to move the C < show_captures >
1954
+ to find the above match: it does suffice to move the C < show-captures >
1955
1955
in the middle of the regular expression, in particular before the repetition of the
1956
1956
first capture C < $1 > to see it in action:
1957
1957
1958
- = begin code
1958
+ = begin code :preamble<my $string = '';>
1959
1959
my $iteration = 0;
1960
- sub show_captures ( Match $m ){
1961
- my Str $result_split ;
1960
+ sub show-captures ( Match $m ){
1961
+ my Str $result-split ;
1962
1962
say "\n=== Iteration {++$iteration} ===";
1963
1963
for $m.list.kv -> $i, $capture {
1964
1964
say "Capture $i = $capture";
1965
- $result_split ~= '[' ~ $capture ~ ']';
1965
+ $result-split ~= '[' ~ $capture ~ ']';
1966
1966
}
1967
1967
1968
- say $result_split ;
1968
+ say $result-split ;
1969
1969
}
1970
1970
1971
- $string ~~ / (.+)(SQL) (.+) { show_captures ( $/ ); } $1 /;
1971
+ $string ~~ / (.+)(SQL) (.+) { show-captures ( $/ ); } $1 /;
1972
1972
= end code
1973
1973
1974
1974
The output will be much more verbose and will show several iterations, with the last one
1975
1975
being the I < winning > . The following is an excerpt of the output:
1976
1976
1977
- = begin code :skip-test
1978
- === Iteration 1 ===
1979
- Capture 0 = PostgreSQL is an
1980
- Capture 1 = SQL
1981
- Capture 2 = database!
1982
- [PostgreSQL is an ][SQL][ database!]
1983
-
1984
- === Iteration 2 ===
1985
- Capture 0 = PostgreSQL is an
1986
- Capture 1 = SQL
1987
- Capture 2 = database
1988
- [PostgreSQL is an ][SQL][ database]
1989
-
1990
- ...
1991
-
1992
- === Iteration 24 ===
1993
- Capture 0 = Postgre
1994
- Capture 1 = SQL
1995
- Capture 2 = is an
1996
- [Postgre][SQL][ is an ]
1977
+ = begin code :lang<text>
1978
+ === Iteration 1 ===
1979
+ Capture 0 = PostgreSQL is an
1980
+ Capture 1 = SQL
1981
+ Capture 2 = database!
1982
+ [PostgreSQL is an ][SQL][ database!]
1983
+
1984
+ === Iteration 2 ===
1985
+ Capture 0 = PostgreSQL is an
1986
+ Capture 1 = SQL
1987
+ Capture 2 = database
1988
+ [PostgreSQL is an ][SQL][ database]
1989
+
1990
+ ...
1991
+
1992
+ === Iteration 24 ===
1993
+ Capture 0 = Postgre
1994
+ Capture 1 = SQL
1995
+ Capture 2 = is an
1996
+ [Postgre][SQL][ is an ]
1997
1997
= end code
1998
1998
1999
1999
In the first iteration the I < SQL > part of I < PostgreSQL > is kept within the word: that is not what
@@ -2005,7 +2005,7 @@ After several iterations, the final result is match.
2005
2005
It is worth noting that the final itaration is number I < 24 > , and that such number is exactly
2006
2006
the distance, in number of chars, from the end of the string to the first I < SQL > occurency:
2007
2007
2008
- = begin code
2008
+ = begin code :preamble<my $string = '';>
2009
2009
say $string.chars - $string.index: 'SQL'; # OUTPUT: 23
2010
2010
= end code
2011
2011
@@ -2019,7 +2019,7 @@ it in those cases where the matching can be found I<forward> only.
2019
2019
With regards to the above example, disabling backtracking means
2020
2020
the regular expression will not have any chance to match:
2021
2021
2022
- = begin code
2022
+ = begin code :preamble<my $string = '';>
2023
2023
say $string ~~ /(.+)(SQL) (.+) $1/; # OUTPUT: 「PostgreSQL is an SQL」
2024
2024
say $string ~~ / :r (.+)(SQL) (.+) $1/; # OUTPUT: Nil
2025
2025
= end code
@@ -2046,25 +2046,25 @@ match fails.
2046
2046
It is possible, again, to inspect what the engine performs
2047
2047
introducing a dumping piece of code within the regular expression:
2048
2048
2049
- = begin code
2049
+ = begin code :preamble<my $string = '';>
2050
2050
my $iteration = 0;
2051
- sub show_captures ( Match $m ){
2052
- my Str $result_split ;
2051
+ sub show-captures ( Match $m ){
2052
+ my Str $result-split ;
2053
2053
say "\n=== Iteration {++$iteration} ===";
2054
2054
for $m.list.kv -> $i, $capture {
2055
2055
say "Capture $i = $capture";
2056
- $result_split ~= '[' ~ $capture ~ ']';
2056
+ $result-split ~= '[' ~ $capture ~ ']';
2057
2057
}
2058
2058
2059
- say $result_split ;
2059
+ say $result-split ;
2060
2060
}
2061
2061
2062
- $string ~~ / (SQL) (.+) { show_captures ( $/ ); } $1 /;
2062
+ $string ~~ / (SQL) (.+) { show-captures ( $/ ); } $1 /;
2063
2063
= end code
2064
2064
2065
2065
that produces a rather simple output:
2066
2066
2067
- = begin code :skip-test
2067
+ = begin code :lang<text>
2068
2068
=== Iteration 1 ===
2069
2069
Capture 0 = SQL
2070
2070
Capture 1 = is an SQL database!
@@ -2079,25 +2079,25 @@ Capture 1 = database!
2079
2079
Even using the L < :r|/language/regexes#ratchet > adverb to prevent backtracking will not
2080
2080
change things:
2081
2081
2082
- = begin code
2082
+ = begin code :preamble<my $string = '';>
2083
2083
my $iteration = 0;
2084
- sub show_captures ( Match $m ){
2085
- my Str $result_split ;
2084
+ sub show-captures ( Match $m ){
2085
+ my Str $result-split ;
2086
2086
say "\n=== Iteration {++$iteration} ===";
2087
2087
for $m.list.kv -> $i, $capture {
2088
2088
say "Capture $i = $capture";
2089
- $result_split ~= '[' ~ $capture ~ ']';
2089
+ $result-split ~= '[' ~ $capture ~ ']';
2090
2090
}
2091
2091
2092
- say $result_split ;
2092
+ say $result-split ;
2093
2093
}
2094
2094
2095
- $string ~~ / :r (SQL) (.+) { show_captures ( $/ ); } $1 /;
2095
+ $string ~~ / :r (SQL) (.+) { show-captures ( $/ ); } $1 /;
2096
2096
= end code
2097
2097
2098
2098
and the output will remain the same:
2099
2099
2100
- = begin code :skip-test
2100
+ = begin code :lang<text>
2101
2101
=== Iteration 1 ===
2102
2102
Capture 0 = SQL
2103
2103
Capture 1 = is an SQL database!
0 commit comments