Skip to content

Commit dd8a19e

Browse files
authored
Update regexes.pod6
1 parent fad52e2 commit dd8a19e

File tree

1 file changed

+52
-52
lines changed

1 file changed

+52
-52
lines changed

doc/Language/regexes.pod6

Lines changed: 52 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1813,7 +1813,7 @@ This leaves us with
18131813
which is fine if you are only processing one line. But if you're processing
18141814
a whole file, suddenly the regex parses
18151815
1816-
=begin code :skip-test
1816+
=begin code :lang<text>
18171817
[with a
18181818
newline in between]
18191819
=end code
@@ -1921,7 +1921,7 @@ the rest.
19211921
Since it is possible to execute a piece of code within a regular expression, it is also possible
19221922
to inspect the L<Match|/type/Match> object within the regular expression itself:
19231923
1924-
=begin code :preamble<my $string = '';>
1924+
=begin code :preamble<my $string = '';>
19251925
my $iteration = 0;
19261926
sub show-captures( Match $m ){
19271927
my Str $result_split;
@@ -1937,63 +1937,63 @@ sub show-captures( Match $m ){
19371937
$string ~~ /(.+)(SQL) (.+) $1 (.+) { show-captures( $/ ); }/;
19381938
=end code
19391939
1940-
The C<show_captures> method will dump all the elements of C<$/> producing
1940+
The C<show-captures> method will dump all the elements of C<$/> producing
19411941
the following output:
19421942
1943-
=for code :lang<output>
1944-
=== Iteration 1 ===
1945-
Capture 0 = Postgre
1946-
Capture 1 = SQL
1947-
Capture 2 = is an
1948-
[Postgre][SQL][ is an ]
1943+
=for code :lang<text>
1944+
=== Iteration 1 ===
1945+
Capture 0 = Postgre
1946+
Capture 1 = SQL
1947+
Capture 2 = is an
1948+
[Postgre][SQL][ is an ]
19491949
19501950
showing that the string has been splitted around the second occurency of I<SQL>, that
19511951
is the repetition of the first capture (C<$/[1]>).
19521952
19531953
With that in place, it is now possible to see how the engine backtracks
1954-
to find the above match: it does suffice to move the C<show_captures>
1954+
to find the above match: it does suffice to move the C<show-captures>
19551955
in the middle of the regular expression, in particular before the repetition of the
19561956
first capture C<$1> to see it in action:
19571957
1958-
=begin code
1958+
=begin code :preamble<my $string = '';>
19591959
my $iteration = 0;
1960-
sub show_captures( Match $m ){
1961-
my Str $result_split;
1960+
sub show-captures( Match $m ){
1961+
my Str $result-split;
19621962
say "\n=== Iteration {++$iteration} ===";
19631963
for $m.list.kv -> $i, $capture {
19641964
say "Capture $i = $capture";
1965-
$result_split ~= '[' ~ $capture ~ ']';
1965+
$result-split ~= '[' ~ $capture ~ ']';
19661966
}
19671967
1968-
say $result_split;
1968+
say $result-split;
19691969
}
19701970
1971-
$string ~~ / (.+)(SQL) (.+) { show_captures( $/ ); } $1 /;
1971+
$string ~~ / (.+)(SQL) (.+) { show-captures( $/ ); } $1 /;
19721972
=end code
19731973
19741974
The output will be much more verbose and will show several iterations, with the last one
19751975
being the I<winning>. The following is an excerpt of the output:
19761976
1977-
=begin code :skip-test
1978-
=== Iteration 1 ===
1979-
Capture 0 = PostgreSQL is an
1980-
Capture 1 = SQL
1981-
Capture 2 = database!
1982-
[PostgreSQL is an ][SQL][ database!]
1983-
1984-
=== Iteration 2 ===
1985-
Capture 0 = PostgreSQL is an
1986-
Capture 1 = SQL
1987-
Capture 2 = database
1988-
[PostgreSQL is an ][SQL][ database]
1989-
1990-
...
1991-
1992-
=== Iteration 24 ===
1993-
Capture 0 = Postgre
1994-
Capture 1 = SQL
1995-
Capture 2 = is an
1996-
[Postgre][SQL][ is an ]
1977+
=begin code :lang<text>
1978+
=== Iteration 1 ===
1979+
Capture 0 = PostgreSQL is an
1980+
Capture 1 = SQL
1981+
Capture 2 = database!
1982+
[PostgreSQL is an ][SQL][ database!]
1983+
1984+
=== Iteration 2 ===
1985+
Capture 0 = PostgreSQL is an
1986+
Capture 1 = SQL
1987+
Capture 2 = database
1988+
[PostgreSQL is an ][SQL][ database]
1989+
1990+
...
1991+
1992+
=== Iteration 24 ===
1993+
Capture 0 = Postgre
1994+
Capture 1 = SQL
1995+
Capture 2 = is an
1996+
[Postgre][SQL][ is an ]
19971997
=end code
19981998
19991999
In the first iteration the I<SQL> part of I<PostgreSQL> is kept within the word: that is not what
@@ -2005,7 +2005,7 @@ After several iterations, the final result is match.
20052005
It is worth noting that the final itaration is number I<24>, and that such number is exactly
20062006
the distance, in number of chars, from the end of the string to the first I<SQL> occurency:
20072007
2008-
=begin code
2008+
=begin code :preamble<my $string = '';>
20092009
say $string.chars - $string.index: 'SQL'; # OUTPUT: 23
20102010
=end code
20112011
@@ -2019,7 +2019,7 @@ it in those cases where the matching can be found I<forward> only.
20192019
With regards to the above example, disabling backtracking means
20202020
the regular expression will not have any chance to match:
20212021
2022-
=begin code
2022+
=begin code :preamble<my $string = '';>
20232023
say $string ~~ /(.+)(SQL) (.+) $1/; # OUTPUT: 「PostgreSQL is an SQL」
20242024
say $string ~~ / :r (.+)(SQL) (.+) $1/; # OUTPUT: Nil
20252025
=end code
@@ -2046,25 +2046,25 @@ match fails.
20462046
It is possible, again, to inspect what the engine performs
20472047
introducing a dumping piece of code within the regular expression:
20482048
2049-
=begin code
2049+
=begin code :preamble<my $string = '';>
20502050
my $iteration = 0;
2051-
sub show_captures( Match $m ){
2052-
my Str $result_split;
2051+
sub show-captures( Match $m ){
2052+
my Str $result-split;
20532053
say "\n=== Iteration {++$iteration} ===";
20542054
for $m.list.kv -> $i, $capture {
20552055
say "Capture $i = $capture";
2056-
$result_split ~= '[' ~ $capture ~ ']';
2056+
$result-split ~= '[' ~ $capture ~ ']';
20572057
}
20582058
2059-
say $result_split;
2059+
say $result-split;
20602060
}
20612061
2062-
$string ~~ / (SQL) (.+) { show_captures( $/ ); } $1 /;
2062+
$string ~~ / (SQL) (.+) { show-captures( $/ ); } $1 /;
20632063
=end code
20642064
20652065
that produces a rather simple output:
20662066
2067-
=begin code :skip-test
2067+
=begin code :lang<text>
20682068
=== Iteration 1 ===
20692069
Capture 0 = SQL
20702070
Capture 1 = is an SQL database!
@@ -2079,25 +2079,25 @@ Capture 1 = database!
20792079
Even using the L<:r|/language/regexes#ratchet> adverb to prevent backtracking will not
20802080
change things:
20812081
2082-
=begin code
2082+
=begin code :preamble<my $string = '';>
20832083
my $iteration = 0;
2084-
sub show_captures( Match $m ){
2085-
my Str $result_split;
2084+
sub show-captures( Match $m ){
2085+
my Str $result-split;
20862086
say "\n=== Iteration {++$iteration} ===";
20872087
for $m.list.kv -> $i, $capture {
20882088
say "Capture $i = $capture";
2089-
$result_split ~= '[' ~ $capture ~ ']';
2089+
$result-split ~= '[' ~ $capture ~ ']';
20902090
}
20912091
2092-
say $result_split;
2092+
say $result-split;
20932093
}
20942094
2095-
$string ~~ / :r (SQL) (.+) { show_captures( $/ ); } $1 /;
2095+
$string ~~ / :r (SQL) (.+) { show-captures( $/ ); } $1 /;
20962096
=end code
20972097
20982098
and the output will remain the same:
20992099
2100-
=begin code :skip-test
2100+
=begin code :lang<text>
21012101
=== Iteration 1 ===
21022102
Capture 0 = SQL
21032103
Capture 1 = is an SQL database!

0 commit comments

Comments
 (0)