Skip to content

Commit 81b64c4

Browse files
Some redactional changes and corrections
1 parent 2377cab commit 81b64c4

File tree

1 file changed

+31
-25
lines changed

1 file changed

+31
-25
lines changed

doc/Language/regexes.pod6

Lines changed: 31 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,22 @@ matching those patterns to actual text.
1616
Perl 6 has special syntax for literal regexes:
1717
1818
m/abc/; # a regex that is immediately matched against $_
19-
rx/abc/; # a Regex object; allow adverbs to be used before regex
19+
rx/abc/; # a Regex object; 'rx' may be followed by regex adverbs
2020
/abc/; # a Regex object; shorthand version of 'rx/ /' operator
2121
2222
For the first two examples, delimiters other than the slash can be used:
2323
2424
m{abc};
2525
rx[abc];
2626
27-
Note that neither the colon nor round parentheses can be delimiters; the colon
28-
is forbidden because it clashes with adverbs, such as C<rx:i/abc/>
29-
(case insensitive regexes), and round parentheses indicate a function call
30-
instead.
27+
Note that neither the colon C<:> nor parentheses C<()> can be delimiters. The
28+
colon is forbidden because it clashes with adverbs, such as in C<rx:i/abc/>
29+
(case insensitive regex). Parentheses are used to indicate a subroutine call;
30+
e.g. in C<rx()> the L<call operator|/language/operators#postcircumfix_(_)>
31+
C<()> invokes the subroutine C<rx>.
3132
32-
Example of difference between C<m/ /> and C</ /> operators:
33+
Here's an example that illustrates the difference between the C<m/ /> and C</ />
34+
operators:
3335
3436
my $match;
3537
$_ = "abc";
@@ -39,25 +41,25 @@ Example of difference between C<m/ /> and C</ /> operators:
3941
Whitespace in literal regexes is generally ignored (except with the C<:s> or,
4042
completely, C<:sigspace> adverb).
4143
42-
Comments work within a regular expression:
44+
Comments are allowed within a regular expression:
4345
4446
/ word #`(match lexical "word") /
4547
4648
as long as the syntax for
4749
L<embedded comments|/language/syntax#Multi-line_/_embedded_comments>, with a
48-
backtick following the hash sign and enclosing delimiters, is used.
50+
backtick and enclosing delimiters following the hash sign, is used.
4951
5052
=head1 Literals
5153
52-
The simplest case for a regex is a match against a string literal:
54+
The simplest use case for a regex is a match against a string literal:
5355
5456
if 'properly' ~~ / perl / {
5557
say "'properly' contains 'perl'";
5658
}
5759
58-
Alphanumeric characters and the underscore C<_> are matched literally. All
59-
other characters must either be escaped with a backslash (for example, C<\:>
60-
to match a colon), or be within quotes:
60+
Alphanumeric characters, including the underscore C<_> which is considered
61+
alphabetic, are matched literally. All other characters must either be escaped
62+
with a backslash (for example, C<\:> to match a colon), or be within quotes:
6163
6264
/ 'two words' /; # matches 'two words' including the blank
6365
/ "a:b" /; # matches 'a:b' including the colon
@@ -74,9 +76,10 @@ matches the regex:
7476
say $/.to; # OUTPUT: «22␤»
7577
};
7678
79+
7780
Match results are always stored in the C<$/> variable and are also returned from
7881
the match. They are both of type L<Match|/type/Match> if the match was
79-
successful; otherwise it is L<Nil|/type/Nil>.
82+
successful; otherwise both are of type L<Nil|/type/Nil>.
8083
8184
8285
=head1 X<Wildcards|regex, .>
@@ -90,25 +93,24 @@ So, these all match:
9093
'perl' ~~ / pe.l /; # the . matches the r
9194
'speller' ~~ / pe.l/; # the . matches the first l
9295
93-
This doesn't match:
96+
while this doesn't match:
9497
9598
'perl' ~~ /. per /;
9699
97100
because there's no character to match before C<per> in the target string.
98101
99-
Note that C<.> now does match B<any> single character, that is, it matches
100-
C<\n>. So the text below match:
102+
Notably C<.> also matches the newline character C<\n>:
101103
102104
my $text = qq:to/END/
103105
Although I am a
104106
multi-line text,
105-
now can be matched
107+
I can be matched
106108
with /.*/.
107109
END
108110
;
109111
110112
say $text ~~ / .* /;
111-
# OUTPUT «「Although I am a␤multi-line text,␤now can be matched␤with /.*/␤」»
113+
# OUTPUT «「Although I am a␤multi-line text,␤I can be matched␤with /.*/.␤」»
112114
113115
=head1 Character classes
114116
@@ -119,14 +121,18 @@ written with an upper-case letter, C<\W>.
119121
120122
=head3 X<C<\n> and C<\N>|regex,\n;regex,\N>
121123
122-
C<\n> matches a single, logical newline character. C<\N> matches a single
123-
character that's not a logical newline.
124+
C<\n> matches a logical newline. C<\N> matches a single character that's not a
125+
logical newline.
126+
127+
The definition of what constitutes a logical newline follows the L<Unicode
128+
definition of a line boundary|https://unicode.org/reports/tr18/#Line_Boundaries>
129+
and includes in particular all of: a line feed (LF) C<\U+000A>, a vertical tab
130+
(VT) C<\U+000B>, a form feed (FF) C<\U+000C>, a carriage return (CR) C<\U+000D>,
131+
and the Microsoft Windows style newline sequence CRLF.
132+
133+
The interpretation of C<\n> in regexes is independent of the value of the
134+
variable C<$?NL> controlled by the L<newline pragma|/language/pragmas#newline>.
124135
125-
What is considered as a single newline character is defined via the compile time
126-
variable L«C<$?NL>|/language/variables#index-entry-$?NL», and the
127-
L<newline pragma|/language/pragmas>; therefore, C<\n> is supposed to be able to
128-
match either a Unix-like newline C<"\n">, a Microsoft Windows style one
129-
C<"\r\n">, or one in the Mac style C<"\r">.
130136
131137
=head3 X<C<\t> and C<\T>|regex,\t;regex,\T>
132138

0 commit comments

Comments
 (0)