@@ -16,20 +16,22 @@ matching those patterns to actual text.
16
16
Perl 6 has special syntax for literal regexes:
17
17
18
18
m/abc/; # a regex that is immediately matched against $_
19
- rx/abc/; # a Regex object; allow adverbs to be used before regex
19
+ rx/abc/; # a Regex object; 'rx' may be followed by regex adverbs
20
20
/abc/; # a Regex object; shorthand version of 'rx/ /' operator
21
21
22
22
For the first two examples, delimiters other than the slash can be used:
23
23
24
24
m{abc};
25
25
rx[abc];
26
26
27
- Note that neither the colon nor round parentheses can be delimiters; the colon
28
- is forbidden because it clashes with adverbs, such as C < rx:i/abc/ >
29
- (case insensitive regexes), and round parentheses indicate a function call
30
- instead.
27
+ Note that neither the colon C < : > nor parentheses C < () > can be delimiters. The
28
+ colon is forbidden because it clashes with adverbs, such as in C < rx:i/abc/ >
29
+ (case insensitive regex). Parentheses are used to indicate a subroutine call;
30
+ e.g. in C < rx() > the L < call operator|/language/operators#postcircumfix_(_) >
31
+ C < () > invokes the subroutine C < rx > .
31
32
32
- Example of difference between C < m/ / > and C < / / > operators:
33
+ Here's an example that illustrates the difference between the C < m/ / > and C < / / >
34
+ operators:
33
35
34
36
my $match;
35
37
$_ = "abc";
@@ -39,25 +41,25 @@ Example of difference between C<m/ /> and C</ /> operators:
39
41
Whitespace in literal regexes is generally ignored (except with the C < :s > or,
40
42
completely, C < :sigspace > adverb).
41
43
42
- Comments work within a regular expression:
44
+ Comments are allowed within a regular expression:
43
45
44
46
/ word #`(match lexical "word") /
45
47
46
48
as long as the syntax for
47
49
L < embedded comments|/language/syntax#Multi-line_/_embedded_comments > , with a
48
- backtick following the hash sign and enclosing delimiters , is used.
50
+ backtick and enclosing delimiters following the hash sign, is used.
49
51
50
52
= head1 Literals
51
53
52
- The simplest case for a regex is a match against a string literal:
54
+ The simplest use case for a regex is a match against a string literal:
53
55
54
56
if 'properly' ~~ / perl / {
55
57
say "'properly' contains 'perl'";
56
58
}
57
59
58
- Alphanumeric characters and the underscore C < _ > are matched literally. All
59
- other characters must either be escaped with a backslash (for example, C < \: >
60
- to match a colon), or be within quotes:
60
+ Alphanumeric characters, including the underscore C < _ > which is considered
61
+ alphabetic, are matched literally. All other characters must either be escaped
62
+ with a backslash (for example, C < \: > to match a colon), or be within quotes:
61
63
62
64
/ 'two words' /; # matches 'two words' including the blank
63
65
/ "a:b" /; # matches 'a:b' including the colon
@@ -74,9 +76,10 @@ matches the regex:
74
76
say $/.to; # OUTPUT: «22»
75
77
};
76
78
79
+
77
80
Match results are always stored in the C < $/ > variable and are also returned from
78
81
the match. They are both of type L < Match|/type/Match > if the match was
79
- successful; otherwise it is L < Nil|/type/Nil > .
82
+ successful; otherwise both are of type L < Nil|/type/Nil > .
80
83
81
84
82
85
= head1 X < Wildcards|regex, . >
@@ -90,25 +93,24 @@ So, these all match:
90
93
'perl' ~~ / pe.l /; # the . matches the r
91
94
'speller' ~~ / pe.l/; # the . matches the first l
92
95
93
- This doesn't match:
96
+ while this doesn't match:
94
97
95
98
'perl' ~~ /. per /;
96
99
97
100
because there's no character to match before C < per > in the target string.
98
101
99
- Note that C < . > now does match B < any > single character, that is, it matches
100
- C < \n > . So the text below match:
102
+ Notably C < . > also matches the newline character C < \n > :
101
103
102
104
my $text = qq:to/END/
103
105
Although I am a
104
106
multi-line text,
105
- now can be matched
107
+ I can be matched
106
108
with /.*/.
107
109
END
108
110
;
109
111
110
112
say $text ~~ / .* /;
111
- # OUTPUT «「Although I am amulti-line text,now can be matchedwith /.*/」»
113
+ # OUTPUT «「Although I am amulti-line text,I can be matchedwith /.*/. 」»
112
114
113
115
= head1 Character classes
114
116
@@ -119,14 +121,18 @@ written with an upper-case letter, C<\W>.
119
121
120
122
= head3 X < C < \n > and C < \N > |regex,\n;regex,\N>
121
123
122
- C < \n > matches a single, logical newline character. C < \N > matches a single
123
- character that's not a logical newline.
124
+ C < \n > matches a logical newline. C < \N > matches a single character that's not a
125
+ logical newline.
126
+
127
+ The definition of what constitutes a logical newline follows the L < Unicode
128
+ definition of a line boundary|https://unicode.org/reports/tr18/#Line_Boundaries >
129
+ and includes in particular all of: a line feed (LF) C < \U+000A > , a vertical tab
130
+ (VT) C < \U+000B > , a form feed (FF) C < \U+000C > , a carriage return (CR) C < \U+000D > ,
131
+ and the Microsoft Windows style newline sequence CRLF.
132
+
133
+ The interpretation of C < \n > in regexes is independent of the value of the
134
+ variable C < $?NL > controlled by the L < newline pragma|/language/pragmas#newline > .
124
135
125
- What is considered as a single newline character is defined via the compile time
126
- variable L « C < $?NL > |/language/variables#index-entry-$?NL» , and the
127
- L < newline pragma|/language/pragmas > ; therefore, C < \n > is supposed to be able to
128
- match either a Unix-like newline C < "\n" > , a Microsoft Windows style one
129
- C < "\r\n" > , or one in the Mac style C < "\r" > .
130
136
131
137
= head3 X < C < \t > and C < \T > |regex,\t;regex,\T>
132
138
0 commit comments