@@ -12,20 +12,13 @@ matching those patterns to actual text.
12
12
13
13
= head1 X < Lexical conventions|quote,/ /;quote,rx;quote,m >
14
14
15
- Fundamentally, regexes are very much like subroutines: both are code objects,
16
- and just as you can have anonymous subs and named subs, you can have anonymous
17
- and named regexes.
15
+ Fundamentally, Perl 6 regexes are very much like subroutines: both are code
16
+ objects, and just as you can have anonymous subs and named subs, you can have
17
+ anonymous and named regexes.
18
18
19
19
A regex, whether anonymous or named, is represented by a L < C < Regex > |/type/Regex>
20
- object. The syntax for constructing anonymous and named C < Regex > objects
21
- differs, as do their intended uses.
22
-
23
- In short, anonymous regexes may be used anywhere where a regex is needed with
24
- the exception of L < C < Grammars > |/type/Grammar> , which are the domain of named
25
- regexes. Named regexes form the building blocks of grammars, in which they serve
26
- as methods (also known as 'subrules') that can be called from other regexes to
27
- effectively parse textual data.
28
-
20
+ object. Yet, the syntax for constructing anonymous and named C < Regex > objects
21
+ differs. We will therefore discuss them in turn.
29
22
30
23
= head2 Anonymous regex definition syntax
31
24
@@ -34,7 +27,7 @@ An anonymous regex may be constructed in one of the following ways:
34
27
rx/pattern/; # an anonymous Regex object; 'rx' stands for 'regex'
35
28
/pattern/; # an anonymous Regex object; shorthand for 'rx/.../'
36
29
37
- regex { pattern } # keyword-declared anonymous regex; this form is
30
+ regex { pattern }; # keyword-declared anonymous regex; this form is
38
31
# intended for defining named regexes and is discussed
39
32
# in that context in the next section
40
33
@@ -43,8 +36,8 @@ The C<rx/ /> form has two advantages over the bare shorthand form C</ />.
43
36
Firstly, it enables the use of delimiters other than the slash, which may be
44
37
used to improve the readability of the regex definition:
45
38
46
- rx{ '/tmp/'.* } # the use of curly braces as delimiters makes this first
47
- rx/ '/tmp/'.* / # definition somewhat easier on the eyes than the second
39
+ rx{ '/tmp/'.* }; # the use of curly braces as delimiters makes this first
40
+ rx/ '/tmp/'.* /; # definition somewhat easier on the eyes than the second
48
41
49
42
Although the choice is vast, not every character may be chosen as an alternative
50
43
regex delimiter:
@@ -87,8 +80,8 @@ given a name by putting them inside a named variable, after which they can be
87
80
referenced, e.g. direcly or by means of
88
81
L < interpolation|/language/regexes#Regex_interpolation > :
89
82
90
- my $regex = / k \w+ /;
91
- say "Made in a low firing kiln " ~~ $regex; # OUTPUT: 「kiln 」
83
+ my $regex = / R \w+ /;
84
+ say "Zen Buddists like Raku too " ~~ $regex; # OUTPUT: 「Raku 」
92
85
93
86
my $regex = /pottery/;
94
87
"Japanese pottery rocks!" ~~ / <$regex> /; # Interpolation of $regex into /.../
@@ -98,7 +91,7 @@ L<interpolation|/language/regexes#Regex_interpolation>:
98
91
99
92
A named regex may be constructed using the C < regex > declarator as follows:
100
93
101
- regex R { pattern } # a named Regex object, named 'R'
94
+ regex R { pattern }; # a named Regex object, named 'R'
102
95
103
96
Unlike with the C < rx > form, you cannot chose your preferred delimiter: curly
104
97
braces are mandatory. In this regard it should be noted that the definition of a
@@ -111,28 +104,34 @@ of a subroutine:
111
104
which emphasizes the fact that a L < C < Regex > |/type/Regex> object represents code
112
105
rather than data:
113
106
114
- &S ~~ Code # OUTPUT: True
107
+ &S ~~ Code; # OUTPUT: True
115
108
116
- &R ~~ Code # OUTPUT: True
117
- &R ~~ Method # OUTPUT: True (A Regex is really a Method!)
109
+ &R ~~ Code; # OUTPUT: True
110
+ &R ~~ Method; # OUTPUT: True (A Regex is really a Method!)
118
111
119
112
Also unlike with the C < rx > form for defining an anonymous regex, the definition
120
113
of a named regex using the C < regex > form does not allow for adverbs to be
121
114
inserted before the opening delimiter. Instead, adverbs that are to modify the
122
115
entire regex pattern may be included first thing within the curly braces:
123
116
124
- regex R { :i pattern } # :i (:ignorecase), renders pattern case insensitive
117
+ regex R { :i pattern }; # :i (:ignorecase), renders pattern case insensitive
125
118
126
119
Alternatively, by way of shorthand, it is also possible (and recommended) to use
127
120
the C < rule > and C < token > variants of the C < regex > declarator for defining a
128
121
C < Regex > when the C < :ratchet > and C < :sigspace > adverbs are of interest:
129
122
130
- regex R { :r pattern } # apply :r (:ratchet) to entire pattern
131
- token R { pattern } # same thing: 'token' implies ':r'
123
+ regex R { :r pattern }; # apply :r (:ratchet) to entire pattern
124
+ token R { pattern }; # same thing: 'token' implies ':r'
132
125
133
- regex R { :r :s pattern } # apply :r (:ratchet) and :s (:sigspace) to pattern
134
- rule R { pattern } # same thing: 'rule' implies ':r:s'
126
+ regex R { :r :s pattern }; # apply :r (:ratchet) and :s (:sigspace) to pattern
127
+ rule R { pattern }; # same thing: 'rule' implies ':r:s'
135
128
129
+ Named regexes may be used as building blocks for other regexes, as they are
130
+ methods that may called from within other regexes using the C « <regex-name> »
131
+ syntax. When they are used this way, they are often referred to as 'subrules';
132
+ see for more details on their use L < here|/language/regexes#Subrules > .
133
+ L < C < Grammars > |/type/Grammar> are the natural niche for subrules, but many common
134
+ predefined character classes are also implemented as named regexes.
136
135
137
136
= head2 Regex readability: whitespace and comments
138
137
0 commit comments