Skip to content

Commit 195a4b9

Browse files
wip
1 parent f819c26 commit 195a4b9

File tree

1 file changed

+25
-26
lines changed

1 file changed

+25
-26
lines changed

doc/Language/regexes.pod6

Lines changed: 25 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -12,20 +12,13 @@ matching those patterns to actual text.
1212
1313
=head1 X<Lexical conventions|quote,/ /;quote,rx;quote,m>
1414
15-
Fundamentally, regexes are very much like subroutines: both are code objects,
16-
and just as you can have anonymous subs and named subs, you can have anonymous
17-
and named regexes.
15+
Fundamentally, Perl 6 regexes are very much like subroutines: both are code
16+
objects, and just as you can have anonymous subs and named subs, you can have
17+
anonymous and named regexes.
1818
1919
A regex, whether anonymous or named, is represented by a L<C<Regex>|/type/Regex>
20-
object. The syntax for constructing anonymous and named C<Regex> objects
21-
differs, as do their intended uses.
22-
23-
In short, anonymous regexes may be used anywhere where a regex is needed with
24-
the exception of L<C<Grammars>|/type/Grammar>, which are the domain of named
25-
regexes. Named regexes form the building blocks of grammars, in which they serve
26-
as methods (also known as 'subrules') that can be called from other regexes to
27-
effectively parse textual data.
28-
20+
object. Yet, the syntax for constructing anonymous and named C<Regex> objects
21+
differs. We will therefore discuss them in turn.
2922
3023
=head2 Anonymous regex definition syntax
3124
@@ -34,7 +27,7 @@ An anonymous regex may be constructed in one of the following ways:
3427
rx/pattern/; # an anonymous Regex object; 'rx' stands for 'regex'
3528
/pattern/; # an anonymous Regex object; shorthand for 'rx/.../'
3629
37-
regex { pattern } # keyword-declared anonymous regex; this form is
30+
regex { pattern }; # keyword-declared anonymous regex; this form is
3831
# intended for defining named regexes and is discussed
3932
# in that context in the next section
4033
@@ -43,8 +36,8 @@ The C<rx/ /> form has two advantages over the bare shorthand form C</ />.
4336
Firstly, it enables the use of delimiters other than the slash, which may be
4437
used to improve the readability of the regex definition:
4538
46-
rx{ '/tmp/'.* } # the use of curly braces as delimiters makes this first
47-
rx/ '/tmp/'.* / # definition somewhat easier on the eyes than the second
39+
rx{ '/tmp/'.* }; # the use of curly braces as delimiters makes this first
40+
rx/ '/tmp/'.* /; # definition somewhat easier on the eyes than the second
4841
4942
Although the choice is vast, not every character may be chosen as an alternative
5043
regex delimiter:
@@ -87,8 +80,8 @@ given a name by putting them inside a named variable, after which they can be
8780
referenced, e.g. direcly or by means of
8881
L<interpolation|/language/regexes#Regex_interpolation>:
8982
90-
my $regex = / k \w+ /;
91-
say "Made in a low firing kiln" ~~ $regex; # OUTPUT: 「kiln
83+
my $regex = / R \w+ /;
84+
say "Zen Buddists like Raku too" ~~ $regex; # OUTPUT: 「Raku
9285
9386
my $regex = /pottery/;
9487
"Japanese pottery rocks!" ~~ / <$regex> /; # Interpolation of $regex into /.../
@@ -98,7 +91,7 @@ L<interpolation|/language/regexes#Regex_interpolation>:
9891
9992
A named regex may be constructed using the C<regex> declarator as follows:
10093
101-
regex R { pattern } # a named Regex object, named 'R'
94+
regex R { pattern }; # a named Regex object, named 'R'
10295
10396
Unlike with the C<rx> form, you cannot chose your preferred delimiter: curly
10497
braces are mandatory. In this regard it should be noted that the definition of a
@@ -111,28 +104,34 @@ of a subroutine:
111104
which emphasizes the fact that a L<C<Regex>|/type/Regex> object represents code
112105
rather than data:
113106
114-
&S ~~ Code # OUTPUT: True
107+
&S ~~ Code; # OUTPUT: True
115108
116-
&R ~~ Code # OUTPUT: True
117-
&R ~~ Method # OUTPUT: True (A Regex is really a Method!)
109+
&R ~~ Code; # OUTPUT: True
110+
&R ~~ Method; # OUTPUT: True (A Regex is really a Method!)
118111
119112
Also unlike with the C<rx> form for defining an anonymous regex, the definition
120113
of a named regex using the C<regex> form does not allow for adverbs to be
121114
inserted before the opening delimiter. Instead, adverbs that are to modify the
122115
entire regex pattern may be included first thing within the curly braces:
123116
124-
regex R { :i pattern } # :i (:ignorecase), renders pattern case insensitive
117+
regex R { :i pattern }; # :i (:ignorecase), renders pattern case insensitive
125118
126119
Alternatively, by way of shorthand, it is also possible (and recommended) to use
127120
the C<rule> and C<token> variants of the C<regex> declarator for defining a
128121
C<Regex> when the C<:ratchet> and C<:sigspace> adverbs are of interest:
129122
130-
regex R { :r pattern } # apply :r (:ratchet) to entire pattern
131-
token R { pattern } # same thing: 'token' implies ':r'
123+
regex R { :r pattern }; # apply :r (:ratchet) to entire pattern
124+
token R { pattern }; # same thing: 'token' implies ':r'
132125
133-
regex R { :r :s pattern } # apply :r (:ratchet) and :s (:sigspace) to pattern
134-
rule R { pattern } # same thing: 'rule' implies ':r:s'
126+
regex R { :r :s pattern }; # apply :r (:ratchet) and :s (:sigspace) to pattern
127+
rule R { pattern }; # same thing: 'rule' implies ':r:s'
135128
129+
Named regexes may be used as building blocks for other regexes, as they are
130+
methods that may called from within other regexes using the C«<regex-name>»
131+
syntax. When they are used this way, they are often referred to as 'subrules';
132+
see for more details on their use L<here|/language/regexes#Subrules>.
133+
L<C<Grammars>|/type/Grammar> are the natural niche for subrules, but many common
134+
predefined character classes are also implemented as named regexes.
136135
137136
=head2 Regex readability: whitespace and comments
138137

0 commit comments

Comments
 (0)