Skip to content

Commit b24a42b

Browse files
authored
Update regexes.pod6
1 parent b5a674d commit b24a42b

File tree

1 file changed

+36
-36
lines changed

1 file changed

+36
-36
lines changed

doc/Language/regexes.pod6

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ the colon is forbidden because it clashes with adverbs, such as C<rx:i/abc/>
2828
(case insensitive regexes), and round parentheses indicate a function call
2929
instead.
3030
31-
Whitespace in regexes is generally ignored (except with the C<:s> or
32-
C<:sigspace> adverb).
31+
Whitespace in regexes is generally ignored (except with the C<:s> or,
32+
completely, C<:sigspace> adverb).
3333
3434
As with Perl 6, in general, comments in regexes start with a hash character
3535
C<#> and go to the end of the line.
@@ -62,8 +62,8 @@ part of the string matches the regex:
6262
};
6363
6464
Match results are stored in the C<$/> variable and are also returned from
65-
the match. The result is of L<type Match|/type/Match> if the match was successful;
66-
otherwise it's L<Nil|/type/Nil>.
65+
the match. The result is of L<type Match|/type/Match> if the match was
66+
successful; otherwise it's L<Nil|/type/Nil>.
6767
6868
=head1 Wildcards and character classes
6969
@@ -89,7 +89,7 @@ because there's no character to match before C<per> in the target string.
8989
There are predefined character classes of the form C<\w>. Its negation is
9090
written with an upper-case letter, C<\W>.
9191
92-
=item X<\d and \D|regex,\d;regex,\D>
92+
=item X<C<\d> and C<\D>|regex,\d;regex,\D>
9393
9494
C<\d> matches a single digit (Unicode property C<N>) and C<\D> matches a
9595
single character that is not a digit.
@@ -102,21 +102,21 @@ match C<\d>, but also digits from other scripts.
102102
103103
Examples for digits are:
104104
105-
=begin code :skip-test
105+
=begin code :lang<text>
106106
U+0035 5 DIGIT FIVE
107-
U+07C2 ߂ NKO DIGIT TWO
107+
U+0BEB ௫ TAMIL DIGIT FIVE
108108
U+0E53 ๓ THAI DIGIT THREE
109-
U+1B56 ᭖ BALINESE DIGIT SIX
109+
U+17E5 ៥ KHMER DIGIT FIVE
110110
=end code
111111
112-
=item X<\h and \H|regex,\h;regex,\H>
112+
=item X<C<\h> and C<\H>|regex,\h;regex,\H>
113113
114114
C<\h> matches a single horizontal whitespace character. C<\H> matches a
115115
single character that is not a horizontal whitespace character.
116116
117117
Examples for horizontal whitespace characters are
118118
119-
=begin code :skip-test
119+
=begin code :lang<text>
120120
U+0020 SPACE
121121
U+00A0 NO-BREAK SPACE
122122
U+0009 CHARACTER TABULATION
@@ -126,14 +126,14 @@ Examples for horizontal whitespace characters are
126126
Vertical whitespace like newline characters are explicitly excluded; those
127127
can be matched with C<\v>, and C<\s> matches any kind of whitespace.
128128
129-
=item X<\n and \N|regex,\n;regex,\N>
129+
=item X<C<\n> and C<\N>|regex,\n;regex,\N>
130130
131131
C<\n> matches a single, logical newline character. C<\n> is supposed to also
132132
match a Windows CR LF codepoint pair; though it's unclear whether the magic
133133
happens at the time that external data is read, or at regex match time.
134134
C<\N> matches a single character that's not a logical newline.
135135
136-
=item X<\s and \S|regex,\s;regex,\S>
136+
=item X<C<\s> and C<\S>|regex,\s;regex,\S>
137137
138138
C<\s> matches a single whitespace character. C<\S> matches a single
139139
character that is not whitespace.
@@ -142,20 +142,20 @@ character that is not whitespace.
142142
say ~$/; # OUTPUT: «word␤»
143143
}
144144
145-
=item X<\t and \T|regex,\t;regex,\T>
145+
=item X<C<\t> and C<\T>|regex,\t;regex,\T>
146146
147147
C<\t> matches a single tab/tabulation character, C<U+0009>. (Note that
148148
exotic tabs like the C<U+000B VERTICAL TABULATION> character are not
149149
included here). C<\T> matches a single character that is not a tab.
150150
151-
=item X<\v and \V|regex,\v;regex,\V>
151+
=item X<C<\v> and C<\V>|regex,\v;regex,\V>
152152
153153
C<\v> matches a single vertical whitespace character. C<\V> matches a single
154154
character that is not vertical whitespace.
155155
156156
Examples for vertical whitespace characters:
157157
158-
=begin code :skip-test
158+
=begin code :lang<text>
159159
U+000A LINE FEED
160160
U+000B VERTICAL TABULATION
161161
U+000C FORM FEED
@@ -167,15 +167,15 @@ Examples for vertical whitespace characters:
167167
168168
Use C<\s> to match any kind of whitespace, not just vertical whitespace.
169169
170-
=item X<\w and \W|regex,\w;regex,\W>
170+
=item X<C<\w> and C<\W>|regex,\w;regex,\W>
171171
172172
C<\w> matches a single word character; i.e., a letter (Unicode category L), a
173173
digit or an underscore. C<\W> matches a single character that isn't a word
174174
character.
175175
176176
Examples of word characters:
177177
178-
=begin code :skip-test
178+
=begin code :lang<text>
179179
0041 A LATIN CAPITAL LETTER A
180180
0031 1 DIGIT ONE
181181
03B4 δ GREEK SMALL LETTER DELTA
@@ -185,37 +185,37 @@ Examples of word characters:
185185
186186
Predefined subrules:
187187
188-
=begin code :skip-test
189-
<alnum> \w 'alpha' plus 'digit'
188+
=begin code :lang<text>
190189
<alpha> <:L> Alphabetic characters
191-
<blank> \h Horizontal whitespace
192-
<cntrl> Control characters
193190
<digit> \d Decimal digits
194-
<graph> 'alnum' plus 'punct'
195-
<lower> <:Ll> Lowercase characters
196-
<print> 'graph' plus 'space', but no 'cntrl'
191+
<xdigit> Hexadecimal digit [0-9A-Fa-f]
192+
<alnum> \w 'alpha' plus 'digit'
197193
<punct> Punctuation and Symbols (only Punct beyond ASCII)
194+
<graph> 'alnum' plus 'punct'
198195
<space> \s Whitespace
196+
<cntrl> Control characters
197+
<print> 'graph' plus 'space', but no 'cntrl'
198+
<blank> \h Horizontal whitespace
199+
<lower> <:Ll> Lowercase characters
199200
<upper> <:Lu> Uppercase characters
200201
<?same> Matches between two identical characters
201202
<?wb> Word Boundary (zero-width assertion, ? suppress capture)
202203
<?ww> Within Word (zero-width assertion, ? suppress capture)
203-
<xdigit> Hexadecimal digit [0-9A-Fa-f]
204204
=end code
205205
206206
=head2 X«Unicode properties|regex,<:property>»
207207
208208
The character classes mentioned so far are mostly for convenience; another
209-
approach is to use Unicode character properties. These come in the form C<<
210-
<:property> >>, where C<property> can be a short or long Unicode General
209+
approach is to use Unicode character properties. These come in the form
210+
C«<:property>», where C<property> can be a short or long Unicode General
211211
Category name. These use pair syntax.
212212
213213
To match against a Unicode Property:
214214
215215
"a".uniprop('Script'); # OUTPUT: «Latin␤»
216-
"a" ~~ / <:Script<Latin>> /;
216+
"a" ~~ / <:Script<Latin>> /; # OUTPUT: «「a」␤»
217217
"a".uniprop('Block'); # OUTPUT: «Basic Latin␤»
218-
"a" ~~ / <:Block('Basic Latin')> /;
218+
"a" ~~ / <:Block('Basic Latin')> /; # OUTPUT: «「a」␤»
219219
220220
The following list of Unicode General Categories is stolen from the Perl 5
221221
L<perlunicode|http://perldoc.perl.org/perlunicode.html> documentation:
@@ -267,9 +267,9 @@ L<perlunicode|http://perldoc.perl.org/perlunicode.html> documentation:
267267
268268
=end table
269269
270-
For example, C<< <:Lu> >> matches a single, upper-case letter.
270+
For example, C«<:Lu>» matches a single, upper-case letter.
271271
272-
It's negation is this: C<< <:!property> >>. So, C<< <:!Lu> >> matches a single
272+
It's negation is this: C«<:!property>». So, C«<:!Lu>» matches a single
273273
character that isn't an upper-case letter.
274274
275275
Categories can be used together, with an infix operator:
@@ -287,7 +287,7 @@ Categories can be used together, with an infix operator:
287287
=end table
288288
289289
To match either a lower-case letter or a number, write
290-
C<< <:Ll+:N> >> or C<< <:Ll+:Number> >> or C<< <+ :Lowercase_Letter + :Number> >>.
290+
C«<:Ll+:N>» or C«<:Ll+:Number>» or C«<+ :Lowercase_Letter + :Number>».
291291
292292
It's also possible to group categories and sets of categories with
293293
parentheses; for example:
@@ -297,20 +297,20 @@ parentheses; for example:
297297
=head2 X«Enumerated character classes and ranges|regex,<[ ]>;regex,<-[ ]>»
298298
299299
Sometimes the pre-existing wildcards and character classes are not enough.
300-
Fortunately, defining your own is fairly simple. Within C<< <[ ]> >>, you
300+
Fortunately, defining your own is fairly simple. Within C«<[ ]>», you
301301
can put any number of single characters and ranges of characters (expressed
302302
with two dots between the end points), with or without whitespace.
303303
304304
"abacabadabacaba" ~~ / <[ a .. c 1 2 3 ]> /;
305305
# Unicode hex codepoint range
306306
"ÀÁÂÃÄÅÆ" ~~ / <[ \x[00C0] .. \x[00C6] ]> /;
307307
# Unicode named codepoint range
308-
"ÀÁÂÃÄÅÆ" ~~ / <[ \c[LATIN CAPITAL LETTER A WITH GRAVE] .. \c[LATIN CAPITAL LETTER AE] ]> /;
308+
"αβγ" ~~ /<[\c[GREEK SMALL LETTER ALPHA]..\c[GREEK SMALL LETTER GAMMA]]>/;
309309
310-
Within the C<< < > >> you can use C<+> and C<-> to add or
310+
Within the C«< >» you can use C<+> and C<-> to add or
311311
remove multiple range definitions and
312312
even mix in some of the unicode categories above. You can also
313-
write the backslashed forms for character classes between the C< [ ] >.
313+
write the backslashed forms for character classes between the C<[ ]>.
314314
315315
/ <[\d] - [13579]> /;
316316
# starts with \d and removes odd ASCII digits, but not quite the same as

0 commit comments

Comments
 (0)