@@ -79,7 +79,7 @@ Otherwise it is L<Nil>.
79
79
80
80
= head1 Wildcards and character classes
81
81
82
- = head2 Dot to match any character
82
+ = head2 X < Dot to match any character|regex syntax,. >
83
83
84
84
An unescaped dot C < . > in a regex matches any single character.
85
85
@@ -101,7 +101,7 @@ because there is no character to match before C<per> in the target string.
101
101
There are predefined character classes of the form C < \w > . Its negation is
102
102
written with an upper-case letter, C < \W > .
103
103
104
- = item \d and \D
104
+ = item X < \d and \D|regex syntax,\d;regex syntax,\D >
105
105
106
106
C < \d > matches a single digit (Unicode property C < N > ), and C < \D > matches a
107
107
single character that is not a digit.
@@ -119,7 +119,7 @@ Examples for digits are
119
119
U+0E53 ๓ THAI DIGIT THREE
120
120
U+1B56 ᭖ BALINESE DIGIT SIX
121
121
122
- = item \h and \H
122
+ = item X < \h and \H|regex syntax,\h;regex syntax,\H >
123
123
124
124
C < \h > matches a single horizontal whitespace character. C < \H > matches a
125
125
single character that is not a horizontal whitespace character.
@@ -134,27 +134,27 @@ Examples for horizontal whitespace characters are
134
134
Vertical whitespaces like newline characters are explicitly excluded; those
135
135
can be matched with C < \v > , and C < \s > matches any kind of whitespace.
136
136
137
- = item \n and \N
137
+ = item X < \n and \N|regex syntax,\n;regex syntax,\N >
138
138
139
139
C < \n > matches a single, logical newline character. C < \n > is supposed to also
140
140
match a Windows CR LF codepoing pair; though it is unclear whether the magic
141
141
happens at the time that external data is read, or at regex match time. C < \N >
142
142
matches a single character that's not a logical newline.
143
143
144
- = item \s and \S
144
+ = item X < \s and \S|regex syntax,\s;regex syntax,\S >
145
145
146
146
C < \s > matches a single whitespace character. C < \S > matches a single
147
147
character that is not a whitspace.
148
148
149
149
TODO: examples
150
150
151
- = item \t and \T
151
+ = item X < \t and \T|regex syntax,\t;regex syntax,\T >
152
152
153
153
C < \t > matches a single tab/tabulation character, C < U+0009 > . (Note that
154
154
exotic tabs like the C < U+000B VERTICAL TABULATION > character are not included
155
155
here). C < \T > matches a single character that is not a tab.
156
156
157
- = item \v and \V
157
+ = item X < \v and \V|regex syntax,\v;regex syntax,\V >
158
158
159
159
C < \v > matches a single vertical whitespace character. C < \V > match a single
160
160
character that is not a vertical whitspace.
@@ -169,7 +169,7 @@ Examples for vertical whitespace characters:
169
169
170
170
Use C < \s > to match any kind of whitespace, not just vertical whitespace
171
171
172
- = item \w and \W
172
+ = item X < \w and \W|regex syntax,\w;regex syntax,\W >
173
173
174
174
C < \w > matches a single word character, that is a letter (Unicode category L),
175
175
a digit or an underscore. C < \W > matches a single character that isn't a word
@@ -183,7 +183,7 @@ Examples of word characters:
183
183
03F3 ϳ GREEK LETTER YOT
184
184
0409 Љ CYRILLIC CAPITAL LETTER LJE
185
185
186
- = head2 Unicode properties
186
+ = head2 X « Unicode properties|regex syntax,<:property> »
187
187
188
188
The character classes so far are mostly for convenience; a more systematic
189
189
approach is the use of Unicode properties. They are called in the form
@@ -265,7 +265,7 @@ C<< <:Ll+:N> >> or C<< <:Ll+:Number> >> or C<< <+ :Lowercase_Letter + :Number> >
265
265
(Grouping of set operations with round parens inside character classes is
266
266
supposed to work, but not supported by Rakudo at the time of writing).
267
267
268
- = head2 Enumerated character classes and ranges
268
+ = head2 X « Enumerated character classes and ranges|regex syntax,<[ ]>;regex assertion,<-[ ]> »
269
269
270
270
Sometimes the pre-existing wildcards and character classes are just not
271
271
enough. Fortunately, defining your own is simple enough. Between C << <[ ]> >> ,
@@ -312,7 +312,7 @@ Quantifiers bind tighter than concatenation, so C<ab+> matches one C<a>
312
312
followed by one or more C < b > s. This is different for quotes, so C < 'ab'+ >
313
313
matches the strings C < ab > , C < abab > , C < ababab > etc.
314
314
315
- = head2 One or more: +
315
+ = head2 X < One or more: +|regex syntax,+ >
316
316
317
317
The C < + > quantifier makes the preceding atom match one or more times, with
318
318
no upper limit.
@@ -322,7 +322,7 @@ like this:
322
322
323
323
/ \w+ '=' \w+ /
324
324
325
- = head2 Zero or more: *
325
+ = head2 X < Zero or more: *|regex syntax,* >
326
326
327
327
The C < * > quantifier makes the preceding atom match zero or more times, with
328
328
no upper limit.
@@ -331,19 +331,19 @@ For example to optional whitespace between C<a> and C<b> you can write
331
331
332
332
/ a \s* b /
333
333
334
- = head2 Zero or one match: ?
334
+ = head2 X < Zero or one match: ?|regex syntax,? >
335
335
336
336
The C < ? > quantifier makes the preceding atom match zero or one time.
337
337
338
- = head2 General quantifier: ** min..max
338
+ = head2 X < General quantifier: ** min..max|regex quantifier,** >
339
339
340
340
To quantifier an atom an arbitrary number of times, you can say for example
341
341
C < a ** 2..5 > to match the character C < a > at least twice and at most 5 times
342
342
343
343
If minimal and maximal number of matches are the same, a single integer
344
344
is possible: C < a ** 5 > to match C < a > exactly five times.
345
345
346
- = head1 Alternation
346
+ = head1 X < Alternation|regex syntax,|| >
347
347
348
348
To match one of several possible alternatives, separate them by C < || > ; the
349
349
first matching alternative wins.
@@ -379,7 +379,7 @@ match.
379
379
Anchors need to match successfully in order for the whole regex to match, but
380
380
they do not use up characters while matching.
381
381
382
- = head2 C < ^ > , Start of String
382
+ = head2 X « C < ^ > , Start of String|regex syntax,^ »
383
383
384
384
The C < ^ > assertion only matches at the start of the string.
385
385
@@ -388,7 +388,7 @@ The C<^> assertion only matches at the start of the string.
388
388
say so 'perly' ~~ /^ perl/; # True
389
389
say so 'perl' ~~ /^ perl/; # True
390
390
391
- = head2 C < ^^ > , Start of Line and C < $$ > , End of Line
391
+ = head2 X « C < ^^ > , Start of Line and C < $$ > , End of Line|regex syntax,^^;regex syntax,$$ »
392
392
393
393
The C < ^^ > assertion matches at the start of a logical line. That is, either at
394
394
the start of the string, or after a newline character.
@@ -420,7 +420,7 @@ leading space, and the third and fourth lines have two leading spaces each).
420
420
# and the end of line)
421
421
say so $str ~~ / '."' $$/; # True (at the last line)
422
422
423
- = head2 C <<< << >>> and C <<< >> >>> , left and right word boundary
423
+ = head2 X <<< < C <<< << >>> and C <<< >> >>> , left and right word boundary|regex syntax,<<;regex syntax,>>;regex syntax,«;regex syntax,» >>> >
424
424
425
425
C <<< << >>> matches a left word boundary, so positions where at the left there
426
426
a non-word character (or the start of the string), and to the right there is a
@@ -438,7 +438,7 @@ the end of the string.
438
438
say so $str ~~ /<< own/; # False
439
439
say so $str ~~ /own >>/; # True
440
440
441
- = head1 Grouping and Capturing
441
+ = head1 X « Grouping and Capturing|regex syntax,( );regex syntax,[ ];regex syntax,$<capture> = »
442
442
443
443
In regular (non-regex) Perl 6, you can use parenthesis to group things
444
444
together, usually to override operator precedence:
@@ -561,7 +561,7 @@ named captures:
561
561
But there is a more convenient way to get named captures, discussed in the
562
562
next section.
563
563
564
- = head1 Subrules
564
+ = head1 X < Subrules|declarator,regex >
565
565
566
566
Just like you can put pieces of code into subroutines, so you can also put
567
567
pieces of regex into named rules.
@@ -649,7 +649,7 @@ like C<:overlap> go along with the matching:
649
649
# aA
650
650
651
651
652
- = head2 Regex Adverbs
652
+ = head2 X < Regex Adverbs|regex adverb,:ignorecase;regex adverb,:i >
653
653
654
654
Adverbs that appear at the time of a regex declaration are part of the actual regex,
655
655
and influences how the Perl 6 compiler translates the regex into binary code.
@@ -677,7 +677,7 @@ Brackets and parenthesis limit the scope of an adverb:
677
677
/ (:i a b) c / # matches 'ABc' but not 'ABC'
678
678
/ [:i a b] c / # matches 'ABc' but not 'ABC'
679
679
680
- = head3 Ratchet
680
+ = head3 X < Ratchet|regex adverb,:ratchet;regex adverb,:r >
681
681
682
682
The C < :ratchet > or C < :r > adverb causes the regex engine not to backtrack.
683
683
@@ -710,7 +710,7 @@ to declaring ratcheting regex:
710
710
# short for
711
711
my regex thing { :r ... }
712
712
713
- = head3 Sigspace
713
+ = head3 X < Sigspace|regex adverb,:sigspace;regex adverb,:s >
714
714
715
715
The B < C < :sigspace > > or B < C < :s > > adverb makes whitespace significant in a regex.
716
716
@@ -809,7 +809,7 @@ matching adverbs only make sense while matching a string against a regex.
809
809
They can never appear inside a regex, only on the outside - either as part of
810
810
an C < m/.../ > match, or as arguments to a match method.
811
811
812
- = head3 Continue
812
+ = head3 X < Continue|matching adverb,:continue;matching adverb,:c >
813
813
814
814
The C < :continue > or short C < :c > adverb takes an argument. The argument is the
815
815
position where the regex should start to search. By default, it searches from
@@ -824,7 +824,7 @@ the start of the string, but C<:c> overrides that.
824
824
825
825
TODO
826
826
827
- = head2 Global
827
+ = head3 X < Global|regex adverb,:global;regex adverb,:g >
828
828
829
829
Instead of search just one match, and returning a L < Match|/type/Match > , search
830
830
for every non-overlapping match and returns them in a L < List|/type/List > .
@@ -837,7 +837,7 @@ for every non-overlapping match and returns them in a L<List|/type/List>.
837
837
838
838
C < :g > is a shortcut for C < :global > .
839
839
840
- = head3 Pos
840
+ = head3 X < Pos|regex adverb,:pos;regex adverb,:p >
841
841
842
842
Anchor the match at a specific position in the string:
843
843
0 commit comments