Skip to content

Commit a3701a0

Browse files
committed
Index the various bits of regex syntax
1 parent e058fc8 commit a3701a0

File tree

1 file changed

+26
-26
lines changed

1 file changed

+26
-26
lines changed

lib/Language/regexes.pod

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Otherwise it is L<Nil>.
7979
8080
=head1 Wildcards and character classes
8181
82-
=head2 Dot to match any character
82+
=head2 X<Dot to match any character|regex syntax,.>
8383
8484
An unescaped dot C<.> in a regex matches any single character.
8585
@@ -101,7 +101,7 @@ because there is no character to match before C<per> in the target string.
101101
There are predefined character classes of the form C<\w>. Its negation is
102102
written with an upper-case letter, C<\W>.
103103
104-
=item \d and \D
104+
=item X<\d and \D|regex syntax,\d;regex syntax,\D>
105105
106106
C<\d> matches a single digit (Unicode property C<N>), and C<\D> matches a
107107
single character that is not a digit.
@@ -119,7 +119,7 @@ Examples for digits are
119119
U+0E53 ๓ THAI DIGIT THREE
120120
U+1B56 ᭖ BALINESE DIGIT SIX
121121
122-
=item \h and \H
122+
=item X<\h and \H|regex syntax,\h;regex syntax,\H>
123123
124124
C<\h> matches a single horizontal whitespace character. C<\H> matches a
125125
single character that is not a horizontal whitespace character.
@@ -134,27 +134,27 @@ Examples for horizontal whitespace characters are
134134
Vertical whitespaces like newline characters are explicitly excluded; those
135135
can be matched with C<\v>, and C<\s> matches any kind of whitespace.
136136
137-
=item \n and \N
137+
=item X<\n and \N|regex syntax,\n;regex syntax,\N>
138138
139139
C<\n> matches a single, logical newline character. C<\n> is supposed to also
140140
match a Windows CR LF codepoing pair; though it is unclear whether the magic
141141
happens at the time that external data is read, or at regex match time. C<\N>
142142
matches a single character that's not a logical newline.
143143
144-
=item \s and \S
144+
=item X<\s and \S|regex syntax,\s;regex syntax,\S>
145145
146146
C<\s> matches a single whitespace character. C<\S> matches a single
147147
character that is not a whitspace.
148148
149149
TODO: examples
150150
151-
=item \t and \T
151+
=item X<\t and \T|regex syntax,\t;regex syntax,\T>
152152
153153
C<\t> matches a single tab/tabulation character, C<U+0009>. (Note that
154154
exotic tabs like the C<U+000B VERTICAL TABULATION> character are not included
155155
here). C<\T> matches a single character that is not a tab.
156156
157-
=item \v and \V
157+
=item X<\v and \V|regex syntax,\v;regex syntax,\V>
158158
159159
C<\v> matches a single vertical whitespace character. C<\V> match a single
160160
character that is not a vertical whitspace.
@@ -169,7 +169,7 @@ Examples for vertical whitespace characters:
169169
170170
Use C<\s> to match any kind of whitespace, not just vertical whitespace
171171
172-
=item \w and \W
172+
=item X<\w and \W|regex syntax,\w;regex syntax,\W>
173173
174174
C<\w> matches a single word character, that is a letter (Unicode category L),
175175
a digit or an underscore. C<\W> matches a single character that isn't a word
@@ -183,7 +183,7 @@ Examples of word characters:
183183
03F3 ϳ GREEK LETTER YOT
184184
0409 Љ CYRILLIC CAPITAL LETTER LJE
185185
186-
=head2 Unicode properties
186+
=head2 X«Unicode properties|regex syntax,<:property>»
187187
188188
The character classes so far are mostly for convenience; a more systematic
189189
approach is the use of Unicode properties. They are called in the form
@@ -265,7 +265,7 @@ C<< <:Ll+:N> >> or C<< <:Ll+:Number> >> or C<< <+ :Lowercase_Letter + :Number> >
265265
(Grouping of set operations with round parens inside character classes is
266266
supposed to work, but not supported by Rakudo at the time of writing).
267267
268-
=head2 Enumerated character classes and ranges
268+
=head2 X«Enumerated character classes and ranges|regex syntax,<[ ]>;regex assertion,<-[ ]>»
269269
270270
Sometimes the pre-existing wildcards and character classes are just not
271271
enough. Fortunately, defining your own is simple enough. Between C<< <[ ]> >>,
@@ -312,7 +312,7 @@ Quantifiers bind tighter than concatenation, so C<ab+> matches one C<a>
312312
followed by one or more C<b>s. This is different for quotes, so C<'ab'+>
313313
matches the strings C<ab>, C<abab>, C<ababab> etc.
314314
315-
=head2 One or more: +
315+
=head2 X<One or more: +|regex syntax,+>
316316
317317
The C<+> quantifier makes the preceding atom match one or more times, with
318318
no upper limit.
@@ -322,7 +322,7 @@ like this:
322322
323323
/ \w+ '=' \w+ /
324324
325-
=head2 Zero or more: *
325+
=head2 X<Zero or more: *|regex syntax,*>
326326
327327
The C<*> quantifier makes the preceding atom match zero or more times, with
328328
no upper limit.
@@ -331,19 +331,19 @@ For example to optional whitespace between C<a> and C<b> you can write
331331
332332
/ a \s* b /
333333
334-
=head2 Zero or one match: ?
334+
=head2 X<Zero or one match: ?|regex syntax,?>
335335
336336
The C<?> quantifier makes the preceding atom match zero or one time.
337337
338-
=head2 General quantifier: ** min..max
338+
=head2 X<General quantifier: ** min..max|regex quantifier,**>
339339
340340
To quantifier an atom an arbitrary number of times, you can say for example
341341
C<a ** 2..5> to match the character C<a> at least twice and at most 5 times
342342
343343
If minimal and maximal number of matches are the same, a single integer
344344
is possible: C<a ** 5> to match C<a> exactly five times.
345345
346-
=head1 Alternation
346+
=head1 X<Alternation|regex syntax,||>
347347
348348
To match one of several possible alternatives, separate them by C<||>; the
349349
first matching alternative wins.
@@ -379,7 +379,7 @@ match.
379379
Anchors need to match successfully in order for the whole regex to match, but
380380
they do not use up characters while matching.
381381
382-
=head2 C<^>, Start of String
382+
=head2 X«C<^>, Start of String|regex syntax,^»
383383
384384
The C<^> assertion only matches at the start of the string.
385385
@@ -388,7 +388,7 @@ The C<^> assertion only matches at the start of the string.
388388
say so 'perly' ~~ /^ perl/; # True
389389
say so 'perl' ~~ /^ perl/; # True
390390
391-
=head2 C<^^>, Start of Line and C<$$>, End of Line
391+
=head2 X«C<^^>, Start of Line and C<$$>, End of Line|regex syntax,^^;regex syntax,$$»
392392
393393
The C<^^> assertion matches at the start of a logical line. That is, either at
394394
the start of the string, or after a newline character.
@@ -420,7 +420,7 @@ leading space, and the third and fourth lines have two leading spaces each).
420420
# and the end of line)
421421
say so $str ~~ / '."' $$/; # True (at the last line)
422422
423-
=head2 C<<< << >>> and C<<< >> >>>, left and right word boundary
423+
=head2 X<<<<C<<< << >>> and C<<< >> >>>, left and right word boundary|regex syntax,<<;regex syntax,>>;regex syntax,«;regex syntax,»>>>>
424424
425425
C<<< << >>> matches a left word boundary, so positions where at the left there
426426
a non-word character (or the start of the string), and to the right there is a
@@ -438,7 +438,7 @@ the end of the string.
438438
say so $str ~~ /<< own/; # False
439439
say so $str ~~ /own >>/; # True
440440
441-
=head1 Grouping and Capturing
441+
=head1 X«Grouping and Capturing|regex syntax,( );regex syntax,[ ];regex syntax,$<capture> =»
442442
443443
In regular (non-regex) Perl 6, you can use parenthesis to group things
444444
together, usually to override operator precedence:
@@ -561,7 +561,7 @@ named captures:
561561
But there is a more convenient way to get named captures, discussed in the
562562
next section.
563563
564-
=head1 Subrules
564+
=head1 X<Subrules|declarator,regex>
565565
566566
Just like you can put pieces of code into subroutines, so you can also put
567567
pieces of regex into named rules.
@@ -649,7 +649,7 @@ like C<:overlap> go along with the matching:
649649
# aA
650650
651651
652-
=head2 Regex Adverbs
652+
=head2 X<Regex Adverbs|regex adverb,:ignorecase;regex adverb,:i>
653653
654654
Adverbs that appear at the time of a regex declaration are part of the actual regex,
655655
and influences how the Perl 6 compiler translates the regex into binary code.
@@ -677,7 +677,7 @@ Brackets and parenthesis limit the scope of an adverb:
677677
/ (:i a b) c / # matches 'ABc' but not 'ABC'
678678
/ [:i a b] c / # matches 'ABc' but not 'ABC'
679679
680-
=head3 Ratchet
680+
=head3 X<Ratchet|regex adverb,:ratchet;regex adverb,:r>
681681
682682
The C<:ratchet> or C<:r> adverb causes the regex engine not to backtrack.
683683
@@ -710,7 +710,7 @@ to declaring ratcheting regex:
710710
# short for
711711
my regex thing { :r ... }
712712
713-
=head3 Sigspace
713+
=head3 X<Sigspace|regex adverb,:sigspace;regex adverb,:s>
714714
715715
The B<C<:sigspace>> or B<C<:s>> adverb makes whitespace significant in a regex.
716716
@@ -809,7 +809,7 @@ matching adverbs only make sense while matching a string against a regex.
809809
They can never appear inside a regex, only on the outside - either as part of
810810
an C<m/.../> match, or as arguments to a match method.
811811
812-
=head3 Continue
812+
=head3 X<Continue|matching adverb,:continue;matching adverb,:c>
813813
814814
The C<:continue> or short C<:c> adverb takes an argument. The argument is the
815815
position where the regex should start to search. By default, it searches from
@@ -824,7 +824,7 @@ the start of the string, but C<:c> overrides that.
824824
825825
TODO
826826
827-
=head2 Global
827+
=head3 X<Global|regex adverb,:global;regex adverb,:g>
828828
829829
Instead of search just one match, and returning a L<Match|/type/Match>, search
830830
for every non-overlapping match and returns them in a L<List|/type/List>.
@@ -837,7 +837,7 @@ for every non-overlapping match and returns them in a L<List|/type/List>.
837837
838838
C<:g> is a shortcut for C<:global>.
839839
840-
=head3 Pos
840+
=head3 X<Pos|regex adverb,:pos;regex adverb,:p>
841841
842842
Anchor the match at a specific position in the string:
843843

0 commit comments

Comments
 (0)