Update regexes.pod6

tisonkun · web-flow · commit b24a42baee49 · 2017-10-30T02:17:28.000-05:00
diff --git a/doc/Language/regexes.pod6 b/doc/Language/regexes.pod6
@@ -28,8 +28,8 @@ the colon is forbidden because it clashes with adverbs, such as C<rx:i/abc/>
 (case insensitive regexes), and round parentheses indicate a function call
 instead.
 
-Whitespace in regexes is generally ignored (except with the C<:s> or
-C<:sigspace> adverb).
+Whitespace in regexes is generally ignored (except with the C<:s> or,
+completely, C<:sigspace> adverb).
 
 As with Perl 6, in general, comments in regexes start with a hash character
 C<#> and go to the end of the line.
@@ -62,8 +62,8 @@ part of the string matches the regex:
     };
 
 Match results are stored in the C<$/> variable and are also returned from
-the match. The result is of L<type Match|/type/Match> if the match was successful;
-otherwise it's L<Nil|/type/Nil>.
+the match. The result is of L<type Match|/type/Match> if the match was
+successful; otherwise it's L<Nil|/type/Nil>.
 
 =head1 Wildcards and character classes
 
@@ -89,7 +89,7 @@ because there's no character to match before C<per> in the target string.
 There are predefined character classes of the form C<\w>. Its negation is
 written with an upper-case letter, C<\W>.
 
-=item X<\d and \D|regex,\d;regex,\D>
+=item X<C<\d> and C<\D>|regex,\d;regex,\D>
 
 C<\d> matches a single digit (Unicode property C<N>) and C<\D> matches a
 single character that is not a digit.
@@ -102,21 +102,21 @@ match C<\d>, but also digits from other scripts.
 
 Examples for digits are:
 
-    =begin code :skip-test
+    =begin code :lang<text>
     U+0035 5 DIGIT FIVE
-    U+07C2 ߂ NKO DIGIT TWO
+    U+0BEB ௫ TAMIL DIGIT FIVE
     U+0E53 ๓ THAI DIGIT THREE
-    U+1B56 ᭖ BALINESE DIGIT SIX
+    U+17E5 ៥ KHMER DIGIT FIVE
     =end code
 
-=item X<\h and \H|regex,\h;regex,\H>
+=item X<C<\h> and C<\H>|regex,\h;regex,\H>
 
 C<\h> matches a single horizontal whitespace character. C<\H> matches a
 single character that is not a horizontal whitespace character.
 
 Examples for horizontal whitespace characters are
 
-    =begin code :skip-test
+    =begin code :lang<text>
     U+0020 SPACE
     U+00A0 NO-BREAK SPACE
     U+0009 CHARACTER TABULATION
@@ -126,14 +126,14 @@ Examples for horizontal whitespace characters are
 Vertical whitespace like newline characters are explicitly excluded; those
 can be matched with C<\v>, and C<\s> matches any kind of whitespace.
 
-=item X<\n and \N|regex,\n;regex,\N>
+=item X<C<\n> and C<\N>|regex,\n;regex,\N>
 
 C<\n> matches a single, logical newline character. C<\n> is supposed to also
 match a Windows CR LF codepoint pair; though it's unclear whether the magic
 happens at the time that external data is read, or at regex match time.
 C<\N> matches a single character that's not a logical newline.
 
-=item X<\s and \S|regex,\s;regex,\S>
+=item X<C<\s> and C<\S>|regex,\s;regex,\S>
 
 C<\s> matches a single whitespace character. C<\S> matches a single
 character that is not whitespace.
@@ -142,20 +142,20 @@ character that is not whitespace.
         say ~$/;        # OUTPUT: «word␤»
     }
 
-=item X<\t and \T|regex,\t;regex,\T>
+=item X<C<\t> and C<\T>|regex,\t;regex,\T>
 
 C<\t> matches a single tab/tabulation character, C<U+0009>. (Note that
 exotic tabs like the C<U+000B VERTICAL TABULATION> character are not
 included here). C<\T> matches a single character that is not a tab.
 
-=item X<\v and \V|regex,\v;regex,\V>
+=item X<C<\v> and C<\V>|regex,\v;regex,\V>
 
 C<\v> matches a single vertical whitespace character. C<\V> matches a single
 character that is not vertical whitespace.
 
 Examples for vertical whitespace characters:
 
-    =begin code :skip-test
+    =begin code :lang<text>
     U+000A LINE FEED
     U+000B VERTICAL TABULATION
     U+000C FORM FEED
@@ -167,15 +167,15 @@ Examples for vertical whitespace characters:
 
 Use C<\s> to match any kind of whitespace, not just vertical whitespace.
 
-=item X<\w and \W|regex,\w;regex,\W>
+=item X<C<\w> and C<\W>|regex,\w;regex,\W>
 
 C<\w> matches a single word character; i.e., a letter (Unicode category L), a
 digit or an underscore. C<\W> matches a single character that isn't a word
 character.
 
 Examples of word characters:
 
-    =begin code :skip-test
+    =begin code :lang<text>
     0041 A LATIN CAPITAL LETTER A
     0031 1 DIGIT ONE
     03B4 δ GREEK SMALL LETTER DELTA
@@ -185,37 +185,37 @@ Examples of word characters:
 
 Predefined subrules:
 
-    =begin code :skip-test
-    <alnum>   \w       'alpha' plus 'digit'
+    =begin code :lang<text>
     <alpha>   <:L>     Alphabetic characters
-    <blank>   \h       Horizontal whitespace
-    <cntrl>            Control characters
     <digit>   \d       Decimal digits
-    <graph>            'alnum' plus 'punct'
-    <lower>   <:Ll>    Lowercase characters
-    <print>            'graph' plus 'space', but no 'cntrl'
+    <xdigit>           Hexadecimal digit [0-9A-Fa-f]
+    <alnum>   \w       'alpha' plus 'digit'
     <punct>            Punctuation and Symbols (only Punct beyond ASCII)
+    <graph>            'alnum' plus 'punct'
     <space>   \s       Whitespace
+    <cntrl>            Control characters
+    <print>            'graph' plus 'space', but no 'cntrl'
+    <blank>   \h       Horizontal whitespace
+    <lower>   <:Ll>    Lowercase characters
     <upper>   <:Lu>    Uppercase characters
     <?same>            Matches between two identical characters
     <?wb>              Word Boundary (zero-width assertion, ? suppress capture)
     <?ww>              Within Word (zero-width assertion, ? suppress capture)
-    <xdigit>           Hexadecimal digit [0-9A-Fa-f]
     =end code
 
 =head2 X«Unicode properties|regex,<:property>»
 
 The character classes mentioned so far are mostly for convenience; another
-approach is to use Unicode character properties. These come in the form C<<
-<:property> >>, where C<property> can be a short or long Unicode General
+approach is to use Unicode character properties. These come in the form
+C«<:property>», where C<property> can be a short or long Unicode General
 Category name. These use pair syntax.
 
 To match against a Unicode Property:
 
     "a".uniprop('Script');                 # OUTPUT: «Latin␤»
-    "a" ~~ / <:Script<Latin>> /;
+    "a" ~~ / <:Script<Latin>> /;           # OUTPUT: «｢a｣␤»
     "a".uniprop('Block');                  # OUTPUT: «Basic Latin␤»
-    "a" ~~ / <:Block('Basic Latin')> /;
+    "a" ~~ / <:Block('Basic Latin')> /;    # OUTPUT: «｢a｣␤»
 
 The following list of Unicode General Categories is stolen from the Perl 5
 L<perlunicode|http://perldoc.perl.org/perlunicode.html> documentation:
@@ -267,9 +267,9 @@ L<perlunicode|http://perldoc.perl.org/perlunicode.html> documentation:
 
 =end table
 
-For example, C<< <:Lu> >> matches a single, upper-case letter.
+For example, C«<:Lu>» matches a single, upper-case letter.
 
-It's negation is this: C<< <:!property> >>. So, C<< <:!Lu> >> matches a single
+It's negation is this: C«<:!property>». So, C«<:!Lu>» matches a single
 character that isn't an upper-case letter.
 
 Categories can be used together, with an infix operator:
@@ -287,7 +287,7 @@ Categories can be used together, with an infix operator:
 =end table
 
 To match either a lower-case letter or a number, write
-C<< <:Ll+:N> >> or C<< <:Ll+:Number> >> or C<< <+ :Lowercase_Letter + :Number> >>.
+C«<:Ll+:N>» or C«<:Ll+:Number>» or C«<+ :Lowercase_Letter + :Number>».
 
 It's also possible to group categories and sets of categories with
 parentheses; for example:
@@ -297,20 +297,20 @@ parentheses; for example:
 =head2 X«Enumerated character classes and ranges|regex,<[ ]>;regex,<-[ ]>»
 
 Sometimes the pre-existing wildcards and character classes are not enough.
-Fortunately, defining your own is fairly simple. Within C<< <[ ]> >>, you
+Fortunately, defining your own is fairly simple. Within C«<[ ]>», you
 can put any number of single characters and ranges of characters (expressed
 with two dots between the end points), with or without whitespace.
 
     "abacabadabacaba" ~~ / <[ a .. c 1 2 3 ]> /;
     # Unicode hex codepoint range
     "ÀÁÂÃÄÅÆ" ~~ / <[ \x[00C0] .. \x[00C6] ]> /;
     # Unicode named codepoint range
-    "ÀÁÂÃÄÅÆ" ~~ / <[ \c[LATIN CAPITAL LETTER A WITH GRAVE] .. \c[LATIN CAPITAL LETTER AE] ]> /;
+    "αβγ" ~~ /<[\c[GREEK SMALL LETTER ALPHA]..\c[GREEK SMALL LETTER GAMMA]]>/;
 
-Within the C<< < > >> you can use C<+> and C<-> to add or
+Within the C«< >» you can use C<+> and C<-> to add or
 remove multiple range definitions and
 even mix in some of the unicode categories above. You can also
-write the backslashed forms for character classes between the C< [ ] >.
+write the backslashed forms for character classes between the C<[ ]>.
 
     / <[\d] - [13579]> /;
     # starts with \d and removes odd ASCII digits, but not quite the same as