Skip to content

Commit fbff02f

Browse files
Upgrade of Literals section (#2946)
* Upgrade of Literals section * Some additional amendments to updated Literals section
1 parent 1b907cc commit fbff02f

File tree

1 file changed

+57
-9
lines changed

1 file changed

+57
-9
lines changed

doc/Language/regexes.pod6

Lines changed: 57 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -61,21 +61,69 @@ and L<multi line/embedded comments|
6161
6262
say '2015-12-25'.match($regex); # OUTPUT: «「2015-12-25」␤»
6363
64-
=head1 Literals
64+
=head1 Literals and metacharacters
6565
66-
The simplest use case for a regex is a match against a string literal:
66+
A regex describes a pattern to be matched in terms of literals and
67+
metacharacters. Alphanumeric characters and the underscore C<_> constitute the
68+
literals: these characters match themselves and nothing else. Other characters
69+
act as metacharacters and may, as such, have a special meaning, either by
70+
themselves (such as the dot C<.>, which serves as a wildcard) or together with
71+
other characters in larger metasyntactic constructs (such as C«<?before ...>»,
72+
which defines a lookahead assertion). But before looking at metacharacters and
73+
their particular uses, let's first explore the relation between literals and
74+
metacharacters in some more detail.
75+
76+
In its simplest form a regex comprises only literals:
6777
6878
if 'properly' ~~ / perl / {
69-
say "'properly' contains 'perl'";
79+
say "'properly' contains 'perl'"; # OUTPUT: «'properly' contains 'perl'␤»
7080
}
7181
72-
Alphanumeric characters and the underscore _ are matched literally. All other
73-
characters must either be escaped with a backslash (for example, C<\:> to match
74-
a colon), or be within quotes:
82+
If you want a regex to literally match one or more characters that normally act
83+
as metacharacters, these characters must either be escaped using a backslash, or
84+
be quoted using single or double quotes.
85+
86+
The backslash serves as a switch. It switches a single metacharacter into a
87+
literal, and vice versa:
88+
89+
/ \# / # matches the hash metacharacter literally
90+
/ \w / # turns literal 'w' into a character class (see below)
91+
/Hallelujah\!/ # matches string 'Hallelujah!' incl. exclamation mark
92+
93+
Even if a metacharacter does not (yet) have a special meaning in Perl 6,
94+
escaping (or quoting) it is required to ensure that the regex compiles and
95+
matches the character literally. This allows the clear distinction between
96+
literals and metacharacters to be maintained:
97+
98+
/ \, / # matches a literal comma ','
99+
/ , / # !! error: a yet meaningless/unrecognized metacharacter
100+
# does not automatically match literally
101+
102+
While an escaping backslash exerts its effect on the next individual character,
103+
single I<and multiple> metacharacters may be turned into literally matching
104+
strings by quoting them using single or double quotes:
105+
106+
/ "abc" / # you may quote literals like this, but it has no effect
107+
/ "Hallelujah!" / # yet, this form is generally preferred over /Hallelujah\!/
108+
109+
/ "two words" / # quoting a space renders it significant, so this matches
110+
# the string 'two words' including the intermediate space
111+
112+
/ '#!:@' / # this regex matches the string of metacharacters '#!:@'
113+
114+
Quoting does not turn every metacharacter into a literal, however. This is due
115+
to the fact that quotes allow for backslash-escapes and interpolation.
116+
Specifically: in single quotes, the backslash may be used to escape single
117+
quotes and the backslash itself; double quotes additionally enable the
118+
interpolation of variables, and of code blocks of the form C<{...}>:
119+
120+
/ '\\\'' / # matches a backslash followed by a single quote: \'
121+
/ '\' / # !! error: this is NOT the way to literally match a
122+
# backslash because now it escapes the second quote
123+
my $x = 'Hi';
124+
/ "$x there!" / # matches the string 'Hi there!'
75125
76-
/ 'two words' /; # matches 'two words' including the blank
77-
/ "a:b" /; # matches 'a:b' including the colon
78-
/ \# /; # matches a hash character
126+
/ "1 + 1 = {1+1}" / # matches the string '1 + 1 = 2'
79127
80128
Strings are searched left to right, so it is enough if only part of the string
81129
matches the regex:

0 commit comments

Comments
 (0)