@@ -61,21 +61,69 @@ and L<multi line/embedded comments|
61
61
62
62
say '2015-12-25'.match($regex); # OUTPUT: «「2015-12-25」»
63
63
64
- = head1 Literals
64
+ = head1 Literals and metacharacters
65
65
66
- The simplest use case for a regex is a match against a string literal:
66
+ A regex describes a pattern to be matched in terms of literals and
67
+ metacharacters. Alphanumeric characters and the underscore C < _ > constitute the
68
+ literals: these characters match themselves and nothing else. Other characters
69
+ act as metacharacters and may, as such, have a special meaning, either by
70
+ themselves (such as the dot C < . > , which serves as a wildcard) or together with
71
+ other characters in larger metasyntactic constructs (such as C « <?before ...> » ,
72
+ which defines a lookahead assertion). But before looking at metacharacters and
73
+ their particular uses, let's first explore the relation between literals and
74
+ metacharacters in some more detail.
75
+
76
+ In its simplest form a regex comprises only literals:
67
77
68
78
if 'properly' ~~ / perl / {
69
- say "'properly' contains 'perl'";
79
+ say "'properly' contains 'perl'"; # OUTPUT: «'properly' contains 'perl'»
70
80
}
71
81
72
- Alphanumeric characters and the underscore _ are matched literally. All other
73
- characters must either be escaped with a backslash (for example, C < \: > to match
74
- a colon), or be within quotes:
82
+ If you want a regex to literally match one or more characters that normally act
83
+ as metacharacters, these characters must either be escaped using a backslash, or
84
+ be quoted using single or double quotes.
85
+
86
+ The backslash serves as a switch. It switches a single metacharacter into a
87
+ literal, and vice versa:
88
+
89
+ / \# / # matches the hash metacharacter literally
90
+ / \w / # turns literal 'w' into a character class (see below)
91
+ /Hallelujah\!/ # matches string 'Hallelujah!' incl. exclamation mark
92
+
93
+ Even if a metacharacter does not (yet) have a special meaning in Perl 6,
94
+ escaping (or quoting) it is required to ensure that the regex compiles and
95
+ matches the character literally. This allows the clear distinction between
96
+ literals and metacharacters to be maintained:
97
+
98
+ / \, / # matches a literal comma ','
99
+ / , / # !! error: a yet meaningless/unrecognized metacharacter
100
+ # does not automatically match literally
101
+
102
+ While an escaping backslash exerts its effect on the next individual character,
103
+ single I < and multiple > metacharacters may be turned into literally matching
104
+ strings by quoting them using single or double quotes:
105
+
106
+ / "abc" / # you may quote literals like this, but it has no effect
107
+ / "Hallelujah!" / # yet, this form is generally preferred over /Hallelujah\!/
108
+
109
+ / "two words" / # quoting a space renders it significant, so this matches
110
+ # the string 'two words' including the intermediate space
111
+
112
+ / '#!:@' / # this regex matches the string of metacharacters '#!:@'
113
+
114
+ Quoting does not turn every metacharacter into a literal, however. This is due
115
+ to the fact that quotes allow for backslash-escapes and interpolation.
116
+ Specifically: in single quotes, the backslash may be used to escape single
117
+ quotes and the backslash itself; double quotes additionally enable the
118
+ interpolation of variables, and of code blocks of the form C < {...} > :
119
+
120
+ / '\\\'' / # matches a backslash followed by a single quote: \'
121
+ / '\' / # !! error: this is NOT the way to literally match a
122
+ # backslash because now it escapes the second quote
123
+ my $x = 'Hi';
124
+ / "$x there!" / # matches the string 'Hi there!'
75
125
76
- / 'two words' /; # matches 'two words' including the blank
77
- / "a:b" /; # matches 'a:b' including the colon
78
- / \# /; # matches a hash character
126
+ / "1 + 1 = {1+1}" / # matches the string '1 + 1 = 2'
79
127
80
128
Strings are searched left to right, so it is enough if only part of the string
81
129
matches the regex:
0 commit comments