@@ -180,6 +180,94 @@ Examples of word characters:
180
180
03F3 ϳ GREEK LETTER YOT
181
181
0409 Љ CYRILLIC CAPITAL LETTER LJE
182
182
183
+ = head2 Unicode properties
184
+
185
+ The character classes so far are mostly for convenience; a more systematic
186
+ approach is the use of Unicode properties. They are called in the form
187
+ C << <:property> >> , where C < property > can be a short or long Unicode property
188
+ name.
189
+
190
+ The following list is stolen from the Perl 5
191
+ L < perlunicode|http://perldoc.perl.org/perlunicode.html > documentation:
192
+
193
+ = begin table
194
+
195
+ Short Long
196
+ ===== =====
197
+ L Letter
198
+ LC Cased_Letter
199
+ Lu Uppercase_Letter
200
+ Ll Lowercase_Letter
201
+ Lt Titlecase_Letter
202
+ Lm Modifier_Letter
203
+ Lo Other_Letter
204
+
205
+ M Mark
206
+ Mn Nonspacing_Mark
207
+ Mc Spacing_Mark
208
+ Me Enclosing_Mark
209
+
210
+ N Number
211
+ Nd Decimal_Number (also Digit)
212
+ Nl Letter_Number
213
+ No Other_Number
214
+
215
+ P Punctuation (also Punct)
216
+ Pc Connector_Punctuation
217
+ Pd Dash_Punctuation
218
+ Ps Open_Punctuation
219
+ Pe Close_Punctuation
220
+ Pi Initial_Punctuation
221
+ (may behave like Ps or Pe depending on usage)
222
+ Pf Final_Punctuation
223
+ (may behave like Ps or Pe depending on usage)
224
+ Po Other_Punctuation
225
+
226
+ S Symbol
227
+ Sm Math_Symbol
228
+ Sc Currency_Symbol
229
+ Sk Modifier_Symbol
230
+ So Other_Symbol
231
+
232
+ Z Separator
233
+ Zs Space_Separator
234
+ Zl Line_Separator
235
+ Zp Paragraph_Separator
236
+
237
+ C Other
238
+ Cc Control (also Cntrl)
239
+ Cf Format
240
+ Cs Surrogate
241
+ Co Private_Use
242
+ Cn Unassigned
243
+
244
+ = end table
245
+
246
+ So for example C << <:Lu> >> matches a single, upper-case letter.
247
+
248
+ Negation works as C << <:!category> >> , so C << <:!Lu> >> matches a single
249
+ character that isn't an upper-case letter.
250
+
251
+ Several category can be combined with one of these infix operators:
252
+
253
+ = begin table
254
+
255
+ Operator Meaning
256
+ ======== =======
257
+ + set union
258
+ | set union
259
+ & set intersection
260
+ - set difference (first minus second)
261
+ ^ symmetric set intersection / XOR
262
+
263
+ = end table
264
+
265
+ So for example to either match a lower-case letter or a number, one can write
266
+ C << <:Ll+:N> >> or C << <:Ll+:Number> >> or C << C < + :Lowercase_Letter + :Number > >> .
267
+
268
+ (Grouping of set operations with round parens inside character classes is
269
+ supposed to work, but not supported by Rakudo at the time of writing).
270
+
183
271
= head2 Enumerated character classes and ranges
184
272
185
273
TODO
0 commit comments