Skip to content

Commit a27ba65

Browse files
committed
Rewrite section on ** regex quantifier
- Reduce prose on Range variants (examples show them already) - Make examples show the actual matched content rather than True/False - Explain how to write more complex Ranges and values - Document newly-implemented semantics for edge cases: Rakudo impl: rakudo/rakudo@681d6be974 Spec: Raku/roast@99c822abde
1 parent d19bedd commit a27ba65

File tree

1 file changed

+39
-18
lines changed

1 file changed

+39
-18
lines changed

doc/Language/regexes.pod6

Lines changed: 39 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -370,38 +370,59 @@ The C<?> quantifier makes the preceding atom match zero or once.
370370
371371
=head2 X<General quantifier: C<** min..max>|regex quantifier,**>
372372
373-
To quantify an atom an arbitrary number of times, you can write something like
374-
C<a ** 2..5> to match the character C<a> at least twice and at most 5 times.
373+
To quantify an atom an arbitrary number of times, use C<**> quantifier.
374+
The quantifier takes a single L<Int> or a L<Range> on the right hand side
375+
that specifies the number of times to match. If L<Range> is specified,
376+
the end-points specify the minimum and maximum number of times to match.
375377
376378
=begin code
377-
say so 'a' ~~ /a ** 2..5/; # OUTPUT: «False␤»
378-
say so 'aaa' ~~ /a ** 2..5/; # OUTPUT: «True␤»
379+
say 'abcdefg' ~~ /\w ** 4/; # OUTPUT: «「abcd」␤»
380+
say 'a' ~~ /\w ** 2..5/; # OUTPUT: «Nil␤»
381+
say 'abc' ~~ /\w ** 2..5/; # OUTPUT: «「abc」␤»
382+
say 'abcdefg' ~~ /\w ** 2..5/; # OUTPUT: «「abcde」␤»
383+
say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «「abcd」␤»
384+
say 'abcdefg' ~~ /\w ** ^3/; # OUTPUT: «「ab」␤»
379385
=end code
380386
381-
If the minimal and maximal number of matches are the same, a single integer
382-
is possible: C<a ** 5> matches C<a> exactly five times.
387+
Only basic literal syntax for the right hand side of the quantifier
388+
is supported, to avoid ambiguities with other regex constructs. If you need
389+
to use a more complex expression—for example a L<Range> made from
390+
variables—enclose the L<Range> into curly braces:
383391
384392
=begin code
385-
say so 'aaaaa' ~~ /a ** 5/; # OUTPUT: «True␤»
393+
my $start = 3;
394+
say 'abcdefg' ~~ /\w ** {$start .. $start+2}/; # OUTPUT: «「abcde」␤»
395+
say 'abcdefg' ~~ /\w ** {π.Int}/; # OUTPUT: «「abc」␤»
386396
=end code
387397
388-
It's also possible to use non inclusive ranges using a caret:
398+
Negative values are treated like zero:
389399
390400
=begin code
391-
say so 'a' ~~ /a ** 1^..^6/; # OUTPUT: «False␤» -- there are 2 to 5 'a's in a row
392-
say so 'aaaa' ~~ /a ** 1^..^6/; # OUTPUT: «True␤»
401+
say 'abcdefg' ~~ /\w ** {-Inf}/; # OUTPUT: «「」␤»
402+
say 'abcdefg' ~~ /\w ** {-42}/; # OUTPUT: «「」␤»
403+
say 'abcdefg' ~~ /\w ** {-10..-42}/; # OUTPUT: «「」␤»
404+
say 'abcdefg' ~~ /\w ** {-42..-10}/; # OUTPUT: «「」␤»
393405
=end code
394406
395-
This includes the numeric ranges starting from 0:
407+
If then, the resultant value is C<Inf>
408+
or C<NaN> or the resultant L<Range> is empty, non-Numeric, contains C<NaN>
409+
end-points, or has mimimum effective end-point as C<Inf>, the
410+
C<X::Syntax::Regex::QuantifierValue> exception will be thrown:
396411
397412
=begin code
398-
say so 'aaa' ~~ /a ** ^6/; # OUTPUT: «True␤» -- there are 0 to 5 'a's in a row
399-
=end code
400-
401-
or a Whatever C<*> operator for an infinite range with a non inclusive minimum:
402-
403-
=begin code
404-
say so 'aaaa' ~~ /a ** 1^..*/; # OUTPUT: «True␤» -- there are 2 or more 'a's in a row
413+
(try say 'abcdefg' ~~ /\w ** {42..10}/ ) orelse say ($!.^name, $!.empty-range);
414+
# OUTPUT: «(X::Syntax::Regex::QuantifierValue True)␤»
415+
(try say 'abcdefg' ~~ /\w ** {Inf..Inf}/) orelse say ($!.^name, $!.inf);
416+
# OUTPUT: «(X::Syntax::Regex::QuantifierValue True)␤»
417+
(try say 'abcdefg' ~~ /\w ** {NaN..42}/ ) orelse say ($!.^name, $!.non-numeric-range);
418+
# OUTPUT: «(X::Syntax::Regex::QuantifierValue True)␤»
419+
(try say 'abcdefg' ~~ /\w ** {"a".."c"}/) orelse say ($!.^name, $!.non-numeric-range);
420+
# OUTPUT: «(X::Syntax::Regex::QuantifierValue True)␤»
421+
422+
(try say 'abcdefg' ~~ /\w ** {Inf}/) orelse say ($!.^name, $!.inf);
423+
# OUTPUT: «(X::Syntax::Regex::QuantifierValue True)␤»
424+
(try say 'abcdefg' ~~ /\w ** {NaN}/) orelse say ($!.^name, $!.non-numeric);
425+
# OUTPUT: «(X::Syntax::Regex::QuantifierValue True)␤»
405426
=end code
406427
407428
=head2 X<Modified quantifier: C<%>|regex,%;regex,%%>

0 commit comments

Comments
 (0)