@@ -526,8 +526,9 @@ This can be useful for augmenting an existing regex. For example if you have
526
526
a regex C < quoted > that matches a quoted string, then C < / <quoted> && <-[x]>* / >
527
527
matches a quoted string that does not contain the character C < x > .
528
528
529
- Note that you cannot easily obtain the same behavior with a look-ahead, because
530
- a look-ahead doesn't stop looking when the quoted string stops matching.
529
+ Note that you cannot easily obtain the same behavior with a look-ahead, that
530
+ is, a regex doens't consume characters, because a look-ahead doesn't stop
531
+ looking when the quoted string stops matching.
531
532
532
533
= begin code
533
534
say 'abc' ~~ / <?before a> && . /; # OUTPUT: «Nil»
@@ -590,65 +591,93 @@ The following is a multi-line string:
590
591
and keep it safe
591
592
EOS
592
593
593
- say so $str ~~ /safe $/; # OUTPUT: «True» -- 'safe' is at the end of the string
594
- say so $str ~~ /secret $/; # OUTPUT: «False» -- 'secret' is at the end of a line -- not the string
595
- say so $str ~~ /^Keep /; # OUTPUT: «True» -- 'Keep' is at the start of the string
596
- say so $str ~~ /^and /; # OUTPUT: «False» -- 'and' is at the start of a line -- not the string
594
+ # 'safe' is at the end of the string
595
+ say so $str ~~ /safe $/; # OUTPUT: «True»
596
+
597
+ # 'secret' is at the end of a line, not the string
598
+ say so $str ~~ /secret $/; # OUTPUT: «False»
599
+
600
+ # 'Keep' is at the start of the string
601
+ say so $str ~~ /^Keep /; # OUTPUT: «True»
602
+
603
+ # 'and' is at the start of a line -- not the string
604
+ say so $str ~~ /^and /; # OUTPUT: «False»
597
605
598
606
= head2 X « C < ^^ > , Start of Line and C < $$ > , End of Line|regex,^^;regex,$$»
599
607
600
608
The C < ^^ > assertion matches at the start of a logical line. That is, either
601
- at the start of the string, or after a newline character. However, it does not match
602
- at the end of the string, even if it ends with a newline character.
609
+ at the start of the string, or after a newline character. However, it does not
610
+ match at the end of the string, even if it ends with a newline character.
603
611
604
612
C < $$ > matches only at the end of a logical line, that is, before a newline
605
613
character, or at the end of the string when the last character is not a
606
614
newline character.
607
615
608
616
(To understand the following example, it's important to know that the
609
- C < q:to/EOS/...EOS > "heredoc" syntax removes leading indention to the same
610
- level as the C < EOS > marker, so that the first, second and last lines have no
611
- leading space and the third and fourth lines have two leading spaces each).
617
+ C < q:to/EOS/...EOS > L < heredoc|/language/quoting#Heredocs:_:to > syntax removes
618
+ leading indention to the same level as the C < EOS > marker, so that the first,
619
+ second and last lines have no leading space and the third and fourth lines have
620
+ two leading spaces each).
612
621
613
- = begin code
614
- my $str = q:to/EOS/;
615
- There was a young man of Japan
616
- Whose limericks never would scan.
617
- When asked why this was,
618
- He replied "It's because
619
- I always try to fit as many syllables into the last line as ever I possibly can."
620
- EOS
621
-
622
- say so $str ~~ /^^ There/; # OUTPUT: «True» -- start of string
623
- say so $str ~~ /^^ limericks/; # OUTPUT: «False» -- not at the start of a line
624
- say so $str ~~ /^^ I/; # OUTPUT: «True» -- start of the last line
625
- say so $str ~~ /^^ When/; # OUTPUT: «False» -- there are blanks between
626
- # start of line and the "When"
627
-
628
- say so $str ~~ / Japan $$/; # OUTPUT: «True» -- end of first line
629
- say so $str ~~ / scan $$/; # OUTPUT: «False» -- there's a . between "scan"
630
- # and the end of line
631
- say so $str ~~ / '."' $$/; # OUTPUT: «True» -- at the last line
632
- = end code
622
+ = begin code
623
+ my $str = q:to/EOS/;
624
+ There was a young man of Japan
625
+ Whose limericks never would scan.
626
+ When asked why this was,
627
+ He replied "It's because I always try to fit
628
+ as many syllables into the last line as ever I possibly can."
629
+ EOS
630
+
631
+ # 'There' is at the start of string
632
+ say so $str ~~ /^^ There/; # OUTPUT: «True»
633
+
634
+ # 'limericks' is not at the start of a line
635
+ say so $str ~~ /^^ limericks/; # OUTPUT: «False»
636
+
637
+ # 'as' is at start of the last line
638
+ say so $str ~~ /^^ as/; # OUTPUT: «True»
639
+
640
+ # there are blanks between start of line and the "When"
641
+ say so $str ~~ /^^ When/; # OUTPUT: «False»
642
+
643
+ # 'Japan' is at end of first line
644
+ say so $str ~~ / Japan $$/; # OUTPUT: «True»
645
+
646
+ # there's a . between "scan" and the end of line
647
+ say so $str ~~ / scan $$/; # OUTPUT: «False»
648
+
649
+ # matched at the last line
650
+ say so $str ~~ / '."' $$/; # OUTPUT: «True»
651
+ = end code
633
652
634
653
635
654
= head2 X « C « <|w> » and C « <!|w> » , word boundary|regex, <|w>;regex, <!|w>»
636
655
637
656
To match any word boundary, use C « <|w> » . This is similar to other
638
- languages’ X « C < \b > |regex deprecated,\b» .
639
- To match not a word boundary, use <!|w>, similar to other languages X < C < \B > |regex deprecated, \B > .
657
+ languages' X « C < \b > |regex deprecated,\b» .
658
+
659
+ To match not a word boundary, use <!|w>. This is similar to other
660
+ languages' X < C < \B > |regex deprecated, \B > .
661
+
640
662
These are both zero width assertions.
641
663
642
- = head2 X <<< <C <<< << >>> and C <<< >> >>> , left and right word boundary|regex,<<;regex,>>;regex,«;regex,» >>> >
664
+ say "two-words" ~~ / "two"<|w>"-"<|w>"words" /; # OUTPUT: «「two-words」»
665
+ say "two-words" ~~ / "two"<!|w>"-"<!|w>"words" /; # OUTPUT: «Nil»
666
+
667
+ = head2 C « << » and C « >> » , left and right word boundary
643
668
644
- C <<< << >>> matches a left word boundary. It matches positions where there
669
+ X « |regex, <<; regex, >>; regex, «; regex, » »
670
+
671
+ C « << » matches a left word boundary. It matches positions where there
645
672
is a non-word character at the left (or the start of the string) and a word
646
673
character to the right.
647
674
648
- C <<< >> >>> matches a right word boundary. It matches positions where there
675
+ C « >> » matches a right word boundary. It matches positions where there
649
676
is a word character at the left and a non-word character at the right (or
650
677
the end of the string).
651
678
679
+ These are both zero width assertions.
680
+
652
681
my $str = 'The quick brown fox';
653
682
say so $str ~~ /br/; # OUTPUT: «True»
654
683
say so $str ~~ /<< br/; # OUTPUT: «True»
@@ -663,34 +692,34 @@ You can also use the variants C<«> and C<»> :
663
692
say so $str ~~ /« own/; # OUTPUT: «False»
664
693
say so $str ~~ /own »/; # OUTPUT: «True»
665
694
666
- = head1 X « Grouping and Capturing|regex,( );regex,[ ];regex,$<capture> = »
695
+ = head1 Grouping and Capturing
667
696
668
697
In regular (non-regex) Perl 6, you can use parentheses to group things
669
698
together, usually to override operator precedence:
670
699
671
- say 1 + 4 * 2; # 9 , parsed as 1 + (4 * 2)
672
- say (1 + 4) * 2; # OUTPUT: «10»
700
+ say 1 + 4 * 2; # OUTPUT: «9» , parsed as 1 + (4 * 2)
701
+ say (1 + 4) * 2; # OUTPUT: «10»
673
702
674
703
The same grouping facility is available in regexes:
675
704
676
- / a || b c /; # matches 'a' or 'bc'
677
- / ( a || b ) c /; # matches 'ac' or 'bc'
705
+ / a || b c /; # matches 'a' or 'bc'
706
+ / ( a || b ) c /; # matches 'ac' or 'bc'
678
707
679
708
The same grouping applies to quantifiers:
680
709
681
- / a b+ /; # matches an 'a' followed by one or more 'b's
682
- / (a b)+ /; # matches one or more sequences of 'ab'
683
- / (a || b)+ /; # matches a sequence of 'a's and 'b's, at least one long
710
+ / a b+ /; # matches an 'a' followed by one or more 'b's
711
+ / (a b)+ /; # matches one or more sequences of 'ab'
712
+ / (a || b)+ /; # matches a string of 'a's and 'b's, except empty string
684
713
685
714
An unquantified capture produces a L < Match > object. When a capture is
686
715
quantified (except with the C < ? > quantifier) the capture becomes a list of
687
716
L < Match > objects instead.
688
717
689
- = head2 Capturing
718
+ = head2 X « Capturing|regex,( ) »
690
719
691
720
The round parentheses don't just group, they also I < capture > ; that is, they
692
721
make the string matched within the group available as a variable, and also as
693
- an element of the resulting L < Match|/type/Match > object:
722
+ an element of the resulting L < Match > object:
694
723
695
724
my $str = 'number 42';
696
725
if $str ~~ /'number ' (\d+) / {
@@ -716,7 +745,7 @@ access all elements:
716
745
say $/.list.join: ', ' # OUTPUT: «a, c»
717
746
}
718
747
719
- = head2 Non-capturing grouping
748
+ = head2 X « Non-capturing grouping|regex,[ ] »
720
749
721
750
The parentheses in regexes perform a double role: they group the regex
722
751
elements inside and they capture what is matched by the sub-regex inside.
@@ -728,9 +757,10 @@ instead.
728
757
say ~$0; # OUTPUT: «c»
729
758
}
730
759
731
- If you do not need the captures, using non-capturing groups provides three
732
- benefits: they more cleanly communicate the regex intent; they make it easier to
733
- count the capturing groups that you do care about; and matching is bit faster.
760
+ If you do not need the captures, using non-capturing groups provides
761
+ three benefits: they more cleanly communicate the regex intent; they
762
+ make it easier to count the capturing groups that you do care about;
763
+ and matching is bit faster.
734
764
735
765
= head2 Capture numbers
736
766
@@ -749,21 +779,16 @@ Alternations reset the capture count:
749
779
Example:
750
780
751
781
if 'abc' ~~ /(x)(y) || (a)(.)(.)/ {
752
- say ~$1; # b
782
+ say ~$1; # OUTPUT: «b»
753
783
}
754
784
755
785
If two (or more) alternations have a different number of captures,
756
786
the one with the most captures determines the index of the next capture:
757
787
758
- = begin code
759
- $_ = 'abcd';
760
-
761
- if / a [ b (.) || (x) (y) ] (.) / {
762
- # $0 $0 $1 $2
763
- say ~$2; # d
764
- }
765
- = end code
766
-
788
+ if 'abcd' ~~ / a [ b (.) || (x) (y) ] (.) / {
789
+ # $0 $0 $1 $2
790
+ say ~$2; # OUTPUT: «d»
791
+ }
767
792
768
793
Captures can be nested, in which case they are numbered per level
769
794
@@ -783,23 +808,24 @@ it in a variable first:
783
808
say "11" ~~ /(\d) {} :my $c = $0; ($c)/;
784
809
# OUTPUT: «「11」 0 => 「1」 1 => 「1」»
785
810
786
- = head2 Named captures
811
+ = head2 X < Named captures|regex, Named captures >
787
812
788
- Instead of numbering captures, you can also give them names. The generic --
789
- and slightly verbose -- way of naming captures is like this:
813
+ Instead of numbering captures, you can also give them names. The generic,
814
+ and slightly verbose, way of naming captures is like this:
790
815
791
816
if 'abc' ~~ / $<myname> = [ \w+ ] / {
792
817
say ~$<myname> # OUTPUT: «abc»
793
818
}
794
819
795
- The access to the named capture, C << $<myname> >> , is a shorthand for indexing
796
- the match object as a hash, in other words: C < $/{ 'myname' } > or C << $/<myname> >> .
820
+ The access to the named capture, C « $<myname> » , is a shorthand for indexing
821
+ the match object as a hash, in other words: C < $/{ 'myname' } > or C « $/<myname> » .
797
822
798
823
Named captures can also be nested using regular capture group syntax:
799
824
800
825
if 'abc-abc-abc' ~~ / $<string>=( [ $<part>=[abc] ]* % '-' ) / {
801
- say ~$<string>; # OUTPUT: «abc-abc-abc»
802
- say ~$<string><part>; # OUTPUT: «abc abc abc»
826
+ say ~$<string>; # OUTPUT: «abc-abc-abc»
827
+ say ~$<string><part>; # OUTPUT: «abc abc abc»
828
+ say ~$<string><part>[0]; # OUTPUT: «abc»
803
829
}
804
830
805
831
Coercing the match object to a hash gives you easy programmatic access to
@@ -818,12 +844,21 @@ all named captures:
818
844
}
819
845
820
846
A more convenient way to get named captures is discussed in
821
- the Subrules section.
847
+ the L < Subrules|#Subrules > section.
848
+
822
849
= head2 X « Capture markers: C « <( )> » |regex,<( )> »
823
850
824
- A C « <( » token indicates the start of the match's overall capture, while the corresponding C « )> »
825
- token indicates its endpoint. The C « <( » is similar to other languages X < \K|regex deprecated,\K > to discard any matches
826
- found before the C < \K > .
851
+ A C « <( » token indicates the start of the match's overall capture, while the
852
+ corresponding C « )> » token indicates its endpoint. The C « <( » is similar to other
853
+ languages X < \K|regex deprecated,\K > to discard any matches found before the
854
+ C < \K > .
855
+
856
+ say 'abc' ~~ / a <( b )> c/; # OUTPUT: «「b」»
857
+ say 'abc' ~~ / <(a <( b )> c)>/; # OUTPUT: «「bc」»
858
+
859
+ As the example above, you can see C « <( » set the startpoint and C « <) » set the
860
+ endpoint. They are actually independent.
861
+
827
862
828
863
= head1 Substitution
829
864
0 commit comments