Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1122 lines (774 sloc) 42.8 KB
=begin pod
=TITLE class Str
=SUBTITLE String of characters
class Str is Cool does Stringy { }
Built-in class for strings. Objects of type C<Str> are immutable, but
L<read the FAQ to understand precisely what this means|/language/faq#If_Str_is_immutable,_how_does_s///_work?_If_Int_is_immutable,_how_does_$i%2B%2B_work?>.
=head1 Methods
=head2 routine chop
multi method chop(Str:D:)
multi method chop(Str:D: Int() $chopping)
Returns the string with C<$chopping> characters removed from the end.
say "Whateverable".chop(3.6); # OUTPUT: «Whatevera␤»
my $string= "Whateverable";
say $string.chop("3"); # OUTPUT: «Whatevera␤»
The C<$chopping> positional is converted to C<Int> before being applied to the
string.
=head2 routine chomp
Defined as:
multi sub chomp(Str:D --> Str:D)
multi method chomp(Str:D: --> Str:D)
Returns the string with a logical newline (any codepoint that has the
C<NEWLINE> property) removed from the end.
Examples:
say chomp("abc\n"); # OUTPUT: «abc␤»
say "def\r\n".chomp; # OUTPUT: «def␤» NOTE: \r\n is a single grapheme!
say "foo\r".chomp; # OUTPUT: «foo␤»
=head2 method contains
Defined as:
multi method contains(Str:D: Cool:D $needle --> Bool:D)
multi method contains(Str:D: Str:D $needle --> Bool:D)
multi method contains(Str:D: Cool:D $needle, Int(Cool:D) $pos --> Bool:D)
multi method contains(Str:D: Str:D $needle, Int:D $pos --> Bool:D)
Coerces the invocant (represented in the signature by C<Str:D:>, that would be
the I<haystack>) and first argument (which we are calling C<$needle>) to
L<C<Str>|/type/Str> (if it's not already, that is, in the first and third
C<multi> forms), and searches for C<$needle> in the invocant (or I<haystack>)
starting from C<$pos> characters into the string, if that is included as an
argument. Returns C<True> if C<$needle> is found. C<$pos> is an optional
parameter, and if it's not present, C<contains> will search from the beginning
of the string (using the first two forms of the C<multi>).
say <Hello, World>.contains('Hello', 0); # OUTPUT: «True␤»
say "Hello, World".contains('Hello'); # OUTPUT: «True␤»
say "Hello, World".contains('hello'); # OUTPUT: «False␤»
say "Hello, World".contains('Hello', 1); # OUTPUT: «False␤»
say "Hello, World".contains(','); # OUTPUT: «True␤»
say "Hello, World".contains(',', 3); # OUTPUT: «True␤»
say "Hello, World".contains(',', 10); # OUTPUT: «False␤»
In the first example, coercion is used to convert a C<List> to a Str.
In the 4th case, the C<'Hello'> string is not found since we have
started looking at the second position in it (index 1). Note that
because of how a L<List|/type/List> or L<Array|/type/Array> is
L<coerced|/type/List#method_Str> into a L<Str|/type/Str>, the results
may sometimes be surprising. See
L<traps|/language/traps#Lists_become_strings,_so_beware_.contains()>.
=head2 routine lc
Defined as:
multi sub lc(Str:D --> Str:D)
multi method lc(Str:D: --> Str:D)
Returns a lower-case version of the string.
Examples:
lc("A"); # RESULT: «"a"»
"A".lc; # RESULT: «"a"»
=head2 routine uc
multi sub uc(Str:D --> Str:D)
multi method uc(Str:D: --> Str:D)
Returns an uppercase version of the string.
=head2 routine fc
multi sub fc(Str:D --> Str:D)
multi method fc(Str:D: --> Str:D)
Does a Unicode "fold case" operation suitable for doing caseless
string comparisons. (In general, the returned string is unlikely to
be useful for any purpose other than comparison.)
=head2 routine tc
multi sub tc(Str:D --> Str:D)
multi method tc(Str:D: --> Str:D)
Does a Unicode "titlecase" operation, that is changes the first character in
the string to title case, or to upper case if the character has no title case
mapping
=head2 routine tclc
multi sub tclc(Str:D --> Str:D)
multi method tclc(Str:D: --> Str:D)
Turns the first character to title case, and all other characters to lower
case
=head2 routine wordcase
=for code
multi sub wordcase(Cool $x --> Str)
multi sub wordcase(Str:D $x --> Str)
multi method wordcase(Str:D: :&filter = &tclc, Mu :$where = True --> Str)
Returns a string in which C<&filter> has been applied to all the words
that match C<$where>. By default, this means that the first letter of
every word is capitalized, and all the other letters lowercased.
=head2 method unival
multi method unival(Str:D --> Numeric)
Returns the numeric value that the first codepoint in the invocant represents,
or C<NaN> if it's not numeric.
say '4'.unival; # OUTPUT: «4␤»
say '¾'.unival; # OUTPUT: «0.75␤»
say 'a'.unival; # OUTPUT: «NaN␤»
=head2 method univals
multi method univals(Str:D --> List)
Returns a list of numeric values represented by each codepoint in the invocant
string, and C<NaN> for non-numeric characters.
say "4a¾".univals; # OUTPUT: «(4 NaN 0.75)␤»
=head2 routine chars
multi sub chars(Cool $x --> Int:D)
multi sub chars(Str:D $x --> Int:D)
multi sub chars(str $x --> int)
multi method chars(Str:D: --> Int:D)
Returns the number of characters in the string in graphemes. On the JVM,
this currently erroneously returns the number of codepoints instead.
=head2 method encode
multi method encode(Str:D $encoding = 'utf8', :$replacement, Bool() :$translate-nl = False, :$strict)
Returns a L<Blob|/type/Blob> which represents the original string in the given
encoding and normal form. The actual return type is as specific as
possible, so C<$str.encode('UTF-8')> returns a C<utf8> object,
C<$str.encode('ISO-8859-1')> a C<buf8>. If C<:translate-nl> is set to
C<True>, it will translate newlines from C<\n> to C<\n\r>, but only in
Windows. C<$replacement> indicates how characters are going to be
replaced in the case they are not available in the current encoding,
while C<$strict> indicates whether unmapped codepoints will still
decode; for instance, codepoint 129 which does not exist in
C<windows-1252>.
my $str = "Þor is mighty";
say $str.encode("ascii", :replacement( 'Th') ).decode("ascii");
# OUTPUT: «Thor is mighty␤»
In this case, any unknown character is going to be substituted by C<Th>. We know
in advance that the character that is not known in the C<ascii> encoding is
C<Þ>, so we substitute it by its latin equivalent, C<Th>. In the absence of any
replacement set of characters, C<:replacement> is understood as a C<Bool>:
=for code :preamble<my $str = "Þor is mighty";>
say $str.encode("ascii", :replacement).decode("ascii"); # OUTPUT: «?or is mighty␤»
If C<:replacement> is not set or assigned a value, the error C<Error encoding
ASCII string: could not encode codepoint 222> will be issued (in this case,
since þ is codepoint 222).
Since the C<Blob> returned by C<encode> is the original string in normal form,
and every element of a C<Blob> is a byte, you can obtain the length in bytes of
a string by calling a method that returns the size of the C<Blob> on it:
=for code
say "þor".encode.bytes; # OUTPUT: «4␤»
say "þor".encode.elems; # OUTPUT: «4␤»
=head2 routine index
multi method index(Str:D: Cool:D $needle --> Int:D)
multi method index(Str:D: Str:D $needle --> Int:D)
multi method index(Str:D: Cool:D $needle, Cool:D $pos --> Int:D)
multi method index(Str:D: Str:D $needle, Int:D $pos --> Int:D)
Searches for C<$needle> in the string starting from C<$pos> (if present). It
returns the offset into the string where C<$needle> was found, and C<Nil> if it
was not found.
Examples:
say index "Camelia is a butterfly", "a"; # OUTPUT: «1␤»
say index "Camelia is a butterfly", "a", 2; # OUTPUT: «6␤»
say index "Camelia is a butterfly", "er"; # OUTPUT: «17␤»
say index "Camelia is a butterfly", "Camel"; # OUTPUT: «0␤»
say index "Camelia is a butterfly", "Onion"; # OUTPUT: «Nil␤»
say index("Camelia is a butterfly", "Onion").defined ?? 'OK' !! 'NOT'; # OUTPUT: «NOT␤»
Other forms of index, including a sub, are
L<inherited from C<Cool>|/type/Cool#routine_index>. Check them there.
=head2 routine rindex
multi sub rindex(Str:D $haystack, Str:D $needle, Int $startpos = $haystack.chars --> Int)
multi method rindex(Str:D $haystack: Str:D $needle, Int $startpos = $haystack.chars --> Int)
Returns the last position of C<$needle> in C<$haystack> not after C<$startpos>.
Returns C<Nil> if C<$needle> wasn't found.
Examples:
say rindex "Camelia is a butterfly", "a"; # OUTPUT: «11␤»
say rindex "Camelia is a butterfly", "a", 10; # OUTPUT: «6␤»
=head2 method indices
Defined as:
multi method indices(Str:D: Str:D $needle, :$overlap --> List:D)
multi method indices(Str:D: Str:D $needle, Int:D $start, :$overlap --> List:D)
Searches for all occurrences of C<$needle> in the string starting from position
C<$start>, or zero if it is not specified, and returns a C<List> with all offsets
in the string where C<$needle> was found, or an empty list if it was not found.
If the optional parameter C<:overlap> is specified the search continues from the
index directly following the previous match, otherwise the search will continue
after the previous match.
say "banana".indices("a"); # OUTPUT: «(1 3 5)␤»
say "banana".indices("ana"); # OUTPUT: «(1)␤»
say "banana".indices("ana", :overlap); # OUTPUT: «(1 3)␤»
say "banana".indices("ana", 2); # OUTPUT: «(3)␤»
=head2 method match
method match($pat, :continue(:$c), :pos(:$p), :global(:$g), :overlap(:$ov), :exhaustive(:$ex), :st(:$nd), :rd(:$th), :$nth, :$x --> Match)
Performs a match of the string against C<$pat> and returns a
L<Match|/type/Match> object if there is a successful match; it returns C<(Any)>
otherwise. Matches are stored in the L<default match variable
C<$/>|/language/variables#index-entry-match_variable>. If C<$pat> is not a
L<Regex|/type/Regex> object, match will coerce the argument to a Str and then
perform a literal match against C<$pat>.
A number of optional named parameters can be specified, which alter how the match is performed.
=item :continue
The C<:continue> adverb takes as an argument the position where the regex should
start to search. If no position is specified for C<:c> it will default to 0
unless C<$/> is set, in which case it defaults to C<$/.to>.
=item :pos
Takes a position as an argument. Fails if regex cannot be matched from that position, unlike C<:continue>.
=item :global
Instead of searching for just one match and returning a C<Match> object, search
for every non-overlapping match and return them in a C<List>.
=item :overlap
Finds all matches including overlapping matches, but only returns one match from
each starting position.
=item :exhaustive
Finds all possible matches of a regex, including overlapping matches and matches
that start at the same position.
=item :st, :nd, :rd, :nth
Returns the nth match in the string. The argument can be a L<Numeric|/type/Numeric> or
an L<Iterable|/type/Iterable> producing monotonically increasing numbers (that is, the
next produced number must be larger than the previous one). The L<Iterable|/type/Iterable>
will be lazily L<reified|/language/glossary#index-entry-Reify> and if
non-monotonic sequence is encountered an exception will be thrown.
If L<Iterable|/type/Iterable> argument is provided the return value and C<$/> variable
will be set to a possibly-empty L<List|/type/List> of L<Match|/type/Match> objects.
=item :x
Takes as an argument the number of matches to return, stopping once the
specified number of matches has been reached. The value must be a L<Numeric|/type/Numeric> or
a L<Range|/type/Range>; other values will cause C<.match> to return a L<Failure|/type/Failure> containing
C<X::Str::Match::x> exception.
Examples:
=begin code
say "properly".match('perl'); # OUTPUT: «「perl」␤»
say "properly".match(/p.../); # OUTPUT: «「prop」␤»
say "1 2 3".match([1,2,3]); # OUTPUT: «「1 2 3」␤»
say "a1xa2".match(/a./, :continue(2)); # OUTPUT: «「a2」␤»
say "abracadabra".match(/ a .* a /, :exhaustive);
# OUTPUT: «(「abracadabra」 「abracada」 「abraca」 「abra」 「acadabra」 「acada」 「aca」 「adabra」 「ada」 「abra」)␤»
say 'several words here'.match(/\w+/,:global); # OUTPUT: «(「several」 「words」 「here」)␤»
say 'abcdef'.match(/.*/, :pos(2)); # OUTPUT: «「cdef」␤»
say "foo[bar][baz]".match(/../, :1st); # OUTPUT: «「fo」␤»
say "foo[bar][baz]".match(/../, :2nd); # OUTPUT: «「o[」␤»
say "foo[bar][baz]".match(/../, :3rd); # OUTPUT: «「ba」␤»
say "foo[bar][baz]".match(/../, :4th); # OUTPUT: «「r]」␤»
say "foo[bar][baz]bada".match('ba', :x(2)); # OUTPUT: «(「ba」 「ba」)␤»
=end code
=head2 method Numeric
Defined as:
method Numeric(Str:D: --> Numeric:D)
Coerces the string to L<Numeric|/type/Numeric> using semantics equivalent to L<val|/routine/val> routine.
L<Fails|/routine/fail> with C<X::Str::Numeric> if the coercion to a number cannot be done.
Only Unicode characters with property C<Nd>, as well as leading and trailing
whitespace are allowed, with the special case of the empty string being coerced
to C<0>. Synthetic codepoints (e.g. C<"7\x[308]">) are forbidden.
While C<Nl> and C<No> characters can be used as numeric literals
in the language, their conversion via C<Str.Numeric> will fail, by design.
See L«unival|/routine/unival» if you need to coerce such characters to
C<Numeric>.
=head2 method Int
Defined as:
method Int(Str:D: --> Int:D)
Coerces the string to L<Int|/type/Int>, using the same rules as
L«C<Str.Numeric>|/type/Str#method_Numeric».
=head2 method Rat
Defined as:
method Rat(Str:D: --> Rational:D)
Coerces the string to a L<Rat|/type/Rat> object, using the same rules as
L«C<Str.Numeric>|/type/Str#method_Numeric». If the denominator is larger
than 64-bits is it still kept and no degradation to L<Num|/type/Num> occurs.
=head2 method Bool
Defined as:
method Bool(Str:D: --> Bool:D)
Returns C<False> if the string is empty, C<True> otherwise.
=head2 routine parse-base
multi sub parse-base(Str:D $num, Int:D $radix --> Numeric)
multi method parse-base(Str:D $num: Int:D $radix --> Numeric)
Performs the reverse of L«C<base>|/routine/base» by converting a string
with a base-C<$radix> number to its L«C<Numeric>|/type/Numeric»
equivalent. Will L«C<fail>|/routine/fail» if radix is not in range C<2..36>
or if the string being parsed contains characters that are not valid
for the specified base.
1337.base(32).parse-base(32).say; # OUTPUT: «1337␤»
'Perl6'.parse-base(30).say; # OUTPUT: «20652936␤»
'FF.DD'.parse-base(16).say; # OUTPUT: «255.863281␤»
See also: L«:16<FF> syntax for number literals|/syntax/Number%20literals»
=head2 routine parse-names
sub parse-names(Str:D $names --> Str:D)
method parse-names(Str:D $names: --> Str:D)
B<DEPRECATED>. Use L<uniparse|/routine/uniparse> instead. Existed in Rakudo implementation as a proof of viability
implementation before being renamed and will be removed when 6.e language is released.
=head2 routine uniparse
sub uniparse(Str:D $names --> Str:D)
method uniparse(Str:D $names: --> Str:D)
Takes string with comma-separated Unicode names of characters and
returns a string composed of those characters. Will L«C<fail>|/routine/fail»
if any of the characters' names are empty or not recognized. Whitespace
around character names is ignored.
say "I {uniparse 'TWO HEARTS'} Perl"; # OUTPUT: «I 💕 Perl␤»
'TWO HEARTS, BUTTERFLY'.uniparse.say; # OUTPUT: «💕🦋␤»
Note that unlike C<\c[...]> construct available in string interpolation,
C<uniparse> does not accept decimal numerical values. Use L<chr|/routine/chr> routine to
convert those:
say "\c[1337]"; # OUTPUT: «Թ␤»
say '1337'.chr; # OUTPUT: «Թ␤»
I<Note:> before being standardized in 2017.12, this routine was known
under its working name of L<parse-names>. This denomination will be removed in the 6.e version.
=head2 routine split
=for code :method
multi sub split( Str:D $delimiter, Str:D $input, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p)
=for code :method
multi sub split(Regex:D $delimiter, Str:D $input, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p)
=for code :method
multi sub split(List:D $delimiters, Str:D $input, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p)
=for code :method
multi method split(Str:D: Str:D $delimiter, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p)
=for code :method
multi method split(Str:D: Regex:D $delimiter, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p)
=for code :method
multi method split(Str:D: List:D $delimiters, $limit = Inf,
:$skip-empty, :$v, :$k, :$kv, :$p)
Splits a string up into pieces based on delimiters found in the string.
If C<DELIMITER> is a string, it is searched for literally and not treated
as a regex. If C<DELIMITER> is the empty string, it effectively returns all
characters of the string separately (plus an empty string at the begin and at
the end). If C<PATTERN> is a regular expression, then that will be used
to split up the string. If C<DELIMITERS> is a list, then all of its elements
will be considered a delimiter (either a string or a regular expression) to
split the string on.
The optional C<LIMIT> indicates in how many segments the string should be
split, if possible. It defaults to B<Inf> (or B<*>, whichever way you look at
it), which means "as many as possible". Note that specifying negative limits
will not produce any meaningful results.
A number of optional named parameters can be specified, which alter the
result being returned. The C<:v>, C<:k>, C<:kv> and C<:p> named parameters
all perform a special action with regards to the delimiter found.
=item :skip-empty
If specified, do not return empty strings before or after a delimiter.
=item :v
Also return the delimiter. If the delimiter was a regular expression, then
this will be the associated C<Match> object. Since this stringifies as the
delimiter string found, you can always assume it is the delimiter string if
you're not interested in further information about that particular match.
=item :k
Also return the B<index> of the delimiter. Only makes sense if a list of
delimiters was specified: in all other cases, this will be B<0>.
=item :kv
Also return both the B<index> of the delimiter, as well as the delimiter.
=item :p
Also return the B<index> of the delimiter and the delimiter as a C<Pair>.
Examples:
say split(";", "a;b;c").perl; # OUTPUT: «("a", "b", "c").Seq␤»
say split(";", "a;b;c", :v).perl; # OUTPUT: «("a", ";", "b", ";", "c").Seq␤»
say split(";", "a;b;c", 2).perl; # OUTPUT: «("a", "b;c").Seq␤»
say split(";", "a;b;c", 2, :v).perl; # OUTPUT: «("a", ";", "b;c").Seq␤»
say split(";", "a;b;c,d").perl; # OUTPUT: «("a", "b", "c,d").Seq␤»
say split(/\;/, "a;b;c,d").perl; # OUTPUT: «("a", "b", "c,d").Seq␤»
say split(<; ,>, "a;b;c,d").perl; # OUTPUT: «("a", "b", "c", "d").Seq␤»
say split(/<[;,]>/, "a;b;c,d").perl; # OUTPUT: «("a", "b", "c", "d").Seq␤»
say split(<; ,>, "a;b;c,d", :k).perl; # OUTPUT: «("a", 0, "b", 0, "c", 1, "d").Seq␤»
say split(<; ,>, "a;b;c,d", :kv).perl; # OUTPUT: «("a", 0, ";", "b", 0, ";", "c", 1, ",", "d").Seq␤»
say "".split("x").perl; # OUTPUT: «("",).Seq␤»
say "".split("x", :skip-empty).perl; # OUTPUT: «().Seq␤»
say "abcde".split("").perl; # OUTPUT: «("", "a", "b", "c", "d", "e", "").Seq␤»
say "abcde".split("",:skip-empty).perl; # OUTPUT: «("a", "b", "c", "d", "e").Seq␤»
=head2 routine comb
multi sub comb(Str:D $matcher, Str:D $input, $limit = Inf)
multi sub comb(Regex:D $matcher, Str:D $input, $limit = Inf, Bool :$match)
multi sub comb(Int:D $size, Str:D $input, $limit = Inf)
multi method comb(Str:D $input:)
multi method comb(Str:D $input: Str:D $matcher, $limit = Inf)
multi method comb(Str:D $input: Regex:D $matcher, $limit = Inf, Bool :$match)
multi method comb(Str:D $input: Int:D $size, $limit = Inf)
Searches for C<$matcher> in C<$input> and returns a L<Seq|/type/Seq> of non-overlapping
matches limited to at most C<$limit> matches. If C<$matcher> is a Regex,
each C<Match> object is converted to a C<Str>, unless C<$match> is set.
If no matcher is supplied, a Seq of characters in the string is returned,
as if the matcher was C<rx/./>.
Examples:
say "abc".comb.perl; # OUTPUT: «("a", "b", "c").Seq␤»
say 'abcdefghijk'.comb(3).perl; # OUTPUT: «("abc", "def", "ghi", "jk").Seq␤»
say 'abcdefghijk'.comb(3, 2).perl; # OUTPUT: «("abc", "def").Seq␤»
say comb(/\w/, "a;b;c").perl; # OUTPUT: «("a", "b", "c").Seq␤»
say comb(/\N/, "a;b;c").perl; # OUTPUT: «("a", ";", "b", ";", "c").Seq␤»
say comb(/\w/, "a;b;c", 2).perl; # OUTPUT: «("a", "b").Seq␤»
say comb(/\w\;\w/, "a;b;c", 2).perl; # OUTPUT: «("a;b",).Seq␤»
say comb(/.<(.)>/, "<>[]()").perl; # OUTPUT: «(">", "]", ")").Seq␤»
If the matcher is an integer value, C<comb> behaves as if the matcher
was C<rx/ . ** {1..$matcher} />, but which is optimized to be much faster.
Note that a Regex matcher may control which portion of the matched text
is returned by using features which explicitly set the top-level capture.
=head2 routine lines
Defined as:
multi method lines(Str:D: $limit)
multi method lines(Str:D:)
Returns a list of lines (without trailing newline characters), i.e. the
same as a call to C<$input.comb( / ^^ \N* /, $limit )> would.
Examples:
say lines("a\nb").perl; # OUTPUT: «("a", "b").Seq␤»
say lines("a\nb").elems; # OUTPUT: «2␤»
say "a\nb".lines.elems; # OUTPUT: «2␤»
say "a\n".lines.elems; # OUTPUT: «1␤»
You can limit the
number of lines returned by setting the C<$limit> variable to a non-zero,
non-C<Infinity> value:
say <not there yet>.join("\n").lines( 2 ); # OUTPUT: «(not there)␤»
B«DEPRECATED as of C<6.d> language», the C<:count> argument was used
to return the total number of lines:
say <not there yet>.join("\n").lines( :count ); # OUTPUT: «3␤»
Use L<elems|/routine/elems> call on the returned L<Seq|/type/Seq> instead:
say <not there yet>.join("\n").lines.elems; # OUTPUT: «3␤»
=head2 routine words
multi method words(Str:D: $limit)
multi method words(Str:D:)
Returns a list of non-whitespace bits, i.e. the same as a call to
C<$input.comb( / \S+ /, $limit )> would.
Examples:
say "a\nb\n".words.perl; # OUTPUT: «("a", "b").Seq␤»
say "hello world".words.perl; # OUTPUT: «("hello", "world").Seq␤»
say "foo:bar".words.perl; # OUTPUT: «("foo:bar",).Seq␤»
say "foo:bar\tbaz".words.perl; # OUTPUT: «("foo:bar", "baz").Seq␤»
It can also be used as a subroutine, turning the first argument into the
invocant. C<$limit> is optional, but if it is provided (and not equal to
C<Inf>), it will return only the first C<$limit> words.
say words("I will be very brief here", 2); # OUTPUT: «(I will)␤»
=head2 routine flip
multi sub flip(Str:D --> Str:D)
multi method flip(Str:D: --> Str:D)
Returns the string reversed character by character.
Examples:
"Perl".flip; # RESULT: «lreP»
"ABBA".flip; # RESULT: «ABBA»
=head2 method starts-with
multi method starts-with(Str:D: Str(Cool) $needle --> Bool:D)
Returns C<True> if the invocant is identical to or starts with C<$needle>.
say "Hello, World".starts-with("Hello"); # OUTPUT: «True␤»
say "https://perl6.org/".starts-with('ftp'); # OUTPUT: «False␤»
=head2 method ends-with
multi method ends-with(Str:D: Str(Cool) $needle --> Bool:D)
Returns C<True> if the invocant is identical to or ends with C<$needle>.
say "Hello, World".ends-with('Hello'); # OUTPUT: «False␤»
say "Hello, World".ends-with('ld'); # OUTPUT: «True␤»
=head2 method subst
multi method subst(Str:D: $matcher, $replacement, *%opts)
Returns the invocant string where C<$matcher> is replaced by C<$replacement>
(or the original string, if no match was found).
There is an in-place syntactic variant of C<subst> spelled
L«C<s/matcher/replacement/>|/syntax/s$SOLIDUS%20$SOLIDUS%20$SOLIDUS» and with
adverb following the C<s> or inside the matcher.
C<$matcher> can be a L<Regex|/type/Regex>, or a literal C<Str>. Non-Str matcher arguments of
type L<Cool|/type/Cool> are coerced to C<Str> for literal matching. If a L<Regex|/type/Regex>
C<$matcher> is used, the L«C<$/> special variable|/syntax/$$SOLIDUS» will be set
to C<Nil> (if no matches occurred), a L<Match|/type/Match> object, or a L<List|/type/List> of L<Match|/type/Match>
objects (if multi-match options like C<:g> are used).
=head3 Literal replacement substitution
my $some-string = "Some foo";
my $another-string = $some-string.subst(/foo/, "string"); # gives 'Some string'
$some-string.=subst(/foo/, "string"); # in-place substitution. $some-string is now 'Some string'
=head3 Callable
The replacement can be a L<Callable|/type/Callable> in which the current L<Match|/type/Match> object will
be placed in the C<$/> variable, as well as the C<$_> topic variable. Using a
L<Callable|/type/Callable> as replacement is how you can refer to any of the captures created
in the regex:
# Using capture from $/ variable (the $0 is the first positional capture)
say 'abc123defg'.subst(/(\d+)/, { " before $0 after " });
# OUTPUT: «abc before 123 after defg␤»
# Using capture from $/ variable (the $<foo> is a named capture)
say 'abc123defg'.subst(/$<foo>=\d+/, { " before $<foo> after " });
# OUTPUT: «abc before 123 after defg␤»
# Using WhateverCode to operate on the Match given in $_:
say 'abc123defg'.subst(/(\d+)/, "[ " ~ *.flip ~ " ]");
# OUTPUT: «abc[ 321 ]defg␤»
# Using a Callable to generate substitution without involving current Match:
my $i = 41;
my $str = "The answer is secret.";
say $str.subst(/secret/, {++$i}); # The answer to everything
# OUTPUT: «The answer is 42.␤»
=head3 Adverbs
The following adverbs are supported
=begin table
short | long | meaning
============================+=============+================
:g | :global | tries to match as often as possible
:nth(Int|Callable|Whatever) | | only substitute the nth match; aliases: :st, :nd, :rd, and :th
:ss | :samespace | preserves whitespace on substitution
:ii | :samecase | preserves case on substitution
:mm | :samemark | preserves character marks (e.g. 'ü' replaced with 'o' will result in 'ö')
:x(Int|Range|Whatever) | | substitute exactly $x matches
=end table
Note that only in the C<s///> form C<:ii> implies C<:i> and C<:ss> implies
C<:s>. In the method form, the C<:s> and C<:i> modifiers must be added to the
regex, not the C<subst> method call.
=head3 More Examples
Here are other examples of usage:
my $str = "Hey foo foo foo";
$str.subst(/foo/, "bar", :g); # global substitution - returns Hey bar bar bar
$str.subst(/foo/, "no subst", :x(0)); # targeted substitution. Number of times to substitute. Returns back unmodified.
$str.subst(/foo/, "bar", :x(1)); #replace just the first occurrence.
$str.subst(/foo/, "bar", :nth(3)); # replace nth match alone. Replaces the third foo. Returns Hey foo foo bar
The C<:nth> adverb has readable English-looking variants:
say 'ooooo'.subst: 'o', 'x', :1st; # OUTPUT: «xoooo␤»
say 'ooooo'.subst: 'o', 'x', :2nd; # OUTPUT: «oxooo␤»
say 'ooooo'.subst: 'o', 'x', :3rd; # OUTPUT: «ooxoo␤»
say 'ooooo'.subst: 'o', 'x', :4th; # OUTPUT: «oooxo␤»
=head2 method subst-mutate
B<NOTE:> I<<< C<.subst-mutate> is deprecated in the 6.d version, and will be
removed in future ones. You can use L<subst|/routine/subst> with L«C<.=> method
call assignment operator|/routine/.=» or L«C<s///> substitution
operator|/syntax/s$SOLIDUS$SOLIDUS$SOLIDUS» instead. >>>
Where L<subst|/routine/subst> returns the modified string and leaves the
original unchanged, it is possible to mutate the original string by using
C<subst-mutate>. If the match is successful, the method returns a
L<Match|/type/Match> object representing the successful match, otherwise returns
L<Nil|/type/Nil>. If C<:nth> (or one of its aliases) with
L<Iterable|/type/Iterable> value, C<:g>, C<:global>, or C<:x> arguments are
used, returns a L<List|/type/List> of L<Match|/type/Match> objects, or an empty
L<List|/type/List> if no matches occurred.
my $some-string = "Some foo";
my $match = $some-string.subst-mutate(/foo/, "string");
say $some-string; # OUTPUT: «Some string␤»
say $match; # OUTPUT: «「foo」␤»
$some-string.subst-mutate(/<[oe]>/, '', :g); # remove every o and e, notice the :g named argument from .subst
If a L<Regex|/type/Regex> C<$matcher> is used, the
L«C<$/> special variable|/syntax/$$SOLIDUS» will be set to C<Nil> (if no
matches occurred), a L<Match|/type/Match> object, or a L<List|/type/List> of L<Match|/type/Match> objects (if
multi-match options like C<:g> are used).
=head2 routine substr
multi sub substr(Str:D $s, $from, $chars? --> Str:D)
multi sub substr(Str:D $s, Range $from-to --> Str:D)
multi method substr(Str:D $s: $from, $chars? --> Str:D)
multi method substr(Str:D $s: Range $from-to --> Str:D)
Returns a substring of the original string, between the indices specified by
C<$from-to>'s endpoints (coerced to L<Int|/type/Int>) or from index C<$from> and of
length C<$chars>.
Both C<$from> and C<$chars> can be specified as L<Callable|/type/Callable>, which will be
invoked with the L<length|/routine/chars> of the original string and the
returned value will be used as the value for the argument. If C<$from> or
C<$chars> are not L<Callable|/type/Callable>, they'll be coerced to L<Int|/type/Int>.
If C<$chars> is omitted or is larger than the available characters,
the string from C<$from> until the end of the string is returned.
If C<$from-to>'s starting index or C<$from> is less than
zero, C<X::OutOfRange> exception is thrown. The C<$from-to>'s ending index is
permitted to extend past the end of string, in which case it will be equivalent
to the index of the last character.
say substr("Long string", 3..6); # RESULT: «g st␤»
say substr("Long string", 6, 3); # RESULT: «tri␤»
say substr("Long string", 6); # RESULT: «tring␤»
say substr("Long string", 6, *-1); # RESULT: «trin␤»
say substr("Long string", *-3, *-1); # RESULT: «in␤»
=head2 method substr-eq
multi method substr-eq(Str:D: Str(Cool) $test-string, Int(Cool) $from --> Bool)
multi method substr-eq(Cool:D: Str(Cool) $test-string, Int(Cool) $from --> Bool)
Returns C<True> if the C<$test-string> exactly matches the C<String> object,
starting from the given initial index C<$from>. For example, beginning with
the string C<"foobar">, the substring C<"bar"> will match from index 3:
my $string = "foobar";
say $string.substr-eq("bar", 3); # OUTPUT: «True␤»
However, the substring C<"barz"> starting from index 3 won't match even
though the first three letters of the substring do match:
my $string = "foobar";
say $string.substr-eq("barz", 3); # OUTPUT: «False␤»
Naturally, to match the entire string, one merely matches from index 0:
my $string = "foobar";
say $string.substr-eq("foobar", 0); # OUTPUT: «True␤»
Since this method is inherited from the C<Cool> type, it also works on
integers. Thus the integer C<42> will match the value C<342> starting from
index 1:
my $integer = 342;
say $integer.substr-eq(42, 1); # OUTPUT: «True␤»
As expected, one can match the entire value by starting at index 0:
my $integer = 342;
say $integer.substr-eq(342, 0); # OUTPUT: «True␤»
Also using a different value or an incorrect starting index won't match:
my $integer = 342;
say $integer.substr-eq(42, 3); # OUTPUT: «False␤»
say $integer.substr-eq(7342, 0); # OUTPUT: «False␤»
=head2 method substr-rw
method substr-rw($from, $length = *)
A version of C<substr> that returns a L<Proxy|/type/Proxy> functioning as a
writable reference to a part of a string variable. Its first argument, C<$from>
specifies the index in the string from which a substitution should occur, and
its last argument, C<$length> specifies how many characters are to be replaced.
If not specified, C<$length> defaults to the length of the string.
For example, in its method form, if one wants to take the string C<"abc">
and replace the second character (at index 1) with the letter C<"z">, then
one does this:
my $string = "abc";
$string.substr-rw(1, 1) = "z";
$string.say; # OUTPUT: «azc␤»
Note that new characters can be inserted as well:
my $string = 'azc';
$string.substr-rw(2, 0) = "-Zorro-"; # insert new characters BEFORE the character at index 2
$string.say; # OUTPUT: «az-Zorro-c␤»
C<substr-rw> also has a function form, so the above examples can also be
written like so:
my $string = "abc";
substr-rw($string, 1, 1) = "z";
$string.say; # OUTPUT: «azc␤»
substr-rw($string, 2, 0) = "-Zorro-";
$string.say; # OUTPUT: «az-Zorro-c␤»
It is also possible to alias the writable reference returned by C<substr-rw>
for repeated operations:
my $string = "A character in the 'Flintstones' is: barney";
$string ~~ /(barney)/;
my $ref := substr-rw($string, $0.from, $0.to-$0.from);
$string.say;
# OUTPUT: «A character in the 'Flintstones' is: barney␤»
$ref = "fred";
$string.say;
# OUTPUT: «A character in the 'Flintstones' is: fred␤»
$ref = "wilma";
$string.say;
# OUTPUT: «A character in the 'Flintstones' is: wilma␤»
=head2 routine samemark
multi sub samemark(Str:D $string, Str:D $pattern --> Str:D)
method samemark(Str:D: Str:D $pattern --> Str:D)
Returns a copy of C<$string> with the mark/accent information for each
character changed such that it matches the mark/accent of the corresponding
character in C<$pattern>. If C<$string> is longer than C<$pattern>, the
remaining characters in C<$string> receive the same mark/accent as the last
character in C<$pattern>. If C<$pattern> is empty no changes will be made.
Examples:
say 'åäö'.samemark('aäo'); # OUTPUT: «aäo␤»
say 'åäö'.samemark('a'); # OUTPUT: «aao␤»
say samemark('Pêrl', 'a'); # OUTPUT: «Perl␤»
say samemark('aöä', ''); # OUTPUT: «aöä␤»
=head2 method succ
method succ(Str:D --> Str:D)
Returns the string incremented by one.
String increment is "magical". It searches for the last alphanumeric
sequence that is not preceded by a dot, and increments it.
'12.34'.succ; # RESULT: «13.34»
'img001.png'.succ; # RESULT: «img002.png»
The actual increment step works by mapping the last alphanumeric
character to a character range it belongs to, and choosing the next
character in that range, carrying to the previous letter on overflow.
'aa'.succ; # RESULT: «ab»
'az'.succ; # RESULT: «ba»
'109'.succ; # RESULT: «110»
'α'.succ; # RESULT: «β»
'a9'.succ; # RESULT: «b0»
String increment is Unicode-aware, and generally works for scripts where a
character can be uniquely classified as belonging to one range of characters.
=head2 method pred
method pred(Str:D: --> Str:D)
Returns the string decremented by one.
String decrementing is "magical" just like string increment (see
L<succ|/routine/succ>). It fails on underflow
=for code
'b0'.pred; # RESULT: «a9»
'a0'.pred; # Failure
'img002.png'.pred; # RESULT: «img001.png»
=head2 routine ord
multi sub ord(Str:D --> Int:D)
multi method ord(Str:D: --> Int:D)
Returns the codepoint number of the base characters of the first grapheme
in the string.
Example:
ord("A"); # 65
"«".ord; # 171
=head2 method ords
multi method ords(Str:D: --> Seq)
Returns a list of Unicode codepoint numbers that describe the codepoints making up the string.
Example:
"aå«".ords; # (97 229 171)
Strings are represented as graphemes. If a character in the string is represented by multiple
codepoints, then all of those codepoints will appear in the result of C<ords>. Therefore, the
number of elements in the result may not always be equal to L<chars|/routine/chars>, but will be equal to
L<codes|/routine/codes>; L<codes|/routine/codes> computes the codepoints in a different way, so the result might be faster.
The codepoints returned will represent the string in L<NFC|/type/NFC>. See the L<NFD|/type/NFD>, L<NFKC|/type/NFKC>, and
L<NFKD|/type/NFKD> methods if other forms are required.
=head2 method trans
multi method trans(Str:D: Pair:D \what, *%n --> Str)
multi method trans(Str:D: *@changes, :complement(:$c), :squash(:$s), :delete(:$d) --> Str)
Replaces one or many characters with one or many characters. Ranges are
supported, both for keys and values. Regexes work as keys. In case a list of
keys and values is used, substrings can be replaced as well. When called with
C<:complement> anything but the matched value or range is replaced with a
single value; with C<:delete> the matched characters without corresponding
replacement are removed. Combining C<:complement> and C<:delete> will remove
anything but the matched values, I<unless replacement characters have been
specified>, in which case, C<:delete> would be ignored. The adverb C<:squash>
will reduce repeated matched characters to a single character.
Example:
my $str = 'say $x<b> && $y<a>';
$str.=trans( '<' => '«' );
$str.=trans( '<' => '«', '>' => '»' );
$str.=trans( [ '<' , '>' , '&' ] =>
[ '&lt;', '&gt;', '&amp;' ]);
$str.=trans( ['a'..'y'] => ['A'..'z'] );
"abcdefghij".trans(/<[aeiou]> \w/ => ''); # RESULT: «cdgh»
"a123b123c".trans(['a'..'z'] => 'x', :complement); # RESULT: «axxxbxxxc»
"a123b123c".trans('23' => '', :delete); # RESULT: «a1b1c»
"aaa1123bb123c".trans('a'..'z' => 'A'..'Z', :squash); # RESULT: «A1123B123C»
"aaa1123bb123c".trans('a'..'z' => 'x', :complement, :squash); # RESULT: «aaaxbbxc»
Please note that the behavior of the two versions of the multi method is
slightly different. The first form will transpose only one character if the
origin is also one character:
=begin code
say "abcd".trans( "a" => "zz" ); # OUTPUT: «zbcd␤»
say "abcd".trans( "ba" => "yz" ); # OUTPUT: «zycd␤»
=end code
In the second case, behavior is as expected, since the origin is more than one
char long. However, if the C<Pair> in the multi method does not have a C<Str> as
an origin or target, it is handled to the second multi method, and behavior
changes:
say "abcd".trans: ["a"] => ["zz"]; # OUTPUT: «zzbcd␤»
In this case, neither origin nor target in the C<Pair> are C<Str>; the method
with the C<Pair> signature then calls the second, making this call above
equivalent to C«"abcd".trans: ["a"] => ["zz"], » (with the comma behind, making
it a C<Positional>, instead of a C<Pair>), resulting in the behavior shown as
output.
=head2 method indent
multi method indent(Int $steps where { $_ == 0 } )
multi method indent(Int $steps where { $_ > 0 } )
multi method indent($steps where { .isa(Whatever) || .isa(Int) && $_ < 0 } )
Indents each line of the string by C<$steps>. If C<$steps> is negative,
it outdents instead. If C<$steps> is L<C<*>|*>, then the string is
outdented to the margin:
" indented by 2 spaces\n indented even more".indent(*)
eq "indented by 2 spaces\n indented even more"
=head2 method trim
method trim(Str:D: --> Str)
Remove leading and trailing whitespace. It can be used both as a method
on strings and as a function. When used as a method it will return
the trimmed string. In order to do in-place trimming, one needs to write
C<.=trim>
my $line = ' hello world ';
say '<' ~ $line.trim ~ '>'; # OUTPUT: «<hello world>␤»
say '<' ~ trim($line) ~ '>'; # OUTPUT: «<hello world>␤»
$line.trim;
say '<' ~ $line ~ '>'; # OUTPUT: «< hello world >␤»
$line.=trim;
say '<' ~ $line ~ '>'; # OUTPUT: «<hello world>␤»
See also L<trim-trailing> and L<trim-leading>
=head2 method trim-trailing
method trim-trailing(Str:D: --> Str)
Removes the whitespace characters from the end of a string. See also L<trim|/routine/trim>.
=head2 method trim-leading
method trim-leading(Str:D: --> Str)
Removes the whitespace characters from the beginning of a string. See also L<trim|/routine/trim>.
=head2 method NFC
method NFC(Str:D: --> NFC:D)
Returns a codepoint string in L<NFC|/type/NFC> format (Unicode
Normalization Form C / Composed).
=head2 method NFD
method NFD(Str:D: --> NFD:D)
Returns a codepoint string in L<NFD|/type/NFD> format (Unicode
Normalization Form D / Decomposed).
=head2 method NFKC
method NFKC(Str:D: --> NFKC:D)
Returns a codepoint string in L<NFKC|/type/NFKC> format (Unicode Normalization
Form KC / Compatibility Composed).
=head2 method NFKD
method NFKD(Str:D: --> NFKD:D)
Returns a codepoint string in L<NFKD|/type/NFKD> format (Unicode Normalization
Form KD / Compatibility Decomposed).
=head2 method ACCEPTS
multi method ACCEPTS(Str:D: $other)
Returns C<True> if the string is L<the same as|eq> C<$other>.
=head2 method Capture
Defined as:
method Capture()
Throws C<X::Cannot::Capture>.
=head2 routine val
multi sub val(Str:D $MAYBEVAL, :$val-or-fail)
Given a C<Str> that may be parsable as a numeric value, it will
attempt to construct the appropriate L<allomorph|/language/glossary#Allomorph>,
returning one of L<IntStr|/type/IntStr>, L<NumStr|/type/NumStr>, L<RatStr|/type/RatStr>
or L<ComplexStr|/type/ComplexStr> or a plain C<Str> if a numeric value cannot
be parsed. If the C<:val-or-fail> adverb is provided it will return an
L<X::Str::Numeric|/type/X::Str::Numeric> rather than the original string if it
cannot parse the string as a number.
say val("42").^name; # OUTPUT: «IntStr␤»
say val("42e0").^name; # OUTPUT: «NumStr␤»
say val("42.0").^name; # OUTPUT: «RatStr␤»
say val("42+0i").^name; # OUTPUT: «ComplexStr␤»
While characters belonging to the Unicode categories C<Nl> (number letters)
and C<No> (other numbers) can be used as numeric literals in the language,
they will not be converted to a number by C<val>, by design.
See L«unival|/routine/unival» if you need to convert such characters to
C<Numeric>.
=end pod
# vim: expandtab softtabstop=4 shiftwidth=4 ft=perl6
You can’t perform that action at this time.