-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Closed
Labels
Milestone

Description
This proposal is for use with #43651. I propose to define a new package, byteseq
, that will provide simple generic
functions to manipulate UTF-8 encoded strings and byte slices.
Goals of this proposal:
- Provide a safe generic API without
string
<->[]byte
conversion overhead. - Reduce code duplication between
strings
andbytes
packages. - Enforce immutability in API using type constraints. The
~string | ~[]byte
constraint denotes that function should
not mutate their arguments.
API description:
// Byteseq represents a generic UTF-8 byte string.
type Byteseq interface {
~string | ~[]byte
}
// Compare returns an integer comparing two strings lexicographically.
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b.
func Compare[A, B Byteseq](a A, b B) int
// Contains reports whether subslice is within b.
func Contains[B, SubSlice Byteseq](b B, subslice SubSlice) bool
// ContainsAny reports whether any of the UTF-8-encoded code points in chars are within b.
func ContainsAny[B, Chars Byteseq](b B, chars Chars) bool
// Count counts the number of non-overlapping instances of sep in s.
// If sep is empty, Count returns 1 + the number of UTF-8-encoded code points in s.
func Count[S, Sep Byteseq](s S, sep Sep) int
// Equal reports whether a and b are the same length and contain the same bytes
func Equal[A, B Byteseq](a A, b B) bool
// EqualFold reports whether s and t, interpreted as UTF-8 strings, are equal under Unicode case-folding,
// which is a more general form of case-insensitivity.
func EqualFold[S, T Byteseq](s S, t T) bool
// Fields splits the string s around each instance of one or more consecutive white space
// characters, as defined by unicode.IsSpace, returning a slice of substrings of s or an
// empty slice if s contains only white space.
func Fields[S Byteseq](s S) []S
// FieldsFunc splits the string s at each run of Unicode code points c satisfying f(c)
// and returns an array of slices of s. If all code points in s satisfy f(c) or the
// string is empty, an empty slice is returned.
//
// FieldsFunc makes no guarantees about the order in which it calls f(c)
// and assumes that f always returns the same value for a given c.
func FieldsFunc[S Byteseq](s S, f func (rune) bool) []S
// HasPrefix tests whether the string s begins with prefix.
func HasPrefix[S, Prefix Byteseq](s S, prefix Prefix) bool
// HasSuffix tests whether the string s ends with suffix.
func HasSuffix[S, Suffix Byteseq](s S, suffix Suffix) bool
// Index returns the index of the first instance of substr in s, or -1 if substr is not present in s.
func Index[S, Substr Byteseq](s S, substr Substr) int
// IndexAny returns the index of the first instance of any Unicode code point
// from chars in s, or -1 if no Unicode code point from chars is present in s.
func IndexAny[S, Chars Byteseq](s S, chars Chars) int
// IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.
func IndexByte[S Byteseq](s S, c byte) int
// IndexFunc returns the index into s of the first Unicode
// code point satisfying f(c), or -1 if none do.
func IndexFunc[S Byteseq](s S, f func (rune) bool) int
// IndexRune returns the index of the first instance of the Unicode code point
// r, or -1 if rune is not present in s.
// If r is utf8.RuneError, it returns the first instance of any
// invalid UTF-8 byte sequence.
func IndexRune[S Byteseq](s S, r rune) int
// LastIndex returns the index of the last instance of substr in s, or -1 if substr is not present in s.
func LastIndex[S, Substr Byteseq](s S, substr Substr) int
// LastIndexAny returns the index of the last instance of any Unicode code
// point from chars in s, or -1 if no Unicode code point from chars is
// present in s.
func LastIndexAny[S, Chars Byteseq](s S, chars Chars) int
// LastIndexByte returns the index of the last instance of c in s, or -1 if c is not present in s.
func LastIndexByte[S Byteseq](s S, c byte) int
// LastIndexFunc returns the index into s of the last
// Unicode code point satisfying f(c), or -1 if none do.
func LastIndexFunc[S Byteseq](s S, f func (rune) bool) int
// Split slices s into all substrings separated by sep and returns a slice of
// the substrings between those separators.
//
// If s does not contain sep and sep is not empty, Split returns a
// slice of length 1 whose only element is s.
//
// If sep is empty, Split splits after each UTF-8 sequence. If both s
// and sep are empty, Split returns an empty slice.
//
// It is equivalent to SplitN with a count of -1.
func Split[S, Sep Byteseq](s S, sep Sep) []S
// SplitAfter slices s into all substrings after each instance of sep and
// returns a slice of those substrings.
//
// If s does not contain sep and sep is not empty, SplitAfter returns
// a slice of length 1 whose only element is s.
//
// If sep is empty, SplitAfter splits after each UTF-8 sequence. If
// both s and sep are empty, SplitAfter returns an empty slice.
//
// It is equivalent to SplitAfterN with a count of -1.
func SplitAfter[S, Sep Byteseq](s S, sep Sep) []S
// SplitAfterN slices s into substrings after each instance of sep and
// returns a slice of those substrings.
//
// The count determines the number of substrings to return:
// n > 0: at most n substrings; the last substring will be the unsplit remainder.
// n == 0: the result is nil (zero substrings)
// n < 0: all substrings
//
// Edge cases for s and sep (for example, empty strings) are handled
// as described in the documentation for SplitAfter.
func SplitAfterN[S, Sep Byteseq](s S, sep Sep, n int) []S
// SplitN slices s into substrings separated by sep and returns a slice of
// the substrings between those separators.
//
// The count determines the number of substrings to return:
// n > 0: at most n substrings; the last substring will be the unsplit remainder.
// n == 0: the result is nil (zero substrings)
// n < 0: all substrings
//
// Edge cases for s and sep (for example, empty strings) are handled
// as described in the documentation for Split.
func SplitN[S, Sep Byteseq](s S, sep Sep, n int) []S
// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim[S, Cutset Byteseq](s S, cutset Cutset) S
// TrimFunc returns a slice of the string s with all leading
// and trailing Unicode code points c satisfying f(c) removed.
func TrimFunc[S Byteseq](s S, f func (rune) bool) S
// TrimLeft returns a slice of the string s with all leading
// Unicode code points contained in cutset removed.
//
// To remove a prefix, use TrimPrefix instead.
func TrimLeft[S, Cutset Byteseq](s S, cutset Cutset) S
// TrimLeftFunc returns a slice of the string s with all leading
// Unicode code points c satisfying f(c) removed.
func TrimLeftFunc[S Byteseq](s S, f func (rune) bool) S
// TrimPrefix returns s without the provided leading prefix string.
// If s doesn't start with prefix, s is returned unchanged.
func TrimPrefix[S, Prefix Byteseq](s S, prefix Prefix) S
// TrimRight returns a slice of the string s, with all trailing
// Unicode code points contained in cutset removed.
//
// To remove a suffix, use TrimSuffix instead.
func TrimRight[S, Cutset Byteseq](s S, cutset Cutset) S
// TrimRightFunc returns a slice of the string s with all trailing
// Unicode code points c satisfying f(c) removed.
func TrimRightFunc[S Byteseq](s S, f func (rune) bool) S
// TrimSpace returns a slice of the string s, with all leading
// and trailing white space removed, as defined by Unicode.
func TrimSpace[S Byteseq](s S) S
// TrimSuffix returns s without the provided trailing suffix string.
// If s doesn't end with suffix, s is returned unchanged.
func TrimSuffix[S, Suffix Byteseq](s S, suffix Suffix) S
Notice that API proposal below does not include functions like strings.Map
or strings.Join
that build a new string.
The reason is avoiding dependency on strings.Builder
.
jhsx