Skip to content

proposal: byteseq: add a generic byte string manipulation package #48643

@ghost

Description

This proposal is for use with #43651. I propose to define a new package, byteseq, that will provide simple generic
functions to manipulate UTF-8 encoded strings and byte slices.

Goals of this proposal:

  • Provide a safe generic API without string <-> []byte conversion overhead.
  • Reduce code duplication between strings and bytes packages.
  • Enforce immutability in API using type constraints. The ~string | ~[]byte constraint denotes that function should
    not mutate their arguments.

API description:

// Byteseq represents a generic UTF-8 byte string.
type Byteseq interface {
     ~string | ~[]byte
}

// Compare returns an integer comparing two strings lexicographically. 
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b.
func Compare[A, B Byteseq](a A, b B) int

// Contains reports whether subslice is within b.
func Contains[B, SubSlice Byteseq](b B, subslice SubSlice) bool

// ContainsAny reports whether any of the UTF-8-encoded code points in chars are within b.
func ContainsAny[B, Chars Byteseq](b B, chars Chars) bool

// Count counts the number of non-overlapping instances of sep in s. 
// If sep is empty, Count returns 1 + the number of UTF-8-encoded code points in s.
func Count[S, Sep Byteseq](s S, sep Sep) int

// Equal reports whether a and b are the same length and contain the same bytes
func Equal[A, B Byteseq](a A, b B) bool

// EqualFold reports whether s and t, interpreted as UTF-8 strings, are equal under Unicode case-folding, 
// which is a more general form of case-insensitivity.
func EqualFold[S, T Byteseq](s S, t T) bool

// Fields splits the string s around each instance of one or more consecutive white space
// characters, as defined by unicode.IsSpace, returning a slice of substrings of s or an
// empty slice if s contains only white space.
func Fields[S Byteseq](s S) []S

// FieldsFunc splits the string s at each run of Unicode code points c satisfying f(c)
// and returns an array of slices of s. If all code points in s satisfy f(c) or the
// string is empty, an empty slice is returned.
// 
// FieldsFunc makes no guarantees about the order in which it calls f(c)
// and assumes that f always returns the same value for a given c.
func FieldsFunc[S Byteseq](s S, f func (rune) bool) []S

// HasPrefix tests whether the string s begins with prefix.
func HasPrefix[S, Prefix Byteseq](s S, prefix Prefix) bool

// HasSuffix tests whether the string s ends with suffix.
func HasSuffix[S, Suffix Byteseq](s S, suffix Suffix) bool

// Index returns the index of the first instance of substr in s, or -1 if substr is not present in s.
func Index[S, Substr Byteseq](s S, substr Substr) int

// IndexAny returns the index of the first instance of any Unicode code point
// from chars in s, or -1 if no Unicode code point from chars is present in s.
func IndexAny[S, Chars Byteseq](s S, chars Chars) int

// IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.
func IndexByte[S Byteseq](s S, c byte) int

// IndexFunc returns the index into s of the first Unicode
// code point satisfying f(c), or -1 if none do.
func IndexFunc[S Byteseq](s S, f func (rune) bool) int

// IndexRune returns the index of the first instance of the Unicode code point
// r, or -1 if rune is not present in s.
// If r is utf8.RuneError, it returns the first instance of any
// invalid UTF-8 byte sequence.
func IndexRune[S Byteseq](s S, r rune) int

// LastIndex returns the index of the last instance of substr in s, or -1 if substr is not present in s.
func LastIndex[S, Substr Byteseq](s S, substr Substr) int

// LastIndexAny returns the index of the last instance of any Unicode code
// point from chars in s, or -1 if no Unicode code point from chars is
// present in s.
func LastIndexAny[S, Chars Byteseq](s S, chars Chars) int

// LastIndexByte returns the index of the last instance of c in s, or -1 if c is not present in s.
func LastIndexByte[S Byteseq](s S, c byte) int

// LastIndexFunc returns the index into s of the last
// Unicode code point satisfying f(c), or -1 if none do.
func LastIndexFunc[S Byteseq](s S, f func (rune) bool) int

// Split slices s into all substrings separated by sep and returns a slice of
// the substrings between those separators.
// 
// If s does not contain sep and sep is not empty, Split returns a
// slice of length 1 whose only element is s.
// 
// If sep is empty, Split splits after each UTF-8 sequence. If both s
// and sep are empty, Split returns an empty slice.
// 
// It is equivalent to SplitN with a count of -1.
func Split[S, Sep Byteseq](s S, sep Sep) []S

// SplitAfter slices s into all substrings after each instance of sep and
// returns a slice of those substrings.
// 
// If s does not contain sep and sep is not empty, SplitAfter returns
// a slice of length 1 whose only element is s.
// 
// If sep is empty, SplitAfter splits after each UTF-8 sequence. If
// both s and sep are empty, SplitAfter returns an empty slice.
// 
// It is equivalent to SplitAfterN with a count of -1.
func SplitAfter[S, Sep Byteseq](s S, sep Sep) []S

// SplitAfterN slices s into substrings after each instance of sep and
// returns a slice of those substrings.
// 
// The count determines the number of substrings to return:
//   n > 0: at most n substrings; the last substring will be the unsplit remainder.
//   n == 0: the result is nil (zero substrings)
//   n < 0: all substrings
// 
// Edge cases for s and sep (for example, empty strings) are handled
// as described in the documentation for SplitAfter.
func SplitAfterN[S, Sep Byteseq](s S, sep Sep, n int) []S

// SplitN slices s into substrings separated by sep and returns a slice of
// the substrings between those separators.
// 
// The count determines the number of substrings to return:
//   n > 0: at most n substrings; the last substring will be the unsplit remainder.
//   n == 0: the result is nil (zero substrings)
//   n < 0: all substrings
// 
// Edge cases for s and sep (for example, empty strings) are handled
// as described in the documentation for Split.
func SplitN[S, Sep Byteseq](s S, sep Sep, n int) []S

// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim[S, Cutset Byteseq](s S, cutset Cutset) S

// TrimFunc returns a slice of the string s with all leading
// and trailing Unicode code points c satisfying f(c) removed.
func TrimFunc[S Byteseq](s S, f func (rune) bool) S

// TrimLeft returns a slice of the string s with all leading
// Unicode code points contained in cutset removed.
// 
// To remove a prefix, use TrimPrefix instead.
func TrimLeft[S, Cutset Byteseq](s S, cutset Cutset) S

// TrimLeftFunc returns a slice of the string s with all leading
// Unicode code points c satisfying f(c) removed.
func TrimLeftFunc[S Byteseq](s S, f func (rune) bool) S

// TrimPrefix returns s without the provided leading prefix string.
// If s doesn't start with prefix, s is returned unchanged.
func TrimPrefix[S, Prefix Byteseq](s S, prefix Prefix) S

// TrimRight returns a slice of the string s, with all trailing
// Unicode code points contained in cutset removed.
// 
// To remove a suffix, use TrimSuffix instead.
func TrimRight[S, Cutset Byteseq](s S, cutset Cutset) S

// TrimRightFunc returns a slice of the string s with all trailing
// Unicode code points c satisfying f(c) removed.
func TrimRightFunc[S Byteseq](s S, f func (rune) bool) S

// TrimSpace returns a slice of the string s, with all leading
// and trailing white space removed, as defined by Unicode.
func TrimSpace[S Byteseq](s S) S

// TrimSuffix returns s without the provided trailing suffix string.
// If s doesn't end with suffix, s is returned unchanged.
func TrimSuffix[S, Suffix Byteseq](s S, suffix Suffix) S

Notice that API proposal below does not include functions like strings.Map or strings.Join that build a new string.
The reason is avoiding dependency on strings.Builder.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions