proposal: byteseq: add a generic byte string manipulation package

This proposal is for use with #43651. I propose to define a new package, `byteseq`, that will provide simple generic
functions to manipulate UTF-8 encoded strings and byte slices.

Goals of this proposal:

- Provide a safe generic API without `string` <-> `[]byte` conversion overhead.
- Reduce code duplication between `strings` and `bytes` packages.
- Enforce immutability in API using type constraints. The `~string | ~[]byte` constraint denotes that function should
  not mutate their arguments.

API description:

```go
// Byteseq represents a generic UTF-8 byte string.
type Byteseq interface {
     ~string | ~[]byte
}

// Compare returns an integer comparing two strings lexicographically. 
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b.
func Compare[A, B Byteseq](a A, b B) int

// Contains reports whether subslice is within b.
func Contains[B, SubSlice Byteseq](b B, subslice SubSlice) bool

// ContainsAny reports whether any of the UTF-8-encoded code points in chars are within b.
func ContainsAny[B, Chars Byteseq](b B, chars Chars) bool

// Count counts the number of non-overlapping instances of sep in s. 
// If sep is empty, Count returns 1 + the number of UTF-8-encoded code points in s.
func Count[S, Sep Byteseq](s S, sep Sep) int

// Equal reports whether a and b are the same length and contain the same bytes
func Equal[A, B Byteseq](a A, b B) bool

// EqualFold reports whether s and t, interpreted as UTF-8 strings, are equal under Unicode case-folding, 
// which is a more general form of case-insensitivity.
func EqualFold[S, T Byteseq](s S, t T) bool

// Fields splits the string s around each instance of one or more consecutive white space
// characters, as defined by unicode.IsSpace, returning a slice of substrings of s or an
// empty slice if s contains only white space.
func Fields[S Byteseq](s S) []S

// FieldsFunc splits the string s at each run of Unicode code points c satisfying f(c)
// and returns an array of slices of s. If all code points in s satisfy f(c) or the
// string is empty, an empty slice is returned.
// 
// FieldsFunc makes no guarantees about the order in which it calls f(c)
// and assumes that f always returns the same value for a given c.
func FieldsFunc[S Byteseq](s S, f func (rune) bool) []S

// HasPrefix tests whether the string s begins with prefix.
func HasPrefix[S, Prefix Byteseq](s S, prefix Prefix) bool

// HasSuffix tests whether the string s ends with suffix.
func HasSuffix[S, Suffix Byteseq](s S, suffix Suffix) bool

// Index returns the index of the first instance of substr in s, or -1 if substr is not present in s.
func Index[S, Substr Byteseq](s S, substr Substr) int

// IndexAny returns the index of the first instance of any Unicode code point
// from chars in s, or -1 if no Unicode code point from chars is present in s.
func IndexAny[S, Chars Byteseq](s S, chars Chars) int

// IndexByte returns the index of the first instance of c in s, or -1 if c is not present in s.
func IndexByte[S Byteseq](s S, c byte) int

// IndexFunc returns the index into s of the first Unicode
// code point satisfying f(c), or -1 if none do.
func IndexFunc[S Byteseq](s S, f func (rune) bool) int

// IndexRune returns the index of the first instance of the Unicode code point
// r, or -1 if rune is not present in s.
// If r is utf8.RuneError, it returns the first instance of any
// invalid UTF-8 byte sequence.
func IndexRune[S Byteseq](s S, r rune) int

// LastIndex returns the index of the last instance of substr in s, or -1 if substr is not present in s.
func LastIndex[S, Substr Byteseq](s S, substr Substr) int

// LastIndexAny returns the index of the last instance of any Unicode code
// point from chars in s, or -1 if no Unicode code point from chars is
// present in s.
func LastIndexAny[S, Chars Byteseq](s S, chars Chars) int

// LastIndexByte returns the index of the last instance of c in s, or -1 if c is not present in s.
func LastIndexByte[S Byteseq](s S, c byte) int

// LastIndexFunc returns the index into s of the last
// Unicode code point satisfying f(c), or -1 if none do.
func LastIndexFunc[S Byteseq](s S, f func (rune) bool) int

// Split slices s into all substrings separated by sep and returns a slice of
// the substrings between those separators.
// 
// If s does not contain sep and sep is not empty, Split returns a
// slice of length 1 whose only element is s.
// 
// If sep is empty, Split splits after each UTF-8 sequence. If both s
// and sep are empty, Split returns an empty slice.
// 
// It is equivalent to SplitN with a count of -1.
func Split[S, Sep Byteseq](s S, sep Sep) []S

// SplitAfter slices s into all substrings after each instance of sep and
// returns a slice of those substrings.
// 
// If s does not contain sep and sep is not empty, SplitAfter returns
// a slice of length 1 whose only element is s.
// 
// If sep is empty, SplitAfter splits after each UTF-8 sequence. If
// both s and sep are empty, SplitAfter returns an empty slice.
// 
// It is equivalent to SplitAfterN with a count of -1.
func SplitAfter[S, Sep Byteseq](s S, sep Sep) []S

// SplitAfterN slices s into substrings after each instance of sep and
// returns a slice of those substrings.
// 
// The count determines the number of substrings to return:
//   n > 0: at most n substrings; the last substring will be the unsplit remainder.
//   n == 0: the result is nil (zero substrings)
//   n < 0: all substrings
// 
// Edge cases for s and sep (for example, empty strings) are handled
// as described in the documentation for SplitAfter.
func SplitAfterN[S, Sep Byteseq](s S, sep Sep, n int) []S

// SplitN slices s into substrings separated by sep and returns a slice of
// the substrings between those separators.
// 
// The count determines the number of substrings to return:
//   n > 0: at most n substrings; the last substring will be the unsplit remainder.
//   n == 0: the result is nil (zero substrings)
//   n < 0: all substrings
// 
// Edge cases for s and sep (for example, empty strings) are handled
// as described in the documentation for Split.
func SplitN[S, Sep Byteseq](s S, sep Sep, n int) []S

// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim[S, Cutset Byteseq](s S, cutset Cutset) S

// TrimFunc returns a slice of the string s with all leading
// and trailing Unicode code points c satisfying f(c) removed.
func TrimFunc[S Byteseq](s S, f func (rune) bool) S

// TrimLeft returns a slice of the string s with all leading
// Unicode code points contained in cutset removed.
// 
// To remove a prefix, use TrimPrefix instead.
func TrimLeft[S, Cutset Byteseq](s S, cutset Cutset) S

// TrimLeftFunc returns a slice of the string s with all leading
// Unicode code points c satisfying f(c) removed.
func TrimLeftFunc[S Byteseq](s S, f func (rune) bool) S

// TrimPrefix returns s without the provided leading prefix string.
// If s doesn't start with prefix, s is returned unchanged.
func TrimPrefix[S, Prefix Byteseq](s S, prefix Prefix) S

// TrimRight returns a slice of the string s, with all trailing
// Unicode code points contained in cutset removed.
// 
// To remove a suffix, use TrimSuffix instead.
func TrimRight[S, Cutset Byteseq](s S, cutset Cutset) S

// TrimRightFunc returns a slice of the string s with all trailing
// Unicode code points c satisfying f(c) removed.
func TrimRightFunc[S Byteseq](s S, f func (rune) bool) S

// TrimSpace returns a slice of the string s, with all leading
// and trailing white space removed, as defined by Unicode.
func TrimSpace[S Byteseq](s S) S

// TrimSuffix returns s without the provided trailing suffix string.
// If s doesn't end with suffix, s is returned unchanged.
func TrimSuffix[S, Suffix Byteseq](s S, suffix Suffix) S
```

Notice that API proposal below does not include functions like `strings.Map` or `strings.Join` that build a new string.
The reason is avoiding dependency on `strings.Builder`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: byteseq: add a generic byte string manipulation package #48643

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: byteseq: add a generic byte string manipulation package #48643

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions