-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Background
utf16.Encode always allocates a []uint16 large enough to fit the UTF-16 encoded sequence, which is really ergonomic but forces one allocation.
Proposal
Update, May 27 2022: The proposed API has changed (see #51896 (comment)) to:
// AppendRune appends the UTF-16 encoding of the Unicode code point r
// to the end of p and returns the extended buffer. If the rune is not
// a valid Unicode code point, it appends the encoding of U+FFFD.
func AppendRune(p []uint16, r rune) []uint16Update, May 27 2022: The following functions were been superseded by the previous AppendRune.
For those cases that the extra allocation matters, unicode/utf16 could provide an additional encoding function which accepts a pre-allocated (and large enough) backing slice.
The signature would look like this:
// EncodeInto writes into a (which must be large enough) the UTF-16 encoding
// of the Unicode code point sequence s.
func EncodeInto(a []uint16, s []rune) []uint16Optionally, in order to know the minimum size of the backing array, unicode/utf16 could provide an additional function which counts the number of code units in a code point sequence.
It would look something like this:
// Countreturns the number of code units in p.
// Invalid encodings are treated as single runes of width 1 byte.
func Count(s []rune) int {
n := len(s)
for _, v := range s {
if v >= surrSelf {
n++
}
}
return n
}It worth northing that utf16.Encode could then be implemented using utf16.Count and utf16.EncodeInto.
Examples
My specific use case is to allow x/sys/windows/mkwinsyscall generate syscall wrappers which accept string arguments without allocating, at least for short strings. Check this comment for more context.
If I had utf16.EncodeInto I could implement a non-allocating wrapper as follows:
func Foo(s string) {
p := []rune(s + "\x00")
l := utf16.Count(p)
var a []uint16
if l < 32 {
a = make([]uint16, 32)
} else {
a = make([]uint16, l)
}
a = utf16.EncodeInto(a, p)
syscall.Syscall6(fnAddr(), 6, 0, uintptr(unsafe.Pointer(&a[0])), 0, 0, 0, 0)
return
}