Skip to content

unicode/utf16: add AppendRune #51896

@qmuntal

Description

@qmuntal

Background

utf16.Encode always allocates a []uint16 large enough to fit the UTF-16 encoded sequence, which is really ergonomic but forces one allocation.

Proposal

Update, May 27 2022: The proposed API has changed (see #51896 (comment)) to:

// AppendRune appends the UTF-16 encoding of the Unicode code point r
// to the end of p and returns the extended buffer. If the rune is not
// a valid Unicode code point, it appends the encoding of U+FFFD.
func AppendRune(p []uint16, r rune) []uint16

Update, May 27 2022: The following functions were been superseded by the previous AppendRune.

For those cases that the extra allocation matters, unicode/utf16 could provide an additional encoding function which accepts a pre-allocated (and large enough) backing slice.

The signature would look like this:

// EncodeInto writes into a (which must be large enough) the UTF-16 encoding
// of the Unicode code point sequence s.
func EncodeInto(a []uint16, s []rune) []uint16

Optionally, in order to know the minimum size of the backing array, unicode/utf16 could provide an additional function which counts the number of code units in a code point sequence.

It would look something like this:

// Countreturns the number of code units in p.
// Invalid encodings are treated as single runes of width 1 byte.
func Count(s []rune) int {
	n := len(s)
	for _, v := range s {
		if v >= surrSelf {
			n++
		}
	}
	return n
}

It worth northing that utf16.Encode could then be implemented using utf16.Count and utf16.EncodeInto.

Examples

My specific use case is to allow x/sys/windows/mkwinsyscall generate syscall wrappers which accept string arguments without allocating, at least for short strings. Check this comment for more context.

If I had utf16.EncodeInto I could implement a non-allocating wrapper as follows:

func Foo(s string) {
	p := []rune(s + "\x00")
	l := utf16.Count(p)
	var a []uint16
	if l < 32 {
		a = make([]uint16, 32)
	} else {
		a = make([]uint16, l)
	}
	a = utf16.EncodeInto(a, p)
	syscall.Syscall6(fnAddr(), 6, 0, uintptr(unsafe.Pointer(&a[0])), 0, 0, 0, 0)
	return
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions