Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/encoding/charmap: values should have type *Charmap, not encoding.Encoding #19584

Closed
nigeltao opened this issue Mar 17, 2017 · 6 comments
Closed

Comments

@nigeltao
Copy link
Contributor

The golang.org/x/text/encoding/charmap package exports values of type encoding.Encoding (an interface, https://godoc.org/golang.org/x/text/encoding#Encoding):

var Macintosh encoding.Encoding = etc

Being an interface (defined in another package) means that we can't add extra methods to these variables.

The (API-breaking) proposal is to change these variable's types from encoding.Encoding to *Charmap (the existing charmap type would be exported as Charmap), and then add two methods:

func (*Charmap) DecodeByte(b byte) rune
func (*Charmap) EncodeRune(r rune) (b byte, ok bool)

The motivation is that https://go.googlesource.com/image/+/master/font/sfnt/cmap.go contains:

// TODO: for this closure to be goroutine-safe, the
// golang.org/x/text/encoding/charmap API needs to allocate a new
// Encoder and new []byte buffers, for every call to this closure, even
// though all we want to do is to encode one rune as one byte. We could
// possibly add some fields in the Buffer struct to re-use these
// allocations, but a better solution is to improve the charmap API.
var dst, src [utf8.UTFMax]byte
n := utf8.EncodeRune(src[:], r)
_, _, err = charmap.Macintosh.NewEncoder().Transform(dst[:], src[:n], true)
if err != nil {
	// The source rune r is not representable in the Macintosh-Roman encoding.
	return 0, nil
}
doStuffWith(dst[0])

With this proposal, this code would become:

b, ok := charmap.Macintosh.EncodeRune(r)
if !ok {
	// The source rune r is not representable in the Macintosh-Roman encoding.
	return 0, nil
}
doStuffWith(b)

@mpvl

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017
@mpvl
Copy link
Contributor

mpvl commented Mar 22, 2017

@nigeltao generally I think this is a good idea. This has the other big benefit that the linker can now shed unused encodings. This is probably something that should be done for all encodings.

One caveat with this API is the user now needs to be aware of text normalization. The encoding packages are the only ones in the text repo where the input needs to be normalized to get consistent encodings. I plan to fix this at some point and the current API supports that. Using this API there is no way around it that the user needs to normalize to NFC first. But I guess in this use case that was already lost. In fact, this API will make documenting this possible.

Furthermore, I recall seeing some of the "single-byte" encodings allowing for a multi-rune decoding, but I must be wrong. I ran
grep SBCS * | cut -d: -f1 | xargs -n 1 grep -e "^<.*|3"
on http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/
to verify that there are no exceptions.

@bcmills
Copy link
Contributor

bcmills commented Mar 22, 2017

I think the change from interface to concrete type is a good idea independent of the proposed EncodeRune and DecodeByte additions. I have no particular opinion on the latter, but I am strongly in favor of the former: if nothing else, it sets a better example for Go users who encounter the API and design their own similar APIs.

@rsc
Copy link
Contributor

rsc commented Mar 27, 2017

Since some combination of @nigeltao and @mpvl own this package and they're both in favor, proposal accepted.

@rsc rsc closed this as completed Mar 27, 2017
@rsc rsc reopened this Mar 27, 2017
@rsc rsc changed the title proposal: golang.org/x/text/encoding/charmap values should have type *Charmap, not encoding.Encoding x/text/encoding/charmap: values should have type *Charmap, not encoding.Encoding Mar 27, 2017
@gopherbot
Copy link

CL https://golang.org/cl/38873 mentions this issue.

@mattn
Copy link
Member

mattn commented Mar 31, 2017

Seems breaking compatibility. mattn/go-encoding#1

@mattn
Copy link
Member

mattn commented Mar 31, 2017

But I understand why this change was required. So ignore this. Sorry.

@golang golang locked and limited conversation to collaborators Mar 31, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants