-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
What version of Go are you using (go version)?
$ go version go version devel go1.21-f90b4cd655 Fri May 26 03:21:41 2023 +0000 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env)?
N/A
What did you do?
Read the encoding/base64 and encoding/base32 documentation for WithPadding.
Run this code:
package main
import (
"encoding/base64"
"fmt"
"unicode/utf8"
)
func main() {
s := base64.StdEncoding.WithPadding('é').EncodeToString([]byte("a"))
fmt.Println(utf8.ValidString(s))
}
What did you expect to see?
The WithPadding signature (and documentation) takes a rune argument, which generally implies that it will be encoded as a valid UTF-8 character. It stipulates that it must be less than 0x100 but, that still allows some non-ASCII characters.
However, when a non-ASCII rune is used, the result is a string that is not well formed (it's not valid UTF-8) because the padding "rune" is actually treated as a byte, not as a character.
From a strict reading of the documentation, I'd expect to see the pad character encoded as a valid multibyte utf-8 character and therefore the above program should print true.
What did you see instead?
The above code prints false because the resulting string is not valid:
I realise that this behaviour cannot be changed at this point, but we could at least update the documentation to make it clear that the "rune" is actually treated as a byte.