Skip to content

encoding/base{32,64}: WithPadding is misleading #60689

@rogpeppe

Description

@rogpeppe

What version of Go are you using (go version)?

$ go version
go version devel go1.21-f90b4cd655 Fri May 26 03:21:41 2023 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

N/A

What did you do?

Read the encoding/base64 and encoding/base32 documentation for WithPadding.

Run this code:

package main

import (
	"encoding/base64"
	"fmt"
	"unicode/utf8"
)

func main() {
	s := base64.StdEncoding.WithPadding('é').EncodeToString([]byte("a"))
	fmt.Println(utf8.ValidString(s))
}

What did you expect to see?

The WithPadding signature (and documentation) takes a rune argument, which generally implies that it will be encoded as a valid UTF-8 character. It stipulates that it must be less than 0x100 but, that still allows some non-ASCII characters.

However, when a non-ASCII rune is used, the result is a string that is not well formed (it's not valid UTF-8) because the padding "rune" is actually treated as a byte, not as a character.

From a strict reading of the documentation, I'd expect to see the pad character encoded as a valid multibyte utf-8 character and therefore the above program should print true.

What did you see instead?

The above code prints false because the resulting string is not valid:

I realise that this behaviour cannot be changed at this point, but we could at least update the documentation to make it clear that the "rune" is actually treated as a byte.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationIssues describing a change to documentation.FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions