Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/hex: DecodeString wastes memory #29802

Open
rillig opened this issue Jan 18, 2019 · 7 comments
Open

encoding/hex: DecodeString wastes memory #29802

rillig opened this issue Jan 18, 2019 · 7 comments

Comments

@rillig
Copy link
Contributor

@rillig rillig commented Jan 18, 2019

What version of Go are you using?

go version go1.11.4 windows/amd64

What did you do?

https://play.golang.org/p/pbZJx5AZHpJ

What did you expect to see?

The byte slice returned by hex.DecodeString has len == cap.

What did you see instead?

The byte slice returned by hex.DecodeString is twice as large as strictly necessary.

See this rillig/pkglint commit for the real-life example where I discovered this. In a commit message shortly before that, I discovered strange sequences of hex digits in the heap dump.

@josharian

This comment has been minimized.

Copy link
Contributor

@josharian josharian commented Jan 18, 2019

I’m not sure it’s worth the required code duplication to use a smaller buffer (i.e. use less memory). It might be worth setting cap==len in the returned slice (return src[:n:n]), although that would be a performance penalty for anyone who appends to the return value.

@rillig

This comment has been minimized.

Copy link
Contributor Author

@rillig rillig commented Jan 18, 2019

What about this:

func DecodeString(s string) ([]byte, error) {
	dstLen := DecodedLen(len(s))
	dst := make([]byte, dstLen, dstLen)
	n, err := Decode(dst, []byte(s))
	return dst[:n], err
}

There's no code duplication here, yet the output slice is sized as expected (by me).

The Go compiler should be smart enough to figure out that []byte(s) can be a no-op in this case.

Users of this function who want to append later probably already use Decode(dst, src) instead. Appending to the result of DecodeString feels like an edge case to me, and at least in the Go source tree I didn't find a single example for that. Therefore I was surprised to see that DecodeString wastes memory in the common case where the output is expected to be complete as it is.

@bronze1man

This comment has been minimized.

Copy link
Contributor

@bronze1man bronze1man commented Jan 18, 2019

@rillig

The Go compiler should be smart enough to figure out that []byte(s) can be a no-op in this case.

I use runtime.ReadMemStats to confirm that when I pass in a 2048 bytes string, it will make two allocs and total size is 3072 bytes with go version go1.11.4 darwin/amd64.

So the Go compiler is not smart enough in your option at least in go1.11.4

@rillig

This comment has been minimized.

Copy link
Contributor Author

@rillig rillig commented Jan 18, 2019

@josharian Trimming the return value using src[:n:n] would not help the GC since it still does not free the second half of the byte array. I tested this using the following program:

package main

import (
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/debug"
)

func main() {
	large := make([]byte, 0, 1<<24)

	dump("001.heapdump") // 16 MB

	small := large[0 : 1<<10 : 1<<10]
	large = nil

	dump("002.heapdump") // still 16 MB, even though large is not accessible anymore

	fmt.Println(small[0:1])
	small = nil

	dump("003.heapdump") // 272 kB
}

func dump(filename string) {
	f, err := os.Create(filename)
	if err != nil {
		log.Fatalln(err)
	}

	runtime.GC()
	debug.WriteHeapDump(f.Fd())

	err = f.Close()
	if err != nil {
		log.Fatalln(err)
	}
}
@josharian

This comment has been minimized.

Copy link
Contributor

@josharian josharian commented Jan 18, 2019

The Go compiler should be smart enough to figure out that []byte(s) can be a no-op in this case.

But it's not. I filed #29810 for this.

Trimming the return value using src[:n:n] would not help the GC

Indeed. The only value to doing so is that it helps avoid surprise of the kind expressed in this issue. @rillig also expressed concern about "leaking data", but given that the caller already had the hex input, it's not so clear to me that this is a major concern. On balance, I don't think that we should make this change.

I use runtime.ReadMemStats to confirm

FWIW, this kind of thing is easier to measure using benchmarks.

@robpike

This comment has been minimized.

Copy link
Contributor

@robpike robpike commented Jan 18, 2019

A side remark: encoding/hex is close to pointless. Printf and friends do hex conversion just fine, with careful memory management.

@valyala

This comment has been minimized.

Copy link
Contributor

@valyala valyala commented Jan 20, 2019

Yet another side remark: it would be great if standard encoding/* packages had Append*(dst []byte, args...) []byte methods with semantics similar to strconv.Append*. Such methods could be easily used in zero-alloc mode. There is an outdated proposal for encoding/base64 at #19366 .

@bcmills bcmills added the Performance label Jan 29, 2019
@bcmills bcmills added this to the Unplanned milestone Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.