Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: provide no-copy conversion from []byte to string #25484

Closed
tliron opened this issue May 22, 2018 · 14 comments
Closed

Feature: provide no-copy conversion from []byte to string #25484

tliron opened this issue May 22, 2018 · 14 comments

Comments

@tliron
Copy link

tliron commented May 22, 2018

If you read bytes from an io.Reader, but need them as a string, then you're going to have to convert. And that will cost you memory as data is copied between quite different variables.

Annoyed by this waste, some of us has been doing things like this:

func BytesToString(bytes []byte) string {
	sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	stringHeader := reflect.StringHeader{Data: sliceHeader.Data, Len: sliceHeader.Len}
	return *(*string)(unsafe.Pointer(&stringHeader))
}

Of course, proper use requires you to never change the original byte array, but that's a reasonable contract in many cases.

First, I would like to know if there are any repercussions to this usage. I'm always nervous about using unsafe, but in this case it seems to adhere to the spec, and also I can't foresee any problems with garbage collection. But maybe I'm missing something important. I've been using this a lot and have not seen any issues.

Second, assuming this code is A-OK, then why not provide something like this in the standard library? (Probably in the bytes package.) It could allow for significant memory savings for programs that read a lot of text.

@randall77
Copy link
Contributor

That's not quite the right way to do it. See the description at the end of package unsafe's description. (TL;DR don't declare variables of type reflect.SliceHeader or reflect.StringHeader, use variables of type *reflect.SliceHeader and *reflect.StringHeader instead.) But in general, if you want to do this in your code, go for it. #19367 is a proposal to make this a bit nicer/safer.

We don't want to provide this functionality (unsafe-converting a []byte to string) in the standard library because it lets people create mutable strings. Lots of packages assume strings are immutable, and may behave badly if the strings are mutable. For instance, maps with mutable string keys will behave badly. We'd rather not encourage those kinds of bugs, even if you can do it yourself.

If and when we do #19367, adding copy-free []byte to string would be an easy additional unsafe operation. But maybe we just keep letting people do that themselves.

@bcmills
Copy link
Contributor

bcmills commented May 22, 2018

As @randall77 notes, this would have to live in the unsafe package (because it is unsafe in general). As such, closing as a dup of #19367.

@bcmills bcmills closed this as completed May 22, 2018
@tliron
Copy link
Author

tliron commented May 22, 2018

I understand the concern, but this just seems to be such a common use case with very obvious gains. I understand that Go is focused more on engineering practicality than language correctness.

Since we already have an unsafe package, how about adding this function there? Is that the point of #19367? By having it in unsafe users should be clear as to what they're getting into (potentially immutable strings).

Here's my new way to do it by avoiding an instance of reflect.StringHeader. Does this look right?

func BytesToString(bytes []byte) string {
	sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	return *(*string)(unsafe.Pointer(&reflect.StringHeader{
		Data: sliceHeader.Data,
		Len:  sliceHeader.Len,
	}))
}

It's actually subtly tricky to ensure that the byte slice doesn't get GC'ed. I took into account the comment here to do it all in one line.

@randall77
Copy link
Contributor

I think this is safer.

func BytesToString(bytes []byte) (s string) {
	slice := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	str := (*reflect.StringHeader)(unsafe.Pointer(&s))
	str.Data = slice.Data
	str.Len = slice.Len
}

That way you're never allocating an instance of reflect.StringHeader.

@tliron
Copy link
Author

tliron commented May 22, 2018

Thanks, but that doesn't compile in go1.10.2. Is it even possible to do & on implicit returns? This does work:

func BytesToString(bytes []byte) string {
	var s string
	sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
	stringHeader.Data = sliceHeader.Data
	stringHeader.Len = sliceHeader.Len
	return s
}

@randall77
Copy link
Contributor

Ah, I forgot a final "return".
Yes, you can take the address of named return values.
Your version is fine as well.

@bcmills
Copy link
Contributor

bcmills commented May 22, 2018

With the missing return, it seems to work:
https://play.golang.org/p/36sWGSvYFyV

FWIW, you can also write it without referring to reflect.SliceHeader:
https://play.golang.org/p/0if5Nj9RH1z

@tliron
Copy link
Author

tliron commented May 22, 2018

Thanks! Here's my final version with one more little optimization.

See, the subtleties of getting this right are exactly why this should be a std func.

@zigo101
Copy link

zigo101 commented May 23, 2018

The BytesToString function above is not safe under current unsafe rules.
It needs a KeepAlive line:

func BytesToString(bytes []byte) (s string) {
	slice := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
	str := (*reflect.StringHeader)(unsafe.Pointer(&s))
	str.Data = slice.Data
	str.Len = slice.Len
	runtime.KeepAlive(&bytes) // this line is essential.
	return s
}

@tliron
Copy link
Author

tliron commented May 23, 2018

Thanks very much, go101.

For a closed issue this one is getting a lot of input. ;) Again, I imagine very, very few people out there are getting this critical code right. See this blog and this blog for wrong ways of doing this.

Folk, I think this should be in Go std lib, if only to save people from themselves. Many people will try to do this and mess up.

@bcmills
Copy link
Contributor

bcmills commented May 23, 2018

@go101 Unless I have missed something, the version without *reflect.SliceHeader (https://play.golang.org/p/0if5Nj9RH1z) does not require a runtime.KeepAlive. Before the Data pointer is written it is reachable via an unsafe.Pointer, and after it is written it is reachable via the s string variable.

@bcmills
Copy link
Contributor

bcmills commented May 23, 2018

@tliron

Again, I imagine very, very few people out there are getting this critical code right.

If #19367 is accepted, then it will be much easier to get right: that proposal uses unsafe.Pointer instead of uintptr for the Data field, which means that the garbage collector will correctly trace slices that are only reachable via a SliceHeader or StringHeader.

The only reason this is tricky today is because the headers are in reflect instead of unsafe, and as such cannot use the correct unsafe.Pointer type for the Data field (see #20171).

@zigo101
Copy link

zigo101 commented May 23, 2018

@bcmills
yes, the way used by strings.Builder

func ByteSlice2String(bs []byte) string {
	return *(*string)(unsafe.Pointer(&bs))
}

is a more efficient way.

@tliron
Copy link
Author

tliron commented May 24, 2018

Thanks, go101. For reference, this is the source referred to.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants