Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

零拷贝实现string 和bytes的转换疑问 #7

Closed
konkona opened this issue Dec 24, 2019 · 7 comments · Fixed by #15
Closed

零拷贝实现string 和bytes的转换疑问 #7

konkona opened this issue Dec 24, 2019 · 7 comments · Fixed by #15

Comments

@konkona
Copy link

konkona commented Dec 24, 2019

func string2bytes(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

	bh := reflect.SliceHeader{
		Data: stringHeader.Data,
		Len:  stringHeader.Len,
		Cap:  stringHeader.Len,
	}

	return *(*[]byte)(unsafe.Pointer(&bh))
}

1,这里的Data 是一个uintptr整型,把stringHeader.Data作为值拷贝,后面gc不会移动或者回收该uintptr指向的内存吗?

2,在官方文档里面的描述是这样的:

// In general, reflect.SliceHeader and reflect.StringHeader should be used
// only as *reflect.SliceHeader and *reflect.StringHeader pointing at actual
// slices or strings, never as plain structs.
// A program should not declare or allocate variables of these struct types.
//	// INVALID: a directly-declared header will not hold Data as a reference.
//	var hdr reflect.StringHeader
//	hdr.Data = uintptr(unsafe.Pointer(p))
//	hdr.Len = n
//	s := *(*string)(unsafe.Pointer(&hdr)) // p possibly already lost

你的转换函数是不是换成下面的更好?

func string2bytes(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

        var b []byte
        pbytes := (*reflect.SliceHeader)(unsafe.Pointer(&b))
        pbytes.Data = stringHeader.Data
        pbytes.Len = stringHeader.Len
        pbytes.Cap = stringHeader.Cap

	return b
}
@changkun
Copy link
Member

changkun commented Feb 11, 2020

  1. 是的,官方文档里面已经说明了这个问题:the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data. -- https://golang.org/pkg/reflect/#SliceHeader
    原来的代码是错误的。

  2. 不用这么复杂,可以直接切为 unsafe 强制转换,,而且这种方式更加高效:

func string2bytes(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(&s))
}

附:性能对比

// main.go
package main

import (
	"reflect"
	"unsafe"
)

func string2bytes1(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

	var b []byte
	pbytes := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	pbytes.Data = stringHeader.Data
	pbytes.Len = stringHeader.Len
	pbytes.Cap = stringHeader.Len

	return b
}

func string2bytes2(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(&s))
}
// main_test.go
package main

import (
	"fmt"
	"math/rand"
	"reflect"
	"testing"
	"time"
)

func TestString2Bytes(t *testing.T) {
	s := "qcrao/Go-Questions"
	if string(string2bytes2(s)) != s {
		t.Fatalf("string2bytes2 is not properly implemented")
	}
	if !reflect.DeepEqual(string2bytes1(s), string2bytes2(s)) {
		t.Fatalf("strings2bytes implementation does not match")
	}
}

func init() {
	rand.Seed(time.Now().UnixNano())
}

var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func genstring(n int) string {
	b := make([]rune, n)
	for i := range b {
		b[i] = letterRunes[rand.Intn(len(letterRunes))]
	}
	return string(b)
}

func BenchmarkString2Bytes(b *testing.B) {
	funcs := map[string]func(string) []byte{
		"string2bytes1": string2bytes1,
		"string2bytes2": string2bytes2,
	}

	for name, f := range funcs {
		for i := 1; i < 10000; i *= 10 {
			s := genstring(i)
			b.Run(fmt.Sprintf("%v-%v", name, i), func(b *testing.B) {
				for i := 0; i < b.N; i++ {
					f(s)
				}
			})
		}

	}
}
$ go test -v -run=none -bench=. -benchmem -count=10 . | tee bench.txt
$ benchstat bench.txt

name                                 time/op
String2Bytes/string2bytes1-1-12      3.07ns ± 1%
String2Bytes/string2bytes1-10-12     3.08ns ± 2%
String2Bytes/string2bytes1-100-12    3.08ns ± 1%
String2Bytes/string2bytes1-1000-12   3.08ns ± 0%
String2Bytes/string2bytes1-10000-12  3.07ns ± 1%
String2Bytes/string2bytes2-1-12      1.95ns ± 2%
String2Bytes/string2bytes2-10-12     1.95ns ± 2%
String2Bytes/string2bytes2-100-12    1.94ns ± 1%
String2Bytes/string2bytes2-1000-12   1.95ns ± 1%
String2Bytes/string2bytes2-10000-12  1.96ns ± 3%

name                                 alloc/op
String2Bytes/string2bytes1-1-12       0.00B     
String2Bytes/string2bytes1-10-12      0.00B     
String2Bytes/string2bytes1-100-12     0.00B     
String2Bytes/string2bytes1-1000-12    0.00B     
String2Bytes/string2bytes1-10000-12   0.00B     
String2Bytes/string2bytes2-1-12       0.00B     
String2Bytes/string2bytes2-10-12      0.00B     
String2Bytes/string2bytes2-100-12     0.00B     
String2Bytes/string2bytes2-1000-12    0.00B     
String2Bytes/string2bytes2-10000-12   0.00B

changkun added a commit to changkun/Go-Questions that referenced this issue Feb 11, 2020
@qcrao qcrao closed this as completed in #15 Feb 11, 2020
@luojiego
Copy link

@changkun string2bytes2 转换函数严格意义上来讲是错误的,因为转换的时候并未正常给 cap 赋值。

package main

import (
	"fmt"
	"reflect"
	"runtime"
	"unsafe"
)


func string2bytes1(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

	var b []byte
	pBytes := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	pBytes.Data = stringHeader.Data
	pBytes.Len = stringHeader.Len
	pBytes.Cap = stringHeader.Len

	runtime.KeepAlive(s)
	return b
}

func string2bytes2(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(&s))
}

func main() {
	s1 := string2bytes1("Roger")
	fmt.Println(s1)
	fmt.Println(len(s1))
	fmt.Println(cap(s1))
	s2 := string2bytes2("Roger")
	fmt.Println(s2)
	fmt.Println(len(s2))
	fmt.Println(cap(s2))
}

s2 的 cap 输出将会是一个随机值。

[82 111 103 101 114]
5
5
[82 111 103 101 114]
5
4840475

@changkun
Copy link
Member

@luojiego 不好意思,我认为这是实现者的决策,而不是正确与否的问题。如果我们要讨论「严格意义」上说,你不应该做这种实现,要么老老实实带拷贝的转换,要么用标准库 bytes.Buffer

另外,string2bytes1 中的 runtime.KeepAlive(s) 是不必要的。

@luojiego
Copy link

@luojiego 不好意思,我认为这是实现者的决策,而不是正确与否的问题。如果我们要讨论「严格意义」上说,你不应该做这种实现,要么老老实实带拷贝的转换,要么用标准库 bytes.Buffer

另外,string2bytes1 中的 runtime.KeepAlive(s) 是不必要的。

OK,非常感谢!

@douglarek
Copy link

douglarek commented Feb 16, 2021

强转的一个问题是转换后的 byte slice cap 很大,这个是不好的,比如 https://play.golang.org/p/_tqfAgxlZAv ,所以简单粗暴的强转不可取,因为无法拿到 byte slice 的 cap,一个性能较好的实现是 fasthttp 的( https://github.com/valyala/fasthttp/blob/c48d3735fa9864a7c1724168812f3571c8313581/bytesconv.go#L387 )。

@techone577
Copy link

为什么 cap 值会这么大?从汇编代码看貌似 cap 值为字符串的 Data 的地址值,但又不是稳定复现的

@luojiego
Copy link

为什么 cap 值会这么大?从汇编代码看貌似 cap 值为字符串的 Data 的地址值,但又不是稳定复现的

src/reflect/value.go 有关于 string 的 []byte 的底层结构体定义,因为 []byte 比 string 多了 Cap 字段,如果使用 unsafe 包直接将 string 转换成 slice,必然会导致 Cap 未正确赋值。

// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type StringHeader struct {
    Data uintptr
    Len  int
}

// SliceHeader is the runtime representation of a slice.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants