New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unsafe: add StringData, String, SliceData #53003
Comments
Thanks, but I don't see an actual proposal here. If you want to discuss ideas, please use golang-nuts. The proposal process should be for a proposal. That is, suggest some specific functions that we should introduce. Thanks. |
Sure, good point. I've edited my comment with an actual proposal of |
One of the main use cases of |
CC @mdempsky |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Filed #53079. Perhaps we should undeprecate them for Go 1.19, since we don't have a replacement for all valid use cases yet. Thanks. |
This proposal has been added to the active column of the proposals project |
FWIW, the proposed functions closely parallel the That package also provides best-effort mutation detection, since Go |
I've seen that package, and the one thing I'd not do is the mutation detection (which you gate behind the I was thinking to link the package in my original writeup, since it is further evidence of the use case. |
At the very least, I think it's important for the documentation for the proposed functions to explicitly call out the lifetime issues, especially for |
To fully replace For reading, we can already extract all the elements of a slice, via For writing, we can create a new slice setting all the elements, via So to me that suggests, in package unsafe, // StringData returns a pointer to the bytes of a string.
// The bytes must not be modified; doing so can cause
// the program to crash or behave unpredictably.
func StringData(string) *byte
// String constructs a string value from a pointer and a length.
// The bytes passed to String must not be modified;
// doing so can cause the program to crash or behave unpredictably.
func String(*byte, int) string These functions should remain meaningful even if we somehow change the representation of slices or strings in the future. The restrictions on changing the bytes are unfortunate, but the fact is that people are doing these kinds of transformations today. Omitting these functions from the unsafe package doesn't mean that Go programs won't do them, it just means that they will do in ways that are sometimes even less safe. It should be feasible to add a dynamic detector for any modifications to these bytes. This could perhaps be enabled when using the race detector. This would not be perfect but would detect egregious misuses. The functions suggested above would then be written as func StringToBytes(s string) []byte {
return unsafe.Slice(unsafe.StringData(s), len(s))
}
func BytesToString(b []byte) string {
return unsafe.String(&b[0], len(b))
} (To be clear, the reverse is also possible: we can write |
It does seem like unsafe.String and unsafe.StringData match unsafe.Slice a bit better and are more fundamental operations |
I'd like to suggest we reconsider the original proposal from #19367: to add new Concretely, I propose adding:
and allowing conversions between Converting an invalid N.B., my original #19367 proposal also allowed conversions between |
My concern about What can we do with |
After thinking about it more, I'm inclined to agree that |
It sounds like we've converged on considering unsafe.StringData, unsafe.String, and unsafe.SliceData. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
Change https://go.dev/cl/428757 mentions this issue: |
Change https://go.dev/cl/429755 mentions this issue: |
As discussed in CL 401434 there are substantial misuses of these in the wild, and they are a potential source of unsafety even for code that does not use them directly. Since proposal #53003 has already been implemented, now is the right time to deprecate reflect.{SliceHeader, StringHeader}. For #53003. Change-Id: I724cf46d4b22d2ed3cbf2b948e6aac5ee4bf0f6e Reviewed-on: https://go-review.googlesource.com/c/go/+/428757 Run-TryBot: hopehook <hopehook@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Matthew Dempsky <mdempsky@google.com>
…ice, String} After we deprecated reflect.{SliceHeader, StringHeader}, it is recommended to use unsafe.{Slice, String} to replace its work. However, the compiler and linker cannot be migrated for the time being. As a temporary strategy, using the "internal/unsafeheader" package like other code is the most suitable choice at present. For #53003. Change-Id: I69d0ef72e2d95caabd0706bbb247a719d225c758 Reviewed-on: https://go-review.googlesource.com/c/go/+/429755 Auto-Submit: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: hopehook <hopehook@golangcn.org>
@mdempsky is this going to go in for 1.20? I've tried to follow the large number of associated PRs but I can't really tell what the current state is. |
@cespare As far as I know, they are in. What's missing is the docs. I just pinged that CL. If there is anything else missing, please let me know. Thanks. |
Updates #53003. Change-Id: I076d1eb4bd0580002ad8008f3ca213c5edc951ee Reviewed-on: https://go-review.googlesource.com/c/go/+/427095 Run-TryBot: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
Change https://go.dev/cl/449537 mentions this issue: |
…Data For #53003. Change-Id: If5d76c7b8dfcbcab919cad9c333c0225fc155859 Reviewed-on: https://go-review.googlesource.com/c/go/+/449537 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Robert Griesemer <gri@google.com> Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>
Both the spec and the unsafe package documentation are now done. Closing. |
|
I still miss that import "unsafe"
func String2ByteSlice(str string) []byte {
if str == "" {
return nil
}
return unsafe.Slice(unsafe.StringData(str), len(str))
}
func ByteSlice2String(bs []byte) string {
if len(bs) == 0 {
return ""
}
return unsafe.String(unsafe.SliceData(bs), len(bs))
} Those edge cases with empty strings/slices make current API very verbose as you have to check for those edge cases. Could those two functions be added to standard library? On a related note, should examples like |
Hi @mitar, FWIW, I don't know that the standard library would want to make that more convenient. Note that Go 1.22 will have some nice enhancements in this area, including recognizing in many cases when a []byte is not modified after a string->[]byte conversion to enable a safe zero-copy conversion (issue #2205). Here is an example (with comparison of 1.21 vs. tip via godbolt): func f() {
s := "hello"
x := []byte(s) // zero-copy string->[]byte conversion
_ = x
} |
@mitar I don't think these checks are needed.
So it's perfectly safe to feed the returned pointer directly to |
@database64128 package main
import "unsafe"
func main() {
{
var s = ""
println(s == "") // true
var x = unsafe.Slice(unsafe.StringData(s), len(s))
println(x == nil) // true
}
{
var s = "go"[:0]
println(s == "") // true
var x = unsafe.Slice(unsafe.StringData(s), len(s))
println(x == nil) // false
}
} It might satisfy the needs of some cases, but not all. The situations of converting from zero-length slices to strings are comparatively simpler. package main
import "unsafe"
func ByteSlice2String(bs []byte) string {
if len(bs) == 0 {
return ""
}
return unsafe.String(unsafe.SliceData(bs), len(bs))
}
func main() {
var x = make([]byte, 0, 1024)
var s = unsafe.String(unsafe.SliceData(x), len(x))
println(s == "") // true
println(*(*uintptr)(unsafe.Pointer(&s))) // <a non-zero addr>
var s2 = ByteSlice2String(x)
println(s2 == "") // true
println(*(*uintptr)(unsafe.Pointer(&s2))) // 0
} |
Now that
reflect.StringHeader
andreflect.SliceHeader
are officially deprecated, I think it's time to revisit adding function that satisfy the reason people have used these types. AFAICT, the reason for deprecation is thatreflect.SliceHeader
andreflect.StringHeader
are commonly misused. As well, the types have always been documented as unstable and not to be relied upon.We can see in Github code search that usage of these types is ubiquitous. The most common use cases I've seen are:
[]byte
tostring
string
to[]byte
Data
pointer field for ffi or some other niche useThe first use case can also commonly be seen as
*(*string)(unsafe.Pointer(&mySlice))
, which is never actually officially documented anywhere as something that can be relied upon. Under the hood, the shape of a string is less than a slice, so this seems valid per unsafe rule (1), but this is all relying on undocumented behavior. The second use case is commonly seen as*(*[]byte)(unsafe.Pointer(&string))
, which is by-default broken because the Cap field can be past the end of a page boundary (example here, in widely used code) -- this violates unsafe rule (1).Regardless of the thought that people should never rely upon these types, people do, and they do so all over. People also rely on invalid conversions because Go has never made this easy. Part of the discussion on #19367 was about all the ways that people misuse these types today. These conversions are small tricks that can alleviate memory pressure and improve latencies and CPU usage in real programs. The use cases are real, and Go provides just enough unsafe and buggy ways of working around these problems such that now there is a large ecosystem of technically invalid code that just so happens to work.
Rather than consistently saying "don't use this",
go vet
ing somewhat, and then ducking all responsibility for buggy programs, Go should provide actual safe(ish) APIs that people can rely on in perpetuity. New functions can live inunsafe
and have well documented rules around their use cases, and then Go can finally document what to do when people want this common escape hatch.Concrete proposal
The following APIs in the unsafe package:
eliminating, because realistically a person can just dofunc DataPointer[T ~string|~[]E, E any](t T) unsafe.Pointer
&slice[0]
(although a corresponding analogue does not exist for strings)I think
unsafe.Slice
covers the use case for converting between slices of different types, although I'm not 100% sure what the use case ofunsafe.Slice
is.The text was updated successfully, but these errors were encountered: