New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: crash in hybrid barrier initializing C memory #19928

Open
rsc opened this Issue Apr 11, 2017 · 29 comments

Comments

Projects
None yet
10 participants
@rsc
Contributor

rsc commented Apr 11, 2017

Received report via email of code crashing in write barrier finding a bad pointer (pointing to a free span). Code looks like:

keys := (*[1 << 20]*C.char)(unsafe.Pointer(C.malloc(C.size_t(uintptr(n)) * C.size_t(unsafe.Sizeof((*C.char)(nil))))))[:0:n]
values := (*[1 << 20]*C.char)(unsafe.Pointer(C.malloc(C.size_t(uintptr(n)) * C.size_t(unsafe.Sizeof((*C.char)(nil))))))[:0:n]
for key, vals := range req.Header {
	for _, value := range vals {
		keys = append(keys, C.CString(key))
		values = append(values, C.CString(value))
	}
}

The problem appears to be that C.malloc returns uninitialized memory. Then the initialization of that memory during append overwrites uninitialized pointer slots, and the new hybrid write barrier tries to follow the uninitialized pointers as if they were real pointers.

Prior to the hybrid barrier, the old data being overwritten for off-heap memory was never considered. Now it is. This will invalidate code like the above. It took a long time to notice, so apparently this is rare.

One workaround is to make C.malloc return zeroed memory, which would have a performance cost people might not like (especially since it doesn't seem to matter to most people?). Another workaround is to tell people to use C.calloc for allocating C data structures with pointers, or maybe rewriting just the C.malloc calls that mention pointer sizes.

Really the best fix would be for the GC not to follow these pointers. @aclements says there are some cases where do want the write barrier to follow pointers in non-managed-memory, specifically non-global off-heap runtime structures, but maybe we can distinguish these from other kinds of memory (like C memory) somehow.

If there is a simple fix and we receiver any reports of external problems along these lines, we can consider including the fix in Go 1.8.2.

@rsc rsc added this to the Go1.8.2 milestone Apr 11, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Apr 11, 2017

Also, @ianlancetaylor, do you think this code should work as far as the Go pointer rules are concerned? There are no Go-managed data structures anywhere in sight, so it seems to me like they should.

@rsc

This comment has been minimized.

Contributor

rsc commented Apr 11, 2017

/cc @RLH

@aclements

This comment has been minimized.

Member

aclements commented Apr 11, 2017

Really the best fix would be for the GC not to follow these pointers.

One downside of this is that it will increase the cost of the write barrier, since we'll have to add a check for if the slot is in the heap or a global.

@aclements says there are some cases where do want the write barrier to follow pointers in non-managed-memory, specifically non-global off-heap runtime structures, but maybe we can distinguish these from other kinds of memory (like C memory) somehow.

I'm not sure off the top of my head how many of these there are. It's possible we could manually annotate the writes to have write barriers (with a write barrier that bypasses the slot check), though that seems error-prone, especially going forward. Or we could compile all write barriers in the runtime to the slot-check-bypassing write barrier, using the knowledge that the runtime never writes a pointer to an uninitialized structure.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Apr 11, 2017

I agree that according to the current Go pointer rules this code is OK. Programs aren't permitted to store a Go pointer in C memory, and indeed that is not happening. Apparently the memory happens to contain a bit pattern that looks like a Go pointer, but it is not actually a Go pointer.

Changing C.malloc to return zeroed memory would fix this case but it wouldn't fix all cases of the general problem. It's currently permitted to call C code that allocates an array on the stack, fails to initialize it, and then calls a Go function with a pointer that the Go function uses to fill in elements. The Go function is not permitted to store Go pointers into the array, but it is permitted to store C pointers. That would invoke the write barrier on uninitialized memory from the C stack, which could potentially contain bit patterns that look like Go pointers.

We could extend the Go pointer rules to say that C memory that has a pointer type must be explicitly initialized if it is visible from Go. To make that simpler we could modify C.malloc to zero out memory, but that would only be necessary, and need only be done, when invoked with a pointer type. I don't see how to check that rule with cgocheck=1 but we could check it with cgocheck=2 (which does cgo checking at every write barrier).

@aclements

This comment has been minimized.

Member

aclements commented Apr 11, 2017

To summarize in-person discussion: we're all concerned about increasing the write barrier cost for something that appears to almost never be a problem in practice. So, we're leaning toward tweaking the rules to say something like: if Go writes to pointers in the memory, it has to be initialized (this is already the case for pointer reads; this would extend it to pointer writes) and possibly extending the cgo interface to make the rules easy to stay in (perhaps providing C.new that takes a type and zeroes memory). But we'll wait to see if this comes up again before diving into a rule change.

@bcmills

This comment has been minimized.

Member

bcmills commented Apr 16, 2017

We could probably get at least some mileage out of whether the pointer has a C type or a Go type.

It seems reasonable to require that all memory accessed through Go types must be initialized (since Go values are normally zero-inititalized), but much less so for C types (since C out-parameters are often left uninitialized in idiomatic C code). With #16623 it might be more plausible for the compiler to be aware of whether the type of a given slot is C or Go.

We could vet the C memory for Go pointers at the time of conversion to Go types (e.g. in a C.AsSlice builtin per #13656 (comment)) by pre-scanning all of the pointer slots to ensure that they are valid.

Then we could apply the more expensive write barrier only to pointers of C types, for which assignments are relatively rare (and often somewhat expensive anyway because they tend to correlate with transitions between Go and C stacks).

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Apr 16, 2017

@bcmills Perhaps, but I'm somewhat reluctant to further complicate the cgo pointer rules by adding rules that depend on the type used to access the memory, especially since when using cgo that type often varies between C code and Go code. For example, C code will often use void* where Go code will use unsafe.Pointer, so what is the correct handling when C code allocates an array of void* and calls a Go function that receives a *unsafe.Pointer and converts it to a []unsafe.Pointer? I'm not saying we can't write a rule but I'm concerned that we can't write a rule that people can actually understand and use.

@bcmills

This comment has been minimized.

Member

bcmills commented Apr 16, 2017

I think we would have to declare that unsafe.Pointer (by itself or with any number of pointer indirections) is a potentially non-Go type (that is, needs the more expensive write barrier). But it might be prudent to use a more robust write barrier for unsafe.Pointer anyway, since its use in application code increases the risk of subtle pointer errors.

So the rules would be, roughly:

  • Each pointer to a Go type must contain nil, a pointer to a valid object in the Go heap, or a pointer to a valid, fully-initialized object outside the Go heap.
  • A "Go type" for this purpose is defined as:
    • A type of the form C.t in a cgo source file is not a Go type.
    • unsafe.Pointer is not a Go type.
    • A pointer is a Go type only if its base type is a Go type.
    • An array is a Go type only if its element type is a Go type.
    • All other types are Go types. Notably:
      • A struct type declared in Go code is always a Go type.
      • A slice type is always a Go type.

Note that the code in Russ's example would be invalid under this rule: *[1 << 20]*C.char is not a Go type, but []*C.char is. (We would presumably detect it as invalid while evaluating the slice expression if GODEBUG=cgocheck is set to a sufficiently high threshold.)

I will leave it up to you to decide whether that rule is too complicated.

@rsc

This comment has been minimized.

Contributor

rsc commented Apr 17, 2017

The current type system does not distinguish between C memory and Go memory and cannot be reinterpreted to do so.

@bcmills

This comment has been minimized.

Member

bcmills commented Apr 18, 2017

@rsc I don't think anyone is proposing that the type system should distinguish between C memory and Go memory. The rule I described above constrains initialization, not object location.

It's quite possible that there is a better choice of phrases than "Go type", but it wasn't obvious to me. The general idea is that a value of a "Go type" is always initialized by a Go statement or expression, whereas a value of a "non-Go type" may or may not be.

The current type system does distinguish between pointers and non-pointers, particularly with the use of unsafe.Pointer and across cgo boundaries. In the case of C out-parameters we have a third kind of data ("non-pointer stored in a pointer type"), and the proposed rule is an attempt to fit that into the existing type system (by allowing non-pointers to be stored in pointer slots of "non-Go" types).

@bradfitz bradfitz modified the milestones: Go1.8.2, Go1.8.3 May 18, 2017

@broady

This comment has been minimized.

Member

broady commented May 23, 2017

Bumping to Go 1.8.4 (or maybe Go 1.9)

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 13, 2017

Are we just not going to fix this for Go 1.8, @aclements?

@aclements

This comment has been minimized.

Member

aclements commented Oct 13, 2017

Since we haven't received any external reports of this being a problem and there are ways to work around it, I would lean toward not fixing this for 1.8.

It's also still unclear to me what the fix would be. We decided in April that we didn't want to make the write barrier more expensive (and complicate the runtime) to detect whether the slot is outside Go-managed memory, especially without evidence that this is actually a problem. So that leaves changing the rules (and maybe making C.malloc zero memory and/or introducing C.new).

Either way, doesn't seem like a 1.8 issue, so I'm re-milestoning it to 1.10.

@aclements aclements modified the milestones: Go1.8.5, Go1.10 Oct 13, 2017

@bcmills

This comment has been minimized.

Member

bcmills commented Oct 16, 2017

So that leaves changing the rules (and maybe making C.malloc zero memory and/or introducing C.new).

I'm more concerned about the implications for passed-in buffers than for the Go C.malloc wrapper itself: it's easy enough for Go callers to use calloc explicitly if they know they need to, but if we're exporting a Go API to C, it won't necessarily be obvious what kind of transitive zero-initialization requirements that would impose on C callers.

@odeke-em

This comment has been minimized.

Member

odeke-em commented Jul 7, 2018

Hello @aclements, how's it going? Neither have we received any reports, in between the Go1.9 to Go1.11 cycles. Perhaps do you think we should bump this for Go1.12 or later?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 11, 2018

Another likely example of this problem in #26288.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 11, 2018

#26288 is in fact exactly the kind of case @bcmills discussed in #19928 (comment) .

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 11, 2018

Perhaps we could partially alleviate, if not actually fix, this problem by adding an additional check step before we complain about a bad pointer. We can check where we found the bad pointer: if we found it in memory that is not on the Go heap, then we just discard it.

@RLH

This comment has been minimized.

Contributor

RLH commented Jul 11, 2018

The write barrier currently discards the slot before this check is done. We would have to add the slot to the write barrier buffer or perhaps do the check earlier. Either way comes with some per-write barrier cost.

@chfast

This comment has been minimized.

chfast commented Jul 12, 2018

Please at least add a not about this issue to the cgo documentation.

@aclements

This comment has been minimized.

Member

aclements commented Jul 12, 2018

The write barrier currently discards the slot before this check is done. We would have to add the slot to the write barrier buffer or perhaps do the check earlier. Either way comes with some per-write barrier cost.

Thinking about this a bit more, here's a possibility: rather than recording the "old" and "new" in the write barrier buffer, record "old" and "slot". This may actually make the write barrier fast path slightly faster since it un-constrains one register around all pointer writes. Then, during flushing, we read the "new" value from the recorded slot. This is likely to be slightly slower since it's less likely to be in cache at this point (though only slightly). We have to resolve this address to an object anyway, so if this resolution fails, we further scrutinize the slot address. If the slot address is outside Go memory, we also ignore the corresponding "old" pointer. This leans on the rule that you're not allowed to write a Go pointer into C memory.

One remaining loose end would be nil pointers. Is Go code allowed to write a nil pointer into C memory? If so, do we need to scrutinize the slot address when we see a nil write?

@bcmills

This comment has been minimized.

Member

bcmills commented Jul 12, 2018

Is Go code allowed to write a nil pointer into C memory?

Yes.

If so, do we need to scrutinize the slot address when we see a nil write?

Not sure.

@aclements

This comment has been minimized.

Member

aclements commented Jul 12, 2018

To be a little more concrete, here's pseudo-code for what I'm proposing:

have old, slot from write barrier buffer
new := *slot
newObj := findObject(new)
if newObj == nil {
    if findObject(slot) == nil && slot is not in Go data segment {
        Non-Go pointer stored into non-Go memory. Ignore new and old.
        continue
    }
}
shade(newObj)
shade(findObject(old))

Notably, this never inspects old or tries to guess whether it looks like a garbage value. Pointer writes that could lead to a garbage old are discarded on the basis of writing a non-Go pointer (the new value) into non-Go memory (the slot address). We don't even try to shade old.

Unfortunately, while this avoids the slow-path slot check for most non-nil non-C pointer writes, it still triggers that path for writes of nil to Go memory, which are obviously very common, as well as writes of pointers to Go globals into Go heap memory.

@gopherbot

This comment has been minimized.

gopherbot commented Jul 12, 2018

Change https://golang.org/cl/123658 mentions this issue: cmd/cgo: add note about bug writing C pointers to uninitialized C memory

@chfast

This comment has been minimized.

chfast commented Jul 12, 2018

Please note that a C function struct S f() is transformed by the compiler to struct S* f(struct S*) and actually called as

struct S _;
f(&_);

Users don't have control over it, so this means that returning C structs from "exported" Go functions can also cause troubles.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 12, 2018

@chfast I'm likely missing something but I don't see how that can cause a problem. The transformation that you describe is done implicitly by the C compiler. The problems described here only arise in Go code. The Go code is going to return a struct in the usual way for Go code, with the usual zero initialization of the result as neeed. The fact that the C code will then copy that into uninitialized C memory doesn't matter.

@chfast

This comment has been minimized.

chfast commented Jul 12, 2018

Sorry, I incorrectly assumed that "exported" Go functions have C ABI.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 12, 2018

@chfast Behind the scenes, C code that calls an exported Go function actually calls a C wrapper, generated by cgo, that calls the Go function using the Go ABI and translates the results into the C ABI.

gopherbot pushed a commit that referenced this issue Jul 16, 2018

cmd/cgo: add note about bug writing C pointers to uninitialized C memory
Describe the problem as a bug, since it is not implied by the rest of
the pointer passing rules, and it may be possible to fix it.

Updates #19928

Change-Id: I2d336e7336b2a215c0b8cf909a203201ef1b054e
Reviewed-on: https://go-review.googlesource.com/123658
Reviewed-by: Austin Clements <austin@google.com>
@rsc

This comment has been minimized.

Contributor

rsc commented Aug 17, 2018

Nothing more here for Go 1.11.

@rsc rsc modified the milestones: Go1.11, Go1.12 Aug 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment