Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cgo: jmethodID/jfieldID is not mapped to uintptr if building with the Android NDK #40954

Closed
steeve opened this issue Aug 21, 2020 · 35 comments · May be fixed by #40955
Closed

cmd/cgo: jmethodID/jfieldID is not mapped to uintptr if building with the Android NDK #40954

steeve opened this issue Aug 21, 2020 · 35 comments · May be fixed by #40955
Milestone

Comments

@steeve
Copy link
Contributor

@steeve steeve commented Aug 21, 2020

This is a variant of #26213

What version of Go are you using (go version)?

$ go version
go version go1.13.8 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/steeve/Library/Caches/go-build"
GOENV="/Users/steeve/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/steeve/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14.5/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14.5/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/steeve/code/github.com/znly/zenly-client-core/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/td/p8xt8d953g37w9f841byty940000gn/T/go-build199341614=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package blah

/*
#include <jni.h>
*/
import "C"

var blah C.jmethodID = 0
var bleh C.jfieldID = 0

with the Android NDK:

$ CC=/path/to/ndk-standalone-x86_64/bin/clang CGO_ENABLED=1 GOOS=android go build program.go

What did you expect to see?

No error.

What did you see instead?

# command-line-arguments
./program.go:8: cannot use 0 (type int) as type _Ctype_jmethodID in assignment

Notes

This issue is the same as #26213, only with jmethodID and jfieldID. It seems the bug was always there, but for some reason it now reliably crashes on Android 11 with the following:

08-21 16:33:21.842 15438     0 E Go      : runtime: bad pointer in frame github.com/mypkg.func1 at 0x40000bfc70: 0x71
08-21 16:33:21.842 15438     0 E Go      : fatal error: invalid pointer found on stack
08-21 16:33:21.898 15438     0 E Go      : 
08-21 16:33:21.898 15438     0 E Go      : runtime stack:
08-21 16:33:21.899 15438     0 E Go      : runtime.throw(0x7b60f3d00a, 0x1e)
08-21 16:33:21.899 15438     0 E Go      :  GOROOT/src/runtime/panic.go:847 +0x4c fp=0x7b4c4ab3e0 sp=0x7b4c4ab3b0 pc=0x7b612d594c
08-21 16:33:21.899 15438     0 E Go      : runtime.adjustpointers(0x40000bfc68, 0x7b4c4ab4c8, 0x7b4c4ab8b8, 0x7b62c0acd0, 0x7b630d0780)
08-21 16:33:21.899 15438     0 E Go      :  GOROOT/src/runtime/stack.go:591 +0x1dc fp=0x7b4c4ab430 sp=0x7b4c4ab3e0 pc=0x7b612ecc7c
08-21 16:33:21.899 15438     0 E Go      : runtime.adjustframe(0x7b4c4ab7c0, 0x7b4c4ab8b8, 0x7b630d0780)
08-21 16:33:21.899 15438     0 E Go      :  GOROOT/src/runtime/stack.go:633 +0x260 fp=0x7b4c4ab4f0 sp=0x7b4c4ab430 pc=0x7b612ecef0
08-21 16:33:21.899 15438     0 E Go      : runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x40000b3500, 0x0, 0x0, 0x7fffffff, 0x7b625ed288, 0x7b4c4ab8b8, 0x0, ...)
08-21 16:33:21.899 15438     0 E Go      :  GOROOT/src/runtime/traceback.go:334 +0xd4c fp=0x7b4c4ab820 sp=0x7b4c4ab4f0 pc=0x7b612f705c
08-21 16:33:21.899 15438     0 E Go      : runtime.copystack(0x40000b3500, 0x1000, 0x7ca8b3bc01)
08-21 16:33:21.899 15438     0 E Go      :  GOROOT/src/runtime/stack.go:886 +0x1b4 fp=0x7b4c4ab9e0 sp=0x7b4c4ab820 pc=0x7b612ed484
08-21 16:33:21.899 15438     0 E Go      : runtime.newstack()
08-21 16:33:21.899 15438     0 E Go      :  GOROOT/src/runtime/stack.go:1055 +0x24c fp=0x7b4c4abb80 sp=0x7b4c4ab9e0 pc=0x7b612ed7bc
08-21 16:33:21.899 15438     0 E Go      : runtime.morestack()
08-21 16:33:21.899 15438     0 E Go      :  src/runtime/asm_arm64.s:310 +0x70 fp=0x7b4c4abb80 sp=0x7b4c4abb80 pc=0x7b613003b0

We are running a modified 1.13.8 and adding jmethodID and jfieldID to the list of badJNI types fixes the issue.

They are defined like so in Android NDK:

struct _jfieldID;                       /* opaque structure */
typedef struct _jfieldID* jfieldID;     /* field IDs */

struct _jmethodID;                      /* opaque structure */
typedef struct _jmethodID* jmethodID;   /* method IDs */

I'll open a PR with the patch.

steeve added a commit to znly/go that referenced this issue Aug 21, 2020
In Android's NDK, jmethodID and jfieldID are defined as:

	struct _jfieldID;                       /* opaque structure */
	typedef struct _jfieldID* jfieldID;     /* field IDs */
	struct _jmethodID;                      /* opaque structure */
	typedef struct _jmethodID* jmethodID;   /* method IDs */

Since Go seems them as pointers, a crash can occur at runtime if the GC
tries to dereference those pointers. This is very reproducible in
Android 11 for some reason.

Add those two types to the badJNI types and rework the check logic a
bit.

Fixes golang#40954

Signed-off-by: Steeve Morin <steeve@zen.ly>
@gopherbot
Copy link

@gopherbot gopherbot commented Aug 21, 2020

Change https://golang.org/cl/249744 mentions this issue: cmd/cgo: add JNI's jmethodID/jfieldID to uintptr for Android NDK

steeve added a commit to znly/go that referenced this issue Aug 21, 2020
In Android's NDK, jmethodID and jfieldID are defined as:

	struct _jfieldID;                       /* opaque structure */
	typedef struct _jfieldID* jfieldID;     /* field IDs */
	struct _jmethodID;                      /* opaque structure */
	typedef struct _jmethodID* jmethodID;   /* method IDs */

Since Go seems them as pointers, a crash can occur at runtime if the GC
tries to dereference those pointers. This is very reproducible in
Android 11 for some reason.

Add those two types to the badJNI types and rework the check logic a
bit.

Fixes golang#40954

Signed-off-by: Steeve Morin <steeve@zen.ly>
@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Aug 21, 2020

Nice find. Consider backporting this to the Go 1.15 (and Go 1.14?) given that it's easily reproduced on Android 11, which is due soon.

This issue is the same as #26213, only with jmethodID and jfieldID. It seems the bug was always there, but for some reason it now reliably crashes on Android 11 with the following:

Perhaps jmethodID/jfieldID values became IDs, not pointers, in Android 11?

@ianlancetaylor we keep having to whack these moles, each time leading to an API change. In particular, comparing with nil must change to comparing with 0. I'm sure there are many other C APIs out there that misuse pointers like this. Can we handle them better in Cgo?

@steeve
Copy link
Contributor Author

@steeve steeve commented Aug 21, 2020

I am indeed in strong favor of a backport in Go 1.15 and Go 1.14 since this is a crasher and Android 11 will be out soon (early Sept if we look at the previous versions)

For what it's worth, I just cherry-picked the commit as-is in our Go 1.13 branch.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 21, 2020

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 21, 2020

In Go it's absolutely essential that we be able to distinguish a pointer and an integer. I don't see a way around that.

In cgo we do want C names that are typedefs for pointer types to be usable in Go, and it seems natural to use them as pointers. I don't see a way around that.

I don't fully understand the problem but I don't see any available options. Go requires completely strict typing for pointers, and C does not.

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 21, 2020

Yes, it looks like we have to add those two types to the list that need to use uintptr representation.

Does anyone have a pointer to code in Android 11 that makes these pointer-not-pointers? I'm curious what changed and why.

Longer term:

It would be nice if we could distinguish types that are intended to be opaque from ones that are not. The JNI headers do things like:

struct S; // opaque
typedef struct S* T; 

If we could detect pointers like this, ones where there's no possible way you could dereference them, or allocate one in Go (what does cgo do with S? Can you do new(C.S)?), we could autotranslate them to uintptr instead of *C.S or unsafe.Pointer or whatever. There would be a one-time pain of lots of nil->0 conversions on the Go side when we did that, but then maybe not many more going forward.

The other option is to make a special cgo type that's known to the compiler. Syntactically it looks like a pointer (initialized with nil, etc.) but doesn't get added to any pointer maps. I'm not sure how that would work, but maybe possible. It has the same problem as the previous idea though, in that once we have one of these special types we have to make sure they can't be allocated in the Go heap.

What about go:notinheap? We have that comment already, which currently just ensures that we never allocate one in the heap (new(T) fails at compile time if T is marked that way). What if, in addition, it means that pointers to go:notinheap types don't get recorded as pointers in pointer bitmaps? Maybe then we can annotate the appropriate cgo types with go:notinheap.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 22, 2020

Change https://golang.org/cl/249917 mentions this issue: cmd/compile: don't allow go:notinheap on the heap or stack

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Aug 23, 2020

I filed #40507 a little bit ago to look at incomplete struct/union types. I think //go:notinheap is a decent approximation of this (and useful as an optimization, since it allows removing write barriers and GC bits), but it won't prevent users from declaring incomplete C types as global variables.

@cagedmantis cagedmantis added this to the Go1.16 milestone Aug 24, 2020
@gopherbot gopherbot closed this in d9a6bdf Aug 25, 2020
@randall77 randall77 reopened this Aug 25, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 25, 2020

@randall77 Nice idea, by the way.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 26, 2020

Change https://golang.org/cl/250939 mentions this issue: cmd/compile: allow aliases to go:notinheap types

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 26, 2020

Change https://golang.org/cl/250940 mentions this issue: cmd/cgo: use go:notinheap for anonymous structs

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 26, 2020

I think my series of CLs solves this issue, and maybe future such issues going forward as well.
The only thing I worry about is that these CLs are a bit subtle, so backporting them is a nontrivial risk. Do we need to backport? Is there another possible workaround? Can Android 11 wait for 1.16 to come out?

@steeve
Copy link
Contributor Author

@steeve steeve commented Aug 27, 2020

I would advise against waiting for 1.16, as Android 11 is due in a few weeks and existing programs may be crashing (at least ours does).

Also, CL 249744 could be easily backported in 1.15 and 1.14 while the go:notinheap fixes it for good in 1.16. WDYT ?

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 27, 2020

The problem with CL 249744 is that the initializers of jfieldID and jmethodID need to be rewritten from nil to 0. Then for 1.16 they'd need to be rewritten back to nil. That's the kind of code churn we really don't want to put users through.

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Aug 27, 2020

We could do both in Go 1.16 (convert jfieldID and jmethodID to uintptr, and your notinheap fix). That's once churn (but still in a point release).

However, I'd love for all the special cases (JNI types, CFTypes, EGL* types) to go away now that the longer term fix is in. How about doing CL 249744 for Go 1.15.1 and then remove all special cases at once in Go 1.16? That's more churn, but at least we're getting rid of a Cgo wart. If that's too drastic, only remove the special cases when "go 1.16" or later is in go.mod.

Note that code may not need to be rewritten, if you're careful. For example, I'm using var nilEGLDisplay C.EGLDisplay to compare with the zero value, not 0 or nil, to be compatible across the release that changed the types for EGLDisplay. See https://git.sr.ht/~eliasnaur/gio/tree/master/app/internal/egl/egl.go#L38.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Aug 27, 2020

The problem with CL 249744 is that the initializers of jfieldID and jmethodID need to be rewritten from nil to 0. Then for 1.16 they'd need to be rewritten back to nil. That's the kind of code churn we really don't want to put users through.

We could make 0 (the untyped constant) assignable to pointer types (like in C) or make nil assignable to uintptr, as needed. This would be most robust in the compiler, but we could probably recognize some common cases even just within cgo.

BTW, what should checkptr semantics be for notinheap types? (Didn't have to worry about this before when it was runtime only.) I think converting *nih to unsafe.Pointer needs to be an error if it points into the heap. What about the reverse?

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 27, 2020

BTW, what should checkptr semantics be for notinheap types? (Didn't have to worry about this before when it was runtime only.) I think converting *nih to unsafe.Pointer needs to be an error if it points into the heap. What about the reverse?

Converting *nih to unsafe.Pointer, sounds bad regardless of the target. Just because it doesn't point into the heap currently, doesn't mean it won't in the future (if the heap grows). But we probably want the exception that uintptr(unsafe.Pointer(x)) where x is *nih should be ok. (Not sure if anyone would use that, but maybe.)

Going from unsafe.Pointer to *nih should be ok, if the unsafe.Pointer wasn't pointing to the heap (e.g. it pointed to a global).

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Aug 27, 2020

Just because it doesn't point into the heap currently, doesn't mean it won't in the future (if the heap grows).

Yeah, though that's a concern for uintptr to unsafe.Pointer conversions too, which checkptr doesn't disallow currently. I think disallowing values that point into the currently reserved heap arena would be reasonable; proactively guarding in case of heap growth seems like it's going to have false positives though.

I think if we want to detect pointers becoming heap pointers due to heap growth, it would be more robust to detect that during heap growth itself. E.g., allocate the new heap memory, but flag it as not quite ready yet; trigger a full GC run; and during heap marking, crash if we find any pointers that point into this newly allocated memory.

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 27, 2020

Change https://golang.org/cl/251158 mentions this issue: cmd/compile: make go:notinheap error message friendlier for cgo

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 27, 2020

Yeah, though that's a concern for uintptr to unsafe.Pointer conversions too, which checkptr doesn't disallow currently. I think disallowing values that point into the currently reserved heap arena would be reasonable; proactively guarding in case of heap growth seems like it's going to have false positives though.

Yes, I guess that makes sense. The false negative I was worried about seems very rare.

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 6, 2020

Gentle ping. What to do for Go 1.15, and can we get rid of the special cases in Go 1.16 with go:notinheap?

@eliasnaur
Copy link
Contributor

@eliasnaur eliasnaur commented Sep 8, 2020

@steeve
Copy link
Contributor Author

@steeve steeve commented Sep 10, 2020

Gentle ping too. I'm in favour of @eliasnaur's plan: merge this one for 1.15 and go:notinheap for 1.16. Go 1.15.2 just got released and it would have been pretty easy to have this one in it, even 1.15.1.
It's also pretty easy to back port on 1.14.

After all it's fixing a very real crash, and Android 11 is out now.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 10, 2020

I think the suggested plan is:

  1. change cgo to treat jmethodID and jfieldID like jobject (https://golang.org/cl/249744)
  2. backport that change to the 1.15 release branch and possibly the 1.14 release branch
  3. change cgo (in 1.16 only) to mark empty structs as notinheap (https://golang.org/cl/250940)

Does that sound right to everyone?

@randall77 Any objections to this plan?

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 10, 2020

I was kind of hoping that we could backport #3 (consisting of CLs 250939, 250940, and maybe 251158). That way no user would have to change their code (nil -> 0 initializer is the main pain point).
If we're uncomfortable backporting those CLs, then your suggested plan is fine. It would mean jmethodID and jfieldID would need to be handled as "legacy" bad pointers instead of "improved" bad pointers.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 11, 2020

I'm OK with backporting those three CLs to the 1.15 release branch.

@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 11, 2020

Ok, that would be better. I worry that just doing #1 on the 1.15 release branch means we would make code that did var x C.jmethodID = nil not compile any more. That seems ok for a major release but not a point release.

gopherbot pushed a commit that referenced this issue Sep 16, 2020
The alias doesn't need to be marked go:notinheap. It gets its
notinheap-ness from the target type.

Without this change, the type alias test in the notinheap.go file
generates these two errors:

notinheap.go:62: misplaced compiler directive
notinheap.go:63: type nih must be go:notinheap

The first is a result of go:notinheap pragmas not applying
to type alias declarations.
The second is the result of then trying to match the notinheap-ness
of the alias and the target type.

Add a few more go:notinheap tests while we are here.

Update #40954

Change-Id: I067ec47698df6e9e593e080d67796fd05a1d480f
Reviewed-on: https://go-review.googlesource.com/c/go/+/250939
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Trust: Keith Randall <khr@golang.org>
@gopherbot gopherbot closed this in 42b023d Sep 16, 2020
gopherbot pushed a commit that referenced this issue Sep 16, 2020
Update #40954

Change-Id: Ifaab7349631ccb12fc892882bbdf7f0ebf3d845f
Reviewed-on: https://go-review.googlesource.com/c/go/+/251158
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Keith Randall <khr@golang.org>
@randall77
Copy link
Contributor

@randall77 randall77 commented Sep 16, 2020

@gopherbot please open a backport issue for 1.15.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 16, 2020

Backport issue(s) opened: #41432 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 16, 2020

Change https://golang.org/cl/255321 mentions this issue: cmd/cgo: use go:notinheap for anonymous structs

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 17, 2020

Change https://golang.org/cl/255637 mentions this issue: cmd/compile: propagate go:notinheap implicitly

gopherbot pushed a commit that referenced this issue Sep 17, 2020
//go:notinheap
type T int

type U T

We already correctly propagate the notinheap-ness of T to U.  But we
have an assertion in the typechecker that if there's no explicit
//go:notinheap associated with U, then report an error. Get rid of
that error so that implicit propagation is allowed.

Adjust the tests so that we make sure that uses of types like U
do correctly report an error when U is used in a context that might
cause a Go heap allocation.

Fixes #41451

Update #40954
Update #41432

Change-Id: I1692bc7cceff21ebb3f557f3748812a40887118d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255637
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2020

Change https://golang.org/cl/255337 mentions this issue: cmd/compile: allow aliases to go:notinheap types

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2020

Change https://golang.org/cl/255320 mentions this issue: cmd/compile: don't allow go:notinheap on the heap or stack

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 18, 2020

Change https://golang.org/cl/255338 mentions this issue: cmd/compile: make go:notinheap error message friendlier for cgo

@steeve
Copy link
Contributor Author

@steeve steeve commented Sep 19, 2020

Thank again folks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.