New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cgo: make identical C types identical Go types across packages #13467

Open
bcmills opened this Issue Dec 3, 2015 · 53 comments

Comments

Projects
None yet
@bcmills
Member

bcmills commented Dec 3, 2015

https://golang.org/cmd/cgo/ says:

Cgo translates C types into equivalent unexported Go types. Because the translations are unexported, a Go package should not expose C types in its exported API: a C type used in one Go package is different from the same C type used in another.

While that's a convenient workaround for allowing access to struct fields and other names that would otherwise be inaccessible as Go identifiers, it greatly complicates the process of writing Go APIs for export to C callers. The Go code to produce and/or manipulate C values must be essentially confined to a single package.

It would be nice to remove that restriction: instead of treating C types as unexported local types, we should treat them as exported types in the "C" package (and similarly export the lower-case names that would otherwise be unexported).

@ianlancetaylor ianlancetaylor changed the title from cgo: make identical C types identical Go types across packages to cmd/cgo: make identical C types identical Go types across packages Dec 3, 2015

@mdempsky

This comment has been minimized.

Member

mdempsky commented Dec 3, 2015

Somewhat related: does C require types to be defined the same way across translation units? Reading through C99, it seems to only require that objects and functions with external linkage need to have the same type across translation units (6.2.7), but I can't find anything per se that disallows for example "typedef int foo;" in one translation unit and "typedef unsigned foo;" in another (assuming they don't in turn lead to incompatible object/function declarations).

(Not to say that cgo needs to support that.)

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 3, 2015

C and C++ permit the same name to designate different types in different compilation units.

@joegrasse

This comment has been minimized.

joegrasse commented Dec 8, 2015

I definitely agree that this restriction should be lifted. It makes it very hard to break up your code to promote maintainability. It would make more sense for a C.int, etc to be a C.int everywhere, just as an int is an int everywhere.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 28, 2015

I don't believe we should lift this restriction. It is explicitly not a goal to make it possible to expose C types directly in Go package APIs. As Ian said, it's not even clear this is sound.

@rsc rsc closed this Dec 28, 2015

@bcmills

This comment has been minimized.

Member

bcmills commented Jan 4, 2016

It is explicitly not a goal to make it possible to expose C types directly in Go package APIs.

The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions and/or Go packages that export C APIs with richer structure than primitive pointers and integers. (For example: one might want to return a protocol buffer from a Go function to a C or C++ caller without making more than one copy of the marshaled data. That operation is complex enough that it needs a support library, and because it needs to manipulate Go types it must be written in Go.)

As Ian said, it's not even clear this is sound.

The request is to make "identical" C types identical Go types, not to make C types "with the same name" identical Go types. I believe it is sound provided that we enforce that the C types are actually identical.

@rsc

This comment has been minimized.

Contributor

rsc commented Jan 4, 2016

OK, I'm happy to reopen this, but I have no idea how to do it. It seems fundamentally at odds with Go's package system. I'm happy to look at implementation proposals though.

@rsc rsc reopened this Jan 4, 2016

@rsc rsc added this to the Unplanned milestone Jan 4, 2016

@14rcole

This comment has been minimized.

14rcole commented Jun 27, 2016

For the time being, is there a workaround for this? Maybe using an interface that can represent the same struct from different packages?

@bcmills

This comment has been minimized.

Member

bcmills commented Jun 27, 2016

You can work around it in one direction (going from the C types in another package to the C types in the current package) using reflect and unsafe.Pointer. The same technique may be possible in the other direction too.

If you want to add back some of the type-safety at run time, you can use the reflect package to iterate over the struct fields to verify that they're compatible.

@joegrasse

This comment has been minimized.

joegrasse commented Dec 2, 2016

Just pondering if aliases or whatever comes out of #18130 would help with this problem.

@bcmills

This comment has been minimized.

Member

bcmills commented Dec 2, 2016

@joegrasse I don't think type aliases per se would help with the general problem: either way, you end up needing one "canonical definition" for each type, and if you've already got a canonical definition then you don't need to be able to refer to it by different names.

However, it might at least solve the subproblem of making the Go types for C typedefs have the same aliasing structure as the C types. (I'm honestly not sure whether that's currently the case: it hadn't even occurred to me to check.)

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 6, 2016

@joegrasse, probably not, but if we do #16623 (let compiler know more about cgo) then the compiler would be in a position to resolve this, if we wanted to.

(Fixed issue number, sorry.)

@joegrasse

This comment has been minimized.

joegrasse commented Dec 6, 2016

@rsc, do you mind double checking the issue number. I think you might have mistyped it.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Dec 6, 2016

@joegrasse Russ meant #16623.

@joegrasse

This comment has been minimized.

joegrasse commented Dec 6, 2016

Thanks

@14rcole

This comment has been minimized.

14rcole commented Dec 6, 2016

@bcmills Forgive my backtracking, but I don't understand why aliasing wouldn't solve the problem. I thought that the problem with C structs in Go was that the compiler views a C struct within a package differently as the same C struct outside of the package. Therefore you don't have a "cannonical definition" for that type. Wouldn't aliasing the C struct as a Go struct help with the problem?

@bcmills

This comment has been minimized.

Member

bcmills commented Dec 6, 2016

@14rcole Consider this program:

foo.h:

typedef struct {
  int i;
} Foo;

foo/foo.go:

package foo

// #include "foo.h"
import "C"

func Frozzle(x *C.Foo) {
  …
}

bar.h:

typedef struct {
  int i;
} Bar;

bar/bar.go:

package bar

// #include "bar.h"
import "C"

func Bozzle(y *C.Bar) { foo.Frozzle(y) }

This program should compile: C.Foo in package bar is the same C type (a typedef of a struct with the same definition) as C.Bar in package foo. However, that would require cgo to write the definitions of C.Foo and C.Bar such that they are aliases for the same underlying type. Since the type includes the x field which is currently unexported, there is no package in which that type could be defined.

There are other possible ways to solve the problem (e.g. by rewriting field names so that they are always exported and combining all of the C declarations into one package), but they involve more than just a suitable application of aliases.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 6, 2016

Also, an alias has to exist in one package P and point to another package Q. That implies P imports Q (to point at it). In the general version of the problem in this issue, both P and Q define some C type and don't know about each other at all. Then some other package M (for main) imports both and tries to mix one with the other. There's no way for aliases per se to solve this problem, because P and Q need to continue not knowing about each other, and M can't change the definitions in P and Q.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Dec 6, 2016

Following the latest proposed solution in #16623 (but not strictly dependent upon it), the compilers could treat declarations for _Cfoo_bar as though they were declared from a synthetic "C" package. I believe we could also easily turn off symbol visibility rules for this package (e.g., so that lowercase struct fields are still accessible).

Then I think usual type identity rules would just work as desired, and usual type-reexporting information would help to catch ODR (one-definition rule) violations across C compilation units.

@bcmills

This comment has been minimized.

Member

bcmills commented Dec 6, 2016

@mdempsky

the compilers could treat declarations for _Cfoo_bar as though they were declared from a synthetic "C" package

It can't be just one package, unfortunately. In a valid program, a type C.X can legitimately mean two different things in two different compilation units.

We could perhaps do some sort of name-mangling to disambiguate, though. For example, we could encode the complete C type definition in the mangled name and have the compiler treat all C-mangled names as being in the same package.

The remaining concern with that approach is what to do with reflect. (If we've mangled the names to avoid collisions, should reflect report the mangled names, the colliding names, or something else entirely?)

@minux

This comment has been minimized.

Member

minux commented Dec 7, 2016

@joegrasse

This comment has been minimized.

joegrasse commented Dec 7, 2016

Here is a very contrived example of how I came across this issue. I believe it to be a more simplistic case then what @bcmills and @rsc have both discussed above (although I could be wrong).

Consider the package:
cp/cp.go

package cp

import "C"

func CTest(name *C.char) string {
	return C.GoString(name)
}

and the program:
ct/main.go

package main

import (
	"fmt"

	"cp"
)
import "C"

func main() {
	s := C.CString("Hello World") // This needs to be freed later
	fmt.Println(cp.CTest(s))
}

When you try and build ct/main.go, you get the following error.

# ct
./main.go:12: cannot use s (type *C.char) as type *cp.C.char in argument to cp.CTest

Before coming across this isssue, I would have thought that this program should compile, because cp.CTest takes a *C.char and s is a *C.char. I am not creating any new types, just using the basic C char type. For some reason though, *C.char in package cp becomes a *cp.C.char.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 7, 2016

After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.

@AlexRouSg

This comment has been minimized.

Contributor

AlexRouSg commented Jun 1, 2017

After #16623 we can figure out what the semantics should be here. It could be that we only support this for built-in C types like char/int/etc.

What about for types in external C libs? For external libs we can be pretty sure C.foo will always be C.foo, maybe even prefix it MyLib so like C.MyLib.foo or C.MyLib_foo?

@asottile

This comment has been minimized.

Contributor

asottile commented Sep 8, 2017

I've opened #21809

@gopherbot

This comment has been minimized.

gopherbot commented Sep 12, 2017

Change https://golang.org/cl/63277 mentions this issue: cmd/cgo: use type aliases for primitive types

@gopherbot

This comment has been minimized.

gopherbot commented Sep 12, 2017

Change https://golang.org/cl/63276 mentions this issue: misc/cgo/errors: port test.bash to Go

gopherbot pushed a commit that referenced this issue Sep 13, 2017

misc/cgo/errors: port test.bash to Go
This makes the test easier to run in isolation and easier to change,
and simplifies the code to run the tests in parallel.

updates #13467

Change-Id: I5622b5cc98276970347da18e95d071dbca3c5cc1
Reviewed-on: https://go-review.googlesource.com/63276
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

gopherbot commented Sep 13, 2017

Change https://golang.org/cl/63692 mentions this issue: errors_test: fix erroneous regexp detection

@gopherbot

This comment has been minimized.

gopherbot commented Sep 13, 2017

Change https://golang.org/cl/63730 mentions this issue: misc/cgo/errors: test that the Go rune type is not identical to C.int

gopherbot pushed a commit that referenced this issue Sep 14, 2017

misc/cgo/errors: fix erroneous regexp detection
I had passed 1 instead of 2 to the SplitAfterN call in
errorstest.check, so all of the cases were erroneously falling through
to the non-regexp case (and passing even if the actual error didn't
match).

Now, we use bytes.HasSuffix to check for the non-regexp case, so we
will not incorrectly match a regexp comment to the non-regexp case.

updates #13467

Change-Id: Ia6be928a495425f2b7bae5001bd01346e115dcfa
Reviewed-on: https://go-review.googlesource.com/63692
Reviewed-by: Ian Lance Taylor <iant@golang.org>

gopherbot pushed a commit that referenced this issue Sep 14, 2017

misc/cgo/errors: test that the Go rune type is not identical to C.int
rune has a well-defined size, but C.int is implementation-specified.
Using one as the other should require an explicit conversion.

updates #13467

Change-Id: I53ab2478427dca790efdcc197f6b8d9fbfbd1847
Reviewed-on: https://go-review.googlesource.com/63730
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills

This comment has been minimized.

Member

bcmills commented Sep 16, 2017

I think I have a partial solution to this for struct and union types. As expected, Go type aliases are the key.

Caveats:

  • All of the C struct fields must begin with a capital letter (so that they become exported fields in Go).
  • The C types must either be complete in both packages or incomplete in both packages (so that they have the same size and fields in both packages).
  • We need to figure out a solution for primitives to make it work: otherwise the individual field types will conflict.

We start by defining each converted type as an alias for its underlying Go struct type. Now the same types are identical, but too many types are identical: types with the same layout but different C tags are erroneously aliased to the same type.

To fix that problem, we can use Go struct field tags to encode the C struct tags! Because struct tags still count for type identity (#16085 notwithstanding), if we apply a Go tag containing the C tag to the first field on each struct, the two Go types will be mutually convertible but not identical. If the Go struct type does not have any fields, we add a zero-size field named _ and apply the tag to that.

https://play.golang.org/p/Dq9icy_BlH illustrates the general approach.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 16, 2017

A simple solution for primitives would be to add some package to the standard library containing declarations for all of the C types:

package ctypes

// #cgo CGO_NOALIAS=1
import "C"

type (
	Int = C.int
	Uint = C.uint
	…
)

Then cgo would be able to rewrite all of the local types to be aliases for that:

package usercode

import "ctypes"

type (
	_Ctype_int = ctypes.Int
	...
)

For certain C types with sizes defined by the standard (e.g., int32_t), we would instead emit typedefs directly to the corresponding Go types (e.g., int32).

@AlexRouSg

This comment has been minimized.

Contributor

AlexRouSg commented Sep 16, 2017

@bcmills
Do you think there would be a workaround for the all fields must start with a capital letter requirement? Cause passing around third party C structs where you can't rename the fields could be very useful.

Maybe have a tag to tell cgo to caps the first letter in go?

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 19, 2017

Do you think there would be a workaround for the all fields must start with a capital letter requirement?

The only alternative I can see, short of renaming fields, would be a language change to allow lower-case names to be exported anyway, which would potentially require changes in associated tooling (godoc, the ast` package, and likely others).

I doubt that this use-case is compelling enough to justify such a change.

Maybe have a tag to tell cgo to caps the first letter in go?

That could work, or to add a prefix (such as "C_") to each field. That wouldn't be source-compatible with existing cgo files, but it could be viable as an explicit option (e.g. specified in the cgo prelude).

@rsc

This comment has been minimized.

Contributor

rsc commented Sep 19, 2017

I'm unclear about what problem people are talking about solving at this point.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 19, 2017

I'm unclear about what problem people are talking about solving at this point.

The same one this issue has always been about: making identical C types (in a cgo-using Go program) identical Go types across Go packages.

Comment 329947340 addresses the subproblem of translating identical numeric C types to identical Go types.

Comment 329946826 addresses the subproblem of translating identical struct and union C types to identical Go types. The solution proposed in that comment requires that the names of the C members start with a capital letter (so that they becomes exported fields of the Go struct type). Workarounds for that requirement are discussed in comments 329969062 and 330593346.

@rsc

This comment has been minimized.

Contributor

rsc commented Sep 20, 2017

I have no interest in solving the general problem, nor in the associated complexity. Packages should not be exporting, say, *C.FILE in their APIs. If two different packages export *C.FILE and those are different types, that's OK.

I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 20, 2017

Oh, I see what you're saying now. I tried to address that question in comment 168719378, but apparently I was not convincing enough.

I am slightly more sympathetic to *C.char, but even there I don't understand why the package API doesn't just use appropriate Go types instead (like []byte).

*C.char is honestly one of the least problematic types, because it already loosely corresponds to at least three idiomatic Go types ([]byte, *byte, or unsafe.Pointer, depending on usage).

C.long is a better example for a primitive, because there is no Go type to which it portably corresponds.

Packages should not be exporting, say, *C.FILE in their APIs.

Agreed. As I noted previously, “The point of this request is not to write general-purpose Go packages using C types. It is to enable the creation of support libraries for Go packages that call C functions.”

To give some concrete examples:

  • If the package returns the Go type time.Time, it may need a way to obtain one from a *C.struct_tm.
  • If the package returns the Go type *os.File, it may need a way to obtain one from a *C.FILE.
  • If the package returns the Go type string, it may need a way to obtain one from a *C.wchar_t.
  • If the package uses the Go x/text libraries, it may need a way to convert those types to and from a *C.struct_lconv.
  • If the package returns a Go proto.Message, it may need to obtain one from a *C.ProtobufCMessage

...and so on. Most of these conversions involve struct types and require non-trivial boilerplate, and some are quite subtle to implement correctly.

At the moment, either each package must implement its own copy of these conversions (inefficient and error-prone), or the exported API of the conversion helper-package must rely on error-prone unsafe.Pointer conversions.

@joegrasse

This comment has been minimized.

joegrasse commented Sep 20, 2017

@rsc Here is a very basic example of my problem and interest in this issue. As you stated here, I would really only care about the basic C types.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 20, 2017

@joegrasse Honestly, I think that example only undermines my point. It isn't at all obvious why your cp package needs to accept a parameter of type *C.char instead of the idiomatic Go []byte or string type, considering that you can easily construct the former from the latter (as illustrated in https://golang.org/cl/56530):

package cp

import "C"

import "unsafe"

func CTest(name string) {
	b := make([]byte, len(name)+1)
	copy(b, name)
	p := (*C.char)(unsafe.Pointer(&b[0]))
	C.use(p)
}

You can apply the transformation in the reverse direction using the workaround library described in #13656 (comment), or perhaps its eventual replacement described in #19367.

To reiterate: I really don't think *C.char is a compelling example for this issue at all.

@joegrasse

This comment has been minimized.

joegrasse commented Sep 20, 2017

@bcmills That example was a very contrived example just to demonstrate the problem. I could have chosen any basic C type to display the problem.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 20, 2017

@joegrasse, part of Russ's point is that this problem is not worth solving if it only affects contrived examples. I think we all understand the nature of the problem: what we need to understand is its importance. (See https://blog.golang.org/toward-go2 for a much more in-depth discussion on this point.)

@FlorianUekermann

This comment has been minimized.

Contributor

FlorianUekermann commented Sep 20, 2017

In case the relevance of this issue is actually unclear and someone would benefit from an real world example, let me help out. Otherwise please ignore this comment, I have nothing technical to add to the discussion.

There are a lot of C libraries that will give you an instance of a non-basic C type, which you then use in a different C library. Last time I ran into this was while using Vulkan (the low level OpenGL "successor"), so here we go:

If you want to do GPU accelerated graphics stuff, you would typically use a library like Glfw to handle the OS dependent details, like window creation and input. There are go bindings for that, which is nice. Glfw will do the OS specific incantations for you and return an instance of VkSurfaceKHR, a Vulkan type.

Now you want to draw something into your window, so you need to pass the VkSurfaceKHR to Vulkan and do something with it.

But since the Vulkan and Glfw bindings are in different packages, you can't just get the C.VkSurfaceKHR from Glfw and use it in a Vulkan function call.

You can't put both bindings into one package, because Glfw supports m graphics apis and there are n platform abstraction libraries that support Vulkan. So you would end up with m*n go packages.

This is a very real problem I encounter every couple of weeks in different contexts.

@AlexRouSg

This comment has been minimized.

Contributor

AlexRouSg commented Sep 20, 2017

@MaVo159

Was just about to describe a very similar problem and you beat me to it.

@jimmyfrasche

This comment has been minimized.

Member

jimmyfrasche commented Sep 20, 2017

A related issue is one large library that has a number of "modules" defined by optional header files.

It's natural to want to make these true separate packages in Go. This is doable with an internal/ package that implements everything which is used by packages that expose the actual API.

But this means that the implementation of every optional module is included in the build artifact regardless of what actually gets imported, which depending on the library/module can be rather large.

It's not a show-stopping issue, generally, but it sounds like this would fix it.

@AlexRouSg

This comment has been minimized.

Contributor

AlexRouSg commented Sep 20, 2017

My use case is more like an union of the problems described by @MaVo159 and @jimmyfrasche.

A package using multiple intercommunicating C libraries that wants users to expand on it's functionality by calling the C library's functions directly. Something similar to plugin behaviour.

@AlexRouSg

This comment has been minimized.

Contributor

AlexRouSg commented Sep 21, 2017

Just thought of a problem that will affect the implementation and its usefulness.

It is common for some C libraries to hide the fields of some structs. For example SDL defines typedef struct SDL_Window SDL_Window; in the public headers and struct SDL_Window{...} in the internal headers so you can't directly access the fields.

I think in these cases we would have no choice but to use unsafe.Pointer if we want to pass them around? Unless there is some way cgo can get the strut definition without including the internal headers as we would most likely be missing the proper #defines for it.

@bcmills

This comment has been minimized.

Member

bcmills commented Sep 21, 2017

@AlexRouSg

I think in these cases we would have no choice but to use unsafe.Pointer if we want to pass them around?

Those should be fine, as long as the compiler treats blank-named fields as exported for the purpose of computing type identity (but I should verify that).

The we can define an “opaque” struct type for each incomplete type, and as long as the type is incomplete in all of the Go packages they'll remain equivalent.

The generated code would look something like:

type _Ctype_struct_SDL_Window = struct {
	_ struct{} `cgo: struct SDL_Window`
}

(But see also #19487: defining Go types at all for incomplete types is a bit of a thorny problem.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment