Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: reflect: allow creation of recursive types #39528

Closed
TheCount opened this issue Jun 11, 2020 · 17 comments
Closed

proposal: reflect: allow creation of recursive types #39528

TheCount opened this issue Jun 11, 2020 · 17 comments

Comments

@TheCount
Copy link

TheCount commented Jun 11, 2020

Introduction

In #20013, @ben-clayton asked for runtime creation of recursive structs. The analogous issue arises with several other constructions which are legal at compile time, e. g.:

type A [42]*A
type C chan C
type F func(F) F
type M map[string]M
type P *P
type S []S

While these specific definitions are not as useful at the top-level as a struct, combinations can be, e. g.

type L struct {
  Value interface{}
  Next *L
  Snitch chan L
}

or

type T []struct{ Edge interface{}; Out T }

The issue is related to #16522, where @crawshaw proposes an API to create a named type with a method set. In that issue, @cosmos72 observed that the proposed API would not allow for the creation of recursive types, and also referenced https://golang.org/pkg/go/types/#Named, where recursive named types can be represented in a two-step process. In #16522, it was also observed that the implementation of a runtime method set is met with certain challenges, so this proposal will concentrate on recursive types while leaving the door open for the core concern of #16522.

Proposal

This issue proposes an API for a two-step process of runtime recursive type creation similar to, but not quite analogous to https://golang.org/pkg/go/types/#Named. The first step is to create an incomplete named type which can only be used in certain restricted contexts until it is completed in the second step by binding the name to an underlying type specification, in analogy to a compile-time type declaration.

// Named creates a new incomplete type with the specified valid name.
// An invalid name causes Named to panic.
//
// The resulting type can be used as an argument to ArrayOf, ChanOf, FuncOf, MapOf,
// PtrTo, SliceOf, and StructOf whenever the analogous compile-time type specification
// in a type declaration would result in a valid recursive type. Invalid combinations
// cause Named to panic. E. g., passing an incomplete type to ArrayOf is invalid,
// but passing an incomplete type to PtrTo and then that type to ArrayOf is valid.
//
// Any such type, including the incomplete named type itself, may not be used in any
// other context until it has been completed with Bind. A future version may relax
// this requirement (e. g., allow use in MakeFunc before calling Bind).
func Named(name string) Type

// Bind binds the specified incomplete named type to underlying akin to a valid type
// declaration and returns a completed version of named. The underlying type may
// indirectly reference the named type, but no
// other incomplete type created with Named. A Bind which represents an illegal
// type declaration panics.
//
// Currently, adding a method set to named is not supported and methods must be nil.
// This limitation may be lifted in a future version.
func Bind(named Type, underlying Type, methods []Method) Type

(edit: changed return value of Bind)

Here, the methods would only be added at the Bind stage (in a future version), so that they can refer to named.

Open questions

  • What is a valid name for Named? The spec suggests letter { letter | unicode_digit }, but that may be expensive to check. Also, should the name be the name of an exported type?
  • What should the PkgPath() method of the created type return? Probably an empty string, but could also be a pseudo package name where all the types created at runtime end up in.
  • What should the Kind() method of the named type return before Bind? Of the existing kinds, only Invalid makes sense. Could introduce a new kind, or use the tflags to mark the type incomplete.
  • (edit: added) goroutine safety: probably the new type won't be intrinsically goroutine-safe until Bind has been called on it.
@cosmos72
Copy link

cosmos72 commented Jun 11, 2020

Good to hear that I am not the only one interested in adding this feature! :)

Some days ago I had almost exactly the same idea, with some minor differences:

// create a new, incomplete, named Type with Kind = Invalid.
// the parameter 'pkgpath string' is stored in the type, but does not grant any special permission:
// for example, creating a type in package "runtime" does not give it access to the package's
// unexported symbols.
// TBD: decide: accept or reject pkgpath+name equal to a pre-existing type?
func NewNamed(pkgpath string, name string) Type 

// must be called exactly once before Complete()
func SetUnderlying(named Type, underlying Type)

// cannot be implemented yet because of API/ABI limitations... I would keep this for later
// func AddMethod(named Type, method Method)

// must be called exactly once to "freeze" the type and make it immutable,
// internally it will mark the type as complete and make it fully usable as any other reflect.Type
func Complete(named Type)

I think it's better than having a single "do-everything" function

func Bind(named Type, underlying Type, methods []Method)

for three reasons:

  1. each function above has a single purpose
  2. it's more similar to go/types API
  3. it allows to postpone the API to add methods until there is a technical solution to actually implement them. I am not fond of adding an API which cannot currently be implemented (the methods []Method part)

If this proposal (or the above one, or something similar enough) is accepted, I can help implementing it

@ianlancetaylor ianlancetaylor changed the title reflect: allow creation of recursive types proposal: reflect: allow creation of recursive types Jun 11, 2020
@gopherbot gopherbot added this to the Proposal milestone Jun 11, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jun 11, 2020
@TheCount
Copy link
Author

@cosmos72 The creation process is already quite involved, what with the invening calls to the PtrTo and SomethingOf functions, so I don't see a problem splitting my Bind up into the functions in your proposal.

The important thing, I think, is that there is a point when the new type is final (my Bind or your Complete); I've added an open question regarding goroutine-safety to my original proposal.

Maybe the next logical step is to see whether our proposals can be implemented without too much hassle.

@TheCount
Copy link
Author

TheCount commented Jun 12, 2020

I had a cursory glance at reflect/type.go and learned that while the dynamic type of a valid Type is always *rtype, the object behind the pointer can be different in size for different types, so I think from an implementation standpoint, it's easier if Bind returns a completely new type instead of trying to alter the named argument. I've edited the proposal accordingly.

@cosmos72
Copy link

cosmos72 commented Jun 12, 2020

Yes, reflect.Type internals are quite complicated because they are heavily optimized to reduce memory usage and the number of distinct allocations.

There is also a property of reflect.Type that would cause a lot of problems with your last suggestion - namely, that Bind() should return a completely new type instead of trying to alter the named argument:

https://golang.org/pkg/reflect/#Type states:
"Type values are comparable, such as with the == operator, so they can be used as map keys. Two Type values are equal if they represent identical types."

I hope I'm wrong, but trying to imagine how Named() and Bind() interact with such guarantee, I think you will see that having around two objects representing the same type:

  • one returned by Named() and possibly used to construct other types, for example with PtrTo() or one of the *Of() functions
  • the other returned by Bind()

causes many conflicts with such guarantee

@TheCount
Copy link
Author

True. One possible solution is that this guarantee simply does not hold until Bind returns. So the Type returned by Named would be just a placeholder, to be used in the *Of functions only, not to be used as map key or in any other context. After Bind, the placeholder cannot be used any more, the new returned Type has to be used instead.

This also means that Bind has to walk recursively through the underlying type and replace all references to the placeholder with the real thing, possibly updating the type cache on the way.

I guess a similar situation arises with goroutine-safety. Probably the involved types do not become goroutine-safe until Bind returns.

@cosmos72
Copy link

cosmos72 commented Jun 13, 2020

What about Go 1.x compatibility promise ?
I am quite sure it also covers the statement cited above:

https://golang.org/pkg/reflect/#Type
"Type values are comparable, such as with the == operator, so they can be used as map keys. Two Type values are equal if they represent identical types."

which you suggest to break.

That's why I preferred your original proposal:
it required internal changes to reflect.Type to allow creating a *reflect.rtype without knowing in advance whether it will describe a struct, map, func or one of the other possible kinds - and instead deciding it later, with Bind or SetUnderlying.
As a consequence, the current technique allocating in a single step reflect.rtype plus its auxiliary (kind-dependent) information could probably not be reused in this case, and an additional layout would probably be needed - a kind of "rtypeUnion" with the ability to contain all the information needed by the different kinds.
But whatever the required changes are - and they could be complicated - they would be internal to the package reflect, and not visible from outside.

I am worried that your idea "Bind has to walk recursively through the underlying type and replace all references to the placeholder with the real thing, possibly updating the type cache on the way"

  • is quite expensive CPU-wise
  • falls short on preserving Go compatibility promise, because the placeholder type could be already in the hands of user code, thus breaking the guarantee that Type is comparable with ==

@TheCount
Copy link
Author

What about Go 1.x compatibility promise ?

Is there any way to gain access to a reflect.Type returned by Named which is outside the control of the caller of Named? If not, I'm not worried about compatibility issues. Named and Bind are a new API with a special, restrictive contract: users of this API have to keep the reflect.Type returned by Named and all derived types close, but once Bind has returned, the returned type as well as the derived types enjoy all the usual guarantees and can be given to other user code. The named argument must never get into the hands of user code and should be thrown away and forgotten after Bind, though.

This way, older source will continue to compile and run as laid out in the compatibility guidelines.

I am worried that your idea "Bind has to walk recursively through the underlying type and replace all references to the placeholder with the real thing, possibly updating the type cache on the way"

  • is quite expensive CPU-wise

I've had another look at the code and apparently these caches are for the benefit of the *Of functions only. So I don't think the cache has to be touched. That should make the process no more CPU intensive than the work a compiler has to do when it parses a recursive type definition.


If my assumption above about being able to keep the return value of Named close is not correct, then things become difficult. I don't think a "rtypeUnion"-approach would work well (at least not without revamping the internal type system) as some types may require a lot of space (for example, the tail of a function type is an array of *rtype pointers representing the all argument and return types) and you would have to allocate space for the largest possible type. In this case, it would be better if there was some way of asking the gc to explicitly resize a specific block and update all pointers to it, or may do allocate a lot of space and shrink later, without updating any pointers. This is just guesswork, though. I have no clue how the internal go memory system works.

@ianlancetaylor
Copy link
Contributor

I'm not sure what you mean by keeping a value close. Certainly once a program has a reflect.Type created by reflect.Named it can call reflect.Zero(t).Interface() to get a value of type interface{}, and the program can do anything with the value, including passing it some other package. That other package can then call reflect.TypeOf on the value to retrieve the reflect.Type created by reflect.Name. For example, this would likely happen if the value were passed to encoding/json.

@TheCount
Copy link
Author

I'm not sure what you mean by keeping a value close. Certainly once a program has a reflect.Type created by reflect.Named it can call reflect.Zero(t).Interface() to get a value of type interface{}, and the program can do anything with the value, including passing it some other package. That other package can then call reflect.TypeOf on the value to retrieve the reflect.Type created by reflect.Name. For example, this would likely happen if the value were passed to encoding/json.

The idea is that there is a special contract for callers of reflect.Named. Callers promise they won't, directly or indirectly, pass the returned reflect.Type, or any type derived from that reflect.Type via reflect.PtrTo or one of the reflect.KindOf functions, to any functions or methods defined in the reflect package, except as prescribed in the description of reflect.Named. Only the return value of reflect.Bind is no longer bound by this special contract and may be passed around freely.

Now, this contract is not enforceable at the language level. If someone wants to shoot themselves in the foot, they certainly can pass a reflect.Type returned by reflect.Named to reflect.Zero — and suffer the consequences (for a deterministic outcome, the implementation could raise a panic in such a case). It's more of a C-style contract, "if you don't do as prescribed here, the behaviour is undefined".

Is such a contract acceptable for the Go community? From your and @cosmos72's comments I get the notion the answer is "no". Is that correct? If so, we need an API which avoids these "special" reflect.Type instances altogether. I'll write another proposal later.

@cosmos72
Copy link

cosmos72 commented Jun 14, 2020

The idea is that there is a special contract for callers of reflect.Named. Callers promise they won't, directly or indirectly, pass the returned reflect.Type, or any type derived from that reflect.Type via reflect.PtrTo or one of the reflect._Kind_Of functions, to any functions or methods defined in the reflect package, except as prescribed in the description of reflect.Named. Only the return value of reflect.Bind is no longer bound by this special contract and may be passed around freely.

Also, operator== would not have the usual meaning on the Type returned by Named: it's an "incomplete type" different from anything else, including from the completed version of itself.

Also, the Type returned by Named should not be passed to Go code that is not aware of such special contract - which means all existing Go code that (potentially) deals with reflect.Type

All in all, it seems an API easy to use incorrectly, and such misuses are difficult to detect.

That's why I prefer the initial proposal, where Bind or SetUnderlying simply modify the first argument - i.e. the Type returned by Named - instead of creating a new Type.

TheCount added a commit to TheCount/go that referenced this issue Jun 14, 2020
Until now, a valid identifier check was only required for
isValidFieldName. For golang#39528, a valid identifier check will also be
required for type names.

Updates golang#39528.
TheCount added a commit to TheCount/go that referenced this issue Jun 14, 2020
Introduce two new functions, Named and Bind, to the reflect API to
create possibly recursive, named types.

Fixes golang#39528.
@TheCount
Copy link
Author

Also, operator== would not have the usual meaning on the Type returned by Named: it's an "incomplete type" different from anything else, including from the completed version of itself.

Also, the Type returned by Named should not be passed to Go code that is not aware of such special contract - which means all existing Go code that (potentially) deals with reflect.Type

All in all, it seems an API easy to use incorrectly, and such misuses are difficult to detect.

That's why I prefer the initial proposal, where Bind or SetUnderlying simply modify the first argument - i.e. the Type returned by Named - instead of creating a new Type.

OK, but even with an in-place modification, the reflect.Type returned by reflect.Named would not be allowed to be passed to most functions before reflect.Bind has returned. That's why I wanted to write another proposal to avoid this problem altogether.

FWIW, I implemented the proposal in this issue here: https://github.com/TheCount/go/tree/issue-39528. If you think it's worthwhile pursuing, maybe you can try to change my implementation to an in-place modification (I don't know how to do that, I guess you'd have to somehow interact with the memory management system).

@cosmos72
Copy link

cosmos72 commented Jun 15, 2020

OK, but even with an in-place modification, the reflect.Type returned by reflect.Named would not be allowed to be passed to most functions before reflect.Bind has returned.

True, an incomplete type can only be used in few cases. Essentially only to build other types - calling other function/methods with it will (and should) panic.

That's why I wanted to write another proposal to avoid this problem altogether.

I understand the reason, but it had the side effect of creating a reflect.Type which is and always remains incomplete - not just for a limited time.

FWIW, I implemented the proposal in this issue here: https://github.com/TheCount/go/tree/issue-39528. If you think it's worthwhile pursuing, maybe you can try to change my implementation to an in-place modification (I don't know how to do that, I guess you'd have to somehow interact with the memory management system).

I have a draft implementation using in-place modification (UPDATE: published at https://github.com/cosmos72/go/commit/b010cd5c67dac0856fea156d7e6d6c3b0e3f1d77)
The in-place modification itself is relatively simple... the hard part is tracking all types derived from the newly created one abd completing them as soon as possible (not implemented yet):
after

var t = reflect.Named("foo")
var arrayOfT = reflect.ArrayOf(42, t)
reflect.Bind(t, reflect.TypeOf(int(0)))

both t and arrayOfT must be updated by computing their size, GC information and mark them as complete.
And with mutually recursive types it gets more complicated.

I'd say it's time to wait for feedback from Go team on whether either proposal is acceptable.

@TheCount
Copy link
Author

TheCount commented Jun 15, 2020

That's why I wanted to write another proposal to avoid this problem altogether.

I understand the reason, but it had the side effect of creating a reflect.Type which is and always remains incomplete - not just for a limited time.

Small misunderstanding here. I meant I wanted to write (yet) another proposal which avoids even the problem of the limited incompleteness between reflect.Named and reflect.Bind. Just haven't got around to doing that yet.

@TheCount
Copy link
Author

TheCount commented Jun 15, 2020

I have a draft implementation using in-place modification (UPDATE: published at cosmos72@b010cd5)
The in-place modification itself is relatively simple... the hard part is tracking all types derived from the newly created one abd completing them as soon as possible (not implemented yet):
after

var t = reflect.Named("foo")
var arrayOfT = reflect.ArrayOf(42, t)
reflect.Bind(t, reflect.TypeOf(int(0)))

both t and arrayOfT must be updated by computing their size, GC information and mark them as complete.
And with mutually recursive types it gets more complicated.

In your draft, you define your wrapper type as

type wrapperType struct {
	rtype
	uncommon uncommonType
	under    *rtype
}

How do you go from this to the memory layout of the finished type as the runtime expects it? For example, a function type in memory would look like (pseudocode):

struct {
  rtype
  numIn, numOut uint16
  uncommonType
  [numIn+numOut]*rtype
}

Since you can't know numIn+numOut at the time you allocate the memory for wrapperType, it's difficult to allocate the correct amount for it. (Asking the caller of NewNamed for a hint and panicking in Complete if the hint was wrong might be a practical if slightly awkward solution.)

@cosmos72
Copy link

cosmos72 commented Jun 15, 2020

In your draft, you define your wrapper type as

type wrapperType struct {
	rtype
	uncommon uncommonType
	under    *rtype
}

How do you go from this to the memory layout of the finished type as the runtime expects it? For example, a function type in memory would look like (pseudocode):

struct {
  rtype
  numIn, numOut uint16
  uncommonType
  [numIn+numOut]*rtype
}

Since you can't know numIn+numOut at the time you allocate the memory for wrapperType, it's difficult to allocate the correct amount for it. (Asking the caller of NewNamed for a hint and panicking in Complete if the hint was wrong might be a practical if slightly awkward solution.)

That's not necessary. A patched implementation of reflect.Type.Field would be:

func (t *rtype) Field(i int) StructField {
	if t.Kind() != Struct {
		panic("reflect: Field of non-struct type " + t.String())
	}
	if t.tflag&tflagWrapper != 0 {
		t = (*wrapperType)(unsafe.Pointer(t)).underlying("Field")
	}
	tt := (*structType)(unsafe.Pointer(t))
	return tt.Field(i)
}

The idea is to forward (almost) any call to the underlying type, which has the correct layout.

[UPDATE] slightly more complete implementation published at https://github.com/cosmos72/go/tree/issue-39528

@TheCount
Copy link
Author

TheCount commented Jun 18, 2020

The idea is to forward (almost) any call to the underlying type, which has the correct layout.

OK, I think I understand now. You introduce a completely new "proxy" layout. That means of course checking and updating all parts of the Go source code where a pattern like

var something interface{}
…
explicit := *(**expectedLayout)(unsafe.Pointer(&something))

is used. Possibly also third-party tools. But nothing covered by the Go compatibility promise.

[edit: fixed pattern]

@rsc
Copy link
Contributor

rsc commented Jun 24, 2020

Partially constructed types are definitely a major disadvantage. Then everything that accepts a reflect.Type has to guard against them. Given that you've opened the other issue, let's close this one as a duplicate of #39717.

@rsc rsc closed this as completed Jun 24, 2020
@rsc rsc moved this from Incoming to Declined in Proposals (old) Jun 24, 2020
@golang golang locked and limited conversation to collaborators Jun 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

5 participants