Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[codegen/go] Rewrite cyclic types. #8049

Merged
merged 7 commits into from
Sep 28, 2021
Merged

[codegen/go] Rewrite cyclic types. #8049

merged 7 commits into from
Sep 28, 2021

Conversation

pgavlin
Copy link
Member

@pgavlin pgavlin commented Sep 23, 2021

When computing the type name for a field of an object type, we must
ensure that we do not generate invalid recursive struct types. A struct
type T contains invalid recursion if the closure of its fields and its
struct-typed fields' fields includes a field of type T. A few examples:

Directly invalid:

type T struct { Invalid T }

Indirectly invalid:

type T struct { Invalid S }

type S struct { Invalid T }

In order to avoid generating invalid struct types, we replace all
references to types involved in a cyclical definition with *T. The
examples above therefore become:

       type T struct { Valid *T }
       type T struct { Valid *S }

       type S struct { Valid *T }

We do this using a rewriter that turns all fields involved in reference
cycles into optional fields.

These changes also include two enhancements to the SDK codegen test
driver in the interest of making debugging more convenient.

  1. If the -sdk.run flag is passed, the driver will only run the test
    with the given directory name
  2. If the -sdk.no-checks flag is passed, the driver will not run
    post-generation checks

This allows the use of e.g. dlv to debug a particular case without
editing the driver itself, for example:

$ go test -c -gcflags="all=-N -l" .
$ dlv exec ./go.test -- -test.run TestGeneratePackage -sdk.run=cyclic-types -sdk.no-checks

Fixes #7622.

@github-actions
Copy link

Diff for pulumi-random with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-azuread with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-random with merge commit a50a4c0

@github-actions
Copy link

Diff for pulumi-azuread with merge commit a50a4c0

@github-actions
Copy link

Diff for pulumi-kubernetes with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-gcp with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-kubernetes with merge commit a50a4c0

@pgavlin
Copy link
Member Author

pgavlin commented Sep 23, 2021

We do this using a rewriter that turns all fields involved in reference cycles into optional fields.

A note on this: it is a bit of a heavy hammer, and it's really solving the problem at the wrong layer. Unfortunately, we don't have the layer this should really be solved at in the Go code generator, at least not in any real sense.

Given its needs, the Go code generator should probably look like this:

PNG image-4AF6A82B9C07-1

As it stands, we're missing the "Go type graph" portion of that pipeline. If we had that such a representation, this rewrite would likely be part of the lowering step from *schema.Package.

In any case, the most unfortunate side effect of these changes is that Go SDKs may represent required but cyclic properties as optional.

@github-actions
Copy link

Diff for pulumi-gcp with merge commit a50a4c0

@github-actions
Copy link

Diff for pulumi-aws with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-aws with merge commit a50a4c0

@github-actions
Copy link

Diff for pulumi-azure with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-azure with merge commit a50a4c0

@github-actions
Copy link

Diff for pulumi-azure-native with merge commit e74dc64

@github-actions
Copy link

Diff for pulumi-azure-native with merge commit a50a4c0

}

type IndirectCycleT struct {
Foo3 *IndirectCycleS `pulumi:"foo3"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both IndirectCycleS and IndirectCycleT properties get '*' treatment, so this is symmetric/deterministic, good!

@t0yv0
Copy link
Member

t0yv0 commented Sep 24, 2021

This looks well motivated. Coupe of notes.

  1. I'm struggling to find time before Monday to give this the review it deserves, would it be ok to take a rain check on the 24h review SLA or is there urgency to get it out now.

  2. Being conservative about rollout: with the recent track record of codegen changes causing some problems for providers, should we do some manual rebuilds of azure-native and google-native that report having this issue, and perhaps one other provider that does not have this issue to check for any breaking changes? I could help with that. Or do we trust the logic + "downstream codegen checks" here to be sufficient.

  3. Is this fully non-breaking (beside the strictly necessary addition of * that the user notices in the API)? Are all derived forms working?

  4. Thinking out loud: substituting T for *T in a struct field, during recursive serialization *T will serialize to the same form as T so we're likely okay there. I'd be curious to check the generated Input helper forms, as well as cases where *T, as well as interactions of this pass with situations where T field is marked optional or pointer in the schema (so the default projection is already *T), whether the treatment is symmetric/deterministic/stable wrt changes of schema. Preliminary looks good - I'm glancing through tests and finding answers to some of these, more on Monday.

@t0yv0
Copy link
Member

t0yv0 commented Sep 24, 2021

really solving the problem at the wrong layer

This certainly resonates and this is why I feel we need to think about unexpected potential interactions with other features a little harder, to be on the safe side here.

@pgavlin
Copy link
Member Author

pgavlin commented Sep 24, 2021

I'm struggling to find time before Monday to give this the review it deserves, would it be ok to take a rain check on the 24h review SLA or is there urgency to get it out now.

No worries--it's more important that this lands reliably and with robust review than that it lands now.

Being conservative about rollout: with the recent track record of codegen changes causing some problems for providers, should we do some manual rebuilds of azure-native and google-native that report having this issue, and perhaps one other provider that does not have this issue to check for any breaking changes? I could help with that. Or do we trust the logic + "downstream codegen checks" here to be sufficient.

I think that looking at the changed files and trying to build those packages that saw changes should be sufficient.

Is this fully non-breaking (beside the strictly necessary addition of * that the user notices in the API)? Are all derived forms working?

In some aspects this is non-breaking:

  • property access expressions should continue to work, since Go uses the same syntax for accesses that automatically dereference pointers and those that do not
  • existing inputs will be okay, since a TInput is assignable to a TPtrInput

Importantly, though, existing outputs may break if they are passed to ApplyT, since the element type of the output is now *T instead of T. I don't think that there's any way to avoid this beyond changing the dynamic type checker for ApplyT, which is pretty unpalatable.

@t0yv0 t0yv0 self-requested a review September 27, 2021 14:46

// If we've found a cycle, mark each property involved in the cycle as optional.
//
// NOTE: this overestimates the set of properties that must be marked as optional. For example, in case (2) above,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// only one of T.Invalid or S.Invalid needs to be marked as optional in order to break the cycle. However, choosing
// a minimal set of properties that is also deterministic and resilient to changes in visit order is difficult and
// seems to add little value.
for _, p := range cycle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to log any time we do this cycle rewrite? E.g. which type and field? I'm not sure how this surfaces to codegen users but that sounds valuable to see what's happening, as a user of codegen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging might be useful, yes, but we don't really have a logging framework for the code generators, and I'd prefer not to just log to stdout/stderr. File this one under "enhancements" :)

@t0yv0 t0yv0 self-requested a review September 27, 2021 16:36
Copy link
Member

@t0yv0 t0yv0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, I read through the recursion.

Test failure unrelated (merge master should fix).

Reassuring zero diff on pulumi-aws.

Expect to have very low blast radius since few cycles are identified in existing schmas. Logging rewrites might help "see" the blast radius.

I may try this out on pulumi-gcp later on today just to get in the habit of checking codegen PRs against affected providers.

When computing the type name for a field of an object type, we must
ensure that we do not generate invalid recursive struct types. A struct
type T contains invalid recursion if the closure of its fields and its
struct-typed fields' fields includes a field of type T. A few examples:

Directly invalid:

    type T struct { Invalid T }

Indirectly invalid:

    type T struct { Invalid S }

    type S struct { Invalid T }

In order to avoid generating invalid struct types, we replace all
references to types involved in a cyclical definition with *T. The
examples above therefore become:

(1) type T struct { Valid *T }

(2) type T struct { Valid *S }

    type S struct { Valid *T }

We do this using a rewriter that turns all fields involved in reference
cycles into optional fields.

These changes also include two enhancements to the SDK codegen test
driver in the interest of making debugging more convenient.

1. If the -sdk.run flag is passed, the driver will only run the test
   with the given directory name
2. If the -sdk.no-checks flag is passed, the driver will not run
   post-generation checks

This allows the use of e.g. `dlv` to debug a particular case without
editing the driver itself, for example:

    $ go test -c -gcflags="all=-N -l" .
    $ dlv exec ./go.test -- -test.run TestGeneratePackage -sdk.run=cyclic-types -sdk.no-checks
@github-actions
Copy link

Diff for pulumi-azure-native with merge commit b532476

@github-actions
Copy link

Diff for pulumi-azure-native with merge commit 37db7fc

Copy link
Member

@komalali komalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but you'll have to regen the codegen tests (if you've already pulled in changes in master)

@github-actions
Copy link

Diff for pulumi-random with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-azuread with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-kubernetes with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-gcp with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-aws with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-azure with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-azure-native with merge commit 5147b78

@github-actions
Copy link

Diff for pulumi-azuread with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-random with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-kubernetes with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-gcp with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-aws with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-azure with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-azuread with merge commit 8d37d61

@github-actions
Copy link

Diff for pulumi-random with merge commit 8d37d61

@github-actions
Copy link

Diff for pulumi-kubernetes with merge commit 8d37d61

@github-actions
Copy link

Diff for pulumi-gcp with merge commit 8d37d61

@github-actions
Copy link

Diff for pulumi-azure-native with merge commit 1a50f98

@github-actions
Copy link

Diff for pulumi-aws with merge commit 8d37d61

@github-actions
Copy link

Diff for pulumi-azure with merge commit 8d37d61

@github-actions
Copy link

Diff for pulumi-azure-native with merge commit 8d37d61

@pgavlin pgavlin merged commit 0a45f8d into master Sep 28, 2021
@pulumi-bot pulumi-bot deleted the pgavlin/goCyclicTypes branch September 28, 2021 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursive types not supported in Go codegen
3 participants