Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"operator-sdk generate kustomize manifests -q" hangs for api groups with reference #4990

Closed
AndrienkoAleksandr opened this issue Jun 16, 2021 · 10 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@AndrienkoAleksandr
Copy link

AndrienkoAleksandr commented Jun 16, 2021

Bug Report

In the operator I have two api groups. First api group referenced to the object from second api group. When I'm trying to build olm bundle ''make bundle" command hanges. I took a look: "make bundle" command uses inside sub-command:

operator-sdk generate kustomize manifests -q

And "make bundle" hangs on this step.

What did you do?

I will try to explain this bug using sample(https://github.com/AndrienkoAleksandr/memcached-operator-ast-bug). I used operator-sdk 1.8.0(but initially I found this bug on the 1.7.x). I generated project sample with two api groups v1 and v1alpha1:

mkdir -p $HOME/projects/memcached-operator-ast-bug
cd $HOME/projects/memcached-operator-ast-bug

operator-sdk init --domain example.com --repo github.com/AndrienkoAleksandr/memcached-operator-ast-bug

operator-sdk create api --group cache --version v1alpha1 --kind Memcached --resource --controller

# echo "n" -> we don't want one more extra controller
echo "n" | operator-sdk create api --group cache --version v1 --kind Memcached --resource

I applied storage version marker:

AndrienkoAleksandr/memcached-operator-ast-bug@808d81d#diff-5e36a4d8934ca67673d0971f36a1448ee88520428fe8c95154f2958dcbdda632R43

I executed command to update autogenerated files, CRs, CRDs from root of the project:

$ make generate
$ make manifests

I created initial OLM bundle:

make bundle

Bundle was generate successfully. But...

After that in the api group v1 I applied field '"Status" to the structure "MemcachedStatus". And this field has a type name "MemcachedStatus" too, but from api group v1alpha:

AndrienkoAleksandr/memcached-operator-ast-bug@1547b67#diff-5e36a4d8934ca67673d0971f36a1448ee88520428fe8c95154f2958dcbdda632R38

Then I wanted to update OLM bundle:

$ make bundle

And this command hangs Infinitly. Looks like operator-sdk has got dead recursion. On the Fedora 31 os operator-sdk consumes all RAM memory and laptop hangs! On the Mac I see hanged only operator-sdk process.

What did you expect to see?

Bundle should be generated. On the operator-sdk v0.17.2 the same scenario is valid and working.

What did you see instead? Under which circumstances?

During execution "make bundle" sub command "operator-sdk generate kustomize manifests -q" hangs

Environment

Operator type:

language go

Kubernetes cluster type:

minikube. But it doesn't matter for this issue.

$ operator-sdk version operator-sdk version: "v1.8.0", commit: "d3bd87c6900f70b7df618340e1d63329c7cd651e", kubernetes version: "1.20.2", go version: "go1.16.4", GOOS: "darwin", GOARCH: "amd64"

$ go version (if language is Go)

go version go1.16.5 darwin/amd64

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:11:29Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7", GitCommit:"132a687512d7fb058d0f5890f07d4121b3f0a2e2", GitTreeState:"clean", BuildDate:"2021-05-12T12:32:49Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

Yes, workaround exists, but I'm not sure that it will be sutable for all cases....

If I rename structure in the api group v1alpha from MemcachedStatus to MemcachedStatusAlpha, then issue is gone... Bundle updated and crd v1 status description contains expected MemcachedStatusAlpha field definitions.

Additional context

I debugged a bit operator-sdk and I see that described issue it's a bug of the ast.go parser. I want to share a bit details.

  1. ast.go https://github.com/operator-framework/operator-sdk/blob/v1.8.0/internal/generate/clusterserviceversion/bases/definitions/ast.go#L96 analizing api group files to find field with markers.

  2. During inspection https://github.com/operator-framework/operator-sdk/blob/v1.8.0/internal/generate/clusterserviceversion/bases/definitions/ast.go#L110 ast.go in the some iteration inspect node "MemcachedStatus" structure in the api group v1. Let's call this node v1.MemcachedStatus

  3. Then ast.go inspect child nodes(of the v1.MemcachedStatus) one by one, and one of them that's "MemcachedStatus" structure but from v1alpha1 api package. Let's call it v1alpha1.MemcachedStatus. And yes parser correctly found this node for original package v1alpha. But then ast parser recursivly inspect child elements of this node. And for one of the child (with type *ast.Indent) ast.go create indent object with 'rootPkg' api package:

https://github.com/operator-framework/operator-sdk/blob/v1.8.0/internal/generate/clusterserviceversion/bases/definitions/ast.go#L130

'rootPkg' is hardcoded and and always points to the package v1. So for this child node that's wrong package. Valid package should be v1alpha. And unfortunately in the v1 package really present structure with the same name "MemcachedStatus": v1.MemcachedStatus (see step 2).

and parser recursivly inspect:

v1.MemcachedStatus -> v1alpha1.MemcachedStatus -> v1.MemcachedStatus -> v1alpha1.MemcachedStatus ....

in the cycle: https://github.com/operator-framework/operator-sdk/blob/v1.8.0/internal/generate/clusterserviceversion/bases/definitions/ast.go#L107 which never ends.

P.S. It would be nice to cover ast.go by tests, otherwise trying to fix this issue could break a lot of operators....

@asmacdo asmacdo added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2021
@asmacdo asmacdo added this to the Backlog milestone Jun 21, 2021
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2021
@AndrienkoAleksandr
Copy link
Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2021
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 18, 2022
@AndrienkoAleksandr
Copy link
Author

/remove-lifecycle stale

@AndrienkoAleksandr
Copy link
Author

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 19, 2022
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2022
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 19, 2022
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link

openshift-ci bot commented Jun 18, 2022

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants