Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve init container to use DeleteCollection to remove policy reports #2477

Merged
merged 2 commits into from
Oct 6, 2021

Conversation

treydock
Copy link
Member

@treydock treydock commented Oct 5, 2021

Signed-off-by: Trey Dockendorf tdockendorf@osc.edu

Related issue

Relates to #2474 but not sure if solves.

Milestone of this PR

What type of PR is this

/kind feature

Proposed Changes

Checklist

  • I have read the contributing guidelines.
  • [] I have added tests that prove my fix is effective or that my feature works.
  • [] My PR contains new or altered behavior to Kyverno and
    • [] I have added or changed the documentation myself in an existing PR and the link is:
    • [] I have raised an issue in kyverno/website to track the doc update and the link is:
    • [] I have read the PR documentation guide and followed the process including adding proof manifests to this PR.

Further Comments

Based on feedback from #2474 but not sure if solves the issue encountered where containers would go into crash loops and dump traces.

Signed-off-by: Trey Dockendorf <tdockendorf@osc.edu>
go deleteResource(client, "", kind, ns.GetName(), reportName, &wg)
}
wg.Wait()
go func(namespaceName string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to spin up goroutines? Not sure if it would improve runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try and test that with Kind to see if there is any improvement.

@treydock
Copy link
Member Author

treydock commented Oct 5, 2021

My only concern with using DeleteCollection is I'm not sure it will serve the purpose of deleting old policy reports maybe after API changes. That was my guess of why there were previously 3 prefixed resources getting deleted, figured they were previously named but not sure if that will matter with how DeleteCollection works.

@realshuting
Copy link
Member

My only concern with using DeleteCollection is I'm not sure it will serve the purpose of deleting old policy reports maybe after API changes. That was my guess of why there were previously 3 prefixed resources getting deleted, figured they were previously named but not sure if that will matter with how DeleteCollection works.

This is because the name changes for the report. Those three names were added just to ensure all reports should be deleted. With DeleteCollection there is no need to specify the resource name. But I'm not sure if it solves the slowness issue since ideally each namespace only has one report.

Signed-off-by: Trey Dockendorf <tdockendorf@osc.edu>
@treydock
Copy link
Member Author

treydock commented Oct 6, 2021

I am testing with Kind 0.11.1 which is Kubernetes 1.21.1. I created 500 namespaces:

for i in {1..500} ; do kubectl create namespace user-$i ; done

I restarted Kyverno deployment to trigger init container and this resulted in a bunch of throttling messages like this:

I1006 01:33:07.238073       1 request.go:668] Waited for 1m36.994154751s due to client-side throttling, not priority and fairness, request: DELETE:https://10.96.0.1:443/apis/wgpolicyk8s.io/v1alpha2/namespaces/user-97/policyreports

The end result was the init container took 1 minute and 39 seconds. First and last log messages:

I1006 01:31:29.020652       1 main.go:143]  "msg"="Using in-cluster configuration"  

I1006 01:33:08.238184       1 request.go:668] Waited for 1m37.994163118s due to client-side throttling, not priority and fairness, request: DELETE:https://10.96.0.1:443/apis/wgpolicyk8s.io/v1alpha2/namespaces/user-62/policyreports

I then removed the go routine so the function was simply a loop over all namespaces deleting all policy reports and that took 1 minute 40 seconds.

First and last log messages without go routine:

I1006 01:41:00.531283       1 main.go:143]  "msg"="Using in-cluster configuration"  

I1006 01:42:40.148508       1 main.go:266] removePolicyReport "msg"="Removing policy reports"  "namespace"="user-99"

So it seems that the go routine isn't needed. I've removed it so just loops over namespaces.

@treydock
Copy link
Member Author

treydock commented Oct 6, 2021

The e2e failures look unrelated to this change, since failed on some mutate or other thing unrelated to this.

@realshuting
Copy link
Member

No, the failure has nothing to do with your changes.

Is there anything you want to add? Otherwise we can merge it.

@treydock
Copy link
Member Author

treydock commented Oct 6, 2021

Is there anything you want to add? Otherwise we can merge it.

Only thing I can think of is I wonder if a new change is needed to avoid running the removes multiple times when starting HA pair pods. My deployment has replica count of 2 and noticed the long running init container was run for both pods. I wonder if maybe need to improve logic so that only the leader runs these processes or is the leader election not done till after the init container runs?

@realshuting
Copy link
Member

Yes @ojhaarjun1 is working on leader election for the init container #1965.

@treydock
Copy link
Member Author

treydock commented Oct 6, 2021

Excellent, then I think nothing else needed on this pull request.

@realshuting realshuting self-assigned this Oct 6, 2021
@realshuting realshuting added the milestone 1.5.0 Issues and PRs for the 1.5.0 release. label Oct 6, 2021
@realshuting realshuting added this to the Kyverno Release 1.5.0 milestone Oct 6, 2021
@realshuting realshuting merged commit b460490 into kyverno:main Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
milestone 1.5.0 Issues and PRs for the 1.5.0 release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants