Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set revisionHistoryLimit to 1 to reduce load on the issuer #65

Open
asfaltboy opened this issue May 12, 2022 · 5 comments
Open

Set revisionHistoryLimit to 1 to reduce load on the issuer #65

asfaltboy opened this issue May 12, 2022 · 5 comments

Comments

@asfaltboy
Copy link

asfaltboy commented May 12, 2022

We've been running google-cas-issuer for just under a year, to generate certificates to secure Istio workloads. And recently found that it started to reach high memory usage (in the hundreds of MBs) and would get OOMKilled often.

Just before the crash, the deployment logs contain a large amount of log records about existing CertificateRequest already being approved:

INFO	controller.CertificateRequest	CertificateRequest is Ready, ignoring.	{"certificaterequest": ...

Removing limit and clearing up old requests seemed to have helped, but I'd love to see a more permanent solution as I fear this will reccur in a year (or sooner). Our CRs were for the certs such as istio-system/istio-csr and cert-manager/demo-certificate (the latter is likely someone testing out cert-manager and forgot to clean it up).


Noob warning: below is my ramblings trying to understand if I could contribute to the solution - it's probably grossly incorrect

I undertand a solution could be to use CertificateSpec.revisionHistoryLimit, but I couldn't find how the istio-csr Certificate is being created by google-cas-issuer (is it even?), and would love to get some guidance 🙏 .. e.g would it be around this area?:

https://github.com/jetstack/google-cas-issuer/blob/866fd9067b17a24da4e8133d1dcd677f1efb304f/pkg/cas/cas.go#L69-L77

Or, am I looking in the wrong place? Does this Certificate spec get created in Istio, or is it done somewhere else entirely? I feel like I'm missing some basic understanding of what's going on in my system, is there any documentation I can look at?

Side note: I did notice that Istiod cert for example only keept one CR around, being logged by cert-manager:

cert-manager/controller/certificates-revision-manager "msg"="garbage collecting old certificate request revsion" "key"="istio-system/istiod" "related_resource_kind"="CertificateRequest" "related_resource_name"="istiod-kjt9x" "related_resource_namespace"="istio-system" "resource_kind"="Certificate" "resource_name"="istiod" "resource_namespace"="istio-system" "resource_version"="v1" "revision"=9080 
@asfaltboy
Copy link
Author

One thing to add: after removing the old certificate requests, google-cas-issuer logs now contain a lot of "Reconciler error" spam.

This seems similar to the error described in #28 which was meant to be fixed in 0.2.0 (we are running 0.5.3 via quay.io/jetstack/cert-manager-google-cas-issuer:0.5.3).

CC: @jakexks

@tale-toul
Copy link

I have been affected by the same issue described here. When the number of CRs grow and reaches a certain ammount, the google-cas-issuer pod gets OOM killed.
Setting the revisionHistoryLimit parameter in the certificate definition can mitigate the issue, but eventually enough certificates are created and their corresponding CRs grow to the point break.
Also deleting old CRs either manually or via the revisionHistoryLimit causes the pod logs to show many reconcile errors with certificate request not found. Of course it is not found, it they have been removed, forget about them already.

The version I'm using is 0.6.2

@tale-toul
Copy link

Just tested that by deleting the google-cas-issuer pod and have the deployment recreate it, the reconcile errors don't appear in the new pod.
The new pod has no memory of the old CRs, which is expected. But this is not a feasible way of solving this erros

@inteon
Copy link
Member

inteon commented Oct 22, 2024

Do you still have these issues with the latest version of the issuer?

@asfaltboy
Copy link
Author

I have not tested it, but I no longer work for the company where this ran, so will ask @tale-toul's feedback, or otherwise close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants