Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential backoff on CA failures #725

Closed
howardjohn opened this issue Nov 14, 2023 · 9 comments
Closed

Exponential backoff on CA failures #725

howardjohn opened this issue Nov 14, 2023 · 9 comments
Assignees

Comments

@howardjohn
Copy link
Member

Currently here we retry after 60s. 60s can be a long time or a short time. We should probably use a backoff scheme. There may be transient issues in the CA (e.g. the SA informer cache is a bit stale) that will resolve after 1s.. but if its never going to approve, retrying 1440 times a day may be a bit much.

@keithmattix
Copy link
Contributor

/cc @MorrisLaw

@MorrisLaw MorrisLaw self-assigned this Nov 14, 2023
@MorrisLaw
Copy link
Contributor

Thank you Keith, I'll look into this!

@MorrisLaw
Copy link
Contributor

Is there a retry library that maintainers recommend @howardjohn?

@hzxuzhonghu
Copy link
Member

It is not limited, but looks https://crates.io/crates/backoff is the most popular one

@MorrisLaw
Copy link
Contributor

MorrisLaw commented Feb 8, 2024

PR is ready for review again: #768

@MorrisLaw
Copy link
Contributor

PR has been merged.

@keithmattix
Copy link
Contributor

Had to revert; reopening

@keithmattix keithmattix reopened this Mar 4, 2024
@MorrisLaw
Copy link
Contributor

MorrisLaw commented Mar 4, 2024

Had to revert due to an issue noticed in an automated Istio PR that tried updating to the latest stable ztunnel master. TestIntermediateCertificateRefresh failed due to missing cert. It looks like it wasn't refreshed in time or it could be a possible race condition.

Related areas of the code to look at as it relates to the integration test that were changed before and are relevant to the backoff PR, these may require modification in order to support this change:

Goal is to reproduce the integration test failure with the original PR that has been reverted, then figure out how to update the integration test or the business logic itself so that the integration test no longer fails.

@MorrisLaw
Copy link
Contributor

This has been merged back in. Found the issue in an integration test and fixed it. Also, confirmed that istio integration is happy with this latest version of ztunnel: https://github.com/istio/istio/pull/50079/files. We can close this for real now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants