-
Notifications
You must be signed in to change notification settings - Fork 29
OCPBUGS-44693: Default-disable ResilientWatchCacheInitialization #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There's a separate pre-existing issue causing storage layer errors and watch cache re-initialization during cluster bootstrap. With ResilientWatchCacheInitialization enabled, clients (reflectors in particular) are turned away with 429 responses while the watch cache is being repopulated and retry repeatedly. Without this feature, requests hang (consuming priority and fairness "seats") until the watch cache is initialized or they time out. We fail tests when the total number of watch requests during a job exceeds a threshold based on recent historical totals. The systemic 429/retry behavior causes this threshold to be breached. We are temporarily disabling it to reduce noise as it's a symptom and not the cause of the underlying storage errors.
@benluddy: This pull request references Jira Issue OCPBUGS-44693, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cc @p0lyn0mial |
cd4433e
into
openshift:openshift-apiserver-4.18-kubernetes-1.31.1
@benluddy: Jira Issue OCPBUGS-44693: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-44693 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
this pr needs to be reverted, oas is crashing:
|
There's a separate pre-existing issue causing storage layer errors and watch cache re-initialization during cluster bootstrap. With ResilientWatchCacheInitialization enabled, clients (reflectors in particular) are turned away with 429 responses while the watch cache is being repopulated and retry repeatedly. Without this feature, requests hang (consuming priority and fairness "seats") until the watch cache is initialized or they time out. We fail tests when the total number of watch requests during a job exceeds a threshold based on recent historical totals. The systemic 429/retry behavior causes this threshold to be breached. We are temporarily disabling it to reduce noise as it's a symptom and not the cause of the underlying storage errors.