OCPBUGS-78940: Treat groups as existent if they were found but discovery is stale#30923
OCPBUGS-78940: Treat groups as existent if they were found but discovery is stale#30923jacobsee wants to merge 1 commit intoopenshift:mainfrom
Conversation
The kube-apiserver still declares itself ready even with stale discovery entries. The stale entries would be refreshed by a background worker, but there's a race window where clients can hit a newly-ready kube-apiserver and get stale discovery data. For the purpose of answering whether they exist, return true as long as they do, even if the discovery data is currently stale.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@jacobsee: This pull request references Jira Issue OCPBUGS-78940, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jacobsee The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughModified the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/extended/util/framework.go (1)
2236-2241: Use a typed variable for clearer error matchingAt line 2236, prefer the idiomatic pattern
var staleErr discovery.StaleGroupVersionError; errors.As(err, &staleErr)over the empty struct literal&discovery.StaleGroupVersionError{}. Also rename the loop variable to avoid shadowing the outererr:Suggested refactor
- for gv, err := range groupFailed.Groups { + for gv, groupErr := range groupFailed.Groups { if gv.Group == group { - if errors.As(err, &discovery.StaleGroupVersionError{}) { + var staleErr discovery.StaleGroupVersionError + if errors.As(groupErr, &staleErr) { // Group is registered but discovery is transiently stale. // This can happen immediately after a restart and should resolve itself. // For now, treat as "exists" since the APIService is known to the aggregator. return true, nil } - return false, err + return false, groupErr } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/util/framework.go` around lines 2236 - 2241, Replace the current errors.As call that passes an inline &discovery.StaleGroupVersionError{} with the idiomatic typed variable pattern: declare a variable (e.g. var staleErr discovery.StaleGroupVersionError) and call errors.As(err, &staleErr); also ensure you don't shadow the outer err by using a different variable name (staleErr) rather than reusing err in the surrounding function where discovery.StaleGroupVersionError is checked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/extended/util/framework.go`:
- Around line 2236-2241: Replace the current errors.As call that passes an
inline &discovery.StaleGroupVersionError{} with the idiomatic typed variable
pattern: declare a variable (e.g. var staleErr discovery.StaleGroupVersionError)
and call errors.As(err, &staleErr); also ensure you don't shadow the outer err
by using a different variable name (staleErr) rather than reusing err in the
surrounding function where discovery.StaleGroupVersionError is checked.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd7ad772-7cb1-4e1c-9be0-fac3e36005f1
📒 Files selected for processing (1)
test/extended/util/framework.go
|
Scheduling required tests: |
|
@jacobsee: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-vsphere-ovn-serial-runc 10 |
|
@jacobsee: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6010be20-2746-11f1-9564-e2f809730b1e-0 |
|
/test e2e-gcp-ovn |
Is this what is happening in the recent samples? Did you see any errors in kube-apiserver logs showing problems fetching fresh discovery info for this GV? I don't know the answer -- just asking to make sure we don't miss any regression that might be causing an increased failure rate for that traffic to openshift-apiserver. |
|
@benluddy to dump some thoughts on order of operations on startup, first we're seeing: ^that. But we're seeing it before this: at which point resolution issues seem to end. One minute later, we see and everything is golden. So I do think this is a "bad order of operations in the first minute - caught later by a resync" issue. For what it's worth, it looks like this has just been fixed upstream, but this PR is to test the theory (and because I don't think we need to fail tests on this... might be a good idea to have this just to be a little more resilient anyway) |
The kube-apiserver still declares itself ready even with stale discovery entries. The stale entries would be refreshed by a background worker, but there's a race window where clients can hit a newly-ready kube-apiserver and get stale discovery data. For the purpose of answering whether they exist, return true as long as they do, even if the discovery data is currently stale.