-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC TestCustomResourceCascadingDeletion and TestMixedRelationships flakes #50022
Comments
I was able to reproduce this locally (although it's very hard to trigger). Initial thoughts are that it's different from #49966; in this case, the actual sync is instant (millseconds) but the CRD created in the test fails to appear in discovery calls over the course of repeated attempts every 5 seconds for the remainder of the test. Normally, the type shows up in discovery almost immediately after creation. |
Looks like PR #50023 might be hitting a related flaky test. I get 6 failures in 60 runs from the command
I'm super new around here so thought it prudent to add the information I have found before attempting to dig deeper. I followed instructions in If the test failure logs are of any use to anyone just message and I'll post one of them. |
@tcharding Thanks. I've also been able to reproduce locally and am debugging it today. I did have to hack up the integration test setup to get apiserver logs in this context. I've also opened #50035 to explore making the integration test script more configurable with regards to logging. |
I was wrong about the discovery aspect; the CRD is coming back from discovery, but the restMapper fails to map it. I also notice we're using DiscoveryInterface.ServerPreferredResources which is not actually implemented with a cache in memCacheClient. |
@caesarxuchao okay, now I'm getting somewhere. I made a test in a new branch which exercises discovery and REST mapping without the GC in the picture at all, and have been able to reproduce the flake 1 time out of 40 runs so far with |
First problem is pretty clear in hindsight; we reset the restmapper, then get deletable resources using the discovery client directly, then sync based on the resources from discovery. The direct discovery call doesn't use the client's internal cache, and so the restmapper might be out of sync with the subsequent discovery call, causing us to sync with resources not yet represented in the restmapper's cache. That explains the isolated flake in the test I made, but doesn't yet explain why the restmapper would continue to fail on subsequent sync calls (since the nest restmapper reset would usually bring it up to date with the resources returned from the direct discovery call). |
I'm out tomorrow, so a quick update here in case somebody else looks into it... My branch can reproduce the issue where DeferredDiscoveryRESTMapper sometimes becomes unable to map the CRD resource, even after the type is available from discovery. I've also ruled out the memCacheClient by replacing it with a non-caching implementation like: type uncachedDiscoveryClient struct {
discovery.DiscoveryInterface
}
func (_ *uncachedDiscoveryClient) Fresh() bool { return true }
func (_ *uncachedDiscoveryClient) Invalidate() {} And using that in the setup like: discoveryClient := &uncachedDiscoveryClient{clientSet.Discovery()} The issue can still be reproduced that way. Looks like more debugging of DeferredDiscoveryRESTMapper itself is required. |
Finally tracked this down. Custom resource kinds generated in the tests are always given a plural suffix of "s". If the singular name ends in "s", the rest mapper assumes the plural form is an "es" suffix, which will never match the generated kind. This explains the intermittent nature of the failure (the randomly generated names often happen to match the assumptions of the rest mapper). |
#49948 is the root cause |
Good effort @ironcladlou |
Automatic merge from submit-queue Change test to work around restmapper pluralization bug Fixes kubernetes#50022 Works around the pluralization bug to unblock the queue. Once the restmapper bug is fixed in kubernetes#50012, we should add tests specifically for unconventional singular/plural word endings.
Seeing flakes on master:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8177/
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-test-go/8179/
and on unrelated PRs:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/44278/pull-kubernetes-unit/44314/
The text was updated successfully, but these errors were encountered: