-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test "Services should correctly serve identically named services in different namespaces on different external IP addresses" failing on Jenkins #5722
Comments
Oh, this test is just horrible. I manually cleaned up the ELB on the |
Prior to attempting to create new ones, cleanup from previous runs. Timeouts, 500s, etc. are possible here, and if they happen, you don't want to die forever. Along the way: Remove the timeout, it was clearly copied from the previous function and is actually an anti-pattern that needs to be fixed after discovering it doesn't play well with defers. Fixes kubernetes#5722
Amazingly, #5732 wasn't enough to close this.
Bleh. Maybe the namespace approach is right, just to avoid any possible GCE name collision issues. I suspect we may be running into an issue where delete / re-add is just too fast, and I don't want to stick in a sleep. :/ |
So this is interesting. That exact output above, but the GCE cloud console shows a |
This is actually a bug in our code (rather than the test), and a consequence of of the synchronous creation and deletion of GCE ELB's being fixed elsewhere. When we start creating the ELB's asynchronously, these Kupernetes API calls will start succeeding, and anti-entropy mechanisms in our backend will make sure that the ELB creation/deletion eventually succeeds. |
Issue #5180 refers above. |
See also discussion in PR #5732 |
Is there an approach to get this test to pass prior to that getting fixed, but sticking to API primitives? |
Yes, just increase the test timeout to 240 seconds or beyond. On Fri, Mar 20, 2015 at 6:16 PM, Zach Loafman notifications@github.com
|
It's actually not a timeout issue (we're not seeing timeouts, just complaints about duplicate resources). I just noticed that I wasn't paying close attention to the complaint, which is that the target pool was duplicated. I had cleaned up one resource and not the other on the Jenkins project explicitly. :/ Separately, I'm an idiot and the #5732 isn't nearly enough, because it's fine for developer flows (the ^C case), but those services don't actually exist in the Jenkins case because the cluster is newly created. The problem is that at some point, we lost all ability to "re-claim" ELBs by name. This actually used to just happen, and was a source of user complaints. If that was still working, a test flake from a 500-backend-error and then subsequent duplicate wouldn't be an issue. |
I just looked at Jenkins and this test now says that it has failed 2 times in the last 30 runs. If this isn't resolved, it's probably not a P0 any longer. |
Ack.
|
It's not resolved simply because I occasionally have to go clean out ELBs if one of the things fails. However, it's not as critical as it was, true. |
It's not resolved, but should be less considerably less flaky with the increased timeout that got added this afternoon. |
I also plan to make each invocation run in a new namespace, which will make On Tue, Mar 24, 2015 at 10:47 PM, Alex Robinson notifications@github.com
|
@markturansky has a TODO to clean-up persistentvolumeclaims as part of namespace termination ----- Original Message ----- I also plan to make each invocation run in a new namespace, which will make On Tue, Mar 24, 2015 at 10:47 PM, Alex Robinson notifications@github.com
Reply to this email directly or view it on GitHub: |
PR #6125 fixes the original issue reported in this issue. |
Prior to attempting to create new ones, cleanup from previous runs. Timeouts, 500s, etc. are possible here, and if they happen, you don't want to die forever. Along the way: Remove the timeout, it was clearly copied from the previous function and is actually an anti-pattern that needs to be fixed after discovering it doesn't play well with defers. Fixes kubernetes#5722
This test hasn't succeeded in the last 30 runs on GCE or GKE. Trying to figure out what's going on.
The text was updated successfully, but these errors were encountered: