-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gce-serial tests are panicking due to NPE in autoscaling code #47127
Comments
@smarterclayton There are no sig labels on this issue. Please add a sig label by: |
#46833 might be a culprit, but maybe not |
This could be related to how the test initializes the framework or cleans it up, maybe. |
Another failure in the same run, not just this one test:
|
I'm not positive but the stack trace might be wrong due to inlining optimizations. rc.framework == nil was my first guess, but that would fail earlier in that method. |
Framework AfterEach runs namespace deletion, which may mean that .Namespace is nil (if inlining is indeed present). |
I am looking into this right now. |
It's unrelated. I reverted #46833 from the current master HEAD locally (via |
Note that I have executed one of the initializers e2e test (not autoscaling), because it is one of the tests reported by https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-serial/1173 as panicked. |
Looking a bit further I found out that commit 772ab8e added the test above as already failing. |
I have just realized that I have discovered yesterday a race condition in ResourceConsumer implementation, and #46833 might actually make it worse. Let me quickly make a PR out of a fix. |
Let me try to recap what I have found out so far:
|
Initializer was fixed in a different PR, depending on what cluster you test
on.
…On Thu, Jun 8, 2017 at 12:16 AM, Andrzej Wasylkowski < ***@***.***> wrote:
Let me try to recap what I have found out so far:
- The test that is being referred to at the top of this issue
(e2e/extension/initializers.go) fails "out of the box" and probably
deserves a separate issue to track it
- Other failing tests that look like they have a similar failure
pattern (NPE) fail because of trying to send a "consume CPU" request to a
non-existent pod (unrelated to initializers.go above, as far as I can tell)
- PR #46833 <#46833>
might cause the effect above (and probably does) because it actually
deletes the pods that should have been deleted
- However, the root cause is that there is a race condition inside
ResourceConsumer that makes "consume CPU" requests being sometimes sent
already after "CleanUp" got called (fix prepared in PR #47157
<#47157>)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47127 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p3tOZsrX2UITd8WDIhlwUJQNbifqks5sB3WHgaJpZM4Ny0pZ>
.
|
Yeah, confirmed initializers test is now passing on GCE and disabled on GKE
(which doesn't enable initializers configuration group yet)
…On Thu, Jun 8, 2017 at 1:23 AM, Clayton Coleman ***@***.***> wrote:
Initializer was fixed in a different PR, depending on what cluster you
test on.
On Thu, Jun 8, 2017 at 12:16 AM, Andrzej Wasylkowski <
***@***.***> wrote:
> Let me try to recap what I have found out so far:
>
> - The test that is being referred to at the top of this issue
> (e2e/extension/initializers.go) fails "out of the box" and probably
> deserves a separate issue to track it
> - Other failing tests that look like they have a similar failure
> pattern (NPE) fail because of trying to send a "consume CPU" request to a
> non-existent pod (unrelated to initializers.go above, as far as I can tell)
> - PR #46833 <#46833>
> might cause the effect above (and probably does) because it actually
> deletes the pods that should have been deleted
> - However, the root cause is that there is a race condition inside
> ResourceConsumer that makes "consume CPU" requests being sometimes sent
> already after "CleanUp" got called (fix prepared in PR #47157
> <#47157>)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#47127 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_p3tOZsrX2UITd8WDIhlwUJQNbifqks5sB3WHgaJpZM4Ny0pZ>
> .
>
|
Clayton, can you assign this to me? I cannot do this myself. |
…ondition Automatic merge from submit-queue (batch tested with PRs 47065, 47157, 47143) Removed a race condition from ResourceConsumer **What this PR does / why we need it**: Without this PR there is a race condition in ResourceConsumer that sometimes results in communication to pods that might not exist anymore. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes#47127 **Special notes for your reviewer**: **Release note**: ```release-note NONE ```
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-serial/1173
The text was updated successfully, but these errors were encountered: