Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

Closed
liggitt opened this issue Aug 11, 2020 · 5 comments · Fixed by #93985
Closed

[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

liggitt opened this issue Aug 11, 2020 · 5 comments · Fixed by #93985
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@liggitt
Copy link
Member

liggitt commented Aug 11, 2020

Which test(s) are flaking:

TestInvalidPodFiltered

Reason for failure:

--- FAIL: TestInvalidPodFiltered (0.00s)
    config_test.go:126: Expected no update in channel, Got types.PodUpdate{Pods:[]*v1.Pod{(*v1.Pod)(0xc0000b4000)}, Op:1, Source:"test"}
FAIL

This is flaking rarely enough that it is not caught by our CI jobs, which currently tolerate up to 2 unit test failures per run (!).

With that toleration removed in #93605, this flake has been seen (https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/93605/pull-kubernetes-bazel-test/1293289839906525184).

Reproducible with the following steps:

  1. Build a test binary for the affected package:
go test -race -c ./pkg/kubelet/config
  1. Stress the affected test using the stress tool:
stress ./config.test -test.run TestInvalidPodFiltered
  1. Once failures are seen, look at the logs to see the details about the error:
ls -lat $TMPDIR/go-stress*
cat $TMPDIR/go-stress...<filename>

/sig node
cc @kubernetes/sig-node-test-failures

@liggitt liggitt added the kind/flake Categorizes issue or PR as related to a flaky test. label Aug 11, 2020
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. labels Aug 11, 2020
@knight42
Copy link
Member

/assign

@knight42
Copy link
Member

It is a bit funny that if I change expectNoPodUpdate

func expectNoPodUpdate(t *testing.T, ch <-chan kubetypes.PodUpdate) {
select {
case update := <-ch:
t.Errorf("Expected no update in channel, Got %#v", update)
default:
}
}
to the following(wait for 3 seconds until receive event from channel):

func expectNoPodUpdate(t *testing.T, ch <-chan kubetypes.PodUpdate) {
	select {
	case update := <-ch:
		t.Errorf("Expected no update in channel, Got %#v", update)
	case <-time.After(time.Second * 3):
	}
}

The test TestInvalidPodFiltered would consistently fail. I am not sure what the invalid update means at

// add an invalid update
podUpdate = CreatePodUpdate(kubetypes.UPDATE, TestSource, &v1.Pod{ObjectMeta: metav1.ObjectMeta{Name: "foo"}})
channel <- podUpdate
expectNoPodUpdate(t, ch)
, so I cannot think of an ideal fix. I would like to defer to the members from sig-node.

/unassign

@liggitt liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 13, 2020
@MHBauer
Copy link
Contributor

MHBauer commented Aug 14, 2020

Isn't this asserting a negative? I don't think it can work, especially not without a wait or some way of forcing the config mux storage merger thingy to run. The multiple levels is kind of mindbending, really.

I come to the same conclusion as @knight42.
Maybe the focus needs to be on the invalid update? I do not understand why it is invalid from the context.

Tracing it backwards in time, v1.8 was the last time this passed. v1.9 changed what validation means, I think I have a solution.
/assign

@knight42
Copy link
Member

@MHBauer Thanks for going deep down the commit history and found out the problem 👍 !

@MHBauer
Copy link
Contributor

MHBauer commented Aug 14, 2020

Made #93985 if either of you wish to review, and confirm my understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants