[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

liggitt · 2020-08-11T21:17:48Z

Which test(s) are flaking:

TestInvalidPodFiltered

Reason for failure:

--- FAIL: TestInvalidPodFiltered (0.00s)
    config_test.go:126: Expected no update in channel, Got types.PodUpdate{Pods:[]*v1.Pod{(*v1.Pod)(0xc0000b4000)}, Op:1, Source:"test"}
FAIL

This is flaking rarely enough that it is not caught by our CI jobs, which currently tolerate up to 2 unit test failures per run (!).

With that toleration removed in #93605, this flake has been seen (https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/93605/pull-kubernetes-bazel-test/1293289839906525184).

Reproducible with the following steps:

Build a test binary for the affected package:

go test -race -c ./pkg/kubelet/config

Stress the affected test using the stress tool:

stress ./config.test -test.run TestInvalidPodFiltered

Once failures are seen, look at the logs to see the details about the error:

ls -lat $TMPDIR/go-stress*
cat $TMPDIR/go-stress...<filename>

/sig node
cc @kubernetes/sig-node-test-failures

The text was updated successfully, but these errors were encountered:

knight42 · 2020-08-12T03:38:48Z

/assign

knight42 · 2020-08-12T08:41:20Z

It is a bit funny that if I change expectNoPodUpdate

kubernetes/pkg/kubelet/config/config_test.go

Lines 123 to 129 in fa13dc1

    
           func expectNoPodUpdate(t *testing.T, ch <-chan kubetypes.PodUpdate) { 
        
           	select { 
        
           	case update := <-ch: 
        
           		t.Errorf("Expected no update in channel, Got %#v", update) 
        
           	default: 
        
           	} 
        
           }

to the following(wait for 3 seconds until receive event from channel):

func expectNoPodUpdate(t *testing.T, ch <-chan kubetypes.PodUpdate) {
	select {
	case update := <-ch:
		t.Errorf("Expected no update in channel, Got %#v", update)
	case <-time.After(time.Second * 3):
	}
}

The test TestInvalidPodFiltered would consistently fail. I am not sure what the invalid update means at

kubernetes/pkg/kubelet/config/config_test.go

Lines 192 to 195 in fa13dc1

    
           // add an invalid update 
        
           podUpdate = CreatePodUpdate(kubetypes.UPDATE, TestSource, &v1.Pod{ObjectMeta: metav1.ObjectMeta{Name: "foo"}}) 
        
           channel <- podUpdate 
        
           expectNoPodUpdate(t, ch)

, so I cannot think of an ideal fix. I would like to defer to the members from sig-node.

/unassign

MHBauer · 2020-08-14T01:01:05Z

Isn't this asserting a negative? I don't think it can work, especially not without a wait or some way of forcing the config mux storage merger thingy to run. The multiple levels is kind of mindbending, really.

I come to the same conclusion as @knight42.
Maybe the focus needs to be on the invalid update? I do not understand why it is invalid from the context.

Tracing it backwards in time, v1.8 was the last time this passed. v1.9 changed what validation means, I think I have a solution.
/assign

knight42 · 2020-08-14T03:45:02Z

@MHBauer Thanks for going deep down the commit history and found out the problem 👍 !

MHBauer · 2020-08-14T18:34:35Z

Made #93985 if either of you wish to review, and confirm my understanding.

liggitt added the kind/flake Categorizes issue or PR as related to a flaky test. label Aug 11, 2020

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. labels Aug 11, 2020

liggitt mentioned this issue Aug 11, 2020

Stop ignoring unit test flakes #93605

Merged

37 tasks

k8s-ci-robot assigned knight42 Aug 12, 2020

k8s-ci-robot unassigned knight42 Aug 12, 2020

liggitt added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 13, 2020

k8s-ci-robot assigned MHBauer Aug 14, 2020

MHBauer mentioned this issue Aug 14, 2020

update test to match validation filter of pods #93985

Merged

k8s-ci-robot closed this as completed in #93985 Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

liggitt commented Aug 11, 2020

knight42 commented Aug 12, 2020

knight42 commented Aug 12, 2020

MHBauer commented Aug 14, 2020

knight42 commented Aug 14, 2020

MHBauer commented Aug 14, 2020

[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

[sig-node] TestInvalidPodFiltered flakes with "Expected no update in channel" error #93905

Comments

liggitt commented Aug 11, 2020

knight42 commented Aug 12, 2020

knight42 commented Aug 12, 2020

MHBauer commented Aug 14, 2020

knight42 commented Aug 14, 2020

MHBauer commented Aug 14, 2020