Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Failing Test: [sig-scheduling] TaintBasedEvictions [Serial] Checks that the node becomes unreachable #70627
Which jobs are failing: ci-kubernetes-e2e-gce-new-master-upgrade-cluster-new
Which test(s) are failing: [sig-scheduling] TaintBasedEvictions [Serial] Checks that the node becomes unreachable
Since when has it been failing: Nov. 3rd
Reason for failure:
However, this test has a whole bunch of disregarded RBAC errors, and I'm wondering if this is just a permissions problem. Filing an issue just so the scheduling folks take a look at it, in case it does represent a real upgrade problem.
Anything else we need to know:
This particular test is unreliable, and is likely to be taken out of blocking.
@jberkus The e2e test "TaintBasedEvictions [Serial] Checks that the node becomes unreachable" simulates blocking network connection to master like this: sshing into a worker node, and issue an iptables command to block traffic to master, then wait for the worker node to be NotReady status.
And it passed in regular sig-scheduling dashboard. Not sure if there is something particular in the job "ci-kubernetes-e2e-gce-new-master-upgrade-cluster-new". I will take a close look.
One possibility is that in the upgrade testing env, there is only one worker node?
So that it enters a "fully disrupted" mode, and node lifecycle manager doesn't update node to NotReady state, per its design.
Edited: nvm, I noticed that there are 3 worker nodes.
Another symptom I observed is: in regular e2e test env where this test passed, the log shows:
I1104 21:45:51.260] Nov 4 21:45:51.260: INFO: block network traffic from 18.104.22.168:22 to 22.214.171.124
But in the "master upgrade" env, it shows:
Nov 4 17:54:56.449: INFO: block network traffic from 126.96.36.199:22 to 188.8.131.52
It looks suspicious that the master ip is resolved to be
And the e2e test uses same logic to retrieve master ip:
Edited: the master ip resolving is good, but it's that during the master upgrade, internal communication from worker nodes to master got changed to use internal ip.
I think the root cause is described in #56787:
Or, is there a way to disable this only in master upgrade env?
@justinsb I see that your PR #70681 merged as a fix for #56787. I see the test passing in the lagtest run https://testgrid.k8s.io/sig-release-master-upgrade#gce-new-master-upgrade-cluster-new !! Thanks much.
Is there any more work needed @Huang-Wei or can we close this?