New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dind: only wait for Ready non-sdn nodes #8099
Conversation
[testonlyextended][extended:networking] |
f934eb8
to
d7df840
Compare
Filtering for a string (Ready) without explicitly excluding an unwanted token that embeds said string (NotReady) is not the recipe for success one might imagine. |
The 'wait-for-cluster' command of hack/dind-cluster.sh was previously evaluating all nodes when determining whether the cluster's nodes were seen to be 'Ready' and not excluding 'NotReady'. The command now excludes the sdn node, whose state is not relevant for determining cluster readiness, and ensures that NotReady nodes are properly excluded.
d7df840
to
7d93bad
Compare
cc: @openshift/networking |
Evaluated for origin testonlyextended up to 7d93bad |
continuous-integration/openshift-jenkins/testonlyextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/10/) (Extended Tests: networking) |
oc get nodes | grep Ready | wc -l") | ||
node_count=$(echo "${node_count}" | tr -d '\r') | ||
test "${node_count}" -ge "${NODE_COUNT}" | ||
oc get nodes | grep -v ${SDN_NODE_NAME} | grep -v NotReady | grep Ready | wc -l") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe grep -v SchedulingDisabled
instead of grep -v ${SDN_NODE_NAME}
?
(Also, we really shouldn't be calling the master's node "the SDN node"... that suggests it's somehow important to the overall functioning of the SDN, which it isn't.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe grep -v SchedulingDisabled instead of grep -v ${SDN_NODE_NAME}?
I guess actually the math won't work with ${NODE_COUNT} if there was some other unschedulable node. So, ok. LGTM as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest calling the 'sdn node' instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the master" ? or "the node process on the master"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not stuck on 'sdn node', but I think a good name should reflect in some way that the node is required to ensure that the master has connectivity to the pods. I don't think either of those suggestions are sufficiently descriptive in that regard.
I do think you're right about filtering on SchedulingDisabled, though, since that is what the e2e tests check for.
only touches dind, no regression risk. approved [merge] |
LGTM. |
Evaluated for origin merge up to 7d93bad |
[Test]ing while waiting on the merge queue |
eparis: will the failure block the merge? despite what the bot says, the networking job did not fail. |
re-[test] |
Evaluated for origin test up to 7d93bad |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5589/) (Image: devenv-rhel7_3968) |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2983/) (Extended Tests: networking) |
The 'wait-for-cluster' command of hack/dind-cluster.sh was previously
evaluating all nodes when determining whether the cluster's nodes were
seen to be 'Ready' and not excluding 'NotReady'. The command now
excludes the sdn node, whose state is not relevant for determining
cluster readiness, and ensures that NotReady nodes are properly
excluded.
This should fix test flakes when the first networking test(s) lack for nodes.