Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dind: only wait for Ready non-sdn nodes #8099

Merged
merged 1 commit into from Apr 14, 2016

Conversation

marun
Copy link
Contributor

@marun marun commented Mar 17, 2016

The 'wait-for-cluster' command of hack/dind-cluster.sh was previously
evaluating all nodes when determining whether the cluster's nodes were
seen to be 'Ready' and not excluding 'NotReady'. The command now
excludes the sdn node, whose state is not relevant for determining
cluster readiness, and ensures that NotReady nodes are properly
excluded.

This should fix test flakes when the first networking test(s) lack for nodes.

@marun
Copy link
Contributor Author

marun commented Mar 17, 2016

[testonlyextended][extended:networking]

@marun marun changed the title dind: only wait for non-sdn nodes WIP dind: only wait for non-sdn nodes Mar 17, 2016
@marun marun changed the title WIP dind: only wait for non-sdn nodes dind: only wait for Ready non-sdn nodes Apr 13, 2016
@marun
Copy link
Contributor Author

marun commented Apr 13, 2016

Filtering for a string (Ready) without explicitly excluding an unwanted token that embeds said string (NotReady) is not the recipe for success one might imagine.

The 'wait-for-cluster' command of hack/dind-cluster.sh was previously
evaluating all nodes when determining whether the cluster's nodes were
seen to be 'Ready' and not excluding 'NotReady'.  The command now
excludes the sdn node, whose state is not relevant for determining
cluster readiness, and ensures that NotReady nodes are properly
excluded.
@marun
Copy link
Contributor Author

marun commented Apr 13, 2016

cc: @openshift/networking

@openshift-bot
Copy link
Contributor

Evaluated for origin testonlyextended up to 7d93bad

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/testonlyextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/10/) (Extended Tests: networking)

oc get nodes | grep Ready | wc -l")
node_count=$(echo "${node_count}" | tr -d '\r')
test "${node_count}" -ge "${NODE_COUNT}"
oc get nodes | grep -v ${SDN_NODE_NAME} | grep -v NotReady | grep Ready | wc -l")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe grep -v SchedulingDisabled instead of grep -v ${SDN_NODE_NAME}?

(Also, we really shouldn't be calling the master's node "the SDN node"... that suggests it's somehow important to the overall functioning of the SDN, which it isn't.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe grep -v SchedulingDisabled instead of grep -v ${SDN_NODE_NAME}?

I guess actually the math won't work with ${NODE_COUNT} if there was some other unschedulable node. So, ok. LGTM as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest calling the 'sdn node' instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the master" ? or "the node process on the master"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not stuck on 'sdn node', but I think a good name should reflect in some way that the node is required to ensure that the master has connectivity to the pods. I don't think either of those suggestions are sufficiently descriptive in that regard.

I do think you're right about filtering on SchedulingDisabled, though, since that is what the e2e tests check for.

@eparis
Copy link
Member

eparis commented Apr 13, 2016

only touches dind, no regression risk. approved [merge]

@knobunc
Copy link
Contributor

knobunc commented Apr 13, 2016

LGTM.

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 7d93bad

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@marun
Copy link
Contributor Author

marun commented Apr 13, 2016

eparis: will the failure block the merge? despite what the bot says, the networking job did not fail.

@marun
Copy link
Contributor Author

marun commented Apr 13, 2016

re-[test]

@marun
Copy link
Contributor Author

marun commented Apr 13, 2016

@danmcp @eparis option to force a merge even with failing tests? The only job that this PR impacts is passing, so having to jump through hoops to get unrelated flakes to pass seems like the very definition of 'waste of time'.

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 7d93bad

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5589/) (Image: devenv-rhel7_3968)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2983/) (Extended Tests: networking)

@openshift-bot openshift-bot merged commit b910941 into openshift:master Apr 14, 2016
@marun marun deleted the dind-ignore-sdn-node branch April 15, 2016 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants