Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Verification #1159

Merged
merged 4 commits into from Jan 13, 2016
Merged

API Verification #1159

merged 4 commits into from Jan 13, 2016

Conversation

abutcher
Copy link
Member

Wait until the API becomes available

  • before starting the controllers service w/ native ha.
  • before node registration.
  • after api service restarts

@detiber
Copy link
Contributor

detiber commented Jan 11, 2016

@sdodson would this solve your containerized problem if we trigger the test for non-ha installs as well?

It might require a bit of rework to make it so that the ordering is correct for ha and non-ha deployments, but I think it's worth it to notify the user of a failed api server before getting a failure from a later role (usually openshift_examples)

@sdodson
Copy link
Member

sdodson commented Jan 11, 2016

The containerization problem is unique to having to restart docker during node initialization which leads to up to a minute of master downtime while systemd starts the master services again after docker has been restarted. We need to fix both. Should I limit my fix to the node restarting docker in #1137 ?

@detiber
Copy link
Contributor

detiber commented Jan 11, 2016

@sdodson hmm, I think considering they are slightly different issues, it probably wouldn't hurt to leave the additional check in place for #1137

@abutcher abutcher force-pushed the wait-for-api branch 4 times, most recently from 48f29af to 08c0105 Compare January 11, 2016 20:35
@sdodson
Copy link
Member

sdodson commented Jan 11, 2016

👍 to needing this and the other, LGTM

@abutcher
Copy link
Member Author

@wshearn is testing this w/ online

@detiber
Copy link
Contributor

detiber commented Jan 12, 2016

👍

@abutcher abutcher force-pushed the wait-for-api branch 3 times, most recently from 139a9f7 to 867c4de Compare January 12, 2016 15:51
@wshearn
Copy link
Contributor

wshearn commented Jan 12, 2016

So the curl part is failing for me, it is like the api_available_output is not getting updated.

TASK: [openshift_master | Wait for API to become available] ******************* 
ok: [XXXXXXXXXX]

TASK: [openshift_master | fail ] ********************************************** 
failed: [XXXXXXXXXX] => {"failed": true}
msg: Unable to contact master API at https://nope.com

FATAL: all hosts have already failed -- aborting

But the ELB is saying InService around the time it fails.

@detiber
Copy link
Contributor

detiber commented Jan 12, 2016

@wshearn how is the elb health check configured? If I'm remembering the default settings right 120s might time out before re-adding a previously failed host.

@wshearn
Copy link
Contributor

wshearn commented Jan 12, 2016

Ping Target TCP:443
Timeout 5 seconds
Interval 30 seconds
Unhealthy Threshold 2
Healthy Threshold 2

And I bumped it up to 180 retries.

@detiber
Copy link
Contributor

detiber commented Jan 12, 2016

how about increasing the retry interval, I believe I vaguely remember reading a bug about using 0 or 1 as the retry interval?

@abutcher
Copy link
Member Author

@wshearn the fail conditional was reversed. Updated and this is working for me locally.

@abutcher
Copy link
Member Author

aos-ci-test

@wshearn
Copy link
Contributor

wshearn commented Jan 12, 2016

👍 works for me.

@abutcher
Copy link
Member Author

Addressing the nosetests failure in #1169

@abutcher
Copy link
Member Author

aos-ci-test

@abutcher abutcher changed the title Native HA: wait for API before starting controllers API Verification Jan 13, 2016
@abutcher
Copy link
Member Author

Melded @sdodson and my changes together and I'm still testing native ha with the combo.

@abutcher
Copy link
Member Author

aos-ci-test

@abutcher
Copy link
Member Author

@detiber @wshearn PTAL

retries: 120
delay: 1
changed_when: false
when: openshift_master_ha | bool and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this down below the set_fact and make it conditional on start_result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@abutcher
Copy link
Member Author

aos-ci-test

@abutcher
Copy link
Member Author

Closes #1137

@abutcher
Copy link
Member Author

@detiber Moved the wait below set_fact

@detiber
Copy link
Contributor

detiber commented Jan 13, 2016

@brenton 👍

brenton added a commit that referenced this pull request Jan 13, 2016
@brenton brenton merged commit 674e812 into openshift:master Jan 13, 2016
@abutcher abutcher deleted the wait-for-api branch January 13, 2016 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants