Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for supervisord as a monit alternative. #12325

Merged
merged 1 commit into from
Aug 6, 2015

Conversation

brendandburns
Copy link
Contributor

work in progress.

@brendandburns brendandburns force-pushed the supervisord branch 5 times, most recently from f99ea62 to a9273d9 Compare August 6, 2015 02:56
@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit abad90856515a80e3eef98d75d4466b4c30ca368.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit affb421110a2cfc49192e75ee489d2a336a2f898.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit ce65f9b51a35775e293d72a984ef3dc7be44fc68.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit 23ecc4fd2ff30e28d2b93b4b380601be29d6b239.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit a9273d94e4f5a8d6e5e9ea240c90f0dbbd67e804.

@brendandburns brendandburns force-pushed the supervisord branch 2 times, most recently from 8cbb2f4 to 91e035b Compare August 6, 2015 03:32
@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit 8cbb2f410677bef4cc335744a6b5170d87833f05.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit 91e035bcdabd153bb8d0ab7fd3a67051a01c7ea4.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit 73c55c38fda527d7c50809db28127eb7a4c1ccb8.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit fc172d787de16a9ac2f4da204b0c7a847cc1c5b3.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit 7cfc70e008b55ccb08198e786538c7a7d3abc999.

@brendandburns
Copy link
Contributor Author

@zmerlynn @dchen1107 @roberthbailey

This is ready for review. Validated on a working GCE cluster.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit 4bdb22c81f7795224a2b12e033000279e388bf00.

@@ -26,7 +26,11 @@ base:
{% endif %}
{% endif %}
- logrotate
{% if grains['cloud'] is defined and grains.cloud == 'gce' %}
- supervisor
{% else %}
- monit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we want to remove monit completely, not just for GCE?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in above supervisor/init.sls, monit package is purged already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to test other platforms, so I'd prefer to be cautious, I'll send a follow up PR after I test other platforms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@dchen1107 dchen1107 self-assigned this Aug 6, 2015
@dchen1107
Copy link
Member

I will take another round of review tommorrow. Thanks!

sleep 60

while true; do
if ! curl -f -s http://127.0.0.1:10249/healthz > /dev/null; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify an explicit timeout for the curl command (using --connect-timeout)? The default connect timeout for monit is 5 seconds, but I can't tell from the curl manpage what the default for curl is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. (I used --max-time which appears to be more like what we want)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monit documentation is pretty unclear on which one of the two options it's timeout corresponds to. I originally read it as the connect timeout, but I agree that max time is closer to what we actually want (rather than just trying to exactly clone the monit configs).

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit b9210ed640f39f0d75134725be5a2610f5456d8c.

@dchen1107
Copy link
Member

ok to test

@dchen1107
Copy link
Member

You might want to logrotate supervisord' logs too.

@brendandburns
Copy link
Contributor Author

supervisor already does log rotation, by default @ 50Mb.

http://supervisord.org/logging.html

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit b9210ed640f39f0d75134725be5a2610f5456d8c.

@dchen1107
Copy link
Member

Jenkin e2e test keep failed with this pr. Still looking into them ...

@dchen1107
Copy link
Member

cc/ @andyzheng0831

sleep 60

while true; do
if ! sudo docker ps > /dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial healthchecking we performed against docker unix socket, which tests remote API, now switching to CLI. Shouldn't you using timeout here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern I have is that docker ps is a very heavy operation.

Changing it to docker version might be lightweight, but you need to parse the output and make sure "Server version:" is properly returned. This could be error-prone if docker changes the output. :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to docker version (and validated that it returns a non-zero exit code if it can't talk to the server)

how do you suggest doing a timeout?

@brendandburns
Copy link
Contributor Author

Comment addressed, please re-check.

@dchen1107
Copy link
Member

LGTM.

@brendandburns Please rebase your pr again.

@brendandburns
Copy link
Contributor Author

rebased. ptal.

thanks
--brendan

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit dd1f00ee7e43c8820e338bcae7e71339cb7a6c72.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit 58982bd4028474814499e7ed46b57f25bc50bc86.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit c886160a80580035193569d4410083df1fbea8b4.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit 15b9d98.

@dchen1107
Copy link
Member

LGTM and merging it.

dchen1107 added a commit that referenced this pull request Aug 6, 2015
Add support for supervisord as a monit alternative.
@dchen1107 dchen1107 merged commit 2be69e7 into kubernetes:master Aug 6, 2015
brendandburns added a commit that referenced this pull request Aug 7, 2015
…-#12325-release-1.0

Automated cherry pick of #12325
@brendandburns brendandburns mentioned this pull request Aug 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants