-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for supervisord as a monit alternative. #12325
Conversation
f99ea62
to
a9273d9
Compare
GCE e2e build/test passed for commit abad90856515a80e3eef98d75d4466b4c30ca368. |
GCE e2e build/test passed for commit affb421110a2cfc49192e75ee489d2a336a2f898. |
GCE e2e build/test passed for commit ce65f9b51a35775e293d72a984ef3dc7be44fc68. |
GCE e2e build/test passed for commit 23ecc4fd2ff30e28d2b93b4b380601be29d6b239. |
GCE e2e build/test failed for commit a9273d94e4f5a8d6e5e9ea240c90f0dbbd67e804. |
8cbb2f4
to
91e035b
Compare
GCE e2e build/test passed for commit 8cbb2f410677bef4cc335744a6b5170d87833f05. |
91e035b
to
73c55c3
Compare
GCE e2e build/test failed for commit 91e035bcdabd153bb8d0ab7fd3a67051a01c7ea4. |
73c55c3
to
fc172d7
Compare
GCE e2e build/test failed for commit 73c55c38fda527d7c50809db28127eb7a4c1ccb8. |
fc172d7
to
7cfc70e
Compare
GCE e2e build/test failed for commit fc172d787de16a9ac2f4da204b0c7a847cc1c5b3. |
GCE e2e build/test passed for commit 7cfc70e008b55ccb08198e786538c7a7d3abc999. |
7cfc70e
to
4bdb22c
Compare
@zmerlynn @dchen1107 @roberthbailey This is ready for review. Validated on a working GCE cluster. |
GCE e2e build/test passed for commit 4bdb22c81f7795224a2b12e033000279e388bf00. |
@@ -26,7 +26,11 @@ base: | |||
{% endif %} | |||
{% endif %} | |||
- logrotate | |||
{% if grains['cloud'] is defined and grains.cloud == 'gce' %} | |||
- supervisor | |||
{% else %} | |||
- monit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we want to remove monit completely, not just for GCE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in above supervisor/init.sls, monit package is purged already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to test other platforms, so I'd prefer to be cautious, I'll send a follow up PR after I test other platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
I will take another round of review tommorrow. Thanks! |
sleep 60 | ||
|
||
while true; do | ||
if ! curl -f -s http://127.0.0.1:10249/healthz > /dev/null; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify an explicit timeout for the curl command (using --connect-timeout
)? The default connect timeout for monit is 5 seconds, but I can't tell from the curl manpage what the default for curl is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. (I used --max-time
which appears to be more like what we want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The monit documentation is pretty unclear on which one of the two options it's timeout corresponds to. I originally read it as the connect timeout, but I agree that max time is closer to what we actually want (rather than just trying to exactly clone the monit configs).
GCE e2e build/test failed for commit b9210ed640f39f0d75134725be5a2610f5456d8c. |
ok to test |
You might want to logrotate supervisord' logs too. |
supervisor already does log rotation, by default @ 50Mb. |
GCE e2e build/test failed for commit b9210ed640f39f0d75134725be5a2610f5456d8c. |
Jenkin e2e test keep failed with this pr. Still looking into them ... |
cc/ @andyzheng0831 |
sleep 60 | ||
|
||
while true; do | ||
if ! sudo docker ps > /dev/null; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial healthchecking we performed against docker unix socket, which tests remote API, now switching to CLI. Shouldn't you using timeout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another concern I have is that docker ps is a very heavy operation.
Changing it to docker version might be lightweight, but you need to parse the output and make sure "Server version:" is properly returned. This could be error-prone if docker changes the output. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched to docker version (and validated that it returns a non-zero exit code if it can't talk to the server)
how do you suggest doing a timeout?
b9210ed
to
58982bd
Compare
Comment addressed, please re-check. |
58982bd
to
dd1f00e
Compare
LGTM. @brendandburns Please rebase your pr again. |
dd1f00e
to
c886160
Compare
rebased. ptal. thanks |
GCE e2e build/test failed for commit dd1f00ee7e43c8820e338bcae7e71339cb7a6c72. |
GCE e2e build/test failed for commit 58982bd4028474814499e7ed46b57f25bc50bc86. |
GCE e2e build/test failed for commit c886160a80580035193569d4410083df1fbea8b4. |
c886160
to
15b9d98
Compare
GCE e2e build/test passed for commit 15b9d98. |
LGTM and merging it. |
Add support for supervisord as a monit alternative.
…-#12325-release-1.0 Automated cherry pick of #12325
work in progress.