-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMMAND Health Check does not resume after failure? #2179
Comments
The health checker expects the command to be finished after |
Ok, thanks for the information. I expected that exceeding timeoutseconds
would signal a failing health check.
|
@jolexa I agree with your assumption. I consider this as a mesos bug and created a ticket for this: https://issues.apache.org/jira/browse/MESOS-3479 |
Close that ticket, since it can not be solved in Marathon. Watch https://issues.apache.org/jira/browse/MESOS-3479 for progress. |
I have a command health check like this,
I'm curious why the health check is not restarted after 21 seconds is elapsed. The result of this is that my app is not healthy and nothing is restarting because the health check is no longer running and can't satisfy the maxConsecutiveFailures threshold. Before I added the --max-time flag, the command would go on forever after forcing a network partition between app server and its dependancy. The health endpoint will just hang. Any insight / thoughts?
The Mesos stderr logs looks like this:
Thu Aug 27 16:19:22 CDT 2015
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:06 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:07 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:08 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:11 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:12 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:13 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:14 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:15 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:16 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:17 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:18 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:19 --:--:-- 0W0827 16:19:42.832844 11540 main.cpp:375] Health check failed Command check failed with reason: status still pending after timeout 20secs
0 0 0 0 0 0 0 0 --:--:-- 0:00:20 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:21 --:--:-- 0
curl: (28) Operation timed out after 21001 milliseconds with 0 bytes received
The text was updated successfully, but these errors were encountered: