title |
---|
Health Checks |
Health checks may be specified per application to be run against that application's tasks.
- The default health check defers to Mesos' knowledge of the task state
TASK_RUNNING => healthy
- Marathon provides a
health
member of the task resource via the [REST API]({{ site.baseurl }}/docs/rest-api.html).
A health check is considered passing if (1) its HTTP response code is between
200 and 399, inclusive, and (2) its response is received within the
timeoutSeconds
period. If a task fails more than maxConseutiveFailures
health checks consecutively, that task is killed.
{
"path": "/api/health",
"portIndex": 0,
"protocol": "HTTP",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3
}
OR
{
"portIndex": 0,
"protocol": "TCP",
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 0
}
OR
{
"protocol": "COMMAND",
"command": { "value": "curl -f -X GET http://$HOST:$PORT0/health" },
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3
}
gracePeriodSeconds
(Optional. Default: 300): Health check failures are ignored within this number of seconds or until the task becomes healthy for the first time.intervalSeconds
(Optional. Default: 60): Number of seconds to wait between health checks.maxConsecutiveFailures
(Optional. Default: 3) : Number of consecutive health check failures after which the unhealthy task should be killed. If this value is0
, then tasks will not be killed due to failing this check.path
(Optional. Default: "/"): Path to endpoint exposed by the task that will provide health status. Example: "/path/to/health". Note: only used ifprotocol == "HTTP"
.portIndex
(Optional. Default: 0): Index in this app'sports
array to be used for health requests. An index is used so the app can use random ports, like "[0, 0, 0]" for example, and tasks could be started with port environment variables like$PORT1
.protocol
(Optional. Default: "HTTP"): Protocol of the requests to be performed. One of "HTTP" or "TCP".timeoutSeconds
(Optional. Default: 20): Number of seconds after which a health check is considered a failure regardless of the response.
The application health lifecycle is represented by the finite state machine in figure 1 below. In the diagram:
i
is the number of requested instancesr
is the number of running instancesh
is the number of healthy instances