Server Healthchecks

This is a repository housing various scripts useful for monitoring and reporting server's health status. They're geared to be used with the Healthchecks monitoring software, either self-hosted or cloud-hosted Healthchecks.io. However, most scripts can be used independently with small modifications or as-is.

Contents of this repo

All scripts are documented (run with -h) and require mostly just Bash & curl. This list serves as just a general overview.

The main directory contains healthcheck-specific scripts
misc contains useful status-gathering scripts for different platforms
docker contains premade recipes & docs for running this in dockr

`with-healthcheck` - automatically report status of any command

A flexible wrapper script which can report status of any commands you execute, report their execution time, and more.

The Healthcheck's official documentation is a good start. However, it assumes that you can and will modify all your scripts with curl calls. This is sometimes quite hard, or requires individual wrappers for each script. Instead, going with a Unix philosophy, the with-healthchecks.sh will take care of all real-world complexity of implementing health check calls.

To use it, instead of calling your script /root/foo/script.sh use with-healthchecks http://hc_url/ping/123 /root/foo/script.sh.

Crontab example:

# m h  dom mon dow   command
-0 2 1 * * /sbin/zpool scrub -w tank
+0 2 1 * * /root/scripts/with-healthcheck https://example.com/ping/123 /sbin/zpool scrub -w tank

Features overview:

Reporting success/failure separately (official docs)
Reporting with execution time (official docs)
Auto-reporting RunIDs
Include executed command output if desired
Forward or silence executed command output and status to crontab (i.e. no more 1>&2 /dev/null ;))
Support fault-tolerance/success-only reporting for "flaky" jobs that are meant to succeed at least sometimes

`http-middleware` - poll & report external services status

Normally Healthchecks is a push-based system, i.e. requires destination systems to report to a HTTP(S) endpoint every so often. It is not always possible to achieve that for appliances and black-box software. However, most software contains some ping/status/identity endpoint you can query to see if a device/system is alive. HTTP Middleware combines with-healthcheck and http-ping to implement a pull-based checks:

Queries a HTTP(S) endpoint
Checks its HTTP status
Checks response contents against a pattern
Reports to Healtchecks instance whether it was a success or a failure
Repeats the process again in set intervals

With this you can monitor e.g. a Plex Media Server instance (/identity endpoint) running on a NAS. You can even report status of your self-hosted Healthchecks installation to Healthchecks.io via /api/v2/status/ endpoint :) In addition, the http-middleware is meant to be scalable to a large number of checks from one container. However, please read its help message before configuring it as such, to minimize e.g. thundering herd problem.

The HTTP Middleware is especially useful in containerized environment. It can easily be added as an additional service in docker compose and automatically report status of services to a Healthchecks instance. See the docker/ folder for details.

Automatic fault tolerance

In addition, the HTTP Middleware implements an optional fault-tolerance functionality. Normally, when a check is not delivered at all Healthchecks will allow for a grace period. When a check delivers a failure signal, the grace period does not apply and the service is marked as failed right away. In some cases however, intermittent failures are to be expected. One of such examples is scheduled periodic equipment reboots.

HTTP Middleware allows for suppression of reports to the ping server for up to the configured threshold, using CHECK_FAILURE_THRESHOLD_# option. Setting the value to 1, which is the default, will report failures instantly. Any value above 1 will cause success to be reported instantly, while a failure signals will be delayed until at least a set number of consecutive failures are accumulated. Subsequent failures, after the threshold is reached, will be delivered without a delay. The counter will only reset once at least one success is reported.

When CHECK_FAILURE_THRESHOLD_# is configured and a failure passing that threshold has occurred, the log will include an additional note regarding the number of failures that occurred. In order to ensure fault tolerance doesn't trigger the notification of non-response, the grace period has to be configured to at least CHECK_FAILURE_THRESHOLD_# * expected interval.

`http-ping` - check external service status

Small script which visits a HTTP(S) URL and reports whether it was reachable. The reachability status is reported via unix exit code.

While this is an ostensibly simple task, as this script is a wrapper around curl. The complexity start when you need to check for HTTP status codes, as curl doesn't have a built-in way to handle this. This script lets you define list of HTTP codes considered successful. In some instances you may want to consider e.g. HTTP/401 a sign of the endpoint being alive:

# by default only 200 and 204 are considered successful
% http-ping http://httpstat.us/401 ; echo $?
1

# consider 204 and 401 successful (-c) and print output (-p)
% http-ping -p -c 204,401 http://httpstat.us/401 ; echo $?
401 Unauthorized
0

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
docker/http-middleware		docker/http-middleware
misc		misc
LICENSE		LICENSE
README.md		README.md
http-middleware.sh		http-middleware.sh
http-ping.sh		http-ping.sh
with-healthcheck.sh		with-healthcheck.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Server Healthchecks

Contents of this repo

`with-healthcheck` - automatically report status of any command

`http-middleware` - poll & report external services status

Automatic fault tolerance

`http-ping` - check external service status

About

Releases

Packages

Languages

License

kiler129/server-healthchecks

Folders and files

Latest commit

History

Repository files navigation

Server Healthchecks

Contents of this repo

with-healthcheck - automatically report status of any command

http-middleware - poll & report external services status

Automatic fault tolerance

http-ping - check external service status

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`with-healthcheck` - automatically report status of any command

`http-middleware` - poll & report external services status

`http-ping` - check external service status

Packages