-
Notifications
You must be signed in to change notification settings - Fork 0
Home
At its core, Healthy API is a powerful, lightweight, and highly flexible health checking service written in Go. Its primary mission is to answer one critical question: "Are my services running correctly right now?" It goes beyond simple ping tests or basic uptime checks. Instead, it allows you to define precisely what "healthy" means for each of your applications.
Healthy API works by periodically sending HTTP requests to your defined service URLs and evaluating the responses against a sophisticated, user-defined set of rules called Conditions. If a service fails to meet its health conditions, Healthy API instantly sends detailed alerts through multiple channels to ensure the right people are notified immediately.
This tool is built for a world of microservices and complex application stacks where a simple 200 OK status code is not always enough to guarantee that a service is functioning properly. With its powerful condition engine, you can validate response bodies, check for specific headers, and combine these rules with logical operators, giving you complete confidence in the status of your services.
- ✨ Core Concepts
- ⚙️ Configuration Deep Dive
- 📖 Understanding Conditions: YAML vs. JSON
- 📋 Configuration Examples
- 🔧 Extending the Application
▶️ Running the Application- Master Example: A Complete, Real-World Setup
At its heart, Healthy API is built on a few simple but powerful concepts.
A Service is the fundamental entity you want to monitor. Think of it as a single health check target. Each service is defined by a unique name (for alerts), a url to check, and, most importantly, a Condition that determines its health status.
A Notifier is a communication channel used to send alerts when a service's health check fails. The application supports multiple types of notifiers out-of-the-box, allowing you to reach your team wherever they are:
- Email (SMTP): For standard email alerts.
- SMS (IPPanel): For urgent mobile notifications.
- Webhook: For ultimate flexibility, allowing integration with services like Slack, Discord, Microsoft Teams, or your own custom API.
Each configured notifier is given a unique id, which services then reference. This design is highly reusable—you can define a single "critical-alerts" Slack channel and have multiple services send notifications to it.
A Condition is a set of rules that define what makes a service "healthy." This is the most powerful feature of Healthy API. Instead of just checking for a simple 200 OK status code, you can build complex, nested logic to validate a response with precision. A service is considered healthy only if its associated condition evaluates to true. This allows you to check for specific text in the response body, verify headers, and combine these checks with logical operators like AND, OR, and NOT.
A Health Check is the complete process of:
- Sending an HTTP request to a service's
url. - Evaluating the HTTP response against its specified Condition.
- Triggering notifiers if the condition fails.
You can control the timing of these checks with two key parameters:
-
check_period: The interval in seconds between each health check when the service is healthy. -
sleep_on_fail: The interval in seconds to wait after a failure before checking again. This is crucial for preventing alert spam while your team is fixing an issue.
The application is configured using a single config.yaml file, which has three main sections: services, notifiers, and conditions.
This is an array where you define every service to be monitored.
-
name(string): A human-readable name for the service (e.g., "Production API"). This is used in alert messages, so make it descriptive! -
url(string): The full URL of the health check endpoint (e.g.,https://api.example.com/health). -
condition_id(string): This is the link to the health logic. It must match theidof a condition defined in theconditionsblock. -
check_period(integer): The wait time in seconds between checks for a healthy service. -
sleep_on_fail(integer): The wait time in seconds after a failure is detected. -
targets(array): A list of notification targets to alert on failure.-
notifier_id(string): Must match theidof a configured notifier in thenotifiersblock. -
recipients(array of strings): A list of destinations (email addresses, phone numbers, or webhook URLs) for that notifier.
-
This block contains the configuration for all your notification channels. You can define multiple notifiers of each type.
-
id(string): A unique identifier (e.g., "admins-email-group"). -
sender(string): The "From" email address. -
password(string): The SMTP server password. -
server(string): The SMTP server address (e.g., "smtp.gmail.com"). -
port(string): The SMTP server port (e.g., 587).
-
id(string): A unique identifier (e.g., "on-call-sms-alert"). -
url(string): The API endpoint for the IPPanel service. -
user(string): Your IPPanel username. -
pass(string): Your IPPanel password.
Webhooks are the most flexible notifier.
-
id(string): A unique identifier (e.g., "slack-alerts"). -
method(string): The HTTP method to use (e.g.,POST,PUT). -
headers(map): A key-value map of HTTP headers. -
json(map): The JSON payload to send in the request body.
Templating: The headers and json fields support dynamic values using Go's template engine. This is perfect for creating rich, informative alerts. The following variables are available:
-
{{ .ServiceName }}: Thenameof the failed service. -
{{ .TimeStamp }}: The timestamp of the failure in RFC3339 format (e.g.,2023-10-27T10:00:00Z). -
{{ .URL }}: The recipient URL that is currently being processed.
This is where you define the reusable, granular logic for what makes a service healthy. Each condition has a unique id that one or more services can reference.
A condition node must contain exactly one of the following keys. Let's break down the structure of each one.
This is the simplest condition. It checks if the HTTP response status code is an exact match.
-
Structure: It contains a single key,
code, whose value is the integer you expect. -
YAML Structure:
status_code: code: <INTEGER>
-
Example: A condition that is only true if the status code is
200.- id: "is-status-200" condition: status_code: code: 200
This condition checks if the body of the HTTP response matches a given regular expression (regex) pattern. This is incredibly powerful for checking for specific text, error messages, or data formats.
-
Structure: It contains a single key,
pattern, whose value is the regex string to match against the response body. -
YAML Structure:
regex: pattern: "<REGEX_PATTERN>"
-
Example: A condition that is true if the response body contains the string
"status":"UP".- id: "body-contains-status-up" condition: regex: pattern: '"status":"UP"'
This condition checks for the presence and value of one or more HTTP response headers. The condition is only true if all listed headers are found in the response and have the exact matching values.
-
Structure: It contains a list of header checks. Each item in the list is an object with two keys:
key(the header name) andvalue(the expected header value). -
YAML Structure:
header: - key: "<HEADER_NAME_1>" value: "<HEADER_VALUE_1>" - key: "<HEADER_NAME_2>" value: "<HEADER_VALUE_2>"
-
Example: A condition that is true only if the response has both a
Content-Typeofapplication/jsonAND anX-Cacheheader ofHIT.- id: "is-cached-json-response" condition: header: - key: "Content-Type" value: "application/json" - key: "X-Cache" value: "HIT"
These keys allow you to combine the basic conditions above to create complex and precise health check logic.
-
and/or: These keys take a list of condition nodes.-
and: Is true only if ALL conditions in the list are true. -
or: Is true if AT LEAST ONE condition in the list is true.
-
-
not: This key takes a single condition node and inverts its result. It is true if the inner condition is false. -
Example using
and:- id: "is-healthy-and-cached" condition: and: - status_code: code: 200 - header: - key: "X-Cache" value: "HIT"
YAML's indentation-based syntax is great for readability, but it can be tricky when nesting logical conditions. Under the hood, it's just a structured map, very similar to JSON. Understanding this relationship can make writing complex conditions much easier.
Let's look at a complex YAML condition and its JSON equivalent.
Scenario: A service is healthy if (the status is 200 AND a specific header is present) OR the status is 404.
YAML condition Block:
# This is how you'd write it in your config.yaml
conditions:
- id: complex-condition
condition:
or: # The top-level check is an OR
- and: # The first item in the OR list is an AND
- status_code:
code: 200
- header:
- key: "X-Cache-Status"
value: "HIT"
- status_code: # The second item in the OR list is a simple status check
code: 404JSON Equivalent:
This is how the YAML is parsed and understood by the application. Notice the arrays ([]) for or/and and objects ({}) for status_code:
[
{
"id": "complex-condition",
"condition": {
"or": [
{
"and": [
{
"status_code": {
"code": 200
}
},
{
"header": [
{
"key": "X-Cache-Status",
"value": "HIT"
}
]
}
]
},
{
"status_code": {
"code": 404
}
}
]
}
}
]Seeing the JSON structure makes the nesting of lists and objects explicit, which can help clarify how to structure your YAML.
Theory is great, but examples make things concrete. Here are several practical scenarios, ranging from simple to advanced, to demonstrate how to use the condition-based system effectively.
Scenario: The most common use case. You want to monitor your company's homepage. The page is considered "healthy" if it returns a 200 OK status code. If it fails, you want to send an email to the admin team.
services:
- name: "Homepage"
url: "https://my-company.com"
condition_id: "is-status-200" # <-- Reference the simple condition
check_period: 60
sleep_on_fail: 300
targets:
- notifier_id: "admins-email-group"
recipients: ["admin@my-company.com", "support@my-company.com"]
notifiers:
smtp:
- id: "admins-email-group"
sender: "monitoring@my-company.com"
password: "your-smtp-password"
server: "smtp.my-company.com"
port: "587"
conditions:
- id: "is-status-200"
condition:
status_code:
code: 200Scenario: You have a background worker service that exposes a status endpoint. This endpoint always returns a 200 status code, but its health is indicated within the JSON body. The service is only truly healthy if the body contains "status": "healthy". If not, an urgent SMS should be sent.
services:
- name: "Background Worker Status"
url: "http://localhost:8081/status"
condition_id: "body-contains-healthy"
check_period: 30
sleep_on_fail: 180
targets:
- notifier_id: "on-call-sms"
recipients: ["+15551234567"]
notifiers:
ippanel:
- id: "on-call-sms"
url: <YOUR_IPPANEL_URL>
user: <YOUR_IPPANEL_USERNAME>
pass: <YOUR_IPPANEL_PASSWORD>
conditions:
- id: "body-contains-healthy"
condition:
regex:
# This pattern looks for the exact string "status": "healthy"
# The ? makes the space after the colon optional
pattern: '"status": ?"healthy"'Scenario: Your API Gateway is behind a cache. To ensure the cache is working correctly, you need to verify that a successful response includes the X-Cache-Status: HIT header. A failure should post a message to the operations Slack channel.
services:
- name: "API Gateway Cache"
url: "[https://api.my-company.com/v1/users/popular](https://api.my-company.com/v1/users/popular)"
condition_id: "is-cache-hit"
check_period: 120
sleep_on_fail: 300
targets:
- notifier_id: "slack-ops-channel"
recipients: ["[https://hooks.slack.com/services/OPS_CHANNEL_HOOK](https://hooks.slack.com/services/OPS_CHANNEL_HOOK)"]
notifiers:
webhook:
- id: "slack-ops-channel"
method: POST
json:
text: "🟡 Cache Alert: The API Gateway cache check failed for `{{ .ServiceName }}`."
conditions:
- id: "is-cache-hit"
condition:
header:
# You can list multiple headers here; all must be present for the condition to be true.
- key: "X-Cache-Status"
value: "HIT"Scenario: A critical service is only healthy if it returns a 200 status code AND its response body contains the text "All systems operational." This ensures you're not getting a "successful" response with an empty or incorrect body.
services:
- name: "Critical Database API"
url: "[https://db-api.my-company.com/healthz](https://db-api.my-company.com/healthz)"
condition_id: "status-ok-and-body-ok"
targets:
- notifier_id: "on-call-sms"
- notifier_id: "slack-critical-alerts"
# ... (notifiers for sms and webhook would be defined)
conditions:
- id: "status-ok-and-body-ok"
condition:
and: # <-- Both conditions inside this list must be true
- status_code:
code: 200
- regex:
pattern: "All systems operational"Scenario: A data submission endpoint is considered healthy if it returns either a 200 (OK) for updates or a 201 (Created) for new entries. Any other code is a failure.
services:
- name: "Data Submission Endpoint"
url: "[https://api.my-company.com/v1/submit](https://api.my-company.com/v1/submit)"
condition_id: "status-200-or-201"
targets:
- notifier_id: "dev-team-email"
# ... (notifier for email would be defined)
conditions:
- id: "status-200-or-201"
condition:
or: # <-- At least one of the conditions in this list must be true
- status_code:
code: 200
- status_code:
code: 201Scenario: You want to monitor your public-facing website for any unexpected error messages. The page is considered unhealthy if the word "Exception" or "Fatal error" appears anywhere in the HTML. This is a powerful way to catch bugs that don't result in a 5xx status code.
services:
- name: "Public Website Error Check"
url: "[https://www.my-company.com](https://www.my-company.com)"
condition_id: "no-fatal-error-text"
targets:
- notifier_id: "admins-email-group"
# ... (notifier for email would be defined)
conditions:
- id: "no-fatal-error-text"
condition:
not: # <-- Inverts the result of the inner condition
# The service is healthy if the body does NOT match this regex
regex:
# The pipe character | acts as an OR inside the regex
pattern: "Exception|Fatal error"Scenario: This is a common real-world problem. A critical service is healthy if:
- (The status code is 200 AND the body contains
"status":"READY") OR- (The status code is
503(Service Unavailable)ANDthe body contains"status":"MAINTENANCE")
services:
- name: "User Authentication Service"
url: "[https://auth.my-company.com/status](https://auth.my-company.com/status)"
condition_id: "ready-or-maintenance"
targets:
- notifier_id: "slack-auth-alerts"
# ... (webhook notifier for slack would be defined)
conditions:
- id: "ready-or-maintenance"
condition:
or: # <-- The top-level condition is an OR
# Block 1: The 'Ready' state
- and:
- status_code:
code: 200
- regex:
pattern: '"status" ?: ?"READY"'
# Block 2: The 'Maintenance' state
- and:
- status_code:
code: 503
- regex:
pattern: '"status" ?: ?"MAINTENANCE"'The modular design of Healthy API makes it straightforward to add new notification channels (e.g., Telegram, Discord, Microsoft Teams). If you have a service you want to integrate with, you just need to implement a single interface.
Let's walk through the steps to add a hypothetical TelegramNotifier.
-
Implement the Notifier Interface:
- Create a new file in the
notifier/directory, for instance,telegram.go. - Inside this file, define a struct for your new notifier. This struct will hold any configuration it needs, like an API token.
// notifier/telegram.go package notifier import ( "healthy-api/model" "log" ) type TelegramNotifier struct { BotToken string ChatID string Logger *log.Logger }
- Implement the
notifier.Notifierinterface by adding theNotify(n model.Notification) errormethod to your struct. This method will contain the logic for sending the actual alert via the Telegram Bot API.// The Notify method for TelegramNotifier func (t *TelegramNotifier) Notify(n model.Notification) error { // Your logic to format a message and send it to the Telegram API... // For example: // message := fmt.Sprintf("🚨 Alert: Service '%s' is down!", n.ServiceName) // apiURL := fmt.Sprintf("[https://api.telegram.org/bot%s/sendMessage?chat_id=%s&text=%s](https://api.telegram.org/bot%s/sendMessage?chat_id=%s&text=%s)", t.BotToken, t.ChatID, message) // _, err := http.Post(apiURL, "", nil) // if err != nil { // t.Logger.Printf("Failed to send Telegram notification: %v", err) // return err // } // t.Logger.Printf("Sent Telegram notification for service: %s", n.ServiceName) return nil }
- Create a new file in the
-
Update the Configuration Model:
- Open
model/config.go. - Add a new struct to define the configuration fields for your notifier that will be read from the YAML file.
// model/config.go type Telegram struct { ID string `yaml:"id"` BotToken string `yaml:"bot_token"` ChatID string `yaml:"chat_id"` }
- Add a slice of your new struct to the
Notifiersstruct.// model/config.go type Notifiers struct { IPPanels []IPPanel `yaml:"ippanel"` SMTPs []SMTP `yaml:"smtp"` Webhook []Webhook `yaml:"webhook"` Telegrams[]Telegram`yaml:"telegram"` // <-- Add this line }
- Open
-
Register the New Notifier in
main.go:- In
main.go, create a new functionloadTelegramNotifiersthat is similar to the existingload...functions. - This function's job is to read the configuration, create an instance of your
TelegramNotifier, and register it with thenotifierRegistry.// main.go func loadTelegramNotifiers(cfg *model.Config, notifierRegistry *notifier.Registry, logger *log.Logger) int { count := 0 for _, tg := range cfg.Notifiers.Telegrams { // ... (error handling for duplicate IDs) ... notifierInst := ¬ifier.TelegramNotifier{ BotToken: tg.BotToken, ChatID: tg.ChatID, Logger: logger, } notifierRegistry.Register(tg.ID, notifierInst) logger.Printf("new notifier registered. type:telegram -> %v\n", notifierInst) count++ } return count }
- Finally, call your new loading function from the
mainfunction.
- In
And that's it! Users can now configure and use your new Telegram notifier in their config.yaml file.
To run the application, you must provide the path to your configuration file using the -config flag.
# Basic execution with your configuration file
go run main.go -config=config.yaml
# For production, it's better to build a self-contained binary first
go build -o healthy-api
# Run the built binary. Use the -verbose flag for detailed debugging logs.
./healthy-api -config=config.yaml -verbose-
-config(required): Specifies the path to your YAML configuration file. The application will not start without it. -
-verbose(optional): If this flag is present, the application will print detailed logs of every health check (both successful and failed) and every notification attempt. This is highly recommended for setting up and debugging your configuration.
This final example ties everything together, demonstrating a complex monitoring setup with multiple services, a variety of notifiers, and a mix of simple and nested conditions.
#===========================================
# Services to Monitor
#===========================================
services:
# Service 1: Critical API that must be fully operational and fast.
- name: "Production API"
url: "https://api.my-company.com/v1/health"
# Complex AND condition
condition_id: critical-api-health
check_period: 30
sleep_on_fail: 120
targets:
- notifier_id: "on-call-sms"
# Urgent SMS for the on-call engineer
recipients:
- "+15551234567"
- notifier_id: "slack-critical-alerts"
recipients:
# Detailed alert for the team
- "https://hooks.slack.com/services/CRITICAL_CHANNEL"
# Service 2: A public website that shouldn't show server errors to users.
- name: "Main Website"
url: "https://www.my-company.com"
# A NOT condition
condition_id: "no-server-error-text"
check_period: 300
sleep_on_fail: 600
targets:
- notifier_id: "dev-team-email"
# Non-urgent email to the whole team
recipients:
- "lead.dev@my-company.com"
- "backend.team@my-company.com"
# Service 3: A service that can be either ready or in maintenance mode.
- name: "User Authentication Service"
url: "https://auth.my-company.com/status"
#A complex OR condition
condition_id: "ready-or-maintenance"
check_period: 60
sleep_on_fail: 300
targets:
- notifier_id: "slack-info-alerts"
recipients:
# Informational-only alert
- "https://hooks.slack.com/services/INFO_CHANNEL"
#===========================================
# Notification Channel Configuration
#===========================================
notifiers:
# ------ Email (SMTP) ------
smtp:
- id: "dev-team-email"
sender: "monitoring@my-company.com"
password: "your-smtp-password"
server: "smtp.my-company.com"
port: "587"
# ------ SMS (IPPanel) ------
ippanel:
- id: "on-call-sms"
url: <YOUR_IPPANEL_URL>
user: <YOUR_IPPANEL_USERNAME>
pass: <YOUR_IPPANEL_PASSWORD>
# ------ Webhooks ------
webhook:
# A detailed, richly-formatted webhook for critical alerts using Slack's Block Kit
- id: "slack-critical-alerts"
method: POST
headers:
Content-Type: "application/json"
json:
# Fallback text for notifications
text: "🚨 CRITICAL ALERT: Service `{{ .ServiceName }}` is DOWN! 🚨"
blocks:
- type: "header"
text:
type: "plain_text"
text: "🔴 Service `{{ .ServiceName }}` is Unhealthy"
- type: "section"
fields:
- type: "mrkdwn"
text: "*Timestamp:*\n{{ .TimeStamp }}"
- type: "mrkdwn"
text: "*Endpoint URL:*\n{{ .URL }}"
- type: "context"
elements:
- type: "plain_text"
text: "This alert was triggered by Healthy-API Monitoring."
# A simpler webhook for informational alerts
- id: "slack-info-alerts"
method: POST
headers:
Content-Type: "application/json"
json:
text: "ℹ️ INFO: Service `{{ .ServiceName }}` failed its health check. URL: {{ .URL }}"
#===========================================
# Health Check Conditions
#===========================================
conditions:
# Condition for Service 1: Must be 200 OK, have the right header, AND contain "UP" in the body.
- id: "critical-api-health"
condition:
and:
- status_code:
code: 200
- header:
- key: "Content-Type"
value: "application/health+json"
- regex:
# Checks for 'status': "UP" or 'status':"UP"
pattern: '"status": ?"UP"'
# Condition for Service 2: Healthy if the body does NOT contain "Server Error" or "Database Connection Failed".
- id: "no-server-error-text"
condition:
not:
regex:
pattern: "Server Error|Database Connection Failed"
# Condition for Service 3: Handles maintenance mode gracefully.
- id: "ready-or-maintenance"
condition:
or:
# Healthy if ready
- and:
- status_code:
code: 200
- regex:
pattern: "READY"
# Also healthy if in planned maintenance
- and:
- status_code:
code: 503
- regex:
pattern: "MAINTENANCE"