Home

🩺 Welcome to the Healthy API Wiki!

At its core, Healthy API is a powerful, lightweight, and highly flexible health checking service written in Go. Its primary mission is to answer one critical question: "Are my services running correctly right now?" It goes beyond simple ping tests or basic uptime checks. Instead, it allows you to define precisely what "healthy" means for each of your applications.

Healthy API works by periodically sending HTTP requests to your defined service URLs and evaluating the responses against a sophisticated, user-defined set of rules called Conditions. If a service fails to meet its health conditions, Healthy API instantly sends detailed alerts through multiple channels to ensure the right people are notified immediately.

This tool is built for a world of microservices and complex application stacks where a simple 200 OK status code is not always enough to guarantee that a service is functioning properly. With its powerful condition engine, you can validate response bodies, check for specific headers, and combine these rules with logical operators, giving you complete confidence in the status of your services.

✨ Core Concepts

At its heart, Healthy API is built on a few simple but powerful concepts.

Services

A Service is the fundamental entity you want to monitor. Think of it as a single health check target. Each service is defined by a unique name (for alerts), a url to check, and, most importantly, a Condition that determines its health status.

Notifiers

A Notifier is a communication channel used to send alerts when a service's health check fails. The application supports multiple types of notifiers out-of-the-box, allowing you to reach your team wherever they are:

Email (SMTP): For standard email alerts.
SMS (IPPanel): For urgent mobile notifications.
Webhook: For ultimate flexibility, allowing integration with services like Slack, Discord, Microsoft Teams, or your own custom API.

Each configured notifier is given a unique id, which services then reference. This design is highly reusable—you can define a single "critical-alerts" Slack channel and have multiple services send notifications to it.

Conditions

A Condition is a set of rules that define what makes a service "healthy." This is the most powerful feature of Healthy API. Instead of just checking for a simple 200 OK status code, you can build complex, nested logic to validate a response with precision. A service is considered healthy only if its associated condition evaluates to true. This allows you to check for specific text in the response body, verify headers, and combine these checks with logical operators like AND, OR, and NOT.

Health Checks

A Health Check is the complete process of:

Sending an HTTP request to a service's url.
Evaluating the HTTP response against its specified Condition.
Triggering notifiers if the condition fails.

You can control the timing of these checks with two key parameters:

check_period: The interval in seconds between each health check when the service is healthy.
sleep_on_fail: The interval in seconds to wait after a failure before checking again. This is crucial for preventing alert spam while your team is fixing an issue.

⚙️ Configuration Deep Dive

The application is configured using a single config.yaml file, which has three main sections: services, notifiers, and conditions.

The `services` Block

This is an array where you define every service to be monitored.

name (string): A human-readable name for the service (e.g., "Production API"). This is used in alert messages, so make it descriptive!
url (string): The full URL of the health check endpoint (e.g., https://api.example.com/health).
condition_id (string): This is the link to the health logic. It must match the id of a condition defined in the conditions block.
check_period (integer): The wait time in seconds between checks for a healthy service.
sleep_on_fail (integer): The wait time in seconds after a failure is detected.
targets (array): A list of notification targets to alert on failure.
- notifier_id (string): Must match the id of a configured notifier in the notifiers block.
- recipients (array of strings): A list of destinations (email addresses, phone numbers, or webhook URLs) for that notifier.

The `notifiers` Block

This block contains the configuration for all your notification channels. You can define multiple notifiers of each type.

SMTP Config

id (string): A unique identifier (e.g., "admins-email-group").
sender (string): The "From" email address.
password (string): The SMTP server password.
server (string): The SMTP server address (e.g., "smtp.gmail.com").
port (string): The SMTP server port (e.g., 587).

IPPanel (SMS) Config

id (string): A unique identifier (e.g., "on-call-sms-alert").
url (string): The API endpoint for the IPPanel service.
user (string): Your IPPanel username.
pass (string): Your IPPanel password.

Webhook Config & Templating

Webhooks are the most flexible notifier.

id (string): A unique identifier (e.g., "slack-alerts").
method (string): The HTTP method to use (e.g., POST, PUT).
headers (map): A key-value map of HTTP headers.
json (map): The JSON payload to send in the request body.

Templating: The headers and json fields support dynamic values using Go's template engine. This is perfect for creating rich, informative alerts. The following variables are available:

{{ .ServiceName }}: The name of the failed service.
{{ .TimeStamp }}: The timestamp of the failure in RFC3339 format (e.g., 2023-10-27T10:00:00Z).
{{ .URL }}: The recipient URL that is currently being processed.

The `conditions` Block

This is where you define the reusable, granular logic for what makes a service healthy. Each condition has a unique id that one or more services can reference.

A condition node must contain exactly one of the following keys. Let's break down the structure of each one.

`status_code`

This is the simplest condition. It checks if the HTTP response status code is an exact match.

Structure: It contains a single key, code, whose value is the integer you expect.
YAML Structure:
```
status_code:
  code: <INTEGER>
```

Example: A condition that is only true if the status code is 200.

- id: "is-status-200"
  condition:
    status_code:
      code: 200

`regex`

This condition checks if the body of the HTTP response matches a given regular expression (regex) pattern. This is incredibly powerful for checking for specific text, error messages, or data formats.

Structure: It contains a single key, pattern, whose value is the regex string to match against the response body.
YAML Structure:
```
regex:
  pattern: "<REGEX_PATTERN>"
```

Example: A condition that is true if the response body contains the string "status":"UP".

- id: "body-contains-status-up"
  condition:
    regex:
      pattern: '"status":"UP"'

`header`

This condition checks for the presence and value of one or more HTTP response headers. The condition is only true if all listed headers are found in the response and have the exact matching values.

Structure: It contains a list of header checks. Each item in the list is an object with two keys: key (the header name) and value (the expected header value).

YAML Structure:

header:
  - key: "<HEADER_NAME_1>"
    value: "<HEADER_VALUE_1>"
  - key: "<HEADER_NAME_2>"
    value: "<HEADER_VALUE_2>"

Example: A condition that is true only if the response has both a Content-Type of application/json AND an X-Cache header of HIT.

- id: "is-cached-json-response"
  condition:
    header:
      - key: "Content-Type"
        value: "application/json"
      - key: "X-Cache"
        value: "HIT"

Logical Operators: `and`, `or`, `not`

These keys allow you to combine the basic conditions above to create complex and precise health check logic.

and / or: These keys take a list of condition nodes.
- and: Is true only if ALL conditions in the list are true.
- or: Is true if AT LEAST ONE condition in the list is true.
not: This key takes a single condition node and inverts its result. It is true if the inner condition is false.

Example using and:

- id: "is-healthy-and-cached"
  condition:
    and:
      - status_code:
          code: 200
      - header:
          - key: "X-Cache"
            value: "HIT"

📖 Understanding Conditions: YAML vs. JSON

YAML's indentation-based syntax is great for readability, but it can be tricky when nesting logical conditions. Under the hood, it's just a structured map, very similar to JSON. Understanding this relationship can make writing complex conditions much easier.

Let's look at a complex YAML condition and its JSON equivalent.

Scenario: A service is healthy if (the status is 200 AND a specific header is present) OR the status is 404.

YAML condition Block:

# This is how you'd write it in your config.yaml
conditions:
  - id: complex-condition
    condition:
      or: # The top-level check is an OR
        - and: # The first item in the OR list is an AND
            - status_code:
                code: 200
            - header:
                - key: "X-Cache-Status"
                  value: "HIT"
        - status_code: # The second item in the OR list is a simple status check
            code: 404

JSON Equivalent:

This is how the YAML is parsed and understood by the application. Notice the arrays ([]) for or/and and objects ({}) for status_code:

[
{
  "id": "complex-condition",
  "condition": {
    "or": [
      {
        "and": [
          {
            "status_code": {
              "code": 200
            }
          },
          {
            "header": [
              {
                "key": "X-Cache-Status",
                "value": "HIT"
              }
            ]
          }
        ]
      },
      {
        "status_code": {
          "code": 404
        }
      }
    ]
  }
}
]

Seeing the JSON structure makes the nesting of lists and objects explicit, which can help clarify how to structure your YAML.

📋 Configuration Examples

Theory is great, but examples make things concrete. Here are several practical scenarios, ranging from simple to advanced, to demonstrate how to use the condition-based system effectively.

Example 1: The Classic - Simple Status Code Check

Scenario: The most common use case. You want to monitor your company's homepage. The page is considered "healthy" if it returns a 200 OK status code. If it fails, you want to send an email to the admin team.

services:
  - name: "Homepage"
    url: "https://my-company.com"
    condition_id: "is-status-200" # <-- Reference the simple condition
    check_period: 60
    sleep_on_fail: 300
    targets:
      - notifier_id: "admins-email-group"
        recipients: ["admin@my-company.com", "support@my-company.com"]

notifiers:
  smtp:
    - id: "admins-email-group"
      sender: "monitoring@my-company.com"
      password: "your-smtp-password"
      server: "smtp.my-company.com"
      port: "587"

conditions:
  - id: "is-status-200"
    condition:
      status_code:
        code: 200

Example 2: Checking for Specific Text in the Body (Regex)

Scenario: You have a background worker service that exposes a status endpoint. This endpoint always returns a 200 status code, but its health is indicated within the JSON body. The service is only truly healthy if the body contains "status": "healthy". If not, an urgent SMS should be sent.

services:
  - name: "Background Worker Status"
    url: "http://localhost:8081/status"
    condition_id: "body-contains-healthy"
    check_period: 30
    sleep_on_fail: 180
    targets:
      - notifier_id: "on-call-sms"
        recipients: ["+15551234567"]

notifiers:
  ippanel:
    - id: "on-call-sms"
      url: <YOUR_IPPANEL_URL>
      user: <YOUR_IPPANEL_USERNAME>
      pass: <YOUR_IPPANEL_PASSWORD>

conditions:
  - id: "body-contains-healthy"
    condition:
      regex:
        # This pattern looks for the exact string "status": "healthy"
        # The ? makes the space after the colon optional
        pattern: '"status": ?"healthy"'

Example 3: Verifying a Response Header

Scenario: Your API Gateway is behind a cache. To ensure the cache is working correctly, you need to verify that a successful response includes the X-Cache-Status: HIT header. A failure should post a message to the operations Slack channel.

services:
  - name: "API Gateway Cache"
    url: "[https://api.my-company.com/v1/users/popular](https://api.my-company.com/v1/users/popular)"
    condition_id: "is-cache-hit"
    check_period: 120
    sleep_on_fail: 300
    targets:
      - notifier_id: "slack-ops-channel"
        recipients: ["[https://hooks.slack.com/services/OPS_CHANNEL_HOOK](https://hooks.slack.com/services/OPS_CHANNEL_HOOK)"]

notifiers:
  webhook:
    - id: "slack-ops-channel"
      method: POST
      json:
        text: "🟡 Cache Alert: The API Gateway cache check failed for `{{ .ServiceName }}`."

conditions:
  - id: "is-cache-hit"
    condition:
      header:
        # You can list multiple headers here; all must be present for the condition to be true.
        - key: "X-Cache-Status"
          value: "HIT"

Example 4: Combining Conditions with `AND`

Scenario: A critical service is only healthy if it returns a 200 status code AND its response body contains the text "All systems operational." This ensures you're not getting a "successful" response with an empty or incorrect body.

services:
  - name: "Critical Database API"
    url: "[https://db-api.my-company.com/healthz](https://db-api.my-company.com/healthz)"
    condition_id: "status-ok-and-body-ok"
    targets:
      - notifier_id: "on-call-sms"
      - notifier_id: "slack-critical-alerts"
# ... (notifiers for sms and webhook would be defined)

conditions:
  - id: "status-ok-and-body-ok"
    condition:
      and: # <-- Both conditions inside this list must be true
        - status_code:
            code: 200
        - regex:
            pattern: "All systems operational"

Example 5: Combining Conditions with `OR`

Scenario: A data submission endpoint is considered healthy if it returns either a 200 (OK) for updates or a 201 (Created) for new entries. Any other code is a failure.

services:
  - name: "Data Submission Endpoint"
    url: "[https://api.my-company.com/v1/submit](https://api.my-company.com/v1/submit)"
    condition_id: "status-200-or-201"
    targets:
      - notifier_id: "dev-team-email"
# ... (notifier for email would be defined)

conditions:
  - id: "status-200-or-201"
    condition:
      or: # <-- At least one of the conditions in this list must be true
        - status_code:
            code: 200
        - status_code:
            code: 201

Example 6: Using `NOT` for Negative Checks

Scenario: You want to monitor your public-facing website for any unexpected error messages. The page is considered unhealthy if the word "Exception" or "Fatal error" appears anywhere in the HTML. This is a powerful way to catch bugs that don't result in a 5xx status code.

services:
  - name: "Public Website Error Check"
    url: "[https://www.my-company.com](https://www.my-company.com)"
    condition_id: "no-fatal-error-text"
    targets:
      - notifier_id: "admins-email-group"
# ... (notifier for email would be defined)

conditions:
  - id: "no-fatal-error-text"
    condition:
      not: # <-- Inverts the result of the inner condition
        # The service is healthy if the body does NOT match this regex
        regex:
          # The pipe character | acts as an OR inside the regex
          pattern: "Exception|Fatal error"

Example 7: Advanced Nested Condition (Handling Maintenance Mode)

Scenario: This is a common real-world problem. A critical service is healthy if:

(The status code is 200 AND the body contains "status":"READY")
OR
(The status code is 503 (Service Unavailable) AND the body contains "status":"MAINTENANCE")

services:
  - name: "User Authentication Service"
    url: "[https://auth.my-company.com/status](https://auth.my-company.com/status)"
    condition_id: "ready-or-maintenance"
    targets:
      - notifier_id: "slack-auth-alerts"

# ... (webhook notifier for slack would be defined)

conditions:
  - id: "ready-or-maintenance"
    condition:
      or: # <-- The top-level condition is an OR
        # Block 1: The 'Ready' state
        - and:
            - status_code:
                code: 200
            - regex:
                pattern: '"status" ?: ?"READY"'
        # Block 2: The 'Maintenance' state
        - and:
            - status_code:
                code: 503
            - regex:
                pattern: '"status" ?: ?"MAINTENANCE"'

🔧 Extending the Application

The modular design of Healthy API makes it straightforward to add new notification channels (e.g., Telegram, Discord, Microsoft Teams). If you have a service you want to integrate with, you just need to implement a single interface.

How to Add a New Notifier

Let's walk through the steps to add a hypothetical TelegramNotifier.

Implement the Notifier Interface:

Create a new file in the notifier/ directory, for instance, telegram.go.

Inside this file, define a struct for your new notifier. This struct will hold any configuration it needs, like an API token.

// notifier/telegram.go
package notifier

import (
    "healthy-api/model"
    "log"
)

type TelegramNotifier struct {
    BotToken string
    ChatID   string
    Logger   *log.Logger
}

Implement the notifier.Notifier interface by adding the Notify(n model.Notification) error method to your struct. This method will contain the logic for sending the actual alert via the Telegram Bot API.

// The Notify method for TelegramNotifier
func (t *TelegramNotifier) Notify(n model.Notification) error {
    // Your logic to format a message and send it to the Telegram API...
    // For example:
    // message := fmt.Sprintf("🚨 Alert: Service '%s' is down!", n.ServiceName)
    // apiURL := fmt.Sprintf("[https://api.telegram.org/bot%s/sendMessage?chat_id=%s&text=%s](https://api.telegram.org/bot%s/sendMessage?chat_id=%s&text=%s)", t.BotToken, t.ChatID, message)
    // _, err := http.Post(apiURL, "", nil)
    // if err != nil {
    //     t.Logger.Printf("Failed to send Telegram notification: %v", err)
    //     return err
    // }
    // t.Logger.Printf("Sent Telegram notification for service: %s", n.ServiceName)
    return nil
}

Update the Configuration Model:

Open model/config.go.

Add a new struct to define the configuration fields for your notifier that will be read from the YAML file.

// model/config.go
type Telegram struct {
    ID       string `yaml:"id"`
    BotToken string `yaml:"bot_token"`
    ChatID   string `yaml:"chat_id"`
}

Add a slice of your new struct to the Notifiers struct.

// model/config.go
type Notifiers struct {
    IPPanels []IPPanel  `yaml:"ippanel"`
    SMTPs    []SMTP     `yaml:"smtp"`
    Webhook  []Webhook  `yaml:"webhook"`
    Telegrams[]Telegram`yaml:"telegram"` // <-- Add this line
}

Register the New Notifier in main.go:

In main.go, create a new function loadTelegramNotifiers that is similar to the existing load... functions.

This function's job is to read the configuration, create an instance of your TelegramNotifier, and register it with the notifierRegistry.

// main.go
func loadTelegramNotifiers(cfg *model.Config, notifierRegistry *notifier.Registry, logger *log.Logger) int {
    count := 0
    for _, tg := range cfg.Notifiers.Telegrams {
        // ... (error handling for duplicate IDs) ...
        notifierInst := &notifier.TelegramNotifier{
            BotToken: tg.BotToken,
            ChatID:   tg.ChatID,
            Logger:   logger,
        }
        notifierRegistry.Register(tg.ID, notifierInst)
        logger.Printf("new notifier registered. type:telegram -> %v\n", notifierInst)
        count++
    }
    return count
}

Finally, call your new loading function from the main function.

And that's it! Users can now configure and use your new Telegram notifier in their config.yaml file.

▶️ Running the Application

To run the application, you must provide the path to your configuration file using the -config flag.

# Basic execution with your configuration file
go run main.go -config=config.yaml

# For production, it's better to build a self-contained binary first
go build -o healthy-api

# Run the built binary. Use the -verbose flag for detailed debugging logs.
./healthy-api -config=config.yaml -verbose

-config (required): Specifies the path to your YAML configuration file. The application will not start without it.
-verbose (optional): If this flag is present, the application will print detailed logs of every health check (both successful and failed) and every notification attempt. This is highly recommended for setting up and debugging your configuration.

Master Example: A Complete, Real-World Setup

This final example ties everything together, demonstrating a complex monitoring setup with multiple services, a variety of notifiers, and a mix of simple and nested conditions.

#===========================================
#        Services to Monitor
#===========================================

services:
  # Service 1: Critical API that must be fully operational and fast.
  - name: "Production API"
    url: "https://api.my-company.com/v1/health"
    # Complex AND condition
    condition_id: critical-api-health 
    check_period: 30
    sleep_on_fail: 120
    targets:
      - notifier_id: "on-call-sms"
        # Urgent SMS for the on-call engineer
        recipients: 
          - "+15551234567"
      - notifier_id: "slack-critical-alerts"
        recipients: 
          # Detailed alert for the team
          - "https://hooks.slack.com/services/CRITICAL_CHANNEL"

  # Service 2: A public website that shouldn't show server errors to users.
  - name: "Main Website"
    url: "https://www.my-company.com"
    # A NOT condition
    condition_id: "no-server-error-text" 
    check_period: 300
    sleep_on_fail: 600
    targets:
      - notifier_id: "dev-team-email"
        # Non-urgent email to the whole team
        recipients:
          - "lead.dev@my-company.com"
          - "backend.team@my-company.com" 

  # Service 3: A service that can be either ready or in maintenance mode.
  - name: "User Authentication Service"
    url: "https://auth.my-company.com/status"
    #A complex OR condition
    condition_id: "ready-or-maintenance"
    check_period: 60
    sleep_on_fail: 300
    targets:
      - notifier_id: "slack-info-alerts"
        recipients: 
          # Informational-only alert
          - "https://hooks.slack.com/services/INFO_CHANNEL" 

#===========================================
#        Notification Channel Configuration
#===========================================
notifiers:
  # ------ Email (SMTP) ------
  smtp:
    - id: "dev-team-email"
      sender: "monitoring@my-company.com"
      password: "your-smtp-password"
      server: "smtp.my-company.com"
      port: "587"

  # ------ SMS (IPPanel) ------
  ippanel:
    - id: "on-call-sms"
      url: <YOUR_IPPANEL_URL>
      user: <YOUR_IPPANEL_USERNAME>
      pass: <YOUR_IPPANEL_PASSWORD>

  # ------ Webhooks ------
  webhook:
    # A detailed, richly-formatted webhook for critical alerts using Slack's Block Kit
    - id: "slack-critical-alerts"
      method: POST
      headers:
        Content-Type: "application/json"
      json:
        # Fallback text for notifications
        text: "🚨 CRITICAL ALERT: Service `{{ .ServiceName }}` is DOWN! 🚨" 
        blocks:
          - type: "header"
            text:
              type: "plain_text"
              text: "🔴 Service `{{ .ServiceName }}` is Unhealthy"
          - type: "section"
            fields:
              - type: "mrkdwn"
                text: "*Timestamp:*\n{{ .TimeStamp }}"
              - type: "mrkdwn"
                text: "*Endpoint URL:*\n{{ .URL }}"
          - type: "context"
            elements:
              - type: "plain_text"
                text: "This alert was triggered by Healthy-API Monitoring."

    # A simpler webhook for informational alerts
    - id: "slack-info-alerts"
      method: POST
      headers:
        Content-Type: "application/json"
      json:
        text: "ℹ️ INFO: Service `{{ .ServiceName }}` failed its health check. URL: {{ .URL }}"

#===========================================
#        Health Check Conditions
#===========================================
conditions:
  # Condition for Service 1: Must be 200 OK, have the right header, AND contain "UP" in the body.
  - id: "critical-api-health"
    condition:
      and:
        - status_code:
            code: 200
        - header:
            - key: "Content-Type"
              value: "application/health+json"
        - regex:
            # Checks for 'status': "UP" or 'status':"UP"
            pattern: '"status": ?"UP"' 

  # Condition for Service 2: Healthy if the body does NOT contain "Server Error" or "Database Connection Failed".
  - id: "no-server-error-text"
    condition:
      not:
        regex:
          pattern: "Server Error|Database Connection Failed"

  # Condition for Service 3: Handles maintenance mode gracefully.
  - id: "ready-or-maintenance"
    condition:
      or:
        # Healthy if ready
        - and: 
            - status_code:
                code: 200
            - regex:
                pattern: "READY"
         # Also healthy if in planned maintenance
        - and: 
            - status_code:
                code: 503
            - regex:
                pattern: "MAINTENANCE"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

🩺 Welcome to the Healthy API Wiki!

Table of Contents

✨ Core Concepts

Services

Notifiers

Conditions

Health Checks

⚙️ Configuration Deep Dive

The `services` Block

The `notifiers` Block

SMTP Config

IPPanel (SMS) Config

Webhook Config & Templating

The `conditions` Block

`status_code`

`regex`

`header`

Logical Operators: `and`, `or`, `not`

📖 Understanding Conditions: YAML vs. JSON

📋 Configuration Examples

Example 1: The Classic - Simple Status Code Check

Example 2: Checking for Specific Text in the Body (Regex)

Example 3: Verifying a Response Header

Example 4: Combining Conditions with `AND`

Example 5: Combining Conditions with `OR`

Example 6: Using `NOT` for Negative Checks

Example 7: Advanced Nested Condition (Handling Maintenance Mode)

🔧 Extending the Application

How to Add a New Notifier

▶️ Running the Application

Master Example: A Complete, Real-World Setup

Clone this wiki locally

Home

🩺 Welcome to the Healthy API Wiki!

Table of Contents

✨ Core Concepts

Services

Notifiers

Conditions

Health Checks

⚙️ Configuration Deep Dive

The services Block

The notifiers Block

SMTP Config

IPPanel (SMS) Config

Webhook Config & Templating

The conditions Block

status_code

regex

header

Logical Operators: and, or, not

📖 Understanding Conditions: YAML vs. JSON

📋 Configuration Examples

Example 1: The Classic - Simple Status Code Check

Example 2: Checking for Specific Text in the Body (Regex)

Example 3: Verifying a Response Header

Example 4: Combining Conditions with AND

Example 5: Combining Conditions with OR

Example 6: Using NOT for Negative Checks

Example 7: Advanced Nested Condition (Handling Maintenance Mode)

🔧 Extending the Application

How to Add a New Notifier

▶️ Running the Application

Master Example: A Complete, Real-World Setup

Clone this wiki locally

The `services` Block

The `notifiers` Block

The `conditions` Block

`status_code`

`regex`

`header`

Logical Operators: `and`, `or`, `not`

Example 4: Combining Conditions with `AND`

Example 5: Combining Conditions with `OR`

Example 6: Using `NOT` for Negative Checks