Status Incident Service

Internal service for monitoring system status and tracking incidents.

Screenshots

Dashboard

Overview of all systems with real-time status indicators, uptime metrics, and component health.

Administration

System management interface with backup/restore, webhook configuration, and system creation.

Public Status Page

Read-only public page for external stakeholders to view system status.

System Details

Detailed view of a system with status controls, analytics, and dependency management.

SLA Reports

Generate and view SLA compliance reports with target tracking.

Analytics

Overall performance metrics, per-system analytics, and SLA reference table.

Activity Log

Complete history of all status changes with timestamps and source tracking.

API Documentation

Interactive Swagger documentation for the REST API.

Features

System Management - add projects/services with description, URL, and owner
Dependencies - track components of each system (DB, Redis, API, etc.)
Traffic Light Status - green (operational), yellow (degraded), red (outage)
Manual Updates - change status with comments
Heartbeat Monitoring - automatic URL health checks with latency tracking
Latency Graphs - visual latency history and uptime heatmaps
Incident Management - create, track, and resolve incidents with timeline updates
Maintenance Windows - schedule planned downtime excluded from SLA
SLA Reports - generate compliance reports with breach tracking
Webhook Notifications - Slack, Discord, Telegram, Microsoft Teams, generic HTTP
Public Status Page - read-only page for external stakeholders
API Keys - secure API access with scoped permissions
Change History - complete log of all status changes
Analytics - uptime/SLA, incident count, MTTR
Export/Import - backup and restore all data via API
Versioned Migrations - safe database upgrades with automatic backup
Smart Auto-refresh - dashboard updates without interrupting form editing

Tech Stack

Backend: Go + chi router
Database: SQLite (WAL mode)
Frontend: HTML templates + vanilla JS
Architecture: DDD (Domain-Driven Design)

Getting Started

Using Docker (recommended)

# Pull from GitHub Container Registry
docker pull ghcr.io/pomaxa/status-service:v1.0.0
docker run -p 8080:8080 -v status-data:/app/data ghcr.io/pomaxa/status-service:v1.0.0

# Or use latest
docker pull ghcr.io/pomaxa/status-service:latest

Download Binary

Pre-built Linux binary (amd64) available from GitHub Releases.

Local Build

# Build
go build -o status-incident .

# Run (port 8080)
./status-incident

Service will be available at http://localhost:8080

API Documentation

Swagger UI is available at http://localhost:8080/swagger/

Project Structure

├── main.go                     # Entry point
├── internal/
│   ├── domain/                 # Business logic (entities, value objects)
│   ├── application/            # Use cases (services)
│   ├── infrastructure/         # SQLite repositories, HTTP checker
│   └── interfaces/             # HTTP handlers, background workers
├── templates/                  # HTML templates
└── static/                     # CSS styles

Web Interface

Page	URL	Description
Dashboard	`/`	Overview of all systems
System	`/systems/{id}`	System details and dependencies
Public	`/status`	Public status page (read-only)
SLA	`/sla`	SLA reports and breaches
Admin	`/admin`	Manage systems and webhooks
Logs	`/logs`	Change history
Analytics	`/analytics`	Statistics and SLA
Swagger	`/swagger/`	Interactive API documentation
Metrics	`/metrics`	Prometheus metrics endpoint

REST API

Systems

# List systems
GET /api/systems

# Create system
POST /api/systems
{"name": "API", "description": "Main API", "url": "https://api.example.com", "owner": "Backend Team"}

# Get system
GET /api/systems/{id}

# Update system
PUT /api/systems/{id}
{"name": "API", "description": "Updated", "url": "https://api.example.com", "owner": "Backend Team"}

# Delete system
DELETE /api/systems/{id}

# Change status
POST /api/systems/{id}/status
{"status": "yellow", "message": "Degraded performance"}

Dependencies

# List dependencies
GET /api/systems/{id}/dependencies

# Add dependency
POST /api/systems/{id}/dependencies
{"name": "PostgreSQL", "description": "Main database"}

# Update dependency
PUT /api/dependencies/{id}

# Delete dependency
DELETE /api/dependencies/{id}

# Change dependency status
POST /api/dependencies/{id}/status
{"status": "red", "message": "Connection lost"}

# Configure heartbeat
POST /api/dependencies/{id}/heartbeat
{"url": "https://api.example.com/health", "interval": 60}

# Disable heartbeat
DELETE /api/dependencies/{id}/heartbeat

# Force check
POST /api/dependencies/{id}/check

Analytics

# Overall analytics
GET /api/analytics?period=24h

# System analytics
GET /api/systems/{id}/analytics?period=7d

# All logs
GET /api/logs?limit=100

Export / Import

# Export all data (systems, dependencies, logs)
GET /api/export
# Returns JSON file download

# Export only logs
GET /api/export/logs

# Import data from backup
POST /api/import
Content-Type: application/json
# Body: exported JSON data

Export format:

{
  "exported_at": "2024-01-15T10:30:00Z",
  "version": "1.0",
  "systems": [
    {
      "id": 1,
      "name": "API",
      "description": "Main API",
      "url": "https://api.example.com",
      "owner": "Backend Team",
      "status": "green",
      "created_at": "2024-01-01T00:00:00Z",
      "updated_at": "2024-01-15T10:00:00Z"
    }
  ],
  "dependencies": [
    {
      "id": 1,
      "system_id": 1,
      "name": "PostgreSQL",
      "description": "Main database",
      "status": "green",
      "heartbeat_url": "https://api.example.com/health/db",
      "heartbeat_interval": 60,
      "created_at": "2024-01-01T00:00:00Z"
    }
  ],
  "logs": [
    {
      "id": 1,
      "system_id": 1,
      "old_status": "green",
      "new_status": "yellow",
      "message": "High latency detected",
      "source": "manual",
      "created_at": "2024-01-15T09:00:00Z"
    }
  ]
}

Import response:

{
  "systems_imported": 5,
  "dependencies_imported": 12,
  "logs_imported": 150,
  "errors": []
}

Heartbeat Monitoring

How It Works

The service performs HTTP GET requests to configured heartbeat URLs every minute.

Status determination:

GREEN - HTTP 2xx response (200, 201, 204, etc.)
YELLOW - 1-2 consecutive failures (non-2xx or timeout)
RED - 3+ consecutive failures

Health Endpoint Examples

Your service should expose a health endpoint that returns appropriate HTTP status codes.

GREEN Status (Operational)

GET /health HTTP/1.1
Host: your-service.com

HTTP/1.1 200 OK
Content-Type: application/json

{"status": "ok"}

Any 2xx response is considered healthy:

200 OK
201 Created
204 No Content

YELLOW Status (Degraded)

Returned after 1-2 consecutive check failures:

GET /health HTTP/1.1
Host: your-service.com

HTTP/1.1 503 Service Unavailable
Content-Type: application/json

{"status": "degraded", "message": "Database connection slow"}

Non-2xx responses that trigger degraded status:

500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
Connection timeout
DNS resolution failure

RED Status (Outage)

Returned after 3+ consecutive check failures:

GET /health HTTP/1.1
Host: your-service.com

HTTP/1.1 500 Internal Server Error
Content-Type: application/json

{"status": "error", "message": "Database unreachable"}

Recommended Health Endpoint Implementation

// Go example
func healthHandler(w http.ResponseWriter, r *http.Request) {
    // Check your dependencies
    if err := db.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        json.NewEncoder(w).Encode(map[string]string{
            "status": "error",
            "message": err.Error(),
        })
        return
    }

    w.WriteHeader(http.StatusOK)
    json.NewEncoder(w).Encode(map[string]string{
        "status": "ok",
    })
}

// Node.js example
app.get('/health', async (req, res) => {
    try {
        await db.query('SELECT 1');
        res.json({ status: 'ok' });
    } catch (err) {
        res.status(503).json({ status: 'error', message: err.message });
    }
});

Request Details

The heartbeat checker sends requests with:

Method: GET
Timeout: 10 seconds
User-Agent: StatusIncident-HealthChecker/1.0
Redirects: Follows up to 10 redirects

Prometheus Metrics

The /metrics endpoint exposes metrics in Prometheus format for monitoring and alerting.

Available Metrics

Metric	Type	Labels	Description
`status_incident_system_status`	gauge	system_id, system_name	System status (0=green, 1=yellow, 2=red)
`status_incident_system_sla_target`	gauge	system_id, system_name	SLA target percentage
`status_incident_uptime_24h`	gauge	system_id, system_name	Uptime percentage over last 24h
`status_incident_dependency_status`	gauge	system_id, system_name, dependency_id, dependency_name	Dependency status
`status_incident_dependency_latency_ms`	gauge	system_id, system_name, dependency_id, dependency_name	Last check latency in ms
`status_incident_dependency_consecutive_failures`	gauge	system_id, system_name, dependency_id, dependency_name	Consecutive check failures
`status_incident_systems_total`	gauge	-	Total number of systems
`status_incident_dependencies_total`	gauge	-	Total number of dependencies
`status_incident_incidents_active`	gauge	-	Number of active incidents
`status_incident_incidents_total`	gauge	-	Total number of incidents
`status_incident_incidents_by_severity`	gauge	severity	Incidents count by severity
`status_incident_incidents_by_status`	gauge	status	Incidents count by status
`status_incident_maintenances_active`	gauge	-	Active maintenance windows
`status_incident_maintenances_scheduled`	gauge	-	Scheduled maintenance windows
`status_incident_sla_breaches_unacknowledged`	gauge	-	Unacknowledged SLA breaches

Prometheus Configuration

scrape_configs:
  - job_name: 'status-incident'
    static_configs:
      - targets: ['localhost:8080']
    scrape_interval: 30s

Example Alerting Rules

groups:
  - name: status-incident
    rules:
      - alert: SystemDown
        expr: status_incident_system_status == 2
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "System {{ $labels.system_name }} is down"

      - alert: HighLatency
        expr: status_incident_dependency_latency_ms > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High latency on {{ $labels.dependency_name }}"

      - alert: SLABreach
        expr: status_incident_sla_breaches_unacknowledged > 0
        labels:
          severity: critical
        annotations:
          summary: "Unacknowledged SLA breach detected"

Documentation

Health Check Implementation Guide - How to implement health endpoints for your services
Upgrade Guide - Safe upgrade process with automatic backups

Version Info

Check application version:

./status-incident -version
# Status Incident Service
#   Version:    v1.0.0
#   Commit:     abc123
#   Build time: 2024-01-15_10:30:00

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
deploy		deploy
docs		docs
internal		internal
screenshots		screenshots
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

License

pomaxa/status-service

Folders and files

Latest commit

History

Repository files navigation

Status Incident Service

Screenshots

Dashboard

Administration

Public Status Page

System Details

SLA Reports

Analytics

Activity Log

API Documentation

Features

Tech Stack

Getting Started

Using Docker (recommended)

Download Binary

Local Build

API Documentation

Project Structure

Web Interface

REST API

Systems

Dependencies

Analytics

Export / Import

Heartbeat Monitoring

How It Works

Health Endpoint Examples

GREEN Status (Operational)

YELLOW Status (Degraded)

RED Status (Outage)

Recommended Health Endpoint Implementation

Request Details

Prometheus Metrics

Available Metrics

Prometheus Configuration

Example Alerting Rules

Documentation

Version Info

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages