Skip to content

shivam2003-dev/kubernetes-observability-lab

Repository files navigation

Observability Lab - Complete Stack

This workspace creates a local Kubernetes cluster with kind and deploys:

Complete Observability Stack:

  • Prometheus (metrics collection and storage)
  • Alertmanager (alert routing and management - included in kube-prometheus-stack)
  • Grafana (unified visualization for metrics, logs, and traces)
  • Loki (log aggregation with Promtail for log collection)
  • Tempo (distributed tracing)
  • Jaeger (alternative distributed tracing - can be used alongside Tempo)
  • OpenTelemetry Collector (receives traces/metrics/logs and exports to all backends)
  • Two demo Node.js microservices (service-a and service-b) that produce metrics, logs, and traces

Prerequisites

  • macOS with Docker Desktop installed and running
  • kind
  • kubectl
  • helm 3
  • docker

Quick setup (runs many steps automatically):

  1. Make the setup script executable:

    chmod +x ./setup.sh

  2. Run it:

    ./setup.sh

What the script does

  • Creates a kind cluster
  • Builds Docker images for demo services and loads them into kind
  • Installs Prometheus, Grafana, Loki, and Tempo using Helm
  • Deploys the OpenTelemetry Collector
  • Deploys demo services and a load generator

How tracing works between services

  • Services are instrumented with OpenTelemetry (auto-instrumentation and manual spans).
  • Traces are sent to the OpenTelemetry Collector which forwards to Tempo.
  • Service A calls Service B; trace context is propagated via HTTP headers (W3C Trace Context).
  • Use Grafana/Tempo to search and view traces, and Grafana dashboards to visualize latency breakdowns.

Try it / verify

Quick Access (Recommended)

Use the smart port-forward script that automatically handles port conflicts:

./port-forward.sh start

This will start port-forwards for all services:

  • Grafana (preferred: 3000) - WEB UI - Unified visualization
  • Prometheus (preferred: 9090) - WEB UI - Metrics query
  • Alertmanager (preferred: 9094) - WEB UI - Alert management
  • Jaeger UI (preferred: 16686) - WEB UI - Trace visualization with SPM
  • Loki (preferred: 3200) - API ONLY (no UI, use Grafana to view logs)

📖 Open the interactive access guide:

open access-guide.html

Check status:

./port-forward.sh status

Stop all port-forwards:

./port-forward.sh stop

Manual Access (Alternative)

If you prefer to access services individually:

  1. Grafana (unified dashboard):

    kubectl port-forward svc/prometheus-grafana 3000:80 -n observability

    Open http://localhost:3000 (username: admin, password: prom-operator)

  2. Prometheus (metrics):

    kubectl port-forward svc/prometheus-operated 9090:9090 -n observability

    Open http://localhost:9090

  3. Alertmanager (alerts):

    kubectl port-forward svc/prometheus-kube-prometheus-alertmanager 9093:9093 -n observability

    Open http://localhost:9093

  4. Jaeger UI (tracing):

    kubectl port-forward svc/jaeger-query 16686:16686 -n observability

    Open http://localhost:16686

View data:

  • In Grafana, add datasources for Tempo, Loki, and Prometheus (should be auto-configured)
  • Use the provided dashboards to visualize metrics
  • Search traces in Jaeger UI or Grafana/Tempo
  • View logs in Grafana using Loki datasource

Tracing across services

  • Both services are instrumented with OpenTelemetry. Service A calls Service B over HTTP and the W3C Trace Context headers are propagated automatically by the OpenTelemetry instrumentation.
  • Traces are exported to the OpenTelemetry Collector which forwards them to Tempo. In Grafana add Tempo as a datasource to view traces and link them with metrics and logs.

Generate traffic

  • Apply the Kubernetes job to generate ongoing requests:

    kubectl apply -f k8s/traffic-generator.yaml -n observability

  • After a few seconds you should see traces in Tempo and metrics in Prometheus.

Notes and next steps

  • This lab is intentionally simple. To extend: add more services, sample dashboards in Grafana, and configure Loki log parsers and alerts in Alertmanager.

Further notes

  • This is a minimal lab for learning. For production, secure services, enable persistent storage, and configure resource limits.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published