Skip to content
@kube-rca

kuberca

KubeRCA Logo

KubeRCA

AI-Powered Kubernetes Incident Analysis & Root Cause Analysis Tool

Go Python React Helm License


Overview

KubeRCA is an open-source tool that automatically collects incident context from Kubernetes environments and provides Root Cause Analysis (RCA) and response guides using LLM.

When alerts fire in your cluster, KubeRCA:

  1. Receives alerts via Alertmanager webhook
  2. Creates/updates incidents and sends Slack thread notifications
  3. Analyzes context with AI (Strands Agents: Gemini/OpenAI/Anthropic)
  4. Streams realtime updates to the dashboard via SSE
  5. Supports similar incident search, feedback, and in-app chat workflows

Features

  • Automated Context Collection - Gather logs, metrics, and K8s events when alerts fire
  • AI-Powered Analysis - LLM-based root cause analysis with Strands Agents (Gemini/OpenAI/Anthropic)
  • Similar Incident Search - Vector similarity search using pgvector
  • Slack Integration - Real-time notifications with threaded analysis results
  • Realtime Dashboard Sync - Server-Sent Events (/api/v1/events) with polling fallback
  • Operator Feedback Loop - Vote/comment APIs for incidents and alerts
  • In-App AI Chat - Context-aware chat via Backend POST /api/v1/chat and Agent POST /chat
  • Webhook Settings UI - CRUD management for outbound webhook integrations
  • Google OIDC Login - One-click Google authentication with email allowlist
  • Web Dashboard - React-based UI for incident management
  • Helm Deployment - Easy installation via Helm charts

Architecture

flowchart LR
  %% External
  AM[Alertmanager]
  SL[Slack Bot]
  LLM[LLM API Gemini OpenAI Anthropic]
  PR[Prometheus]
  K8S[Kubernetes API]
  TP[Tempo]
  LO[Loki]
  GK[Grafana]
  AL[Alloy]
  OIDC[Google OIDC]

  %% Internal
  subgraph KubeRCA
    FE[Frontend React TypeScript]
    BE[Backend Go Gin]
    AG[Agent Python FastAPI]
    PG[(PostgreSQL pgvector)]
  end

  AM -->|Webhook| BE
  BE -->|Thread notification| SL
  FE -->|Auth Incident Alert API| BE
  FE -->|SSE stream| BE
  BE -->|Analyze and summarize| AG
  BE -->|Chat request| AG
  AG -->|K8s Context| K8S
  AG -->|Metrics Query| PR
  AG -->|LLM Analysis| LLM
  AG -.->|Trace Query| TP
  BE -->|Embeddings| LLM
  BE -.->|OIDC Token Exchange| OIDC
  FE -.->|OIDC Redirect| OIDC
  BE <-->|Data| PG
  AG -.->|Session optional| PG
  AL -.->|Collector| PR
  AL -.->|Collector| LO
  AL -.->|Collector| TP
  GK -.->|Dashboard| PR
  GK -.->|Dashboard| LO
  GK -.->|Dashboard| TP
Loading

Component Flow

Step Description
1 Alertmanager sends alerts to Backend via webhook
2 Backend creates/updates incidents, stores alerts, and posts Slack thread messages
3 Backend requests POST /analyze to Agent asynchronously
4 Agent collects K8s/Prometheus/Tempo context and calls LLM provider
5 Backend stores analysis history (alerts, alert_analyses, artifacts)
6 Backend emits SSE events and Frontend refreshes data in realtime
7 Incident resolve triggers Agent POST /summarize-incident + embedding storage
8 Frontend searches similar incidents, sends feedback, and uses in-app AI chat

Tech Stack

Application

Component Technology
Backend Go 1.24 + Gin
Agent Python 3.10+ + FastAPI + Strands Agents
Frontend React 18 + TypeScript + Vite + Tailwind CSS
Database PostgreSQL + pgvector

Infrastructure & Observability

Category Technology
Deployment Helm, ArgoCD
IaC Terraform
Monitoring Prometheus, Alertmanager, Grafana
Logging Loki, Grafana Alloy
AI/LLM Strands Agents (Gemini/OpenAI/Anthropic)

Testing

Category Technology
Chaos Engineering Chaos Mesh
Load Testing k6

Quick Start

Prerequisites

  • Kubernetes cluster (1.25+)
  • Helm 3.x
  • AI provider API key (Gemini / OpenAI / Anthropic)
  • PostgreSQL with pgvector extension (bundled subchart or external)
  • Slack bot token + channel ID (optional)

Installation via Helm (OCI, Public ECR)

# Optional: login to Public ECR (if your environment requires it)
aws ecr-public get-login-password --region us-east-1 | \
  helm registry login --username AWS --password-stdin public.ecr.aws

# Install/upgrade (chart version from charts/kube-rca/Chart.yaml)
helm upgrade --install kube-rca oci://public.ecr.aws/r5b7j2e4/kube-rca-ecr/charts/kube-rca \
  --namespace kube-rca --create-namespace \
  --version <chart-version> \
  -f values.yaml

Installation from Source (Local Chart)

git clone https://github.com/your-org/kube-rca.git
cd kube-rca/helm-charts/main

helm upgrade --install kube-rca charts/kube-rca \
  --namespace kube-rca --create-namespace \
  -f values.yaml

values.yaml Example (Gemini)

backend:
  embedding:
    provider: "gemini"
    apiKey:
      existingSecret: "kube-rca-ai"
      key: "ai-studio-api-key"
  postgresql:
    secret:
      existingSecret: "postgresql"
      key: "password"
  slack:
    enabled: true
    secret:
      existingSecret: "kube-rca-slack"

agent:
  aiProvider: "gemini"
  gemini:
    secret:
      existingSecret: "kube-rca-ai"
      key: "ai-studio-api-key"
  prometheus:
    url: "http://prometheus-server.monitoring:9090"

frontend:
  ingress:
    enabled: true
    hosts:
      - kube-rca.example.com

For OpenAI/Anthropic, set agent.aiProvider to openai or anthropic and point agent.openai.secret / agent.anthropic.secret to the corresponding secret key (openai-api-key / anthropic-api-key).

Configure Alertmanager

Add the KubeRCA webhook receiver to your Alertmanager configuration:

receivers:
  - name: "kube-rca"
    webhook_configs:
      - url: "http://<release>-backend.<namespace>.svc.cluster.local:8080/webhook/alertmanager"
        send_resolved: true

route:
  receiver: "kube-rca"
  # or add as a child route

Example (release: kube-rca, namespace: kube-rca): http://kube-rca-backend.kube-rca.svc.cluster.local:8080/webhook/alertmanager


Configuration

Secrets (Default Names)

Secret Keys Notes
postgresql postgres-password, password PostgreSQL (Bitnami subchart)
kube-rca-ai ai-studio-api-key / openai-api-key / anthropic-api-key Keys depend on agent.aiProvider / backend.embedding.provider
kube-rca-slack kube-rca-slack-token, kube-rca-slack-channel-id Required if Slack enabled
kube-rca-auth admin-username, admin-password, kube-rca-jwt-secret, oidc-client-id, oidc-client-secret Auth + OIDC (via ExternalSecret or manual)

For full configuration options, see the Helm chart values at helm-charts/main/charts/kube-rca/README.md.


Local Development

Backend (Go)

cd backend/main
go mod tidy
go run .
# or
go test ./...

Agent (Python)

cd agent/main
make install   # uv sync
make lint      # ruff check
make test      # pytest
make run       # uvicorn dev server

Frontend (React)

cd frontend/main
npm ci
npm run dev    # development server
npm run build  # production build
npm run lint   # eslint

Documentation


Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with dedication for the Kubernetes community

Popular repositories Loading

  1. backend backend Public

    Go

  2. frontend frontend Public

    TypeScript

  3. helm-charts helm-charts Public

    Go Template

  4. .github .github Public

  5. agent agent Public

    Python

Repositories

Showing 5 of 5 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…