kuberca

KubeRCA

AI-Powered Kubernetes Incident Analysis & Root Cause Analysis Tool

Overview

KubeRCA is an open-source tool that automatically collects incident context from Kubernetes environments and provides Root Cause Analysis (RCA) and response guides using LLM.

When alerts fire in your cluster, KubeRCA:

Receives alerts via Alertmanager webhook
Creates/updates incidents and sends Slack thread notifications
Analyzes context with AI (Strands Agents: Gemini/OpenAI/Anthropic)
Streams realtime updates to the dashboard via SSE
Supports similar incident search, feedback, and in-app chat workflows

Features

Automated Context Collection - Gather logs, metrics, and K8s events when alerts fire
AI-Powered Analysis - LLM-based root cause analysis with Strands Agents (Gemini/OpenAI/Anthropic)
Similar Incident Search - Vector similarity search using pgvector
Slack Integration - Real-time notifications with threaded analysis results
Realtime Dashboard Sync - Server-Sent Events (/api/v1/events) with polling fallback
Operator Feedback Loop - Vote/comment APIs for incidents and alerts
In-App AI Chat - Context-aware chat via Backend POST /api/v1/chat and Agent POST /chat
Webhook Settings UI - CRUD management for outbound webhook integrations
Google OIDC Login - One-click Google authentication with email allowlist
Web Dashboard - React-based UI for incident management
Helm Deployment - Easy installation via Helm charts

Architecture

flowchart LR
  %% External
  AM[Alertmanager]
  SL[Slack Bot]
  LLM[LLM API Gemini OpenAI Anthropic]
  PR[Prometheus]
  K8S[Kubernetes API]
  TP[Tempo]
  LO[Loki]
  GK[Grafana]
  AL[Alloy]
  OIDC[Google OIDC]

  %% Internal
  subgraph KubeRCA
    FE[Frontend React TypeScript]
    BE[Backend Go Gin]
    AG[Agent Python FastAPI]
    PG[(PostgreSQL pgvector)]
  end

  AM -->|Webhook| BE
  BE -->|Thread notification| SL
  FE -->|Auth Incident Alert API| BE
  FE -->|SSE stream| BE
  BE -->|Analyze and summarize| AG
  BE -->|Chat request| AG
  AG -->|K8s Context| K8S
  AG -->|Metrics Query| PR
  AG -->|LLM Analysis| LLM
  AG -.->|Trace Query| TP
  BE -->|Embeddings| LLM
  BE -.->|OIDC Token Exchange| OIDC
  FE -.->|OIDC Redirect| OIDC
  BE <-->|Data| PG
  AG -.->|Session optional| PG
  AL -.->|Collector| PR
  AL -.->|Collector| LO
  AL -.->|Collector| TP
  GK -.->|Dashboard| PR
  GK -.->|Dashboard| LO
  GK -.->|Dashboard| TP

Component Flow

Step	Description
1	Alertmanager sends alerts to Backend via webhook
2	Backend creates/updates incidents, stores alerts, and posts Slack thread messages
3	Backend requests `POST /analyze` to Agent asynchronously
4	Agent collects K8s/Prometheus/Tempo context and calls LLM provider
5	Backend stores analysis history (`alerts`, `alert_analyses`, `artifacts`)
6	Backend emits SSE events and Frontend refreshes data in realtime
7	Incident resolve triggers Agent `POST /summarize-incident` + embedding storage
8	Frontend searches similar incidents, sends feedback, and uses in-app AI chat

Tech Stack

Application

Component	Technology
Backend	Go 1.24 + Gin
Agent	Python 3.10+ + FastAPI + Strands Agents
Frontend	React 18 + TypeScript + Vite + Tailwind CSS
Database	PostgreSQL + pgvector

Infrastructure & Observability

Category	Technology
Deployment	Helm, ArgoCD
IaC	Terraform
Monitoring	Prometheus, Alertmanager, Grafana
Logging	Loki, Grafana Alloy
AI/LLM	Strands Agents (Gemini/OpenAI/Anthropic)

Testing

Category	Technology
Chaos Engineering	Chaos Mesh
Load Testing	k6

Quick Start

Prerequisites

Kubernetes cluster (1.25+)
Helm 3.x
AI provider API key (Gemini / OpenAI / Anthropic)
PostgreSQL with pgvector extension (bundled subchart or external)
Slack bot token + channel ID (optional)

Installation via Helm (OCI, Public ECR)

# Optional: login to Public ECR (if your environment requires it)
aws ecr-public get-login-password --region us-east-1 | \
  helm registry login --username AWS --password-stdin public.ecr.aws

# Install/upgrade (chart version from charts/kube-rca/Chart.yaml)
helm upgrade --install kube-rca oci://public.ecr.aws/r5b7j2e4/kube-rca-ecr/charts/kube-rca \
  --namespace kube-rca --create-namespace \
  --version <chart-version> \
  -f values.yaml

Installation from Source (Local Chart)

git clone https://github.com/your-org/kube-rca.git
cd kube-rca/helm-charts/main

helm upgrade --install kube-rca charts/kube-rca \
  --namespace kube-rca --create-namespace \
  -f values.yaml

values.yaml Example (Gemini)

backend:
  embedding:
    provider: "gemini"
    apiKey:
      existingSecret: "kube-rca-ai"
      key: "ai-studio-api-key"
  postgresql:
    secret:
      existingSecret: "postgresql"
      key: "password"
  slack:
    enabled: true
    secret:
      existingSecret: "kube-rca-slack"

agent:
  aiProvider: "gemini"
  gemini:
    secret:
      existingSecret: "kube-rca-ai"
      key: "ai-studio-api-key"
  prometheus:
    url: "http://prometheus-server.monitoring:9090"

frontend:
  ingress:
    enabled: true
    hosts:
      - kube-rca.example.com

For OpenAI/Anthropic, set agent.aiProvider to openai or anthropic and point agent.openai.secret / agent.anthropic.secret to the corresponding secret key (openai-api-key / anthropic-api-key).

Configure Alertmanager

Add the KubeRCA webhook receiver to your Alertmanager configuration:

receivers:
  - name: "kube-rca"
    webhook_configs:
      - url: "http://<release>-backend.<namespace>.svc.cluster.local:8080/webhook/alertmanager"
        send_resolved: true

route:
  receiver: "kube-rca"
  # or add as a child route

Example (release: kube-rca, namespace: kube-rca): http://kube-rca-backend.kube-rca.svc.cluster.local:8080/webhook/alertmanager

Configuration

Secrets (Default Names)

Secret	Keys	Notes
`postgresql`	`postgres-password`, `password`	PostgreSQL (Bitnami subchart)
`kube-rca-ai`	`ai-studio-api-key` / `openai-api-key` / `anthropic-api-key`	Keys depend on `agent.aiProvider` / `backend.embedding.provider`
`kube-rca-slack`	`kube-rca-slack-token`, `kube-rca-slack-channel-id`	Required if Slack enabled
`kube-rca-auth`	`admin-username`, `admin-password`, `kube-rca-jwt-secret`, `oidc-client-id`, `oidc-client-secret`	Auth + OIDC (via ExternalSecret or manual)

For full configuration options, see the Helm chart values at helm-charts/main/charts/kube-rca/README.md.

Local Development

Backend (Go)

cd backend/main
go mod tidy
go run .
# or
go test ./...

Agent (Python)

cd agent/main
make install   # uv sync
make lint      # ruff check
make test      # pytest
make run       # uvicorn dev server

Frontend (React)

cd frontend/main
npm ci
npm run dev    # development server
npm run build  # production build
npm run lint   # eslint

Documentation

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with dedication for the Kubernetes community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kuberca

KubeRCA

Overview

Features

Architecture

Component Flow

Tech Stack

Application

Infrastructure & Observability

Testing

Quick Start

Prerequisites

Installation via Helm (OCI, Public ECR)

Installation from Source (Local Chart)

values.yaml Example (Gemini)

Configure Alertmanager

Configuration

Secrets (Default Names)

Local Development

Backend (Go)

Agent (Python)

Frontend (React)

Documentation

Contributing

License

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!