Skip to content

me1abu/devOpsOrchestrate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

18 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ง AutoSRE- A Self-Healing DevOps Orchestrator

Self-Healing DevOps AI Powered Hackathon

Autonomous incident detection, AI-powered analysis, and self-healing code generation.

Zero human intervention from alert to pull request.

Live Dashboard โ€ข Demo Video โ€ข Architecture) โ€ข Quick Start)


๐ŸŽฏ The Problem

DevOps teams are drowning:

  • 500+ alerts daily - Most are noise, but critical ones hide in the chaos
  • Hours spent diagnosing - Manually correlating logs, metrics, and traces
  • Repetitive fixes - The same issues require the same solutions
  • Human bottleneck - Engineers are the single point of failure at 3 AM

What if infrastructure could heal itself?


๐Ÿ’ก The Solution

An autonomous agent that:

  1. ๐Ÿ” Detects infrastructure incidents in real-time
  2. ๐Ÿง  Analyzes root causes using AI-powered log analysis
  3. ๐Ÿ”ง Generates code fixes automatically via agentic workflows
  4. ๐Ÿ“ Submits pull requests for review
  5. โœ… Validates changes with AI-powered code review
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Incident  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   Kestra    โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Cline MCP  โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  GitHub PR  โ”‚
โ”‚  Detection  โ”‚     โ”‚ Orchestratorโ”‚     โ”‚  Auto-Fix   โ”‚     โ”‚  + Review   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚    AI     โ”‚
                    โ”‚ Analysis  โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ† Hackathon Prize Tracks

This project targets $15,000 across 5 sponsor tracks:

Sponsor Prize Integration
Kestra $4,000 Workflow orchestration with AI Agent plugin
Cline $5,000 Custom MCP Server for autonomous code generation
Oumi $3,000 Fine-tuned SRE-LLM for log analysis
Vercel $2,000 Real-time mission control dashboard
CodeRabbit $1,000 AI-powered PR reviews of generated fixes

๐Ÿ—๏ธ Architecture

                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                            โ”‚           SELF-HEALING PIPELINE          โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚             โ”‚         โ”‚                                                     โ”‚
    โ”‚  Log Source โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
    โ”‚  (Webhook)  โ”‚         โ”‚  โ”‚ Kestra  โ”‚โ”€โ”€โ”€โ–ถโ”‚   AI    โ”‚โ”€โ”€โ”€โ–ถโ”‚  Severity    โ”‚   โ”‚
    โ”‚             โ”‚         โ”‚  โ”‚ Trigger โ”‚    โ”‚ Analysisโ”‚    โ”‚  Router      โ”‚   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
                            โ”‚                                       โ”‚           โ”‚
                            โ”‚            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”       โ”‚
                            โ”‚            โ–ผ                              โ–ผ       โ”‚
                            โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
                            โ”‚  โ”‚  HIGH/CRITICAL  โ”‚          โ”‚   LOW/MEDIUM   โ”‚  โ”‚
                            โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚          โ”‚                โ”‚  โ”‚
                            โ”‚  โ”‚  โ”‚ MCP Serverโ”‚  โ”‚          โ”‚  Log & Alert   โ”‚  โ”‚
                            โ”‚  โ”‚  โ”‚  (Cline)  โ”‚  โ”‚          โ”‚                โ”‚  โ”‚
                            โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
                            โ”‚  โ”‚        โ”‚        โ”‚                              โ”‚
                            โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”  โ”‚                              โ”‚
                            โ”‚  โ”‚  โ”‚ Generate  โ”‚  โ”‚                              โ”‚
                            โ”‚  โ”‚  โ”‚   Fix     โ”‚  โ”‚                              โ”‚
                            โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚                              โ”‚
                            โ”‚  โ”‚        โ”‚        โ”‚                              โ”‚
                            โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”  โ”‚                              โ”‚
                            โ”‚  โ”‚  โ”‚ Create PR โ”‚  โ”‚                              โ”‚
                            โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚                              โ”‚
                            โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                              โ”‚
                            โ”‚           โ”‚                                       โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                        โ”‚
                                        โ–ผ
                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                            โ”‚      CodeRabbit       โ”‚
                            โ”‚    AI Code Review     โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                        โ”‚
                                        โ–ผ
                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                            โ”‚    โœ… Merge Ready     โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜


    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚                         REAL-TIME DASHBOARD                             โ”‚
    โ”‚                         (Vercel - Next.js)                              โ”‚
    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
    โ”‚  โ”‚System Health โ”‚  โ”‚  Incidents   โ”‚  โ”‚ Activity Feedโ”‚  โ”‚   Metrics   โ”‚ โ”‚
    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Key Features

๐Ÿค– Autonomous Incident Detection

  • Webhook-based ingestion from any monitoring tool
  • Real-time log analysis and classification
  • Severity-based routing (Critical โ†’ Auto-fix, Low โ†’ Log only)

๐Ÿง  AI-Powered Analysis

  • Custom SRE-LLM trained on infrastructure patterns
  • Root cause identification
  • Suggested remediation with confidence scores

๐Ÿ”ง Automatic Code Generation

  • Cline MCP Server integration for autonomous coding
  • Context-aware fixes based on repository structure
  • Automatic PR creation with detailed descriptions

๐Ÿ“Š Real-Time Dashboard

  • Live incident tracking
  • Activity stream with SSE updates
  • System health monitoring
  • Demo controls for testing

โœ… AI Code Review

  • CodeRabbit integration for automated PR reviews
  • AI reviewing AI-generated code
  • Quality gates before merge

๐Ÿš€ Quick Start

Prerequisites

  • Node.js 18+
  • Docker & Docker Compose
  • Git
  • GitHub Account + Personal Access Token
  • OpenAI API Key

Installation

# Clone the repository
git clone https://github.com/me1abu/devOpsOrchestrate.git
cd devOpsOrchestrate

# Copy environment variables
cp .env.example .env
# Edit .env with your API keys

# Start the infrastructure
docker-compose up -d

# Start the dashboard (development)
cd dashboard
npm install
npm run dev

Access Points

Service URL Credentials
Dashboard http://localhost:3000 -
Kestra UI http://localhost:8080 admin@kestra.io / Kestra123
MCP Server http://localhost:3001 -

๐Ÿ“ฆ Project Structure

devOpsOrchestrate/
โ”œโ”€โ”€ ๐Ÿ“‚ dashboard/              # Next.js real-time dashboard
โ”‚   โ”œโ”€โ”€ app/                   # App router pages
โ”‚   โ”œโ”€โ”€ components/            # React components
โ”‚   โ””โ”€โ”€ Dockerfile
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ kestra/flows/           # Kestra workflow definitions
โ”‚   โ”œโ”€โ”€ main-orchestrator.yml  # Main incident processing flow
โ”‚   โ””โ”€โ”€ auto-fix-workflow.yml  # Autonomous remediation flow
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ mcp-server/             # Cline MCP Server
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ index.ts           # Express server + SSE
โ”‚   โ”‚   โ””โ”€โ”€ tools.ts           # MCP tool definitions
โ”‚   โ””โ”€โ”€ Dockerfile
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ oumi/                   # Oumi model training
โ”‚   โ”œโ”€โ”€ data/                  # Training data (JSONL)
โ”‚   โ”œโ”€โ”€ train.py               # Training script
โ”‚   โ””โ”€โ”€ config.yaml            # Model configuration
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ monitoring/             # Demo monitoring setup
โ”œโ”€โ”€ ๐Ÿ“‚ scripts/                # Utility scripts
โ”œโ”€โ”€ ๐Ÿณ docker-compose.yml      # Full stack deployment
โ””โ”€โ”€ ๐Ÿ“„ README.md

๐Ÿ”Œ MCP Server Tools

The MCP Server exposes these tools for Cline integration:

Tool Description
get_pending_incidents() Fetch unresolved incidents
get_incident_details(id) Get full incident context
get_repository_context() Understand codebase structure
apply_fix(incident_id, fix) Apply generated fix
create_pull_request(...) Create GitHub PR
report_fix_status(...) Update incident status

API Endpoints

GET  /              # API documentation
GET  /health        # Health check
GET  /events        # SSE stream for real-time updates
GET  /incidents     # List all incidents
POST /incidents     # Create new incident
GET  /incidents/:id # Get incident details
PATCH /incidents/:id # Update incident
GET  /stats         # Get statistics

๐ŸŽฌ Demo

Live Deployment

Demo Video

๐Ÿ“บ Watch the 3-minute demo (Coming soon)

Trigger a Demo Incident

curl -X POST https://mcp-server-deploy.up.railway.app/incidents \
  -H "Content-Type: application/json" \
  -d '{
    "severity": "critical",
    "category": "database",
    "summary": "Connection pool exhausted",
    "description": "FATAL: max_connections=100 exceeded",
    "source": "postgresql"
  }'

๐Ÿ› ๏ธ Sponsor Technology Deep Dive

Kestra - Workflow Orchestration

# Example: Main orchestrator flow
id: self-healing-orchestrator
namespace: devops.healing

tasks:
  - id: analyze-incident
    type: io.kestra.plugin.scripts.python.Script
    script: |
      # AI-powered log analysis
      # Severity classification
      # Root cause identification

  - id: trigger-autofix
    type: io.kestra.plugin.core.flow.If
    condition: "{{ severity == 'critical' }}"
    then:
      - id: call-mcp-server
        type: io.kestra.plugin.core.http.Request
        uri: "{{ mcp_server_url }}/fix"

Cline - MCP Server Integration

// MCP tools for autonomous code generation
const tools = [
  {
    name: "get_pending_incidents",
    description: "Fetch incidents awaiting fixes",
    handler: async () => await db.getIncidents({ status: "pending" })
  },
  {
    name: "create_pull_request",
    description: "Create a GitHub PR with the fix",
    handler: async ({ title, body, branch }) => {
      return await github.createPR({ title, body, branch });
    }
  }
];

Oumi - Custom SRE Model

# Training data format
{
  "input": "FATAL: Connection pool exhausted - max_connections=100 exceeded",
  "output": {
    "severity": "critical",
    "category": "database",
    "root_cause": "Connection pool limit reached",
    "suggested_fix": "Increase max_connections or implement connection pooling"
  }
}

Vercel - Dashboard Deployment

  • Next.js 14 with App Router
  • Real-time updates via Server-Sent Events
  • Responsive design with Tailwind CSS

CodeRabbit - AI Code Review

# .coderabbit.yaml
reviews:
  auto_review:
    enabled: true
  path_filters:
    - "!**/*.md"
  tools:
    github-checks:
      enabled: true

๐Ÿ”ฎ Future Roadmap

  • Multi-cloud support - AWS, GCP, Azure integrations
  • Slack/PagerDuty integration - Alert routing
  • Learning from feedback - Improve fixes based on PR reviews
  • Rollback automation - Auto-revert failed deployments
  • Cost optimization - Infrastructure right-sizing recommendations

๐Ÿค Contributing

Contributions are welcome! Please read our Contributing Guidelines first.

# Fork the repo
# Create your feature branch
git checkout -b feature/amazing-feature

# Commit your changes
git commit -m 'Add amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request

๐Ÿ“„ License

MIT License - see LICENSE for details.


๐Ÿ‘ค Author

Abu


โญ Star this repo if you find it useful!

Built with โค๏ธ for the AI Hackathon 2025

"The best incident is the one that fixes itself."