Continuous Repair and Incident Self-Patching Runtime
A self-hosted production error monitoring system that automatically generates fixes and opens pull requests using LLMs.
Live Demo β Try the UI in demo mode (no backend required)
Every error flows through a defined lifecycle. Understanding these stages helps you know what CRISPR is doing and when human intervention is needed.
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β DETECTED βββββΆβ TRIAGING βββββΆβ FIXING βββββΆβ PR_OPEN βββββΆβ VERIFYINGβββββΆβ VERIFIED β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
βUNFIXABLE β βFIX_FAILEDβ βPR_CLOSED β β RECURRED β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
| Stage | What's Happening | Moves Forward When | Falls Back When |
|---|---|---|---|
| detected | Error received and deduplicated | Worker picks it up | β |
| triaging | LLM analyzing if error is fixable | LLM says fixable with confidence | LLM says unfixable β unfixable |
| pending_fix | Waiting in queue for surgical lock | Repo lock acquired | β |
| fixing | LLM generating code fix + tests | Fix generated successfully | LLM fails or times out β fix_failed |
| pr_opening | Creating branch and pull request | PR created on GitHub | GitHub API fails β fix_failed |
| pr_open | PR awaiting human review | PR merged on GitHub | PR closed without merge β pr_closed |
| pr_merged | PR merged, preparing verification | Verification monitoring starts | β |
| verifying | Monitoring for error recurrence | Monitoring period completes with no recurrence | Error reoccurs β detected (retry) |
| verified | Fix confirmed working β | β (terminal state) | β |
| State | Meaning | What To Do |
|---|---|---|
| unfixable | LLM determined this can't be auto-fixed | Review manually, click "Retry" if you disagree |
| fix_failed | Fix generation failed (LLM error, timeout) | Click "Retry" to try again |
| pr_closed | PR was closed without merging | Review why, click "Retry" for new attempt |
| max_attempts | Hit retry limit (default: 3) | Needs manual fix |
| cooldown | Too many recent attempts, waiting | Will auto-retry after cooldown period |
| ignored | Manually marked as ignored | Click "Retry" to re-enable |
After a PR is merged, CRISPR doesn't immediately mark the fix as complete. Instead, it enters a verification period to ensure the fix actually works in production.
How it works:
-
Duration is per-incident: The LLM recommends a monitoring period (24 hours to 30 days) based on:
- Error pattern (transient vs persistent issues)
- Whether it involves time-based behavior (billing cycles, cron jobs, monthly reports)
- Impact severity
-
Monitoring: CRISPR watches for new occurrences of the same error fingerprint
-
Outcomes:
- No recurrence β
verified(fix confirmed working) - Error reoccurs β
detected(back to the start for another fix attempt)
- No recurrence β
Example durations:
- Missing import error: 24 hours (deterministic, either works or doesn't)
- Session handling bug: 48 hours (needs time for sessions to cycle)
- Rate limiting issue: 7 days (needs traffic patterns to exercise the code)
- Billing calculation bug: 30 days (needs to cover a full billing cycle)
To prevent merge conflicts and ensure clean commits, CRISPR uses a surgical lock system:
- Only one incident per repository can be actively fixed at a time
- Other incidents for the same repo wait in a queue (
pending_fixstate) - Before starting a fix, CRISPR does a fresh
git pullto get the latest code - After the fix is complete (PR opened), the lock is released for the next incident
This ensures each fix is based on the current state of the codebase and avoids conflicting changes.
- Incident Lifecycle
- Overview
- Features
- Architecture
- Quick Start
- Configuration
- Connectors
- Generated Tests
- Error Pattern Matching
- Internal Medicine
- API Reference
- UI Guide
- Demo: dumpster-fire
- CRISPR.md Context File
- Distributed Workers
- Development
- License
CRISPR monitors your production logs for errors, automatically triages them using LLMs, and generates code fixes that are submitted as pull requests. It's designed to handle the repetitive bug-fixing work that consumes engineering time.
- Ingest: Errors are received via HTTP POST, OTLP/gRPC, or webhooks (Sentry/Datadog)
- Fingerprint: Errors are deduplicated using SHA256 + regex normalization
- Pattern Match: Known error patterns are matched for auto-triage (skips LLM if matched)
- Triage: A "reader" LLM (cheap, fast) determines if the error is fixable
- Context: Relevant code files are fetched from GitHub
- Fix: A "writer" LLM (capable, expensive) generates a code fix + unit tests
- PR: A pull request is created with the fix, tests, and explanation
- Multi-LLM Support: Anthropic Claude, OpenAI GPT, and Ollama (local models)
- Reader/Writer/Internist Split: Use cheap models for triage, expensive models for fixes, high-reasoning models for pattern analysis
- Generated Tests: Automatically generates unit tests alongside fixes to verify correctness
- Error Pattern Matching: 23 built-in patterns for auto-triage without LLM calls (saves costs)
- Usage Tracking: Track token usage and costs per provider and purpose
- GitHub Integration: PAT or OAuth authentication, automatic PR creation
- Slack Notifications: Get notified when PRs are opened or merged
- Web Dashboard: Monitor incidents, configure settings, view costs
- OTLP Support: Receive logs via OpenTelemetry gRPC protocol
- Webhook Ingestion: Sentry and Datadog webhooks with signature verification
- S3 Storage: Scale sample storage with S3, Azure Blob, GCS, or MinIO
- CRISPR.md Context: Maintains a context file in each repo with fix history and service info
- Local Repo Clones: Keeps repos cloned locally for faster access and offline context
- Distributed Workers: Multiple workers can run in parallel with claim-based coordination
- Internal Medicine: Pattern analysis across incidents to identify architectural issues and propose systemic fixes
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CRISPR Server β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Ingest ββββΆβ Pipeline ββββΆβ GitHub ββββΆβ Slack β β
β β HTTP/gRPCβ β Triage β β PR Createβ β Notify β β
β ββββββββββββ β Fix Gen β ββββββββββββ ββββββββββββ β
β ββββββββββββ β
β β β
β βΌ β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Postgres βββββ LLM ββββΆβ Object β β
β β Database β β Manager β β Store β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Embedded Web UI (Svelte) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Rust 1.75+ (for building)
- Node.js 18+ (for UI development)
- Docker (for PostgreSQL)
- GitHub Account (PAT or OAuth app)
- LLM API Key (Anthropic, OpenAI, or local Ollama)
-
Clone and enter the directory
cd crispr -
Start PostgreSQL
docker compose -f docker-compose.dev.yaml up -d
-
Configure environment
cp .env.example .env # Edit .env with your credentials -
Run the server
cargo run
-
Access the UI
- Open http://localhost:8081
- Navigate to Settings to configure GitHub, Slack, and LLM providers
-
Build the Docker image
docker build -t crispr:latest . -
Configure environment
cp .env.example .env # Set production values (see Configuration section) -
Start the stack
docker compose up -d
-
Access the UI
| Variable | Required | Description |
|---|---|---|
DATABASE_URL |
Yes | PostgreSQL connection string |
GITHUB_PAT |
Yes* | GitHub Personal Access Token (if using PAT mode) |
GITHUB_CLIENT_ID |
Yes* | GitHub OAuth App Client ID (if using OAuth mode) |
GITHUB_CLIENT_SECRET |
Yes* | GitHub OAuth App Client Secret (if using OAuth mode) |
ANTHROPIC_API_KEY |
No | Anthropic API key for Claude models |
OPENAI_API_KEY |
No | OpenAI API key for GPT models |
SLACK_CLIENT_ID |
No | Slack App Client ID |
SLACK_CLIENT_SECRET |
No | Slack App Client Secret |
CRISPR_ENCRYPTION_KEY |
Prod | Base64-encoded 32-byte key for encrypting secrets |
AWS_ACCESS_KEY_ID |
No | AWS credentials for S3 storage |
AWS_SECRET_ACCESS_KEY |
No | AWS credentials for S3 storage |
*Either PAT or OAuth credentials required for GitHub
Configuration is loaded from config/config.dev.yaml (development) or config/config.prod.yaml (production).
environment: development
display_name: "CRISPR (local dev)"
server:
host: "127.0.0.1"
http_port: 8081
grpc_port: 4318
database:
url: "postgres://crispr:crispr@localhost:5433/crispr_dev"
max_connections: 5
storage:
mode: postgres # or "object_store" for S3/Azure/GCS
# object_store:
# provider: s3 # s3, minio, r2, azure, gcp
# bucket: crispr-samples
# region: us-east-1
github:
mode: pat # or "oauth"
personal_access_token: ${GITHUB_PAT}
# client_id: ${GITHUB_CLIENT_ID}
# client_secret: ${GITHUB_CLIENT_SECRET}
slack:
enabled: false
# client_id: ${SLACK_CLIENT_ID}
# client_secret: ${SLACK_CLIENT_SECRET}
safety:
allowed_repos:
- "yourorg/*"
dry_run: true
pr_branch_prefix: "crispr/"
pipeline:
cooldown_hours: 24
max_fix_attempts: 3
auto_approve: false
logging:
level: debug
format: pretty # or "json" for productionCRISPR supports two GitHub authentication modes:
- Create a Personal Access Token with
reposcope - Set
GITHUB_PATenvironment variable - Set
github.mode: patin config
- Create a GitHub OAuth App
- Authorization callback URL:
http://localhost:8081/api/v1/auth/github/callback
- Authorization callback URL:
- Set environment variables:
GITHUB_CLIENT_ID=your-client-id GITHUB_CLIENT_SECRET=your-client-secret
- Set
github.mode: oauthin config - Navigate to Connectors in the UI and click "Connect with GitHub"
- Create a Slack App
- Add OAuth scopes:
chat:write,channels:read - Set Redirect URL:
http://localhost:8081/api/v1/auth/slack/callback - Set environment variables:
SLACK_CLIENT_ID=your-client-id SLACK_CLIENT_SECRET=your-client-secret
- Enable in config:
slack.enabled: true - Navigate to Connectors in the UI and click "Connect with Slack"
- Select a notification channel
CRISPR uses a reader/writer/internist split to optimize costs and capabilities:
- Reader (triage): Cheap, fast model for determining if errors are fixable
- Writer (fix generation): Capable model for generating code fixes
- Internist (pattern analysis): High-reasoning model for cross-incident pattern analysis and architectural recommendations
ANTHROPIC_API_KEY=sk-ant-your-key| Role | Recommended Model |
|---|---|
| Reader | claude-3-haiku-20240307 |
| Writer | claude-sonnet-4-20250514 |
| Internist | claude-sonnet-4-20250514 |
OPENAI_API_KEY=sk-your-key| Role | Recommended Model |
|---|---|
| Reader | gpt-4o-mini |
| Writer | gpt-4o |
| Internist | gpt-4o |
- Install Ollama
- Pull models:
ollama pull llama3.2 - Configure base URL in Settings UI
CRISPR can receive errors directly from monitoring platforms via webhooks.
- In Sentry, go to Settings β Integrations β Webhooks
- Add webhook URL:
http://your-crispr-server:8081/webhooks/sentry - (Optional) Set a webhook secret for signature verification
- Configure the secret in CRISPR:
webhooks: sentry_secret: "your-webhook-secret"
CRISPR verifies Sentry webhook signatures using HMAC-SHA256 when a secret is configured.
- In Datadog, go to Integrations β Webhooks
- Create a new webhook with URL:
http://your-crispr-server:8081/webhooks/datadog - Add a
project:owner/repotag to identify the project - Configure monitors to send to this webhook on error/alert
- In Grafana, go to Alerting β Contact Points
- Create a new contact point with type Webhook
- Set URL:
http://your-crispr-server:8081/webhooks/grafana - (Optional) Add authentication header:
Authorization: Bearer your-token - Configure in CRISPR:
webhooks: grafana_token: "your-token"
Add labels to your alert rules for project identification:
project: owner/repo- Explicit project mappingnamespace+service- Combined as namespace/serviceservice: owner-repo- Hyphen converted to slash
CRISPR processes alerts with severity error, critical, warning, or high.
CRISPR automatically generates unit tests alongside code fixes to help verify correctness and prevent regressions.
-
When a fix is generated, CRISPR also generates test code that:
- Verifies the fix works correctly
- Would have caught the original bug
- Uses the appropriate test framework for the language
-
Tests are stored in the database and can be:
- Viewed in the UI on the incident detail page
- Included or excluded from the PR via toggle
- Committed alongside the fix
| Language | Test Framework |
|---|---|
| TypeScript/JavaScript | Jest, Vitest, Mocha |
| Python | pytest, unittest |
| Go | go test |
| Rust | #[test] |
| Java | JUnit |
On the incident detail page, expand the Generated Tests section to:
- View the test code
- Toggle "Include in PR" for each test
- See which tests have been committed
CRISPR includes 23 built-in error patterns that enable auto-triage without LLM calls, reducing costs and latency.
- When an error is received, it's matched against known patterns using regex
- If a pattern with
auto_triage = truematches:- The LLM triage step is skipped
- The pattern's
auto_fixablesetting determines if fix generation proceeds
- Pattern matches are tracked for analytics
| Category | Examples |
|---|---|
| Null Reference | Java NPE, Go nil pointer, JS TypeError, Python NoneType |
| Connection | Timeouts, socket errors, pool exhaustion |
| Authentication | HTTP 401/403, invalid token |
| Rate Limiting | HTTP 429, quota exceeded |
| Database | Connection failed, query timeout |
| Memory | OOM, stack overflow |
| Bounds | Array index, slice bounds |
| Type Errors | Type mismatch, cast errors |
| Import | Module not found |
| Parsing | JSON syntax errors |
| File System | File not found |
| Network | HTTP 500/502/503/504, DNS resolution |
Create custom patterns via the API:
curl -X POST http://localhost:8081/api/v1/patterns \
-H "Content-Type: application/json" \
-d '{
"name": "Redis Connection Error",
"regex_pattern": "Redis.*ECONNREFUSED|Cannot connect to Redis",
"category": "connection",
"auto_triage": true,
"auto_fixable": false,
"suggested_action": "Check Redis server status"
}'The Internal Medicine feature provides high-level pattern analysis across incidents, identifying architectural issues and proposing systemic fixes rather than ad-hoc patches.
- Automatic Triggering: After each incident is verified (fix confirmed working), the Internist LLM analyzes recent incidents for patterns
- Cross-Service Analysis: The Internist can see incidents across all projects, detecting when multiple services fail together due to shared infrastructure
- Recommendations: Patterns are turned into actionable recommendations with root cause analysis and proposed changes
- Interactive Refinement: Chat with the Internist to refine recommendations before approval
- Conversion to Tickets: Approved recommendations become detailed triage tickets for the Writer LLM to fix
| Category | Description | Auto-Fixable |
|---|---|---|
api_contract |
Breaking changes to APIs, inconsistent interfaces | Yes |
architecture |
Structural problems, tight coupling, missing abstractions | Yes |
error_pattern |
Recurring error types that need systematic handling | Yes |
performance |
Systemic performance issues | Yes |
security |
Security patterns or vulnerabilities | Yes |
infrastructure |
Database sizing, resource limits, scaling issues | No (human required) |
Some issues cannot be fixed by code changes alone. When the Internist identifies infrastructure problems (e.g., "increase database connection pool size"), it creates a recommendation marked as requires human intervention. These appear with a special indicator in the UI and create tickets for humans to address.
The Internal Medicine section appears on the Surgery Board dashboard, showing:
- Active recommendations with status (proposed, approved, implementing)
- Conflict warnings for long-lived proposals
- Pattern confidence scores
- Related incident counts
Click a recommendation to:
- View full analysis and proposed changes
- Chat with the Internist to refine the recommendation
- Approve and convert to a fix ticket
- Reject with a reason
π Full API documentation: See docs/API.md for comprehensive endpoint documentation with request/response schemas and examples.
Receive error logs via HTTP.
curl -X POST http://localhost:8081/ingest \
-H "Content-Type: application/json" \
-d '{
"project_id": "my-project",
"level": "error",
"message": "TypeError: Cannot read property 'foo' of undefined",
"stack_trace": "at handler (/app/src/api.ts:42:15)..."
}'Receive logs via OpenTelemetry OTLP protocol.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/projects |
List all projects |
| POST | /api/v1/projects |
Create a project |
| GET | /api/v1/projects/:id |
Get project details |
| PATCH | /api/v1/projects/:id |
Update project settings |
| DELETE | /api/v1/projects/:id |
Delete a project |
| GET | /api/v1/projects/:id/stats |
Get project statistics |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/incidents |
List incidents (with filters) |
| GET | /api/v1/incidents/stats |
Get incident statistics |
| GET | /api/v1/incidents/:id |
Get incident details |
| POST | /api/v1/incidents/:id/retry |
Retry fix generation |
| POST | /api/v1/incidents/:id/ignore |
Mark incident as ignored |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/incidents/:id/tests |
List generated tests for incident |
| GET | /api/v1/incidents/:id/tests/:test_id |
Get a specific test |
| PATCH | /api/v1/incidents/:id/tests/:test_id |
Update test (e.g., include_in_pr) |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/patterns |
List all patterns |
| POST | /api/v1/patterns |
Create a custom pattern |
| GET | /api/v1/patterns/:id |
Get pattern details |
| PATCH | /api/v1/patterns/:id |
Update a pattern |
| DELETE | /api/v1/patterns/:id |
Delete a custom pattern |
| GET | /api/v1/patterns/:id/stats |
Get pattern match statistics |
| GET | /api/v1/patterns/categories |
List pattern categories |
| POST | /api/v1/patterns/test |
Test a regex against sample text |
| GET | /api/v1/incidents/:id/pattern-matches |
Get pattern matches for incident |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/recommendations |
List all recommendations |
| GET | /api/v1/recommendations/:id |
Get recommendation details |
| POST | /api/v1/recommendations/:id/approve |
Approve recommendation |
| POST | /api/v1/recommendations/:id/reject |
Reject with reason |
| POST | /api/v1/recommendations/:id/chat |
Send message to Internist |
| GET | /api/v1/recommendations/:id/chat |
Get chat history |
| POST | /api/v1/recommendations/:id/apply-suggestion |
Apply Internist's suggested changes |
| POST | /api/v1/recommendations/:id/convert-to-incident |
Convert to triage ticket |
| POST | /api/v1/internist/analyze |
Trigger manual analysis |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/settings/llm |
Get LLM configuration |
| PUT | /api/v1/settings/llm |
Update LLM configuration |
| POST | /api/v1/settings/llm/verify |
Verify API key works |
| POST | /api/v1/settings/llm/models |
List available models |
| GET | /api/v1/usage/summary |
Get usage summary (includes Internist costs) |
| GET | /api/v1/usage/details |
Get detailed usage |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/auth/status |
Get connection status |
| GET | /api/v1/auth/github |
Start GitHub OAuth flow |
| GET | /api/v1/auth/github/callback |
GitHub OAuth callback |
| DELETE | /api/v1/auth/github |
Disconnect GitHub |
| GET | /api/v1/auth/slack |
Start Slack OAuth flow |
| GET | /api/v1/auth/slack/callback |
Slack OAuth callback |
| DELETE | /api/v1/auth/slack |
Disconnect Slack |
| GET | /api/v1/auth/slack/channels |
List Slack channels |
| POST | /api/v1/auth/slack/channel |
Set notification channel |
| POST | /api/v1/auth/slack/test |
Send test message |
The CRISPR UI uses a "Surgery Center" metaphor with a calm, clinical aesthetic inspired by an operating room.
Toggle between Demo and Live modes using the switch in the header. Demo mode shows sample data without requiring backend connectivity β useful for exploring the UI.
The main dashboard has two tabs:
Patients Tab β Kanban board showing incidents flowing through stages:
| Column | Description |
|---|---|
| π¬ Triage | Newly detected errors awaiting analysis |
| π₯ In Surgery | Incidents being analyzed or fixed by the LLM |
| π©Ή Recovery | PRs created, awaiting merge approval |
| Unfixable errors or max attempts reached |
Each "patient card" shows the error message, confidence level, repository, and time since detection. Click a card to view details.
Services Tab β Grid view of all monitored repositories (supports 30-40+ services) with controls to:
- Toggle log watching on/off
- Toggle auto-fix generation on/off
- View incident count and last activity
Detailed view of PRs awaiting approval. Each incident shows:
| Section | Description |
|---|---|
| π΄ What Went Wrong | Root cause analysis and error description |
| π§ What We Fixed | Explanation of the code changes made |
| β Expected Behavior | How the service should behave after the fix |
Includes confidence percentage, link to GitHub PR, and quick approve/reject actions.
Below the main dashboard, the Internal Medicine section shows architectural recommendations:
| Element | Description |
|---|---|
| Recommendation Cards | Each card shows title, category, priority, and related incident count |
| Status Badges | Proposed, Approved, Implementing, Completed, Rejected |
| Conflict Warning | Orange indicator when a long-lived proposal has merge conflicts |
| Human Required | Special badge for infrastructure changes that can't be auto-fixed |
Click a recommendation to open the detail page with:
- Full root cause analysis
- Proposed code changes
- Interactive chat with the Internist LLM
- Approve/Reject actions
| Page | Description |
|---|---|
| Administration | Cost breakdown by provider and purpose (Reader, Writer, Internist), usage charts over time |
| Settings | LLM provider configuration (Anthropic/OpenAI/Ollama) for all three tiers, GitHub and Slack connection status |
A companion repository with 10 intentional bugs for testing CRISPR end-to-end.
Located at: ../dumpster-fire/
-
Start CRISPR (see Quick Start above)
-
Add dumpster-fire as a project
curl -X POST http://localhost:8081/api/v1/projects \ -H "Content-Type: application/json" \ -d '{"repo": "youruser/dumpster-fire"}'
-
Start dumpster-fire
cd ../dumpster-fire npm install npm run dev -
Trigger all bugs
npm run trigger-errors
-
Watch CRISPR work
- Open http://localhost:8081
- Watch incidents appear and fixes being generated
-
Verify fixes after merging PRs
git pull npm run verify
-
Reset for next demo
npm run reset
| # | Category | Description |
|---|---|---|
| 1 | Null reference | Accessing property on undefined |
| 2 | Type mismatch | String passed where number expected |
| 3 | Off-by-one | Wrong pagination offset |
| 4 | Missing try/catch | Unhandled async error |
| 5 | String operation | indexOf vs includes |
| 6 | Array bounds | Accessing [0] on empty array |
| 7 | Missing await | Returning Promise instead of value |
| 8 | Wrong method name | Calling non-existent function |
| 9 | Logic error | Wrong comparison operator |
| 10 | API contract | Missing response wrapper |
CRISPR maintains a CRISPR.md file in each repository it monitors. This file provides persistent context that improves fix quality over time.
# CRISPR Context
## 1. Special Notes
<!-- Developer-provided context about the repo:
- Unusual build processes
- Coding conventions
- Areas requiring human review -->
## 2. Recent Fixes
<!-- Automatically maintained by CRISPR -->
- [timestamp] Error summary
- Fix: what was changed
- PR: link (merged/open)
## 3. Service Description
<!-- What this service does conceptually -->-
During Triage: CRISPR reads the context to understand:
- Any special considerations from developers
- Patterns from previous fixes
- The service's purpose
-
After Fix Generation: CRISPR updates section 2 with:
- The error that was fixed
- What the fix did
- Link to the PR
-
For Developers: You can edit sections 1 and 3 to give CRISPR better context about your codebase.
CRISPR supports running multiple worker instances for horizontal scaling.
- Worker Registration: Each worker registers with a unique ID (hostname + PID)
- Incident Claiming: Before processing, a worker claims the incident using a database lock
- Heartbeats: Workers send periodic heartbeats to indicate they're alive
- Stale Claim Cleanup: If a worker dies, its claims are released after the timeout
worker:
# Auto-generated if not set: hostname-pid
id: "worker-1"
# How long before a claim is considered stale
claim_timeout_minutes: 15
# How often to send heartbeats
heartbeat_interval_seconds: 30Workers keep cloned repositories in a local directory for faster access:
pipeline:
repos_dir: "./repos" # Where to store cloned reposThis enables:
- Faster file access (no API calls)
- Offline context from CRISPR.md
- Git operations for committing fixes
crispr/
βββ src/
β βββ main.rs # Entry point
β βββ config.rs # Configuration loading
β βββ error.rs # Error types
β βββ crypto.rs # Encryption utilities
β βββ worker.rs # Background job processor
β βββ api/ # HTTP handlers
β β βββ router.rs # Route definitions
β β βββ auth.rs # OAuth flows
β β βββ ingest.rs # Error ingestion
β β βββ grpc.rs # OTLP gRPC server
β β βββ webhooks.rs # Sentry/Datadog webhooks
β β βββ patterns.rs # Error pattern API
β β βββ projects.rs # Project CRUD
β β βββ incidents.rs # Incident management
β β βββ recommendations.rs # Internal Medicine API
β β βββ settings.rs # LLM & usage
β βββ integrations/ # External services
β β βββ github.rs # GitHub API client
β β βββ slack.rs # Slack API client
β β βββ llm/ # LLM providers
β βββ patterns/ # Error pattern matching
β β βββ mod.rs # Pattern matcher with auto-triage
β βββ pipeline/ # Fix generation
β β βββ fingerprint.rs
β β βββ triage.rs
β β βββ context.rs
β β βββ fix.rs # Fix + test generation
β β βββ pr.rs
β β βββ repo.rs # Local repo management
β β βββ crispr_md.rs # CRISPR.md context file
β β βββ internist.rs # Pattern analysis (Internal Medicine)
β β βββ api_contracts.rs # API breaking change detection
β β βββ merge.rs # Intelligent merge for long-lived proposals
β βββ store/ # Data storage
β βββ postgres.rs
β βββ object.rs
β βββ models.rs
βββ ui/ # Svelte frontend
β βββ src/lib/components/ # Reusable UI components
βββ tests/ # Integration tests
β βββ api_test.rs # API endpoint tests
β βββ pipeline_test.rs # Pipeline flow tests
β βββ common/ # Test utilities and mocks
βββ docs/ # Documentation
β βββ API.md # Full API reference
βββ migrations/ # SQL migrations
βββ config/ # YAML configs
βββ Dockerfile
βββ docker-compose.yaml
βββ docker-compose.dev.yaml
# Run all tests (unit + integration)
cargo test
# Run only integration tests
cargo test --test api_test --test pipeline_test
# Run with output
cargo test -- --nocaptureThe test suite includes:
- 62 integration tests covering API endpoints and pipeline flows
- Unit tests inline in source modules
- Mock implementations for LLM, GitHub, and Slack
# Build UI first
cd ui && npm run build && cd ..
# Build Rust binary
cargo build --releaseMigrations run automatically on startup. To create a new migration:
# Create migration file
touch migrations/004_my_feature.sqlMIT
CRISPR β Let the robots fix the bugs while you build features.