The Kubernetes-native framework for orchestrating autonomous AI coding agents.
Quick Start · Examples · Reference · YAML Manifests
Point Kelos at a GitHub issue and get a PR back — fully autonomous, running in Kubernetes. Each agent runs in an isolated, ephemeral Pod with a freshly cloned git workspace. Fan out across repositories, chain tasks into pipelines, and react to events automatically.
Supports Claude Code, OpenAI Codex, Google Gemini, OpenCode, and custom agent images.
# Run multiple tasks in parallel across your repo
$ kelos run -p "Fix the bug described in issue #42 and open a PR" --name fix-42
$ kelos run -p "Add unit tests for the auth module" --name add-tests
$ kelos run -p "Update API docs for v2 endpoints" --name update-docs
# Watch all tasks progress simultaneously
$ kelos get tasks
NAME TYPE PHASE BRANCH WORKSPACE AGENT CONFIG DURATION AGE
fix-42 claude-code Running kelos-task-fix-42 my-repo my-config 2m 2m
add-tests claude-code Running kelos-task-add-tests my-repo my-config 1m 1m
update-docs claude-code Running kelos-task-update-docs my-repo my-config 45s 45skelos-demo-0228.mp4
See Autonomous self-development pipeline for a full end-to-end example.
AI coding agents are evolving from interactive CLI tools into autonomous background workers. Kelos provides the infrastructure to manage this transition at scale.
- Orchestration, not just execution — Don't just run an agent; manage its entire lifecycle. Chain tasks with
dependsOnand pass results (branch names, PR URLs, token usage) between pipeline stages. UseTaskSpawnerto build event-driven workers that react to GitHub issues, PRs, or schedules. - Host-isolated autonomy — Each task runs in an isolated, ephemeral Pod with a freshly cloned git workspace. Agents have no access to your host machine — use scoped tokens and branch protection to control repository access.
- Standardized interface — Plug in any agent (Claude, Codex, Gemini, OpenCode, or your own) using a simple container interface. Kelos handles credential injection, workspace management, and Kubernetes plumbing.
- Scalable parallelism — Fan out agents across multiple repositories. Kubernetes handles scheduling, resource management, and queueing — scale is limited by your cluster capacity and API provider quotas.
- Observable & CI-native — Every agent run is a first-class Kubernetes resource with deterministic outputs (branch names, PR URLs, commit SHAs, token usage) captured into status. Monitor via
kubectl, manage via thekelosCLI or declarative YAML (GitOps-ready), and integrate with ArgoCD or GitHub Actions.
Get running in 5 minutes (most of the time is gathering credentials).
- Kubernetes cluster (1.28+)
Don't have a cluster? Create one locally with kind
- Install kind (requires Docker)
- Create a cluster:
kind create cluster
This creates a single-node cluster and configures your kubeconfig automatically.
curl -fsSL https://raw.githubusercontent.com/kelos-dev/kelos/main/hack/install.sh | bashAlternative: install from source
go install github.com/kelos-dev/kelos/cmd/kelos@latestkelos installThis installs the Kelos controller and CRDs into the kelos-system namespace.
Verify the installation:
kubectl get pods -n kelos-system
kubectl get crds | grep kelos.devkelos initEdit ~/.kelos/config.yaml:
oauthToken: <your-oauth-token>
workspace:
repo: https://github.com/your-org/your-repo.git
ref: main
token: <github-token> # optional, for private repos and pushing changesHow to get your credentials
Claude OAuth token (recommended for Claude Code):
Run claude auth login locally, then copy the token from ~/.claude/credentials.json.
Anthropic API key (alternative for Claude Code):
Create one at console.anthropic.com. Set apiKey instead of oauthToken in your config.
Codex OAuth credentials (for OpenAI Codex):
Run codex auth login locally, then reference the auth file in your config:
oauthToken: "@~/.codex/auth.json"
type: codexOr set apiKey with an OpenAI API key instead.
GitHub token (for pushing branches and creating PRs):
Create a Personal Access Token with repo scope (and workflow if your repo uses GitHub Actions).
Warning: Without a workspace, the agent runs in an ephemeral pod — any files it creates are lost when the pod terminates. Always set up a workspace to get persistent results.
$ kelos run -p "Add a hello world program in Python"
task/task-r8x2q created
$ kelos logs task-r8x2q -fThe task name (e.g. task-r8x2q) is auto-generated. Use --name to set a custom name, or -w to automatically watch task logs.
The agent clones your repo, makes changes, and can push a branch or open a PR.
Tip: If something goes wrong, check the controller logs with
kubectl logs deployment/kelos-controller-manager -n kelos-system.
Using kubectl and YAML instead of the CLI
Create a Workspace resource to define a git repository:
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: my-workspace
spec:
repo: https://github.com/your-org/your-repo.git
ref: mainThen reference it from a Task:
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: hello-world
spec:
type: claude-code
prompt: "Create a hello world program in Python"
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspacekubectl apply -f workspace.yaml
kubectl apply -f task.yaml
kubectl get tasks -wUsing an API key instead of OAuth
Set apiKey instead of oauthToken in ~/.kelos/config.yaml:
apiKey: <your-api-key>Or pass --secret to kelos run with a pre-created secret (api-key is the default credential type), or set spec.credentials.type: api-key in YAML.
Kelos orchestrates the flow from external events to autonomous execution:
Triggers (GitHub, Cron) ──┐
│
Manual (CLI, YAML) ───────┼──▶ TaskSpawner ──▶ Tasks ──▶ Isolated Pods
│ │ │ │
API (CI/CD, Webhooks) ────┘ └─(Lifecycle)──┴─(Execution)─┴─(Success/Fail)
You define what needs to be done, and Kelos handles the "how" — from cloning the right repo and injecting credentials to running the agent and capturing its outputs (branch names, commit SHAs, PR URLs, and token usage).
Kelos is built on four resources:
- Tasks — Ephemeral units of work that wrap an AI agent run.
- Workspaces — Persistent or ephemeral environments (git repos) where agents operate.
- AgentConfigs — Reusable bundles of agent instructions (
AGENTS.md,CLAUDE.md), plugins (skills and agents), and MCP servers. - TaskSpawners — Orchestration engines that react to external triggers (GitHub, Cron) to automatically manage agent lifecycles.
TaskSpawner — Automatic Task Creation from External Sources
TaskSpawner watches external sources (e.g., GitHub Issues) and automatically creates Tasks for each discovered item.
polls new issues
TaskSpawner ─────────────▶ GitHub Issues
│ ◀─────────────
│
├──creates──▶ Task: fix-bugs-1
└──creates──▶ Task: fix-bugs-2
Add a token to your workspace config:
workspace:
repo: https://github.com/your-org/repo.git
ref: main
token: <your-github-token>kelos run -p "Fix the bug described in issue #42 and open a PR with the fix"The gh CLI and GITHUB_TOKEN are available inside the agent container, so the agent can push branches and create PRs autonomously.
Create a TaskSpawner to automatically turn GitHub issues into agent tasks:
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: fix-bugs
spec:
when:
githubIssues:
labels: [bug]
state: open
taskTemplate:
type: claude-code
workspaceRef:
name: my-workspace
credentials:
type: oauth
secretRef:
name: claude-oauth-token
promptTemplate: "Fix: {{.Title}}\n{{.Body}}"
pollInterval: 5mkubectl apply -f taskspawner.yamlTaskSpawner polls for new issues matching your filters and creates a Task for each one.
Use dependsOn to chain tasks into pipelines. A task in Waiting phase stays paused until all its dependencies succeed:
kelos run -p "Scaffold a new user service" --name scaffold --branch feature/user-service
kelos run -p "Write tests for the user service" --depends-on scaffold --branch feature/user-serviceTasks sharing the same branch are serialized automatically — only one runs at a time.
YAML equivalent
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: scaffold
spec:
type: claude-code
prompt: "Scaffold a new user service with CRUD endpoints"
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
branch: feature/user-service
---
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: write-tests
spec:
type: claude-code
prompt: "Write comprehensive tests for the user service"
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
branch: feature/user-service
dependsOn: [scaffold]Downstream tasks can reference upstream results in their prompt using {{.Deps}}:
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: open-pr
spec:
type: claude-code
prompt: |
Open a PR for branch {{index .Deps "write-tests" "Results" "branch"}}.
credentials:
type: oauth
secretRef:
name: claude-oauth-token
workspaceRef:
name: my-workspace
branch: feature/user-service
dependsOn: [write-tests]The .Deps map is keyed by dependency Task name. Each entry has Results (key-value map with branch, commit, pr, etc.) and Outputs (raw output lines). See examples/07-task-pipeline for a full three-stage pipeline.
Use AgentConfig to bundle project-wide instructions, plugins, and MCP servers:
apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
name: my-config
spec:
agentsMD: |
# Project Rules
Follow TDD. Always write tests first.
mcpServers:
- name: github
type: http
url: https://api.githubcopilot.com/mcp/
headers:
Authorization: "Bearer <token>"kelos run -p "Fix the bug" --agent-config my-configagentsMDis written to~/.claude/CLAUDE.md(user-level, additive with the repo's own instructions).pluginsare mounted as plugin directories and passed via--plugin-dir.mcpServersare written to the agent's native MCP configuration. Supportsstdio,http, andssetransport types.
See the full AgentConfig spec for plugins, skills, and agents configuration.
This is a real-world TaskSpawner that picks up every open issue, investigates it, opens (or updates) a PR, self-reviews, and ensures CI passes — fully autonomously. When the agent can't make progress, it labels the issue kelos/needs-input and stops. Remove the label to re-queue it.
┌────────────────────────────────────────────────────────────────┐
│ Feedback Loop │
│ │
│ ┌─────────────┐ polls ┌────────────────┐ │
│ │ TaskSpawner │───────▶ │ GitHub Issues │ │
│ └──────┬──────┘ │ (open, no │ │
│ │ │ needs-input) │ │
│ │ creates └────────────────┘ │
│ ▼ │
│ ┌─────────────┐ runs ┌─────────────┐ opens PR ┌───────┐ │
│ │ Task │───────▶ │ Agent │────────────▶│ Human │ │
│ └─────────────┘ in Pod │ (Claude) │ or labels │Review │ │
│ └─────────────┘ needs-input└───┬───┘ │
│ │ │
│ removes label ─┘ │
│ (re-queues issue) │
└────────────────────────────────────────────────────────────────┘
See self-development/kelos-workers.yaml for the full manifest and the self-development/ README for setup instructions.
The key pattern is excludeLabels: [kelos/needs-input] — this creates a feedback loop where the agent works autonomously until it needs human input, then pauses. Removing the label re-queues the issue on the next poll.
Browse all ready-to-apply YAML manifests in the
examples/directory.
- Autonomous Self-Development — Build a feedback loop where agents pick up issues, write code, self-review, and fix CI flakes until the task is complete. See the self-development pipeline.
- Event-Driven Bug Fixing — Automatically spawn agents to investigate and fix bugs as soon as they are labeled in GitHub. See Auto-fix GitHub issues.
- Fleet-Wide Refactoring — Orchestrate a "fan-out" where dozens of agents apply the same refactoring pattern across a fleet of microservices in parallel.
- Hands-Free CI/CD — Embed agents as first-class steps in your deployment pipelines to generate documentation or perform automated migrations.
- AI Worker Pools — Maintain a pool of specialized agents (e.g., "The Security Fixer") that developers can trigger via simple Kubernetes resources.
| Resource | Key Fields | Full Spec |
|---|---|---|
| Task | type, prompt, credentials, workspaceRef, dependsOn, branch |
Reference |
| Workspace | repo, ref, secretRef, files |
Reference |
| AgentConfig | agentsMD, plugins, mcpServers |
Reference |
| TaskSpawner | when, taskTemplate, pollInterval, maxConcurrency |
Reference |
CLI Reference
| Command | Description |
|---|---|
kelos install |
Install Kelos CRDs and controller into the cluster |
kelos uninstall |
Uninstall Kelos from the cluster |
kelos init |
Initialize ~/.kelos/config.yaml |
kelos run |
Create and run a new Task |
kelos get <resource> [name] |
List resources or view a specific resource (tasks, taskspawners, workspaces) |
kelos delete <resource> <name> |
Delete a resource |
kelos logs <task-name> [-f] |
View or stream logs from a task |
kelos suspend taskspawner <name> |
Pause a TaskSpawner |
kelos resume taskspawner <name> |
Resume a paused TaskSpawner |
See full CLI reference for all flags and options.
Kelos runs agents in isolated, ephemeral Pods with no access to your host machine, SSH keys, or other processes. The risk surface is limited to what the injected credentials allow.
What agents CAN do: Push branches, create PRs, and call the GitHub API using the injected GITHUB_TOKEN.
What agents CANNOT do: Access your host, read other pods, reach other repositories, or access any credentials beyond what you explicitly inject.
Best practices:
- Scope your GitHub tokens. Use fine-grained Personal Access Tokens restricted to specific repositories instead of broad
repo-scoped classic tokens. - Enable branch protection. Require PR reviews before merging to
main. Agents can push branches and open PRs, but protected branches prevent direct pushes to your default branch. - Use
maxConcurrencyandmaxTotalTasks. Limit how many tasks a TaskSpawner can create to prevent runaway agent activity. - Use
podOverrides.activeDeadlineSeconds. Set a timeout to prevent tasks from running indefinitely. - Audit via Kubernetes. Every agent run is a first-class Kubernetes resource — use
kubectl get tasksand cluster audit logs to track what was created and by whom.
About
--dangerously-skip-permissions: Claude Code uses this flag for non-interactive operation. Despite the name, the actual risk is minimal — agents run inside ephemeral containers with no host access. The flag simply disables interactive approval prompts, which is necessary for autonomous execution.
Kelos uses standard Kubernetes RBAC — use namespace isolation to separate teams. Each TaskSpawner automatically creates a scoped ServiceAccount and RoleBinding.
Running AI agents costs real money. Here's how to stay in control:
Model costs vary significantly. Opus is the most capable but most expensive model. Use spec.model (or model in config) to choose cheaper models like Sonnet for routine tasks and reserve Opus for complex work. Check the API pricing page for current rates.
Use maxConcurrency to cap spend. Without it, a TaskSpawner can create unlimited concurrent tasks. If 100 issues match your filter on first poll, that's 100 simultaneous agent runs. Always set a limit:
spec:
maxConcurrency: 3 # max 3 tasks running at once
maxTotalTasks: 50 # stop after 50 total tasksUse podOverrides.activeDeadlineSeconds to limit runtime. Set a timeout per task to prevent agents from running indefinitely:
spec:
podOverrides:
activeDeadlineSeconds: 3600 # kill after 1 hourOr via the CLI:
kelos run -p "Fix the bug" --timeout 30mUse suspend for emergencies. If costs are spiraling, pause a spawner immediately:
kelos suspend taskspawner my-spawner
# ... investigate ...
kelos resume taskspawner my-spawnerRate limits. API providers enforce concurrency and token limits. If a task hits a rate limit mid-execution, it will likely fail. Use maxConcurrency to stay within your provider's limits.
What agents does Kelos support?
Kelos supports Claude Code, OpenAI Codex, Google Gemini, and OpenCode out of the box. You can also bring your own agent image using the container interface.
Can I use Kelos without Kubernetes?
No. Kelos is built on Kubernetes Custom Resources and requires a Kubernetes cluster. For local development, use kind (kind create cluster) to create a single-node cluster on your machine.
Is it safe to give agents repo access?
Agents run in isolated, ephemeral Pods with no host access. Their capabilities are limited to what you inject — typically a scoped GitHub token. Use fine-grained PATs, branch protection, and maxConcurrency to control the blast radius. See Security Considerations.
How much does it cost to run?
Costs depend on the model and task complexity. Check the API pricing page for current rates. Use maxConcurrency, timeouts, and model selection to stay in budget. See Cost and Limits.
kelos uninstallBuild, test, and iterate with make:
make update # generate code, CRDs, fmt, tidy
make verify # generate + vet + tidy-diff check
make test # unit tests
make test-integration # integration tests (envtest)
make test-e2e # e2e tests (requires cluster)
make build # build binary
make image # build docker image- Fork the repo and create a feature branch.
- Make your changes and run
make verifyto ensure everything passes. - Open a pull request with a clear description of the change.
For significant changes, please open an issue first to discuss the approach.
We welcome contributions of all kinds — see good first issues for places to start.