-
Notifications
You must be signed in to change notification settings - Fork 12
Description
🤖 Kelos Agent @gjkim42
Summary
TaskSpawner's taskTemplate applies the same model, agentConfigRef, resource limits, and prompt to every work item it discovers. There is no way to vary the task configuration based on work item properties. This forces operators to create separate TaskSpawners for each configuration tier — even when the source filter, workspace, credentials, and overall workflow are identical.
This proposal adds a templateOverrides field to TaskSpawnerSpec that conditionally overrides taskTemplate fields based on work item labels, enabling cost-optimized task routing from a single TaskSpawner.
Problem
1. One TaskSpawner = one configuration for all items
Looking at kelos-workers.yaml (self-development), every issue labeled actor/kelos gets model: opus, 512Mi–2Gi memory, and the same agentConfig — regardless of issue complexity. In practice:
- Simple fixes (typo, label rename, one-line bug) waste opus capacity and cost $5–20 each when sonnet at $0.50–2 would suffice
- Documentation fixes (
kind/docs) don't need the full 4Gi ephemeral storage or complex agentConfig skills - Complex changes (
kind/api,kind/feature) may need more resources (memory, deadline) than simple ones
The only way to differentiate today is to create multiple TaskSpawners with different label filters:
# kelos-workers-simple.yaml — for simple issues
spec:
when:
githubIssues:
labels: [actor/kelos, kind/docs]
excludeLabels: [kelos/needs-input]
taskTemplate:
model: sonnet # cheaper
# ... same workspace, credentials, branch template, agentConfig ...
# kelos-workers-complex.yaml — for complex issues
spec:
when:
githubIssues:
labels: [actor/kelos, kind/feature]
excludeLabels: [kelos/needs-input]
taskTemplate:
model: opus # more capable
# ... same workspace, credentials, branch template, agentConfig ...This creates operational overhead: the prompt, branch template, workspace, credentials, and excludeLabels must be kept in sync across multiple files. With N complexity tiers, you need N TaskSpawners that share 90% of their configuration.
2. Self-development evidence
Kelos's own self-development workflow demonstrates this cost inefficiency. Looking at recent tasks:
- Issue Docs: example 07-task-pipeline README shows invalid Go template syntax for .Deps access #516 (docs template syntax) used opus for a documentation fix — sonnet would have been sufficient
- Issue Workflow: AgentConfig drifted from CLAUDE.md and kelos-workers missing explicit issue filters #515 (config drift) used opus for a documentation/config review — could be sonnet
- Issue Bug: fetchComments does not paginate — misses trigger/exclude comments on busy issues #528 (pagination bug) genuinely needed opus for a complex code change
With ~30 generated-by-kelos issues in the last week, the majority are documentation, label management, or simple code changes that don't require the most expensive model.
3. maxConcurrency conflict
When multiple TaskSpawners share the same branch template (kelos-task-{{.Number}}), they contend for branch locks. If kelos-workers-simple and kelos-workers-complex both try to process the same issue (because label sets overlap), they create conflicting tasks. A single TaskSpawner with conditional overrides eliminates this by construction.
Proposed API
Add templateOverrides to TaskSpawnerSpec
type TaskSpawnerSpec struct {
// ... existing fields ...
// TemplateOverrides conditionally overrides taskTemplate fields based on
// work item properties. Overrides are evaluated in order; the first
// matching override is applied. If no override matches, the base
// taskTemplate is used unchanged.
// +optional
TemplateOverrides []TemplateOverride `json:"templateOverrides,omitempty"`
}
type TemplateOverride struct {
// MatchLabels selects work items that have ALL of these labels.
// When empty, the override matches all items (useful as a default
// fallback when placed last in the list).
// +optional
MatchLabels []string `json:"matchLabels,omitempty"`
// Override specifies the taskTemplate fields to replace.
// Only non-nil/non-zero fields take effect; other fields
// are inherited from the base taskTemplate.
// +kubebuilder:validation:Required
Override TaskTemplateOverride `json:"override"`
}
type TaskTemplateOverride struct {
// Model overrides the agent model.
// +optional
Model string `json:"model,omitempty"`
// AgentConfigRef overrides the agent configuration.
// +optional
AgentConfigRef *AgentConfigReference `json:"agentConfigRef,omitempty"`
// PodOverrides overrides the pod configuration (resources, deadline, env).
// +optional
PodOverrides *PodOverrides `json:"podOverrides,omitempty"`
// PromptTemplate overrides the prompt template.
// +optional
PromptTemplate string `json:"promptTemplate,omitempty"`
}Example: Cost-optimized self-development
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: kelos-workers
spec:
when:
githubIssues:
labels: [actor/kelos]
excludeLabels: [kelos/needs-input]
priorityLabels:
- priority/critical-urgent
- priority/important-soon
- priority/important-longterm
- priority/backlog
maxConcurrency: 3
taskTemplate:
# Base configuration — used when no override matches
model: opus
type: claude-code
workspaceRef:
name: kelos-agent
credentials:
type: oauth
secretRef:
name: kelos-credentials
agentConfigRef:
name: kelos-dev-agent
branch: "kelos-task-{{.Number}}"
ttlSecondsAfterFinished: 3600
promptTemplate: |
... (existing kelos-workers prompt)
podOverrides:
resources:
requests:
cpu: "250m"
memory: "512Mi"
ephemeral-storage: "4Gi"
limits:
cpu: "1"
memory: "2Gi"
ephemeral-storage: "4Gi"
templateOverrides:
# Documentation fixes: use sonnet, less storage
- matchLabels: [kind/docs]
override:
model: sonnet
podOverrides:
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "2Gi"
# Backlog items: use sonnet to save costs
- matchLabels: [priority/backlog]
override:
model: sonnetExample: Multi-tier agent routing for external projects
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: issue-resolver
spec:
when:
githubIssues:
labels: [bot-ready]
taskTemplate:
model: sonnet # default: cheaper model
type: claude-code
workspaceRef:
name: my-repo
credentials:
type: api-key
secretRef:
name: anthropic-key
branch: "auto-fix-{{.Number}}"
promptTemplate: |
Fix issue #{{.Number}}: {{.Title}}
{{.Body}}
templateOverrides:
# Complex features get opus + more resources
- matchLabels: [kind/feature, complexity/high]
override:
model: opus
podOverrides:
resources:
requests:
memory: "1Gi"
limits:
memory: "4Gi"
activeDeadlineSeconds: 7200 # 2 hour deadline
# Security issues get specialized agent config
- matchLabels: [security]
override:
model: opus
agentConfigRef:
name: security-review-agentImplementation
Spawner changes (cmd/kelos-spawner/main.go)
In the task creation loop (around line 258), after rendering the prompt, apply overrides:
// After prompt rendering, apply template overrides
template := ts.Spec.TaskTemplate
if override := findMatchingOverride(ts.Spec.TemplateOverrides, item.Labels); override != nil {
template = applyOverride(template, override)
}The findMatchingOverride function iterates overrides in order, returning the first one where all matchLabels are present in the item's labels. The applyOverride function merges non-zero fields from the override into a copy of the base template.
CRD changes (api/v1alpha1/taskspawner_types.go)
Add the three new types shown above. The TaskTemplateOverride struct intentionally omits fields that should not vary per item:
type(agent type) — changing agent type per item is too fundamental; use separate spawnerscredentials— credentials are tied to agent typeworkspaceRef— workspace is shared across all tasksbranch— branch template is part of the dedup/locking logicttlSecondsAfterFinished— TTL is an operational concern, not per-item
Validation
Add a CEL validation rule ensuring templateOverrides entries have at least one matchLabels entry OR are the last in the list (catch-all):
// +kubebuilder:validation:XValidation:rule="self.templateOverrides.all(i, i == self.templateOverrides.size()-1 || size(self.templateOverrides[i].matchLabels) > 0)",message="only the last templateOverride may have empty matchLabels (catch-all)"Estimated scope
- ~60 lines: new types in
taskspawner_types.go - ~40 lines:
findMatchingOverride+applyOverridein spawner - ~100 lines: unit tests for override matching and merging
- ~20 lines: CRD generation (
make update)
Design decisions
Why label-based matching (not CEL)?
CEL-based matching (#498) is more powerful but:
- CEL adds a non-trivial dependency and evaluation overhead per item
- Label matching covers the dominant use case (issues are already categorized by labels during triage)
- If API: Add matchExpressions field to TaskSpawner for CEL-based work item filtering #498 lands,
matchExpressionscould be added as an alternative tomatchLabelsonTemplateOverride - Label matching is immediately understandable — no learning curve
Why first-match (not merge-all)?
Merging all matching overrides creates ambiguity about field precedence when multiple overrides set the same field. First-match is deterministic and mirrors how Kubernetes NetworkPolicy, PodSecurityPolicy, and ValidatingAdmissionPolicy work with ordered rule lists.
Why not just Go templates in model/agentConfigRef?
Go template evaluation ({{if hasLabel .Labels "kind/docs"}}sonnet{{else}}opus{{end}}) would work for model but:
- Templates can't select a different
AgentConfigRef(it's a struct, not a string) - Templates can't set
PodOverrides(nested resource specs) - Template errors at render time are hard to debug and validate
- Structured overrides are lintable, diffable, and visible in
kubectl describe
Backward compatibility
- Fully additive:
templateOverridesis optional and defaults to empty - When empty, behavior is identical to today (base
taskTemplateused for all items) - No changes to existing CRDs, spawner behavior, or Task structure
- Works with all source types (GitHub Issues, Jira, Cron — though cron items have no labels, so overrides would only match via empty
matchLabelscatch-all)
Related issues
- API: Add matchExpressions field to TaskSpawner for CEL-based work item filtering #498 (CEL filtering) — complementary: CEL decides IF a task is created, overrides decide HOW it's configured. If CEL lands, it could be added as an alternative matcher on
TemplateOverride - New use case: Monorepo support with workspace scoping, sparse checkout, and per-service agent routing #518 (monorepo routing) — complementary: monorepo routing decides WHICH workspace/subpath, overrides decide which model/config within a single workspace
- API: Add cost/token Prometheus metrics and BudgetPolicy for production cost governance #355 (cost metrics) — supporting: cost metrics let operators measure the savings from using overrides
- Workflow: Add kelos-retrospective TaskSpawner for continuous PR outcome analysis and prompt improvement #513 (retrospectives) — supporting: retrospectives could recommend which issues should use which model tier based on outcome data