Skip to content

API: Add TaskTemplate overrides for conditional per-item task configuration #537

@kelos-bot

Description

@kelos-bot

🤖 Kelos Agent @gjkim42

Summary

TaskSpawner's taskTemplate applies the same model, agentConfigRef, resource limits, and prompt to every work item it discovers. There is no way to vary the task configuration based on work item properties. This forces operators to create separate TaskSpawners for each configuration tier — even when the source filter, workspace, credentials, and overall workflow are identical.

This proposal adds a templateOverrides field to TaskSpawnerSpec that conditionally overrides taskTemplate fields based on work item labels, enabling cost-optimized task routing from a single TaskSpawner.

Problem

1. One TaskSpawner = one configuration for all items

Looking at kelos-workers.yaml (self-development), every issue labeled actor/kelos gets model: opus, 512Mi–2Gi memory, and the same agentConfig — regardless of issue complexity. In practice:

  • Simple fixes (typo, label rename, one-line bug) waste opus capacity and cost $5–20 each when sonnet at $0.50–2 would suffice
  • Documentation fixes (kind/docs) don't need the full 4Gi ephemeral storage or complex agentConfig skills
  • Complex changes (kind/api, kind/feature) may need more resources (memory, deadline) than simple ones

The only way to differentiate today is to create multiple TaskSpawners with different label filters:

# kelos-workers-simple.yaml — for simple issues
spec:
  when:
    githubIssues:
      labels: [actor/kelos, kind/docs]
      excludeLabels: [kelos/needs-input]
  taskTemplate:
    model: sonnet   # cheaper
    # ... same workspace, credentials, branch template, agentConfig ...

# kelos-workers-complex.yaml — for complex issues
spec:
  when:
    githubIssues:
      labels: [actor/kelos, kind/feature]
      excludeLabels: [kelos/needs-input]
  taskTemplate:
    model: opus     # more capable
    # ... same workspace, credentials, branch template, agentConfig ...

This creates operational overhead: the prompt, branch template, workspace, credentials, and excludeLabels must be kept in sync across multiple files. With N complexity tiers, you need N TaskSpawners that share 90% of their configuration.

2. Self-development evidence

Kelos's own self-development workflow demonstrates this cost inefficiency. Looking at recent tasks:

With ~30 generated-by-kelos issues in the last week, the majority are documentation, label management, or simple code changes that don't require the most expensive model.

3. maxConcurrency conflict

When multiple TaskSpawners share the same branch template (kelos-task-{{.Number}}), they contend for branch locks. If kelos-workers-simple and kelos-workers-complex both try to process the same issue (because label sets overlap), they create conflicting tasks. A single TaskSpawner with conditional overrides eliminates this by construction.

Proposed API

Add templateOverrides to TaskSpawnerSpec

type TaskSpawnerSpec struct {
    // ... existing fields ...

    // TemplateOverrides conditionally overrides taskTemplate fields based on
    // work item properties. Overrides are evaluated in order; the first
    // matching override is applied. If no override matches, the base
    // taskTemplate is used unchanged.
    // +optional
    TemplateOverrides []TemplateOverride `json:"templateOverrides,omitempty"`
}

type TemplateOverride struct {
    // MatchLabels selects work items that have ALL of these labels.
    // When empty, the override matches all items (useful as a default
    // fallback when placed last in the list).
    // +optional
    MatchLabels []string `json:"matchLabels,omitempty"`

    // Override specifies the taskTemplate fields to replace.
    // Only non-nil/non-zero fields take effect; other fields
    // are inherited from the base taskTemplate.
    // +kubebuilder:validation:Required
    Override TaskTemplateOverride `json:"override"`
}

type TaskTemplateOverride struct {
    // Model overrides the agent model.
    // +optional
    Model string `json:"model,omitempty"`

    // AgentConfigRef overrides the agent configuration.
    // +optional
    AgentConfigRef *AgentConfigReference `json:"agentConfigRef,omitempty"`

    // PodOverrides overrides the pod configuration (resources, deadline, env).
    // +optional
    PodOverrides *PodOverrides `json:"podOverrides,omitempty"`

    // PromptTemplate overrides the prompt template.
    // +optional
    PromptTemplate string `json:"promptTemplate,omitempty"`
}

Example: Cost-optimized self-development

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: kelos-workers
spec:
  when:
    githubIssues:
      labels: [actor/kelos]
      excludeLabels: [kelos/needs-input]
      priorityLabels:
        - priority/critical-urgent
        - priority/important-soon
        - priority/important-longterm
        - priority/backlog
  maxConcurrency: 3
  taskTemplate:
    # Base configuration — used when no override matches
    model: opus
    type: claude-code
    workspaceRef:
      name: kelos-agent
    credentials:
      type: oauth
      secretRef:
        name: kelos-credentials
    agentConfigRef:
      name: kelos-dev-agent
    branch: "kelos-task-{{.Number}}"
    ttlSecondsAfterFinished: 3600
    promptTemplate: |
      ... (existing kelos-workers prompt)
    podOverrides:
      resources:
        requests:
          cpu: "250m"
          memory: "512Mi"
          ephemeral-storage: "4Gi"
        limits:
          cpu: "1"
          memory: "2Gi"
          ephemeral-storage: "4Gi"

  templateOverrides:
    # Documentation fixes: use sonnet, less storage
    - matchLabels: [kind/docs]
      override:
        model: sonnet
        podOverrides:
          resources:
            requests:
              ephemeral-storage: "2Gi"
            limits:
              ephemeral-storage: "2Gi"

    # Backlog items: use sonnet to save costs
    - matchLabels: [priority/backlog]
      override:
        model: sonnet

Example: Multi-tier agent routing for external projects

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: issue-resolver
spec:
  when:
    githubIssues:
      labels: [bot-ready]
  taskTemplate:
    model: sonnet  # default: cheaper model
    type: claude-code
    workspaceRef:
      name: my-repo
    credentials:
      type: api-key
      secretRef:
        name: anthropic-key
    branch: "auto-fix-{{.Number}}"
    promptTemplate: |
      Fix issue #{{.Number}}: {{.Title}}
      {{.Body}}

  templateOverrides:
    # Complex features get opus + more resources
    - matchLabels: [kind/feature, complexity/high]
      override:
        model: opus
        podOverrides:
          resources:
            requests:
              memory: "1Gi"
            limits:
              memory: "4Gi"
          activeDeadlineSeconds: 7200  # 2 hour deadline

    # Security issues get specialized agent config
    - matchLabels: [security]
      override:
        model: opus
        agentConfigRef:
          name: security-review-agent

Implementation

Spawner changes (cmd/kelos-spawner/main.go)

In the task creation loop (around line 258), after rendering the prompt, apply overrides:

// After prompt rendering, apply template overrides
template := ts.Spec.TaskTemplate
if override := findMatchingOverride(ts.Spec.TemplateOverrides, item.Labels); override != nil {
    template = applyOverride(template, override)
}

The findMatchingOverride function iterates overrides in order, returning the first one where all matchLabels are present in the item's labels. The applyOverride function merges non-zero fields from the override into a copy of the base template.

CRD changes (api/v1alpha1/taskspawner_types.go)

Add the three new types shown above. The TaskTemplateOverride struct intentionally omits fields that should not vary per item:

  • type (agent type) — changing agent type per item is too fundamental; use separate spawners
  • credentials — credentials are tied to agent type
  • workspaceRef — workspace is shared across all tasks
  • branch — branch template is part of the dedup/locking logic
  • ttlSecondsAfterFinished — TTL is an operational concern, not per-item

Validation

Add a CEL validation rule ensuring templateOverrides entries have at least one matchLabels entry OR are the last in the list (catch-all):

// +kubebuilder:validation:XValidation:rule="self.templateOverrides.all(i, i == self.templateOverrides.size()-1 || size(self.templateOverrides[i].matchLabels) > 0)",message="only the last templateOverride may have empty matchLabels (catch-all)"

Estimated scope

  • ~60 lines: new types in taskspawner_types.go
  • ~40 lines: findMatchingOverride + applyOverride in spawner
  • ~100 lines: unit tests for override matching and merging
  • ~20 lines: CRD generation (make update)

Design decisions

Why label-based matching (not CEL)?

CEL-based matching (#498) is more powerful but:

  1. CEL adds a non-trivial dependency and evaluation overhead per item
  2. Label matching covers the dominant use case (issues are already categorized by labels during triage)
  3. If API: Add matchExpressions field to TaskSpawner for CEL-based work item filtering #498 lands, matchExpressions could be added as an alternative to matchLabels on TemplateOverride
  4. Label matching is immediately understandable — no learning curve

Why first-match (not merge-all)?

Merging all matching overrides creates ambiguity about field precedence when multiple overrides set the same field. First-match is deterministic and mirrors how Kubernetes NetworkPolicy, PodSecurityPolicy, and ValidatingAdmissionPolicy work with ordered rule lists.

Why not just Go templates in model/agentConfigRef?

Go template evaluation ({{if hasLabel .Labels "kind/docs"}}sonnet{{else}}opus{{end}}) would work for model but:

  1. Templates can't select a different AgentConfigRef (it's a struct, not a string)
  2. Templates can't set PodOverrides (nested resource specs)
  3. Template errors at render time are hard to debug and validate
  4. Structured overrides are lintable, diffable, and visible in kubectl describe

Backward compatibility

  • Fully additive: templateOverrides is optional and defaults to empty
  • When empty, behavior is identical to today (base taskTemplate used for all items)
  • No changes to existing CRDs, spawner behavior, or Task structure
  • Works with all source types (GitHub Issues, Jira, Cron — though cron items have no labels, so overrides would only match via empty matchLabels catch-all)

Related issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions