API: Add TaskTemplate overrides for conditional per-item task configuration

🤖 **Kelos Agent** @gjkim42

## Summary

TaskSpawner's `taskTemplate` applies the same model, agentConfigRef, resource limits, and prompt to every work item it discovers. There is no way to vary the task configuration based on work item properties. This forces operators to create separate TaskSpawners for each configuration tier — even when the source filter, workspace, credentials, and overall workflow are identical.

This proposal adds a `templateOverrides` field to `TaskSpawnerSpec` that conditionally overrides `taskTemplate` fields based on work item labels, enabling cost-optimized task routing from a single TaskSpawner.

## Problem

### 1. One TaskSpawner = one configuration for all items

Looking at `kelos-workers.yaml` (self-development), every issue labeled `actor/kelos` gets `model: opus`, 512Mi–2Gi memory, and the same agentConfig — regardless of issue complexity. In practice:

- **Simple fixes** (typo, label rename, one-line bug) waste opus capacity and cost $5–20 each when sonnet at $0.50–2 would suffice
- **Documentation fixes** (`kind/docs`) don't need the full 4Gi ephemeral storage or complex agentConfig skills
- **Complex changes** (`kind/api`, `kind/feature`) may need more resources (memory, deadline) than simple ones

The only way to differentiate today is to create multiple TaskSpawners with different label filters:

```yaml
# kelos-workers-simple.yaml — for simple issues
spec:
  when:
    githubIssues:
      labels: [actor/kelos, kind/docs]
      excludeLabels: [kelos/needs-input]
  taskTemplate:
    model: sonnet   # cheaper
    # ... same workspace, credentials, branch template, agentConfig ...

# kelos-workers-complex.yaml — for complex issues
spec:
  when:
    githubIssues:
      labels: [actor/kelos, kind/feature]
      excludeLabels: [kelos/needs-input]
  taskTemplate:
    model: opus     # more capable
    # ... same workspace, credentials, branch template, agentConfig ...
```

This creates operational overhead: the prompt, branch template, workspace, credentials, and excludeLabels must be kept in sync across multiple files. With N complexity tiers, you need N TaskSpawners that share 90% of their configuration.

### 2. Self-development evidence

Kelos's own self-development workflow demonstrates this cost inefficiency. Looking at recent tasks:

- Issue #516 (docs template syntax) used opus for a documentation fix — sonnet would have been sufficient
- Issue #515 (config drift) used opus for a documentation/config review — could be sonnet
- Issue #528 (pagination bug) genuinely needed opus for a complex code change

With ~30 generated-by-kelos issues in the last week, the majority are documentation, label management, or simple code changes that don't require the most expensive model.

### 3. `maxConcurrency` conflict

When multiple TaskSpawners share the same branch template (`kelos-task-{{.Number}}`), they contend for branch locks. If `kelos-workers-simple` and `kelos-workers-complex` both try to process the same issue (because label sets overlap), they create conflicting tasks. A single TaskSpawner with conditional overrides eliminates this by construction.

## Proposed API

### Add `templateOverrides` to `TaskSpawnerSpec`

```go
type TaskSpawnerSpec struct {
    // ... existing fields ...

    // TemplateOverrides conditionally overrides taskTemplate fields based on
    // work item properties. Overrides are evaluated in order; the first
    // matching override is applied. If no override matches, the base
    // taskTemplate is used unchanged.
    // +optional
    TemplateOverrides []TemplateOverride `json:"templateOverrides,omitempty"`
}

type TemplateOverride struct {
    // MatchLabels selects work items that have ALL of these labels.
    // When empty, the override matches all items (useful as a default
    // fallback when placed last in the list).
    // +optional
    MatchLabels []string `json:"matchLabels,omitempty"`

    // Override specifies the taskTemplate fields to replace.
    // Only non-nil/non-zero fields take effect; other fields
    // are inherited from the base taskTemplate.
    // +kubebuilder:validation:Required
    Override TaskTemplateOverride `json:"override"`
}

type TaskTemplateOverride struct {
    // Model overrides the agent model.
    // +optional
    Model string `json:"model,omitempty"`

    // AgentConfigRef overrides the agent configuration.
    // +optional
    AgentConfigRef *AgentConfigReference `json:"agentConfigRef,omitempty"`

    // PodOverrides overrides the pod configuration (resources, deadline, env).
    // +optional
    PodOverrides *PodOverrides `json:"podOverrides,omitempty"`

    // PromptTemplate overrides the prompt template.
    // +optional
    PromptTemplate string `json:"promptTemplate,omitempty"`
}
```

### Example: Cost-optimized self-development

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: kelos-workers
spec:
  when:
    githubIssues:
      labels: [actor/kelos]
      excludeLabels: [kelos/needs-input]
      priorityLabels:
        - priority/critical-urgent
        - priority/important-soon
        - priority/important-longterm
        - priority/backlog
  maxConcurrency: 3
  taskTemplate:
    # Base configuration — used when no override matches
    model: opus
    type: claude-code
    workspaceRef:
      name: kelos-agent
    credentials:
      type: oauth
      secretRef:
        name: kelos-credentials
    agentConfigRef:
      name: kelos-dev-agent
    branch: "kelos-task-{{.Number}}"
    ttlSecondsAfterFinished: 3600
    promptTemplate: |
      ... (existing kelos-workers prompt)
    podOverrides:
      resources:
        requests:
          cpu: "250m"
          memory: "512Mi"
          ephemeral-storage: "4Gi"
        limits:
          cpu: "1"
          memory: "2Gi"
          ephemeral-storage: "4Gi"

  templateOverrides:
    # Documentation fixes: use sonnet, less storage
    - matchLabels: [kind/docs]
      override:
        model: sonnet
        podOverrides:
          resources:
            requests:
              ephemeral-storage: "2Gi"
            limits:
              ephemeral-storage: "2Gi"

    # Backlog items: use sonnet to save costs
    - matchLabels: [priority/backlog]
      override:
        model: sonnet
```

### Example: Multi-tier agent routing for external projects

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: issue-resolver
spec:
  when:
    githubIssues:
      labels: [bot-ready]
  taskTemplate:
    model: sonnet  # default: cheaper model
    type: claude-code
    workspaceRef:
      name: my-repo
    credentials:
      type: api-key
      secretRef:
        name: anthropic-key
    branch: "auto-fix-{{.Number}}"
    promptTemplate: |
      Fix issue #{{.Number}}: {{.Title}}
      {{.Body}}

  templateOverrides:
    # Complex features get opus + more resources
    - matchLabels: [kind/feature, complexity/high]
      override:
        model: opus
        podOverrides:
          resources:
            requests:
              memory: "1Gi"
            limits:
              memory: "4Gi"
          activeDeadlineSeconds: 7200  # 2 hour deadline

    # Security issues get specialized agent config
    - matchLabels: [security]
      override:
        model: opus
        agentConfigRef:
          name: security-review-agent
```

## Implementation

### Spawner changes (`cmd/kelos-spawner/main.go`)

In the task creation loop (around line 258), after rendering the prompt, apply overrides:

```go
// After prompt rendering, apply template overrides
template := ts.Spec.TaskTemplate
if override := findMatchingOverride(ts.Spec.TemplateOverrides, item.Labels); override != nil {
    template = applyOverride(template, override)
}
```

The `findMatchingOverride` function iterates overrides in order, returning the first one where all `matchLabels` are present in the item's labels. The `applyOverride` function merges non-zero fields from the override into a copy of the base template.

### CRD changes (`api/v1alpha1/taskspawner_types.go`)

Add the three new types shown above. The `TaskTemplateOverride` struct intentionally omits fields that should not vary per item:
- `type` (agent type) — changing agent type per item is too fundamental; use separate spawners
- `credentials` — credentials are tied to agent type
- `workspaceRef` — workspace is shared across all tasks
- `branch` — branch template is part of the dedup/locking logic
- `ttlSecondsAfterFinished` — TTL is an operational concern, not per-item

### Validation

Add a CEL validation rule ensuring `templateOverrides` entries have at least one `matchLabels` entry OR are the last in the list (catch-all):

```go
// +kubebuilder:validation:XValidation:rule="self.templateOverrides.all(i, i == self.templateOverrides.size()-1 || size(self.templateOverrides[i].matchLabels) > 0)",message="only the last templateOverride may have empty matchLabels (catch-all)"
```

### Estimated scope

- ~60 lines: new types in `taskspawner_types.go`
- ~40 lines: `findMatchingOverride` + `applyOverride` in spawner
- ~100 lines: unit tests for override matching and merging
- ~20 lines: CRD generation (`make update`)

## Design decisions

### Why label-based matching (not CEL)?

CEL-based matching (#498) is more powerful but:
1. CEL adds a non-trivial dependency and evaluation overhead per item
2. Label matching covers the dominant use case (issues are already categorized by labels during triage)
3. If #498 lands, `matchExpressions` could be added as an alternative to `matchLabels` on `TemplateOverride`
4. Label matching is immediately understandable — no learning curve

### Why first-match (not merge-all)?

Merging all matching overrides creates ambiguity about field precedence when multiple overrides set the same field. First-match is deterministic and mirrors how Kubernetes `NetworkPolicy`, `PodSecurityPolicy`, and `ValidatingAdmissionPolicy` work with ordered rule lists.

### Why not just Go templates in model/agentConfigRef?

Go template evaluation (`{{if hasLabel .Labels "kind/docs"}}sonnet{{else}}opus{{end}}`) would work for `model` but:
1. Templates can't select a different `AgentConfigRef` (it's a struct, not a string)
2. Templates can't set `PodOverrides` (nested resource specs)
3. Template errors at render time are hard to debug and validate
4. Structured overrides are lintable, diffable, and visible in `kubectl describe`

## Backward compatibility

- Fully additive: `templateOverrides` is optional and defaults to empty
- When empty, behavior is identical to today (base `taskTemplate` used for all items)
- No changes to existing CRDs, spawner behavior, or Task structure
- Works with all source types (GitHub Issues, Jira, Cron — though cron items have no labels, so overrides would only match via empty `matchLabels` catch-all)

## Related issues

- #498 (CEL filtering) — complementary: CEL decides IF a task is created, overrides decide HOW it's configured. If CEL lands, it could be added as an alternative matcher on `TemplateOverride`
- #518 (monorepo routing) — complementary: monorepo routing decides WHICH workspace/subpath, overrides decide which model/config within a single workspace
- #355 (cost metrics) — supporting: cost metrics let operators measure the savings from using overrides
- #513 (retrospectives) — supporting: retrospectives could recommend which issues should use which model tier based on outcome data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Add TaskTemplate overrides for conditional per-item task configuration #537

Summary

Problem

1. One TaskSpawner = one configuration for all items

2. Self-development evidence

3. `maxConcurrency` conflict

Proposed API

Add `templateOverrides` to `TaskSpawnerSpec`

Example: Cost-optimized self-development

Example: Multi-tier agent routing for external projects

Implementation

Spawner changes (`cmd/kelos-spawner/main.go`)

CRD changes (`api/v1alpha1/taskspawner_types.go`)

Validation

Estimated scope

Design decisions

Why label-based matching (not CEL)?

Why first-match (not merge-all)?

Why not just Go templates in model/agentConfigRef?

Backward compatibility

Related issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: Add TaskTemplate overrides for conditional per-item task configuration #537

Description

Summary

Problem

1. One TaskSpawner = one configuration for all items

2. Self-development evidence

3. maxConcurrency conflict

Proposed API

Add templateOverrides to TaskSpawnerSpec

Example: Cost-optimized self-development

Example: Multi-tier agent routing for external projects

Implementation

Spawner changes (cmd/kelos-spawner/main.go)

CRD changes (api/v1alpha1/taskspawner_types.go)

Validation

Estimated scope

Design decisions

Why label-based matching (not CEL)?

Why first-match (not merge-all)?

Why not just Go templates in model/agentConfigRef?

Backward compatibility

Related issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3. `maxConcurrency` conflict

Add `templateOverrides` to `TaskSpawnerSpec`

Spawner changes (`cmd/kelos-spawner/main.go`)

CRD changes (`api/v1alpha1/taskspawner_types.go`)