Background
The current agent.preset approach only supports one agent per Helm release and couples agent selection to hidden template logic. After deeper analysis (including multi-AZ EKS behaviour), this proposal has evolved to address two problems together: multi-agent support and auth persistence across pod restarts.
Problem 1 — Single-agent limitation (preset)
Current design:
- Only one agent per release
- Adding a new preset requires editing
_helpers.tpl
- Cannot run kiro + claude simultaneously on different Discord channels
Problem 2 — PVC + StatefulSet is wrong for auth-only persistence
The obvious fix for stable storage across pod restarts is StatefulSet + volumeClaimTemplates. However this creates a cross-AZ problem on EKS (and any multi-AZ cluster):
- EBS volumes are AZ-scoped
- If a node fails and the pod reschedules to a different AZ, the EBS volume cannot be attached → pod stuck in
Pending
- EFS avoids this but is overkill for a few KB of auth token files
The data that actually needs persistence is tiny and infrequently changing:
| Path |
Content |
Changes when |
.kiro/settings/ |
OAuth login token |
Only on re-login |
.kiro/steering/ |
Steering config |
User changes |
~/.claude/ |
Claude OAuth token |
Only on re-login |
~/.codex/ |
Codex auth |
Only on re-login |
.kiro/sessions/ |
ACP session cache |
Every restart (ephemeral) |
.semantic_search/ |
Search index |
Rebuild on restart is fine |
A generic S3/MinIO backup approach handles all CLI types without AZ coupling.
Proposed design
values.yaml structure — agents map (one entry = one Deployment)
agents:
kiro:
image:
repository: ghcr.io/thepagent/agent-broker
tag: ""
command: kiro-cli
args: [acp, --trust-all-tools]
discord:
botToken: ""
allowedChannels:
- "YOUR_CHANNEL_ID"
workingDir: /home/agent
env: {}
envFrom: []
pool:
maxSessions: 10
sessionTtlHours: 24
reactions:
enabled: true
removeAfterReply: false
persistence:
s3:
enabled: false
bucket: ""
prefix: "agents/kiro"
credentialsSecret: "" # Secret with AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
endpoint: "" # for MinIO: http://minio.default.svc:9000
paths:
- .kiro/settings # paths relative to $HOME to backup/restore
- .kiro/steering
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
Resources created per agent entry:
Deployment/<fullname>-<name> — replaces StatefulSet (no PVC needed)
ConfigMap/<fullname>-<name> — config.toml + optional AGENTS.md
Secret/<fullname>-<name> — discord bot token
Auth persistence via init container + preStop hook:
Pod startup:
init container → aws s3 sync s3://<bucket>/<prefix>/ /home/agent/ (restore)
main container → agent-broker starts with auth already in place
Pod shutdown:
preStop hook → aws s3 sync /home/agent/ s3://<bucket>/<prefix>/ (backup)
main container → terminates
Runtime data (sessions, search index) uses emptyDir — ephemeral, no AZ dependency.
Why Deployment instead of StatefulSet:
- No
volumeClaimTemplates needed → no EBS AZ binding
- Simpler — no headless Service requirement
emptyDir for runtime data is sufficient
Generic across all CLI types
Since each CLI stores auth in a different home directory path, the persistence.paths field lets users specify exactly what to back up:
| CLI |
Paths to back up |
| kiro-cli |
.kiro/settings, .kiro/steering |
| claude-agent-acp |
.claude/ |
| codex |
.codex/ |
| gemini |
.config/gemini/ |
No chart changes needed when a new CLI is added.
Breaking change notice
⚠️ This is a breaking change from agent.preset.
| Before |
After |
agent.preset: kiro |
agents.kiro.command: kiro-cli |
discord.botToken |
agents.kiro.discord.botToken |
discord.allowedChannels |
agents.kiro.discord.allowedChannels |
| Single Deployment |
One Deployment per agent |
| PVC / StatefulSet |
S3/MinIO init container + preStop |
Recommend releasing as 0.4.0 with migration notes.
CI/CD: Helm chart testing
As part of this change, add automated helm chart testing:
- name: Lint chart
run: helm lint charts/agent-broker
- name: Template test (each agent type)
run: |
for cmd in kiro-cli claude-agent-acp codex-acp gemini; do
helm template test charts/agent-broker \
--set agents.test.command=$cmd \
--set agents.test.discord.botToken=test \
--set "agents.test.discord.allowedChannels={123}" \
--set agents.test.args={} \
--set agents.test.pool.maxSessions=5 \
--set agents.test.pool.sessionTtlHours=24 \
--set agents.test.reactions.enabled=true \
--set agents.test.reactions.removeAfterReply=false | kubectl apply --dry-run=client -f -
done
cc @thepagent
Background
The current
agent.presetapproach only supports one agent per Helm release and couples agent selection to hidden template logic. After deeper analysis (including multi-AZ EKS behaviour), this proposal has evolved to address two problems together: multi-agent support and auth persistence across pod restarts.Problem 1 — Single-agent limitation (
preset)Current design:
_helpers.tplProblem 2 — PVC + StatefulSet is wrong for auth-only persistence
The obvious fix for stable storage across pod restarts is StatefulSet +
volumeClaimTemplates. However this creates a cross-AZ problem on EKS (and any multi-AZ cluster):PendingThe data that actually needs persistence is tiny and infrequently changing:
.kiro/settings/.kiro/steering/~/.claude/~/.codex/.kiro/sessions/.semantic_search/A generic S3/MinIO backup approach handles all CLI types without AZ coupling.
Proposed design
values.yamlstructure —agentsmap (one entry = one Deployment)Resources created per agent entry:
Deployment/<fullname>-<name>— replaces StatefulSet (no PVC needed)ConfigMap/<fullname>-<name>— config.toml + optional AGENTS.mdSecret/<fullname>-<name>— discord bot tokenAuth persistence via init container + preStop hook:
Runtime data (sessions, search index) uses
emptyDir— ephemeral, no AZ dependency.Why Deployment instead of StatefulSet:
volumeClaimTemplatesneeded → no EBS AZ bindingemptyDirfor runtime data is sufficientGeneric across all CLI types
Since each CLI stores auth in a different home directory path, the
persistence.pathsfield lets users specify exactly what to back up:.kiro/settings,.kiro/steering.claude/.codex/.config/gemini/No chart changes needed when a new CLI is added.
Breaking change notice
agent.preset.agent.preset: kiroagents.kiro.command: kiro-clidiscord.botTokenagents.kiro.discord.botTokendiscord.allowedChannelsagents.kiro.discord.allowedChannelsRecommend releasing as
0.4.0with migration notes.CI/CD: Helm chart testing
As part of this change, add automated helm chart testing:
cc @thepagent