Skip to content

Helm chart 0.7.0 upgrade pitfalls: PVC data loss, stale image tag, missing values on upgrade #247

@shaun-agent

Description

@shaun-agent

Summary

Upgrading from openab-0.5.0 to openab-0.7.0 hit three distinct issues that compounded into extended downtime. Documenting all findings here so future PRs can reference them.

Environment

  • Chart: openab-0.5.0openab-0.7.0
  • Platform: Kubernetes (OrbStack, local)
  • Release name: openab

Issue 1: PVC data loss on upgrade (silent)

Root cause: The chart restructured from flat persistence.* to agents.<name>.persistence.*. This changes the PVC name from openab to openab-kiro. Helm treats the old PVC as "no longer part of this release" and deletes it.

Impact: All agent data lost — kiro auth (data.sqlite3), steering files, gh config, session history.

Workaround applied:

# Protect old PVC before upgrade
kubectl annotate pvc openab helm.sh/resource-policy=keep

# After upgrade, mount old PVC via rescue pod and copy data to new PVC
kubectl run pvc-rescue --image=busybox --restart=Never --overrides='...(mount old PVC)...'
kubectl cp pvc-rescue:/old/ /tmp/backup/
kubectl cp /tmp/backup/.kiro <new-pod>:/home/agent/.kiro
kubectl cp /tmp/backup/.local <new-pod>:/home/agent/.local  # ← kiro auth lives here
kubectl cp /tmp/backup/.config <new-pod>:/home/agent/.config

Recommended fix (chart-level):

  1. Add helm.sh/resource-policy: keep annotation to templates/pvc.yaml — prevents accidental PVC deletion on any future rename. The Secret template already has this annotation; PVC should too. (See PR feat(helm): add persistence.existingClaim support #166 comment)
  2. Add persistence.existingClaim support (PR feat(helm): add persistence.existingClaim support #166, addresses feat: add persistence.existingClaim support to Helm chart #120) — lets users explicitly point to an old PVC during migration.

Related: #117, #120, PR #166, external writeup


Issue 2: image.tag defaults to stale commit hash, not appVersion

Root cause: values.yaml in chart 0.7.0 hardcodes tag: "94253a5" (an old commit). The chart metadata says appVersion: "0.7.0" and the ghcr.io/openabdev/openab:0.7.0 image exists, but the default tag doesn't point to it.

Impact: After upgrading to chart 0.7.0, the pod still runs the old binary. New features (STT voice transcription) silently don't work — the config is loaded but the code to handle it doesn't exist in the old image.

Workaround applied:

--set image.tag=0.7.0

Recommended fix: Either:

  • Remove the hardcoded tag from values.yaml and let the template fall back to .Chart.AppVersion (the comment already says "tag defaults to .Chart.AppVersion" but the hardcoded value overrides it)
  • Or update the hardcoded tag to match the release

Related: #235


Issue 3: helm upgrade silently drops values not explicitly passed

Root cause: Helm does not merge user-supplied values across revisions. Any value not passed in the upgrade command resets to chart defaults. We had 3 Discord channel IDs configured; the upgrade command only passed 2. The bot silently ignored messages from the third channel.

Impact: Bot appeared "down" — connected to Discord, no errors in logs, but not responding in one channel. Difficult to diagnose because the logs showed channels=2 with no warning about the change.

Workaround applied:

# Always capture full state before upgrading
helm get values <release>
kubectl get configmap <name> -o yaml
# Then pass ALL values explicitly

Recommended improvement:

  • The chart NOTES.txt could print the configured channel count and IDs after install/upgrade, making it easier to spot missing channels.
  • Consider supporting a values.yaml file approach for upgrades instead of long --set chains.

Suggested pre-upgrade checklist (for docs)

# 1. Record current state
helm get values openab > /tmp/current-values.yaml
kubectl get pvc
kubectl exec deployment/<name> -- cat /etc/openab/config.toml

# 2. Preview changes
helm diff upgrade openab <chart> --version <ver> -f values.yaml

# 3. Protect stateful resources
kubectl annotate pvc <name> helm.sh/resource-policy=keep

# 4. Upgrade
helm upgrade ...

# 5. Verify
kubectl logs deployment/<name> --tail=20
kubectl exec deployment/<name> -- cat /etc/openab/config.toml

Data locations inside the pod (for reference)

Path Content Critical?
.local/share/kiro-cli/data.sqlite3 Kiro auth database ✅ Yes
.kiro/steering/ Agent identity & steering ✅ Yes
.config/gh/ GitHub CLI auth ✅ Yes
.kiro/settings/cli.json Kiro settings (often empty {}) No
.kiro/sessions/ ACP session cache No (ephemeral)
.semantic_search/ Search index No (rebuildable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingp1High — address this sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions