Skip to content

(bug) Fix stale Helm repository cache#1691

Merged
gianlucam76 merged 1 commit intoprojectsveltos:mainfrom
gianlucam76:helm-chart-version-not-found
Apr 4, 2026
Merged

(bug) Fix stale Helm repository cache#1691
gianlucam76 merged 1 commit intoprojectsveltos:mainfrom
gianlucam76:helm-chart-version-not-found

Conversation

@gianlucam76
Copy link
Copy Markdown
Member

@gianlucam76 gianlucam76 commented Apr 4, 2026

When a ClusterProfile is updated to reference a newer Helm chart version, addon-controller reports that the version does not exist even though it is present in the repository. A controller restart is required to deploy successfully.

Root cause: repoAddOrUpdate skips re-downloading the repository index when the repo is already registered in the in-memory storage variable (which persists for the lifetime of the process). As a result, LocateChart reads a stale on-disk index that does not contain the newly published version.

A reactive recovery function (handleLocateChartError) existed but had two flaws:

  1. It returned the error immediately after refreshing the cache, forcing the caller to wait for the next reconciliation cycle to succeed.
  2. The cache refresh was gated on the error string "no chart version found for", which is only produced by non-OCI repositories — OCI repos emit a different error message and were never healed.

Fix

In both the install (locateLoadAndValidateChart) and upgrade (upgradeRelease) paths, when LocateChart fails, the cache is cleared via removeCachedData and the call is retried once within the same reconciliation. If the retry also fails (e.g., the version genuinely does not exist), the error is returned as before.

This approach:

  • Remains fully reactive — the index is never re-downloaded unless LocateChart actually fails, so deployments to N clusters where the version is already cached are unaffected.
  • Covers both OCI and non-OCI repositories, since the fix is unconditional on the error message.
  • Eliminates the wasted reconciliation cycle that required a controller restart to recover.

When a ClusterProfile is updated to reference a newer Helm chart version, addon-controller reports
that the version does not exist even though it is present in the repository. A controller restart
is required to deploy successfully.

Root cause: repoAddOrUpdate skips re-downloading the repository index when the repo is already registered
in the in-memory storage variable (which persists for the lifetime of the process). As a result, LocateChart
reads a stale on-disk index that does not contain the newly published version.

A reactive recovery function (handleLocateChartError) existed but had two flaws:

1. It returned the error immediately after refreshing the cache, forcing the caller to wait for the next
reconciliation cycle to succeed.
2. The cache refresh was gated on the error string "no chart version found for", which is only produced by
non-OCI repositories — OCI repos emit a different error message and were never healed.

Fix

In both the install (locateLoadAndValidateChart) and upgrade (upgradeRelease) paths, when LocateChart fails, the
cache is cleared via removeCachedData and the call is retried once within the same reconciliation. If the retry also
fails (e.g., the version genuinely does not exist), the error is returned as before.

This approach:

- Remains fully reactive — the index is never re-downloaded unless LocateChart actually fails, so deployments to N
clusters where the version is already cached are unaffected.
- Covers both OCI and non-OCI repositories, since the fix is unconditional on the error message.
- Eliminates the wasted reconciliation cycle that required a controller restart to recover.
@gianlucam76 gianlucam76 merged commit cb914ec into projectsveltos:main Apr 4, 2026
16 checks passed
@gianlucam76 gianlucam76 deleted the helm-chart-version-not-found branch April 4, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant