Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rancher Helm corrupts Helm values.yaml leading to broken applications and doesn't handle upstream changes #35717

Open
ajacques opened this issue Dec 2, 2021 · 33 comments

Comments

@ajacques
Copy link

ajacques commented Dec 2, 2021

Rancher Server Setup

  • Rancher version: 2.6.2
  • Installation option (Docker install/Helm Chart): Docker Install
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
  • Proxy/Cert Details:

Information about the Cluster

  • Kubernetes version: v1.20.12
  • Cluster Type (Local/Downstream): Downstream, Custom, RKE1
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):

Describe the bug
In Rancher 2.6 (as opposed to how 2.5 and before behaved) I now have to explicitly provide all values, even values that I don't change. Thus when an upstream repository changes their values, I don't get the newest changes, and I don't seem to have any way to actually see that the upstream changed values, without seeing a bug or manually investigating. Instead, my explicit values are carried forward through every Helm upgrade without knowing which ones to merge.

There's been what seems like several regressions in the UX for Helm applications. This issue is actually a major trust buster for me since I can no-longer trust that Rancher is going to be able to safely upgrade a Helm upgrade.

To Reproduce

  1. Install a helm application (e.g. Prometheus v14.10.x. Note I didn't use Rancher cluster tools to install this, the issue would apply to any other Helm application)
  2. Change one or two values (e.g. change serverFiles.prometheus.yml.scrape_config: from the default
  3. Install the application
  4. Attempt to upgrade to Prometheus Helm 15.0.1
  5. Make the upgrade. When it finally upgrades, note that all image versions are still at the old version and commits like this one are missed

Additionally, if I try to delete the value from the values.yaml view hoping for Rancher to fallback to the 2.5 behavior, it just proceeds with an empty value, breaking the application.

Result

Expected Result

  1. I want Rancher to be able to show me a diff of the default values from the upstream repo (What did they change?)
  2. If I have overridden values in my Helm application, I want Rancher to be able to show me which values I'm overriding have changed in the upstream or help me perform 3-way.
  3. I want to be able to store only the values that I've overridden, not have to look at the entire set of all values. (This was the Rancher 2.5 behavior)
  4. If I hit upgrade or install with some values, I want to see the exact same values in the Values YAML tab in the Heml UI

Screenshots

Additional context

Possibly related to some other Helm issues I've had: #35236

@ajacques ajacques changed the title Rancher Helm doesn't merge upstream values changes Rancher Helm doesn't merge upstream values changes leading to incorrect configuration Dec 2, 2021
@ajacques
Copy link
Author

ajacques commented Dec 2, 2021

Another issue with Rancher's Helm implementation: I made several changes to the Helm values, but they got lost after another edit operation. It seems like the Rancher UI keeps a cache of the values somewhere (possibly in the browser side) and the next time I performed an upgrade, it lost those changes. I'm trying to track down a repro, but these issues are causing me to not want to make any changes to Helm until these issues are resolved.

@ajacques
Copy link
Author

ajacques commented Dec 3, 2021

There's something really weird going on with Rancher 2.6.2's Helm UI. Possibly some kind of client-side state related issues or how the server side handles changing values from the upstream repo. I've attempted to change values and when I saved a completely different area of the values changed and corrupted the values.

I just did a test where I saved the values that I saw in the Rancher UI, then hit update, then looked at what Rancher shows in the values tab and there were differences:

Here's a snippet of the diff between what I saved vs what Rancher seems to have applied.

[...]
serverFiles:
  prometheus.yml:
    scrape_config:
[...]
452c549,555
<           - target_label: __address__
---
>             source_labels:
>               - __meta_kubernetes_service_annotation_prometheus_io_scrape
>           - action: replace
>             regex: (https?)
>             source_labels:
>               - __meta_kubernetes_service_annotation_prometheus_io_scheme
>             target_label: __address__
454c557
<           - source_labels: [__meta_kubernetes_node_name]
---
>           - action: replace
455a559,560
>             source_labels:
>               - __meta_kubernetes_node_name
457a563,586
>           - action: replace
>             regex: ([^:]+)(?::\d+)?;(\d+)
>             replacement: $1:$2
>             source_labels:
>               - __address__
>               - __meta_kubernetes_service_annotation_prometheus_io_port
>             target_label: __address__
>           - action: labelmap
>             regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
>             replacement: __param_$1
>           - action: labelmap
>             regex: __meta_kubernetes_service_label_(.+)
>           - action: replace
>             source_labels:
>               - __meta_kubernetes_namespace
>             target_label: namespace
>           - action: replace
>             source_labels:
>               - __meta_kubernetes_service_name
>             target_label: service
>           - action: replace
>             source_labels:
>               - __meta_kubernetes_pod_node_name
>             target_label: node
[...]

Rancher is corrupting the values and I can't safely make any changes to a Helm application now.

@ajacques ajacques changed the title Rancher Helm doesn't merge upstream values changes leading to incorrect configuration Rancher Helm corrupts Helm values.yaml leading to broken applications and doesn't handle upstream changes Dec 3, 2021
@ajacques
Copy link
Author

ajacques commented Dec 19, 2021

After further investigation, I do believe the issue is because Rancher is merging array values which is not safe to do without semantic understanding of the values.

Example, if you have an upstream repo that has something like this:

someArrayValue:
- name: Foo
  value: XYZ
- name: Foobar
  value2: ABC

and you make changes, and/or rearrange them (which is a entirely valid change to make as a user)

someArrayValue:
- name: Foobar
  value2: ABC
- name: Foo
  value: XYZ

Then rancher will incorrectly merge these and it'll look like:

someArrayValue:
- name: Foobar
  value2: ABC
  value: XYZ
- name: Foo
  value: XYZ
  value2: ABC

Rancher doesn't seem to have any semantic understanding of the values, thus it seems to just merge things based on array index.

I think the solution should be something like:

  1. Don't try to merge array values unless the user hasn't modified the array
  2. Diff the upstream values from vPrev to vNext and present this in a UI for the user to review and decide
  3. Visually separate the overridden values from the upstream values so I can understand what I've actually changed
  4. Highlight conflicts between my overridden values and changes in the upstream values

@ajacques
Copy link
Author

ajacques commented Jan 14, 2022

This issue is definitely not limited to just this one helm application. I just hit it on yet another Helm template that included arrays.

@jonathon2nd
Copy link

In Rancher 2.6 (as opposed to how 2.5 and before behaved) I now have to explicitly provide all values, even values that I don't change. Thus when an upstream repository changes their values, I don't get the newest changes, and I don't seem to have any way to actually see that the upstream changed values, without seeing a bug or manually investigating. Instead, my explicit values (even ones I didn't change) are carried forward through every Helm upgrade.

This is wild, was so confused when we upgraded to 2.6.3. Is previous behaviors with values going to be restored?

@stale
Copy link

stale bot commented Apr 21, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Apr 21, 2022
@ajacques
Copy link
Author

Still happening

@stale stale bot removed the status/stale label Apr 21, 2022
@jonathon2nd
Copy link

jonathon2nd commented May 11, 2022

This is really terrible.

One idea for a possible work around was to install with helm, and then maybe others can update in UI. NOPE
Screenshot_20220511_154215
It just never loads
Screenshot_20220511_154237

And when you view the installed app in the UI, the values fully enumerated there too.
Screenshot_20220511_154317

I know that is the complete list of values for the application installed. But having to track, change, and manage the entire values is odious. I only care about the changes I made to the values. Managing the rare case of the chart provider making a change to the structure of the values I modify is much easier than this new setup.

@ajacques
Copy link
Author

Confirmed that Rancher 2.6.5 still incorrectly handles YAML values.

@jonathon2nd, totally agree on the user experience for these values. I gave some ideas above, but I want to see some clearer diffing to show what I actually changed vs what changed upstream.

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

Still broken v2.6.6

@ajacques
Copy link
Author

Trying to comment again because it's still marked as stale. But Rancher's helm handing is still buggy.

@jonathon2nd
Copy link

jonathon2nd commented Jul 18, 2022

Yeah, we are switching to Argo. Manual setup with helm cli for infra, Argo for our deployments.

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

Still broken on v2.6.8

@jonathon2nd
Copy link

@ajacques We have so far had great success with Argo-cd. We are currently setting up new infra and clusters, and all non cluster charts are managed by Argo-cd. We are managing both infrastructure (metallb, storage CSI, and a few others) as well as applications and their dbs.

May not be what you are looking for, but so far we are happy. The automatic syncing and updating of our applications is a bonus.

We are still in the early stages, could let you know in a couple months how it is going.

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

@jonathon2nd Thanks for the update. I've been meaning to try out ArgoCD, but haven't yet and keep this open in the futile hope that Rancher will fix this bug. Maybe I'll find some time over the holidays to try out ArgoCD.

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

bump

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

bump still broken 2.7.1

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

Bump. Still broken in v2.7.3

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

Bump. v2.7.5

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

ajacques commented Sep 25, 2023

Still broken in v2.7.8

@jonathon2nd
Copy link

Lmao just realized I never updated this. We have been using Argo-CD for all of our deployments without issue.

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@ajacques
Copy link
Author

Still broken 2.8.1

@ajacques
Copy link
Author

Not stale bump

@qaz-t
Copy link

qaz-t commented Mar 1, 2024

still happened in 2.8.2. I deploy helm chart: argocd-apps, parameters in chart is empty, and overwrite them in rancher ui.
when i upgrade values in chart
before:

applications:
  - source:
      helm:
        parameters:
          - name: na
            value: ''
          - name: nc
            value: ''

after:

applications:
  - source:
      helm:
        parameters:
          - name: na
            value: ''
          - name: nb
            value: ''
          - name: nc
            value: ''

rancher display values:
before:

applications:
  - source:
      helm:
        parameters:
          - name: na
            value: v1
          - name: nc
            value: v3

after:

applications:
  - source:
      helm:
        parameters:
          - name: na
            value: v1
          - name: nc
            value: v3
          - name: nc
            value: ''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants