Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm Release shows spurious diff when inputs are unknown #2660

Closed
Tracked by #2823
ffMathy opened this issue Nov 9, 2023 · 25 comments · Fixed by #2822
Closed
Tracked by #2823

Helm Release shows spurious diff when inputs are unknown #2660

ffMathy opened this issue Nov 9, 2023 · 25 comments · Fixed by #2822
Assignees
Labels
area/helm customer/feedback Feedback from customers impact/usability Something that impacts users' ability to use the product easily and intuitively kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@ffMathy
Copy link

ffMathy commented Nov 9, 2023

What happened?

Getting the following for my Helm release:

~ kubernetes:helm.sh/v3:Release chart update [diff: +compat-allowNullValues,apiVersion,createNamespace,dependencyUpdate,description,devel,disableCRDHooks,disableOpenapiValidation,disableWebhooks,forceUpdate,keyring,kind,lint,postrender,recreatePods,renderSubchartNotes,replace,resetValues,resourceNames,reuseValues,skipAwait,skipCrds,verify,version,waitForJobs~values]

It always happens after doing a refresh and then an up. For those exact values.

Example

No example. Under NDA.

Output of pulumi about

CLI
Version 3.92.0
Go Version go1.21.3
Go Compiler gc

Host
OS debian
Version 12.2
Arch aarch64

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@ffMathy ffMathy added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 9, 2023
@mikhailshilkov
Copy link
Member

Hi @ffMathy

I'm still going to ask you to provide a repro example. Issues without a code snippet are not actionable for us. Maybe you can isolate a small program from your NDA-covered code and remove all proprietary details. Thank you for you understanding.

@mikhailshilkov mikhailshilkov added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Nov 10, 2023
@dirien
Copy link

dirien commented Nov 10, 2023

Hi @ffMathy,

will catch up on you with this and will create a example without the need of snippets of code from you!

@ffMathy
Copy link
Author

ffMathy commented Nov 10, 2023

Wow, that sounds great! Thank you.

@ffMathy
Copy link
Author

ffMathy commented Nov 22, 2023

@mjeffryes which commit was this fixed in? Or what release?

@mjeffryes
Copy link
Member

My apologies @ffMathy, was just grooming tickets that have been awaiting-feedback for more than 2 weeks; missed that the ball is actually in our court for this one!

@mjeffryes mjeffryes reopened this Nov 22, 2023
@mjeffryes mjeffryes assigned mjeffryes and dirien and unassigned mjeffryes Nov 22, 2023
@dirien
Copy link

dirien commented Nov 22, 2023

@mjeffryes we meet tomorrow with @ffMathy to have a deeper look into this!

@EronWright
Copy link
Contributor

EronWright commented Nov 22, 2023

I have a possible explanation, it is that some of the inputs - maybe one of the chart values - contain unknowns. In this case, the provider behaves differently and in a way that would produce the above diff. I would advocate for a fix to the Check logic to improve upon this.

To explain further, here's the Check logic that is evidently being skipped and producing a noisy diff:

if !news.ContainsUnknowns() {
logger.V(9).Infof("Decoding new release.")
new, err := decodeRelease(news, fmt.Sprintf("%s.news", label))
if err != nil {
return nil, err
}
err = r.setComputedInputs(ctx, urn, new)
if err != nil {
// setComputedInputs fails when the chart cannot be rendered, and we report it as a problem with the `chart` input.
failures = append(failures, &pulumirpc.CheckFailure{
Property: "chart",
Reason: fmt.Sprintf("%v; check the chart name and repository configuration.", err),
})
}
logger.V(9).Infof("New: %+v", new)
news = resource.NewPropertyMap(new)
}

Simply put, Check normally uses decodeRelease to transform the program inputs into planned resource state (as represented by the Release struct and in variable new), which Diff then compares to the old resource state. In the special case of news.ContainsUnknowns(), the transformation doesn't take place, and Diff then (wrongly) compares raw program inputs to old resource state. The clue for me was that the diff says +compat which refers to an obsolete resource property that exists in the SDK code (and is always true) but doesn't exist in the Release struct.

@ffMathy
Copy link
Author

ffMathy commented Nov 23, 2023

That sounds very plausible! None of the values that are in the diff are specified by us.

It be great if we could also seal this off with a unit test.

@dirien
Copy link

dirien commented Nov 23, 2023

@EronWright had the session with @ffMathy and this a code you can use, which is very very close to the setup of @ffMathy:

import * as pulumi from "@pulumi/pulumi";
import * as kubernetes from "@pulumi/kubernetes";
import * as command from "@pulumi/command";

const someNamespace = new kubernetes.core.v1.Namespace('some-namespace', {
    metadata: {
        name: 'some-namespace'
    }
});

const x = new command.local.Command('some-command', {
    update: "ls -la",
    create: "ls -la",
});

export const commandResult = x.stdout

new kubernetes.helm.v3.Release('some-chart', {
    chart: 'oci://ghcr.io/dirien/charts/minecraft-exporter',
    version: ' 0.11.1',
    name: 'some-stuff',
    timeout: 60 * 60 * 3,
    namespace: someNamespace.metadata.name,
    atomic: true,
    cleanupOnFail: true,
    description: x.stdout,
    values: {
        replicaCount: 1,
        dsd: 1,
        sds: 12,
    },
}, {
    customTimeouts: {
        create: '30m',
        update: '6h',
    },
});

This results in the pulumi preview to:

➜ pulumi preview
Previewing update (dev)

View in Browser (Ctrl+O): https://app.pulumi.com/dirien/lego-helm/dev/previews/26dabdc2-8f56-4736-b405-fdec73ce526c

     Type                              Name           Plan       Info
     pulumi:pulumi:Stack               lego-helm-dev             
 +   ├─ command:local:Command          some-command   create     
 ~   └─ kubernetes:helm.sh/v3:Release  some-chart     update     [diff: +compat-allowNullValues,apiVersion,checksum,createNamespace,dependencyUpdate,devel,disableCRDHooks,disableOpenapiValidation,disableWebhooks,forceUpdate,keyring,kind,lint,pos

Outputs:
  + commandResult: output<string>

Resources:
    + 1 to create
    ~ 1 to update
    2 changes. 2 unchanged

So using a resource output from a computed field as input in the Release object results in the situation @ffMathy reported!

@EronWright EronWright self-assigned this Nov 23, 2023
@mjeffryes mjeffryes removed the awaiting-feedback Blocked on input from the author label Nov 28, 2023
@mnlumi mnlumi added the customer/feedback Feedback from customers label Nov 29, 2023
@EronWright EronWright changed the title Getting a diff always Helm Release shows spurious diff when inputs are unknown Nov 29, 2023
@EronWright EronWright added the impact/usability Something that impacts users' ability to use the product easily and intuitively label Jan 11, 2024
@ffMathy
Copy link
Author

ffMathy commented Jan 17, 2024

Is there an ETA on this?

Right now, it causes our Pulumi to reprovision the full Helm release in production every time, leading to around 3 - 5 minutes of downtime per deploy.

@ffMathy
Copy link
Author

ffMathy commented Jan 19, 2024

CC @dirien. This is quite critical to us.

@mjeffryes mjeffryes added this to the 0.99 milestone Jan 26, 2024
@IdoOzeri
Copy link

CC @dirien. This is quite critical to us.

I second that. I'm currently encountering the same behavior.

@ffMathy
Copy link
Author

ffMathy commented Jan 30, 2024

Maybe we need to tag @mjeffryes to get an ETA instead. Not sure if this has been forgotten. At some point it seemed to be progressing, but now it seems to have stagnated.

@EronWright EronWright removed this from the 0.99 milestone Feb 2, 2024
@EronWright EronWright added this to the 0.100 milestone Feb 2, 2024
@EronWright
Copy link
Contributor

To provide an update, this is my current task and I expect to deliver a fix next week.

@EronWright
Copy link
Contributor

@ffMathy would you clarify what you think the expected behavior should be? In the repro case, the description property's value seems to be varying, and that would naturally cause a helm upgrade. Of course, there's a bug with the diff output because it implicates unrelated properties, and I intend to fix that. Imagine that was fixed; if the input is varying then you'd still see an upgrade. Thanks!

@ffMathy
Copy link
Author

ffMathy commented Feb 4, 2024

I'd just like to know which of the values are varying. Because right now it shouldn't deploy every time. At first glance, these values shouldn't change.

So fixing the diff could be enough. Then at least I'll understand the cause of it.

Could also be that logging some warnings could help. But that might create more confusion than clarity.

@EronWright
Copy link
Contributor

Brief update, this is still my main work task. Some framework code was first needed to make this issue be practical to solve. The root cause is that the handling of unknown inputs is very coarse-grained and further aggravated by a bug.

@ffMathy
Copy link
Author

ffMathy commented Feb 13, 2024

Interesting. Thanks for the updates. Please keep that coming! 😍

@EronWright
Copy link
Contributor

I posted a PR to solve the issue: #2822

@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Feb 16, 2024
@ffMathy
Copy link
Author

ffMathy commented Feb 16, 2024

Awesome work! How long does it usually take for it to be released after merge?

@EronWright
Copy link
Contributor

We plan to cut a new release on Tuesday.

@EronWright
Copy link
Contributor

The fix is now available in v4.8.0, enjoy!

@ffMathy
Copy link
Author

ffMathy commented Feb 24, 2024

Yay, thanks! Great work.

@IdoOzeri
Copy link

IdoOzeri commented Mar 26, 2024

I'd hate to spoil the party, but for me, this issue persists even in version 4.9.1:

args.autoscaler.values = pulumi.Output.all(
                autoscaler_role.role_arn,
                args.autoscaler.values
            ).apply(
                lambda output_args: populate_values_with_placeholders(
                    values=output_args[1],
                    placeholders={
                        '<role-arn>': output_args[0]
                    }
                )
            )

First time I'm running this, I'm getting the expected plan to install the cluster autoscaler.
Once applied, it's installed successfully.

Then, I change nothing and run pulumi up again, and this is constantly what I'm getting:

     pulumi:pulumi:Stack                  base-infra-dev                    
     └─ xm_k8s_installations:HelmRelease  cluster-installations             
 ~      └─ kubernetes:helm.sh/v3:Release  autoscaler             update     [diff: ~values]

Resources:
    ~ 1 to update
    102 unchanged

Do you want to perform this update? details
  pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:dev::base-infra::pulumi:pulumi:Stack::base-infra-dev]
            > kubernetes:core/v1:Service: (read)
                [id=ingress-nginx/ingress-nginx-controller]
                [urn=urn:pulumi:dev::base-infra::xm_k8s_installations:HelmRelease$kubernetes:helm.sh/v3:Release$kubernetes:core/v1:Service::ingress-nginx-service]
                [provider=urn:pulumi:dev::base-infra::xm_aws_eks:EksCluster:eks-dev$pulumi:providers:kubernetes::eks-dev_k8s_provider::83fde744-efb1-4eb0-9163-5bcdfc0403d0]
        ~ kubernetes:helm.sh/v3:Release: (update)
            [id=kube-system/autoscaler-2e4f430f]
            [urn=urn:pulumi:dev::base-infra::xm_k8s_installations:HelmRelease$kubernetes:helm.sh/v3:Release::autoscaler]
            [provider=urn:pulumi:dev::base-infra::xm_aws_eks:EksCluster:eks-dev$pulumi:providers:kubernetes::eks-dev_k8s_provider::83fde744-efb1-4eb0-9163-5bcdfc0403d0]
          - values: {
              - autoDiscovery   : {
                  - clusterName: "eks-dev"
                }
              - awsRegion       : "eu-west-1"
              - deployment      : {
                  - annotations: {
                      - cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
                    }
                }
              - extraArgs       : {
                  - balance-similar-node-groups: "true"
                  - skip-nodes-with-system-pods: "false"
                }
              - fullnameOverride: "cluster-autoscaler"
              - rbac            : {
                  - serviceAccount: {
                      - annotations: {
                          - eks.amazonaws.com/role-arn: "arn:aws:I am::<redacted>:role/cluster-node-autoscaler-6fac752"
                        }
                      - name       : "cluster-autoscaler"
                    }
                }
            }
          + values: output<string>

This is going to constantly offer a change, notice the output object reference at the end of the plan.
Thing is, when I apply the suggested changes - nothing is deployed, Pulumi realizes there's nothing to change, but it is still annoying and confusing to the end user.

Please advise,
Thanks

@mjeffryes
Copy link
Member

Hi @IdoOzeri, this issue is a few months old now, so your comment is likely to get lost here; I suggest opening a new issue and linking to this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm customer/feedback Feedback from customers impact/usability Something that impacts users' ability to use the product easily and intuitively kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants