Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API server normalization of values is causing a preview to show changes when one is not expected in testing. #644

Closed
metral opened this issue Jul 20, 2019 · 5 comments · Fixed by #649
Assignees
Labels
p1 A bug severe enough to be the next item assigned to an engineer
Milestone

Comments

@metral
Copy link
Contributor

metral commented Jul 20, 2019

This error showed up in CI cron jobs in the last 24 hours of pulumi/eks tests:

A deployment marked with resources 2048Mi is being normalized to 2Gi on the initial update, and then on the empty pulumi preview step it is trying to change it back to 2048Mi - this unexpected change causes the test to fail when it does not expect changes.

This occurs on pulumi preview right after an initial pulumi up of a Deployment using resources with 2048Mi notation.

metral@argon:~/temp-clusters/migrate-nodegroups-6$ pul preview --diff
Previewing update (test6):
  pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:test6::migrate-nodegroups::pulumi:pulumi:Stack::migrate-nodegroups-test6]
    ~ kubernetes:apps/v1:Deployment: (update)
        [id=apps-ql27r77a/nginx-ing-cntlr-tr0nrsi8]
        [urn=urn:pulumi:test6::migrate-nodegroups::kubernetes:apps/v1:Deployment::nginx-ing-cntlr]
        [provider=urn:pulumi:test6::migrate-nodegroups::eks:index:Cluster$pulumi:providers:kubernetes::migrate-nodegroups-provider::4316e425-4b56-4dd0-9d7e-78555a61b922]
      ~ spec: {
          ~ template: {
              ~ spec: {
                  ~ containers: [
                      ~ [0]: {
                              ~ resources: {
                                  ~ limits  : {
                                      ~ memory: "2Gi" => "2048Mi"
                                    }
                                  ~ requests: {
                                      ~ memory: "1Gi" => "1024Mi"
                                    }
                                }
                            }
                    ]
                }
            }
        }
Resources:
    ~ 1 to update
    84 unchanged

Versions:

  • pulumi/pulumi: 0.17.25
  • pulumi/kubernetes: 0.25.2
  • pulumi/eks: 0.18.10

See related Slack conversation about a similar type of error occurring in the aws-tf-provider a couple weeks back.

@hausdorff
Copy link
Contributor

hausdorff commented Jul 22, 2019

This regression seems very likely to cause a large number of spurious "always trigger rollout" changes until it is addressed, and users who encounter it are unlikely to know why this is happening, or how to mitigate. I'm going to mark it P1.

Talking to @pgavlin offline, our current understanding is that this diff occurs because we are asking the k8s API server for the live version of the object and computing the diff from that. Since the API server normalizes the memory values, it registers as changes to us.

This may not be the only contributing factor to this issue. I would have expected that this normalization would already trigger spurious diffs, since this normalization should be present also in the live version of the object that's handed back to us upon completion of the Create operation. My knowledge of the diff/create infrastructure of Pulumi might have atrophied, though, so I'd love to hear why that intuition is wrong.

Additional context. kubectl diff does not register this as a difference because it executes a complete dry-run, which normalizes the result, and computes the diff based on that result. #352 was filed to track this work, but it was not really relevant until we started computing diffs in this way.

The "permanent" solution is to implement preview and the diffing capabilities using dry-run, but this change will not be trivial and will probably take longer than a couple days to implement.

How do we want to proceed?

@hausdorff
Copy link
Contributor

hausdorff commented Jul 22, 2019

Regrouping with @pgavlin again, it turns out that we're not diffing against the live state from the API server—we've instead changed from diffing just the current and last inputs, to including the last live state (returned from, e.g., a Create) in the diffing operation. This accounts for my question about why we wouldn't have seen this before—Create does return the right thing, but it's not included in the previous way we computed diffs, so it got ignored until this change landed.

This all makes sense and I have no more questions. I think rolling back these changes doesn't "fix" the issue, it simply breaks it in a different way, so my advice is to just collaborate to fix it.

@pgavlin I am here to help, lmk what you need. I'll assign to you for now but come get me and we can pair (or whatever) until we get a fix.

@hausdorff
Copy link
Contributor

Recommend also that we transfer this to pulumi-kubernetes repository. The fix will be local to the kube provider.

@pgavlin
Copy link
Member

pgavlin commented Jul 22, 2019

Yes, I will transfer this issue.

@pgavlin pgavlin transferred this issue from pulumi/pulumi Jul 22, 2019
@pgavlin pgavlin changed the title Diff normalization on values is causing a preview to show changes when one is not expected in testing. API server normalization of values is causing a preview to show changes when one is not expected in testing. Jul 22, 2019
pgavlin added a commit that referenced this issue Jul 23, 2019
These changes update `Diff` to calculate the new state for the patched
resource by requesting that the API server dry-run the patch when
possible. This avoids unexpected diffs due to server-side normalization
of values that are present in both the resource's configuration and its
state.

Fixes #644.
pgavlin added a commit that referenced this issue Jul 24, 2019
These changes update `Diff` to calculate the new state for the patched
resource by requesting that the API server dry-run the patch when
possible. This avoids unexpected diffs due to server-side normalization
of values that are present in both the resource's configuration and its
state.

Fixes #644.
@joeduffy joeduffy added this to the 0.25 milestone Jul 24, 2019
@arnediekmann
Copy link

This is biting us as well for StatefulSets especially - is there a date when we can expect this change as part of a new release of @pulumi/pulumi-kubernetes?

@infin8x infin8x added the p1 A bug severe enough to be the next item assigned to an engineer label Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 A bug severe enough to be the next item assigned to an engineer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants