Changes to 1st class provider inputs forcing recreation of provider and all dependent resources #2012

lukehoban · 2018-10-02T20:48:02Z

From @geekflyer on Slack:

The problem I’m suddenly facing is that when anything non-substantial in the cluster changes, pulumi also wants to recreate (replace) the k8s provider and ALL the resources that have been previously created by that provider. Non-substantial changes in my case for example is that one of the node-pools scaled down by 1 instance.

He is creating a new new k8s.Provider using code like below which .gets the kubeconfig off a GKE resource. This is causing even small changes to the GKE instance to trigger the K8s provider to get replaced, and in turn all the Kubernetes resources.

import * as gcp from '@pulumi/gcp';
import * as pulumi from '@pulumi/pulumi';
import * as k8s from '@pulumi/kubernetes';
import { Cluster } from '@pulumi/gcp/container';
import { defaultZone, defaultProject } from './gcp-constants';
import { memoize } from 'ramda';
import { packageConfig } from './misc';

const gcpConfig = new pulumi.Config('gcp');

// most of the functions in this module are memoized. This is to improve lookup speed in bigger pulumi programs like solvvy-apis and
// to avoid resource id conflicts in pulumi when importing the same cluster / resource multiple times.

export const lookupClusterByName = memoize(
  (clusterName: string): Cluster => {
    return Cluster.get(clusterName, clusterName, {
      name: clusterName,
      zone: gcpConfig.get('zone') || defaultZone,
      project: gcpConfig.get('project') || defaultProject
    });
  }
);

export function kubectlConfigFromGkeCluster(cluster: Cluster) {
  return pulumi.all([cluster.name, cluster.endpoint, cluster.masterAuth]).apply(([name, endpoint, auth]) => {
    const context = `${gcp.config.project}_${gcp.config.zone}_${name}`;
    return `apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${auth.clusterCaCertificate}
    server: https://${endpoint}
  name: ${context}
contexts:
- context:
    cluster: ${context}
    user: ${context}
  name: ${context}
current-context: ${context}
kind: Config
preferences: {}
users:
- name: ${context}
  user:
    auth-provider:
      config:
        cmd-args: config config-helper --format=json
        cmd-path: gcloud
        expiry-key: '{.credential.token_expiry}'
        token-key: '{.credential.access_token}'
      name: gcp
`;
  });
}

export const getK8sProviderByClusterName: (clusterName: string) => k8s.Provider = memoize((clusterName: string) => {
  return new k8s.Provider(clusterName, {
    kubeconfig: kubectlConfigFromGkeCluster(lookupClusterByName(clusterName))
  });
});

export const getK8sProviderByCluster = memoize((providerInstanceName: string, cluster: Cluster) => {
  return new k8s.Provider(providerInstanceName, {
    kubeconfig: kubectlConfigFromGkeCluster(cluster)
  });
});

export const getK8sProviderFromInferredCluster = memoize(() => {
  const clusterName = packageConfig.require('cluster');
  return new k8s.Provider(clusterName, {
    kubeconfig: kubectlConfigFromGkeCluster(lookupClusterByName(clusterName))
  });
});

The text was updated successfully, but these errors were encountered:

lukehoban · 2018-10-02T22:32:43Z

I've tried to reproduce this - but cannot trigger the reported replacements.

I used the above - along with this:

const provider = helper.getK8sProviderByClusterName("luke-cluster");

new k8s.core.v1.Pod("nginx", {
    spec: {
        containers: [
            { 
              name: "nginx",
              image: "nginx",  
            }
        ]
    }
}, { provider: provider });

This deploys fine. If I then make a change to my cluster - like decreasing the nodePool size - then I do see that diff as part of the next preview/update:

 ~ nodeCount        : "4" => "3"

But other than that no changes are triggered in my provider or in the Pod I have deployed to the cluster.

@geekflyer If you have a diff view of preview and/or update when you see this getting triggered - that might help identify what is going on here.

geekflyer · 2018-10-02T23:57:05Z

I just tried reproduce the behaviour just by only changing the node count but that indeed doesn't trigger an update.
I have one old diff that I'm a bit reluctant to paste here because it has lots of sensitive data, but I'm pasting a small subset here. I changed / anonymized the cluster name and project name in the pasted snippet.
Generally speaking the whole cluster shows up changed in that diff, but I saw the behaviour earlier when only a small subset of the cluster (incl. the node_count) had changed too - I just don't have that diff anymore. I think the behaviour I'm sometimes seeing might be also related to https://pulumi-community.slack.com/archives/C84L4E3N1/p1538365508000100 .

  >-gcp:container/cluster:Cluster: (read)
        [id=my-cluster]
        [urn=urn:pulumi:my-cluster::my-cluster::gcp:container/cluster:Cluster::my-cluster]
      * name   : "my-cluster"
      * project: "my-project"
      * zone   : "us-west1-c"

I'm suspecting a bit that overnight GKE might sometimes update some properties / (i.e. due to automatic maintenance) of the cluster which cause a recreation of the resource. Will continue to observe this behaviour and the next time I see that behaviour post the diff.

hazsetata · 2018-10-05T14:34:27Z

I had a very similar experience today with an Azure AKS cluster. The main difference is that I create the cluster itself also in Pulumi code.

Here are the parts that create the cluster and the provider:

const adApplication = new azure.ad.Application("BZV2");
export const adServicePrincipal = new azure.ad.ServicePrincipal("BZV2-SP", { applicationId: adApplication.applicationId });

const adServicePrincipalPassword = new azure.ad.ServicePrincipalPassword("BZV2-SP-Password", {
    servicePrincipalId: adServicePrincipal.id,
    value: config.servicePrincipalPassword,
    endDate: "2099-01-01T00:00:00Z",
});

export const k8sResourceGroup = new azure.core.ResourceGroup("BZV2-RG", {
    location: config.location,
});

export const k8sCluster = new azure.containerservice.KubernetesCluster("BZV2-K8S", {
    resourceGroupName: k8sResourceGroup.name,
    location: config.location,
    kubernetesVersion: config.kubernetesVersion,
    agentPoolProfile: {
        name: "bzagentpool",
        count: config.nodeCount,
        vmSize: config.nodeSize,
    },
    dnsPrefix: `${pulumi.getStack()}-bzcluster`,
    linuxProfile: {
        adminUsername: "adminuser",
        sshKeys: [{
            keyData: config.sshPublicKey,
        }],
    },
    servicePrincipal: {
        clientId: adApplication.applicationId,
        clientSecret: adServicePrincipalPassword.value,
    },
});

// Expose a K8s provider instance using our custom cluster instance.
export const k8sProvider = new k8s.Provider("BZV2-K8S-Provider", {
    kubeconfig: k8sCluster.kubeConfigRaw,
});

Then I use this provider to deploy all kinds of Kubernetes resources (cert-manager, heptio-contour, argo and own apps). I updated many things in the cluster (I'm on update 39) without any trouble.

Now once I changed the node count (pulumi config set nodeCount 6) and tried to run the update,
I get this kind of output (this is the only change in the code / config):

info: 50 changes previewed:
    ~ 3 resources to update
    +-47 resources to replace
      29 resources unchanged

Do you want to perform this update? details
* pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:bzdev::beyondzen::pulumi:pulumi:Stack::beyondzen-bzdev]
    ~ azure:containerservice/kubernetesCluster:KubernetesCluster: (update)
        [id=/subscriptions/<SUBSCRIPTION_ID>/resourcegroups/BZV2-rgb1b9bc11/providers/Microsoft.ContainerService/managedClusters/BZV2-k8s9032af03]
        [urn=urn:pulumi:bzdev::beyondzen::azure:containerservice/kubernetesCluster:KubernetesCluster::BZV2-K8S]
      ~ agentPoolProfile : {
          ~ count : 2 => 6
        }
    ++pulumi:providers:kubernetes: (create-replacement)
        [id=a9e97d3f-33a5-4618-9db6-a4e22821f00c]
        [urn=urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider]
      ~ kubeconfig: "apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: LS...\n" => computed<string>
    +-pulumi:providers:kubernetes: (replace)
        [id=a9e97d3f-33a5-4618-9db6-a4e22821f00c]
        [urn=urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider]
      ~ kubeconfig: "apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: LS...\n" => computed<string>
        ++kubernetes:core/v1:Namespace: (create-replacement)
            [id=cert-manager]
            [urn=urn:pulumi:bzdev::beyondzen::vtt:k8s:CertManagerDeployment$kubernetes:core/v1:Namespace::cert-manager-ns]
            [provider: urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::a9e97d3f-33a5-4618-9db6-a4e22821f00c => urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::computed<string>]
          * apiVersion: "v1"
          * kind      : "Namespace"
          * metadata  : {
              * name: "cert-manager"
            }
        +-kubernetes:core/v1:Namespace: (replace)
            [id=cert-manager]
            [urn=urn:pulumi:bzdev::beyondzen::vtt:k8s:CertManagerDeployment$kubernetes:core/v1:Namespace::cert-manager-ns]
            [provider: urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::a9e97d3f-33a5-4618-9db6-a4e22821f00c => urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::computed<string>]
          * apiVersion: "v1"
          * kind      : "Namespace"
          * metadata  : {
              * name: "cert-manager"
            }

And it continues on. The indicated change for the provider is the kubeconfig property. In all the other resources it is the provider property. The 3 resources it wants to only update are the Cluster itself, and two DNS records (all Azure components and not Kubernetes ones).

Pulumi CLI version: v0.15.4, @pulumi/pulumi: 0.15.4, @pulumi/azure: 0.15.2, @pulumi/kubernetes: 0.17.0

pgavlin · 2018-10-22T18:13:52Z

It turns out that this was an unfortunate but intentional decision, but that the decision should only impact previews.

When designing support for first-class providers, we debated over whether or not changes to the inputs for a provider should require replacing the provider (and thus its resources). We decided that this should be a decision that is left up to the provider itself, as only it knows whether or not a change to one of its inputs makes it impossible for the provider to manage any resources that depend upon it. The existing provider interface does not accommodate this sort of configuration diffing, so until we add that capability it is the responsibility of the engine.

We decided that the engine's configuration diff logic should require replacement if any property was unknown in order to indicate that we did not have enough information to decide what might happen and were making a conservative decision. To make matters more confusing, this situation is only possible during a preview, so we will never in fact replace the provider during an update. #2088 has changes to align the engine's behavior in both situations s.t. we will never indicate that a provider instance requires replacement.

lukehoban self-assigned this Oct 2, 2018

lukehoban added this to the 0.18 milestone Oct 2, 2018

lukehoban assigned pgavlin and unassigned lukehoban Oct 20, 2018

lukehoban modified the milestones: 0.18, 0.19 Oct 20, 2018

lukehoban added the customer/feedback label Oct 20, 2018

pgavlin mentioned this issue Oct 22, 2018

Do not propose replacement of providers in preview. #2088

Merged

pgavlin closed this as completed Oct 25, 2018

pgavlin mentioned this issue Nov 19, 2018

Step generator panics #2223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to 1st class provider inputs forcing recreation of provider and all dependent resources #2012

Changes to 1st class provider inputs forcing recreation of provider and all dependent resources #2012

lukehoban commented Oct 2, 2018

lukehoban commented Oct 2, 2018 •

edited

Loading

geekflyer commented Oct 2, 2018 •

edited

Loading

hazsetata commented Oct 5, 2018

pgavlin commented Oct 22, 2018

Changes to 1st class provider inputs forcing recreation of provider and all dependent resources #2012

Changes to 1st class provider inputs forcing recreation of provider and all dependent resources #2012

Comments

lukehoban commented Oct 2, 2018

lukehoban commented Oct 2, 2018 • edited Loading

geekflyer commented Oct 2, 2018 • edited Loading

hazsetata commented Oct 5, 2018

pgavlin commented Oct 22, 2018

lukehoban commented Oct 2, 2018 •

edited

Loading

geekflyer commented Oct 2, 2018 •

edited

Loading