Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to 1st class provider inputs forcing recreation of provider and all dependent resources #2012

Closed
lukehoban opened this issue Oct 2, 2018 · 4 comments
Assignees
Milestone

Comments

@lukehoban
Copy link
Member

From @geekflyer on Slack:

The problem I’m suddenly facing is that when anything non-substantial in the cluster changes, pulumi also wants to recreate (replace) the k8s provider and ALL the resources that have been previously created by that provider. Non-substantial changes in my case for example is that one of the node-pools scaled down by 1 instance.

He is creating a new new k8s.Provider using code like below which .gets the kubeconfig off a GKE resource. This is causing even small changes to the GKE instance to trigger the K8s provider to get replaced, and in turn all the Kubernetes resources.

import * as gcp from '@pulumi/gcp';
import * as pulumi from '@pulumi/pulumi';
import * as k8s from '@pulumi/kubernetes';
import { Cluster } from '@pulumi/gcp/container';
import { defaultZone, defaultProject } from './gcp-constants';
import { memoize } from 'ramda';
import { packageConfig } from './misc';

const gcpConfig = new pulumi.Config('gcp');

// most of the functions in this module are memoized. This is to improve lookup speed in bigger pulumi programs like solvvy-apis and
// to avoid resource id conflicts in pulumi when importing the same cluster / resource multiple times.

export const lookupClusterByName = memoize(
  (clusterName: string): Cluster => {
    return Cluster.get(clusterName, clusterName, {
      name: clusterName,
      zone: gcpConfig.get('zone') || defaultZone,
      project: gcpConfig.get('project') || defaultProject
    });
  }
);

export function kubectlConfigFromGkeCluster(cluster: Cluster) {
  return pulumi.all([cluster.name, cluster.endpoint, cluster.masterAuth]).apply(([name, endpoint, auth]) => {
    const context = `${gcp.config.project}_${gcp.config.zone}_${name}`;
    return `apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${auth.clusterCaCertificate}
    server: https://${endpoint}
  name: ${context}
contexts:
- context:
    cluster: ${context}
    user: ${context}
  name: ${context}
current-context: ${context}
kind: Config
preferences: {}
users:
- name: ${context}
  user:
    auth-provider:
      config:
        cmd-args: config config-helper --format=json
        cmd-path: gcloud
        expiry-key: '{.credential.token_expiry}'
        token-key: '{.credential.access_token}'
      name: gcp
`;
  });
}

export const getK8sProviderByClusterName: (clusterName: string) => k8s.Provider = memoize((clusterName: string) => {
  return new k8s.Provider(clusterName, {
    kubeconfig: kubectlConfigFromGkeCluster(lookupClusterByName(clusterName))
  });
});

export const getK8sProviderByCluster = memoize((providerInstanceName: string, cluster: Cluster) => {
  return new k8s.Provider(providerInstanceName, {
    kubeconfig: kubectlConfigFromGkeCluster(cluster)
  });
});

export const getK8sProviderFromInferredCluster = memoize(() => {
  const clusterName = packageConfig.require('cluster');
  return new k8s.Provider(clusterName, {
    kubeconfig: kubectlConfigFromGkeCluster(lookupClusterByName(clusterName))
  });
});
@lukehoban lukehoban self-assigned this Oct 2, 2018
@lukehoban lukehoban added this to the 0.18 milestone Oct 2, 2018
@lukehoban
Copy link
Member Author

lukehoban commented Oct 2, 2018

I've tried to reproduce this - but cannot trigger the reported replacements.

I used the above - along with this:

const provider = helper.getK8sProviderByClusterName("luke-cluster");

new k8s.core.v1.Pod("nginx", {
    spec: {
        containers: [
            { 
              name: "nginx",
              image: "nginx",  
            }
        ]
    }
}, { provider: provider });

This deploys fine. If I then make a change to my cluster - like decreasing the nodePool size - then I do see that diff as part of the next preview/update:

 ~ nodeCount        : "4" => "3"

But other than that no changes are triggered in my provider or in the Pod I have deployed to the cluster.

@geekflyer If you have a diff view of preview and/or update when you see this getting triggered - that might help identify what is going on here.

@geekflyer
Copy link

geekflyer commented Oct 2, 2018

I just tried reproduce the behaviour just by only changing the node count but that indeed doesn't trigger an update.
I have one old diff that I'm a bit reluctant to paste here because it has lots of sensitive data, but I'm pasting a small subset here. I changed / anonymized the cluster name and project name in the pasted snippet.
Generally speaking the whole cluster shows up changed in that diff, but I saw the behaviour earlier when only a small subset of the cluster (incl. the node_count) had changed too - I just don't have that diff anymore. I think the behaviour I'm sometimes seeing might be also related to https://pulumi-community.slack.com/archives/C84L4E3N1/p1538365508000100 .

  >-gcp:container/cluster:Cluster: (read)
        [id=my-cluster]
        [urn=urn:pulumi:my-cluster::my-cluster::gcp:container/cluster:Cluster::my-cluster]
      * name   : "my-cluster"
      * project: "my-project"
      * zone   : "us-west1-c"

I'm suspecting a bit that overnight GKE might sometimes update some properties / (i.e. due to automatic maintenance) of the cluster which cause a recreation of the resource. Will continue to observe this behaviour and the next time I see that behaviour post the diff.

@hazsetata
Copy link

I had a very similar experience today with an Azure AKS cluster. The main difference is that I create the cluster itself also in Pulumi code.

Here are the parts that create the cluster and the provider:

const adApplication = new azure.ad.Application("BZV2");
export const adServicePrincipal = new azure.ad.ServicePrincipal("BZV2-SP", { applicationId: adApplication.applicationId });

const adServicePrincipalPassword = new azure.ad.ServicePrincipalPassword("BZV2-SP-Password", {
    servicePrincipalId: adServicePrincipal.id,
    value: config.servicePrincipalPassword,
    endDate: "2099-01-01T00:00:00Z",
});

export const k8sResourceGroup = new azure.core.ResourceGroup("BZV2-RG", {
    location: config.location,
});

export const k8sCluster = new azure.containerservice.KubernetesCluster("BZV2-K8S", {
    resourceGroupName: k8sResourceGroup.name,
    location: config.location,
    kubernetesVersion: config.kubernetesVersion,
    agentPoolProfile: {
        name: "bzagentpool",
        count: config.nodeCount,
        vmSize: config.nodeSize,
    },
    dnsPrefix: `${pulumi.getStack()}-bzcluster`,
    linuxProfile: {
        adminUsername: "adminuser",
        sshKeys: [{
            keyData: config.sshPublicKey,
        }],
    },
    servicePrincipal: {
        clientId: adApplication.applicationId,
        clientSecret: adServicePrincipalPassword.value,
    },
});

// Expose a K8s provider instance using our custom cluster instance.
export const k8sProvider = new k8s.Provider("BZV2-K8S-Provider", {
    kubeconfig: k8sCluster.kubeConfigRaw,
});

Then I use this provider to deploy all kinds of Kubernetes resources (cert-manager, heptio-contour, argo and own apps). I updated many things in the cluster (I'm on update 39) without any trouble.

Now once I changed the node count (pulumi config set nodeCount 6) and tried to run the update,
I get this kind of output (this is the only change in the code / config):

info: 50 changes previewed:
    ~ 3 resources to update
    +-47 resources to replace
      29 resources unchanged

Do you want to perform this update? details
* pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:bzdev::beyondzen::pulumi:pulumi:Stack::beyondzen-bzdev]
    ~ azure:containerservice/kubernetesCluster:KubernetesCluster: (update)
        [id=/subscriptions/<SUBSCRIPTION_ID>/resourcegroups/BZV2-rgb1b9bc11/providers/Microsoft.ContainerService/managedClusters/BZV2-k8s9032af03]
        [urn=urn:pulumi:bzdev::beyondzen::azure:containerservice/kubernetesCluster:KubernetesCluster::BZV2-K8S]
      ~ agentPoolProfile : {
          ~ count : 2 => 6
        }
    ++pulumi:providers:kubernetes: (create-replacement)
        [id=a9e97d3f-33a5-4618-9db6-a4e22821f00c]
        [urn=urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider]
      ~ kubeconfig: "apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: LS...\n" => computed<string>
    +-pulumi:providers:kubernetes: (replace)
        [id=a9e97d3f-33a5-4618-9db6-a4e22821f00c]
        [urn=urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider]
      ~ kubeconfig: "apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: LS...\n" => computed<string>
        ++kubernetes:core/v1:Namespace: (create-replacement)
            [id=cert-manager]
            [urn=urn:pulumi:bzdev::beyondzen::vtt:k8s:CertManagerDeployment$kubernetes:core/v1:Namespace::cert-manager-ns]
            [provider: urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::a9e97d3f-33a5-4618-9db6-a4e22821f00c => urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::computed<string>]
          * apiVersion: "v1"
          * kind      : "Namespace"
          * metadata  : {
              * name: "cert-manager"
            }
        +-kubernetes:core/v1:Namespace: (replace)
            [id=cert-manager]
            [urn=urn:pulumi:bzdev::beyondzen::vtt:k8s:CertManagerDeployment$kubernetes:core/v1:Namespace::cert-manager-ns]
            [provider: urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::a9e97d3f-33a5-4618-9db6-a4e22821f00c => urn:pulumi:bzdev::beyondzen::pulumi:providers:kubernetes::BZV2-K8S-Provider::computed<string>]
          * apiVersion: "v1"
          * kind      : "Namespace"
          * metadata  : {
              * name: "cert-manager"
            }

And it continues on. The indicated change for the provider is the kubeconfig property. In all the other resources it is the provider property. The 3 resources it wants to only update are the Cluster itself, and two DNS records (all Azure components and not Kubernetes ones).

Pulumi CLI version: v0.15.4, @pulumi/pulumi: 0.15.4, @pulumi/azure: 0.15.2, @pulumi/kubernetes: 0.17.0

@lukehoban lukehoban assigned pgavlin and unassigned lukehoban Oct 20, 2018
@lukehoban lukehoban modified the milestones: 0.18, 0.19 Oct 20, 2018
@pgavlin
Copy link
Member

pgavlin commented Oct 22, 2018

It turns out that this was an unfortunate but intentional decision, but that the decision should only impact previews.

When designing support for first-class providers, we debated over whether or not changes to the inputs for a provider should require replacing the provider (and thus its resources). We decided that this should be a decision that is left up to the provider itself, as only it knows whether or not a change to one of its inputs makes it impossible for the provider to manage any resources that depend upon it. The existing provider interface does not accommodate this sort of configuration diffing, so until we add that capability it is the responsibility of the engine.

We decided that the engine's configuration diff logic should require replacement if any property was unknown in order to indicate that we did not have enough information to decide what might happen and were making a conservative decision. To make matters more confusing, this situation is only possible during a preview, so we will never in fact replace the provider during an update. #2088 has changes to align the engine's behavior in both situations s.t. we will never indicate that a provider instance requires replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants