Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing cert-manager with CRDs using helm hangs pulumi #1222

Closed
ninja- opened this issue Jul 27, 2020 · 12 comments · Fixed by #1223
Closed

Installing cert-manager with CRDs using helm hangs pulumi #1222

ninja- opened this issue Jul 27, 2020 · 12 comments · Fixed by #1223
Assignees
Milestone

Comments

@ninja-
Copy link

ninja- commented Jul 27, 2020

Problem description

const certManagerNamespace = new kubernetes.core.v1.Namespace("certmanager", { metadata: { name: "cert-manager" } }, { provider });
const certmanager = new kubernetes.helm.v2.Chart("certmanager", {
  chart: "cert-manager",
  namespace: "cert-manager",
  fetchOpts: {
    repo: helmRepo.jetstack
  },
  values: {
    installCRDs: true,
  }
}, providers);

I used this setup for a while and it was just fine.
At some point problems started where it would just hang my pulumi up forever working on cert-manager.
(maybe it was "caused" by new helm release of cert-manager, since I didn't specify version).

I tried deleting CRDs and the namespace, then doing pulumi refresh, removing the above code from index.ts - and then my deploy worked just fine.
As I would try to apply that cert-manager code cleanly on the same cluster, it would keep hanging again.
I noticed that while it hangs it just keeps forever using 100% cpu on pulumi-kubernetes process...
doing strace -ff -p $PID on that processed showed just a spam of timer related syscalls, I saw no network or I/O activity...

I have some feeling that it's related to the CRDs and installing them seperately would solve the problem, but I haven't checked yet.
It might be related to finalizers as well, as patching or removing CRDs may cause finalizers to hang forever.
I tried deleting the CRDs once and it would hang unless I removed the finalizers first.
On a second try on clean setup, it would delete them just fine.
But I am not convinced it's even trying to patch these CRDs, but I am 100% sure it hangs while "working" on one of them.

Also I noticed that during deploy DigitalOcean's kubernetes API server got into super aggresive throttling mode, where it would start dropping connections before handshake even.
I am not sure yet what's the deal with throttling, waiting on ticket response.
It may be that pulumi was spamming the server all the time because of a bug, or it may be that their limits are just incorrectly set.
If you think throttling can cause pulumi-kubernetes to go into 100% cpu loop, maybe that's the bug here?

I tried running with maximum verbosity but nothing interesting found, except maybe serialization debugs containing the CRDs.

Errors & Logs

Affected product version(s)

Latest pulumi and kubernetes plugin.
DigitalOcean Kubernetes.
Latest helm3

Reproducing the issue

Suggestions for a fix

@ninja-
Copy link
Author

ninja- commented Jul 27, 2020

#1130 seems related - sometimes it would hang on planning, sometimes on applying, depended on messing with pulumi refresh and some related stuff

@ninja-
Copy link
Author

ninja- commented Jul 27, 2020

#963 #964 as well. but in this case problem doesnt seem to be on nodejs side but golang pulumi-kubernetes

@ninja-
Copy link
Author

ninja- commented Jul 27, 2020

In #1219 there's a hint that downgrading cert-manager to 0.15.2 may help - I will try that, but something is clearly wrong in pulumi-kubernetes process and verbose logging doesn't help much to know.

@ninja-
Copy link
Author

ninja- commented Jul 27, 2020

hmm ok I think I found the issue upstream here: https://cert-manager.io/docs/installation/upgrading/upgrading-0.15-0.16/
They say it would fail the same with helm upgrade and so far helm version with a fix was not released.

Kubernetes bug: kubernetes/kubernetes#91615

@XBeg9
Copy link

XBeg9 commented Jul 28, 2020

it's not only using helm, I got hang by doing this:

import * as k8s from "@pulumi/kubernetes";

export const certManager = new k8s.yaml.ConfigFile("cert-manager", {
  file:
    "https://github.com/jetstack/cert-manager/releases/download/v0.16.0/cert-manager.yaml",
});

@banerjeeip
Copy link

I'm encountering the same issue. Here is my code

from pulumi_kubernetes.yaml import ConfigFile

cert_manager = ConfigFile('cert_manager', 
'https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager-legacy.yaml')

@lblackstone
Copy link
Member

lblackstone commented Jul 31, 2020

After further investigation, this issue appears to be triggered by the v0.16.0 version of cert-manager. Their release notes indicate that the underlying issue is a bug in a dependency, and the fix is still pending.

Once the fix has merged, I'll update the Pulumi k8s provider's dependency. For now, I'd suggest sticking with a previous version of cert-manager.

Edit: In the interest of fixing this more quickly, I forked the upstream repo and applied the fix in the fork. I'll cut a release with the fix on Monday.

@brandon-martin-bcg
Copy link

brandon-martin-bcg commented Sep 4, 2020

After further investigation, this issue appears to be triggered by the v0.16.0 version of cert-manager. Their release notes indicate that the underlying issue is a bug in a dependency, and the fix is still pending.

Once the fix has merged, I'll update the Pulumi k8s provider's dependency. For now, I'd suggest sticking with a previous version of cert-manager.

Edit: In the interest of fixing this more quickly, I forked the upstream repo and applied the fix in the fork. I'll cut a release with the fix on Monday.

cert-manager v1.0.0 is still having various issues with pulumi.
Between random hangs, CRD int64/float64 conversion errors this is still very problematic. Using nodejs here.

@dotansimha
Copy link

Same here with v1.0.1 of cert-manager

@lblackstone
Copy link
Member

lblackstone commented Sep 23, 2020

I tested this again this morning with the latest k8s provider release (v2.6.1) and did not encounter the reported hangs. I expected this to be fixed by the changes in #1223, so can you verify that you're using a recent version of the provider?

Here's the code that deployed successfully for me:

import * as k8s from "@pulumi/kubernetes";

export const certManager = new k8s.yaml.ConfigFile("cert-manager", {
    file: "https://github.com/jetstack/cert-manager/releases/download/v1.0.1/cert-manager.yaml",
});

It also worked with a Helm deployment:

import * as k8s from "@pulumi/kubernetes";

const certManagerNamespace = new k8s.core.v1.Namespace("certmanager", { metadata: { name: "cert-manager" } } );
const certmanager = new k8s.helm.v3.Chart("certmanager", {
    chart: "cert-manager",
    namespace: "cert-manager",
    version: "1.0.1",
    fetchOpts: {
        repo: "https://charts.jetstack.io"
    },
    values: {
        installCRDs: true,
    }
});

@dotansimha
Copy link

dotansimha commented Sep 24, 2020

@lblackstone Tried multiple things, using latest versions of all tools (kubectl and pulumi). My cluster is in Azure.
Only solution I found is to deploy cert manager separately, without Pulumi, using a more performant cluster.

Changing the cluster size had some effect on installing cert-manager with Pulumi, but I still got it hanging forever (waited ~30 minutes, had just another service deployment in addition to the cert-manager).

Didn't want to give it more tries, since stopping the hanged running Pulumi process caused the state to be corrupted, and I needed to remove all cluster resources and then recreate all of them :(

@brandon-martin-bcg
Copy link

brandon-martin-bcg commented Sep 24, 2020

We actually just abandoned cert manager and went for Azure Front Door for certificate generation/ingress. Cert manager seems to be more trouble than it's worth, and tearing down clusters always causes pulumi stack issues when cert manager resources are present.
Azure Front Door isn't well documented for use with aks, but using static IPs as backends and setting up services to use those IPs with Load Balancers works well. Only downside is Front Door can't handle self signed backend certs, but has been much more reliable on the pulumi front.

Here's an example custom resource I'm using for Front Door

export interface frontDoorOpts extends pulumi.CustomResourceOptions  {
  name: string,
  port: number,
  https?: boolean,
  additionalRoutingRules?:pulumi.Input<azure.types.input.frontdoor.FrontdoorRoutingRule>[]
}

export class FrontDoor extends pulumi.ComponentResource {
  public readonly publicIp:azure.network.PublicIp;
  public readonly frontDoor:azure.frontdoor.Frontdoor;
  public readonly frontDoorHttps:azure.frontdoor.CustomHttpsConfiguration;
  public readonly frontDoorCustomHttps:azure.frontdoor.CustomHttpsConfiguration;
  public readonly dns:azure.dns.CNameRecord;
  
  constructor(name:string, opts:frontDoorOpts) {
    super("pkg:index:fd", name, {}, opts);
    
    this.publicIp = new azure.network.PublicIp(`${name}-public-ip`, {
      name: `${name}-ip`,
      resourceGroupName: resourceGroup.name,
      location: config.get('location'),
      allocationMethod: 'Static',
      sku: 'Standard',
      tags: globalTags
    }, { parent: this })

    this.dns = new azure.dns.CNameRecord(`${name}-dns`, {
      name: opts.name,
      zoneName: zone.name,
      resourceGroupName: zone.resourceGroupName,
      ttl: 300,
      record: `${name}-ingress.azurefd.net`
    }, {parent: this});
    
    this.frontDoor = new azure.frontdoor.Frontdoor(`${name}-frontdoor`, {
      name: `${name}-ingress`,
      resourceGroupName: resourceGroup.name,
      backendPools: [{
        name,
        loadBalancingName: name,
        healthProbeName: name,
        
        backends: [{
          address: this.publicIp.ipAddress,
          httpPort: opts.port,
          httpsPort: opts.port,
          hostHeader: `${opts.name}.${subdomain}`,
        }]
      }],
      frontendEndpoints: [{
        name,
        hostName: `${name}-ingress.azurefd.net`
      }, {
        name: `${name}custom`,
        hostName: `${opts.name}.${subdomain}`
      }],
      backendPoolHealthProbes: [{
        name,
        protocol: opts.https ? 'Https' : 'Http'
      }],
      backendPoolLoadBalancings: [{
        name,
      }],
      enforceBackendPoolsCertificateNameCheck: false,
      routingRules: [{
        name,
        acceptedProtocols: ['Https'],
        frontendEndpoints: [name, `${name}custom`],
        patternsToMatches: ['/*'],
        forwardingConfiguration: {
          forwardingProtocol: opts.https ? 'HttpsOnly' : 'HttpOnly',
          backendPoolName: name
        }
      }, {
        name: 'redirect',
        acceptedProtocols: ['Http'],
        frontendEndpoints: [name, `${name}custom`],
        patternsToMatches: ['/*'],
        redirectConfiguration: {
          redirectProtocol: 'HttpsOnly',
          redirectType: 'Moved'
        }
      },
      ...opts.additionalRoutingRules ? opts.additionalRoutingRules : []
    ],
    }, {parent: this, dependsOn: [this.dns, kubernetesCluster]});

    this.frontDoorHttps = new azure.frontdoor.CustomHttpsConfiguration(`${name}-https`, {
      frontendEndpointId: this.frontDoor.frontendEndpoints.apply(frontendEndpoints => frontendEndpoints[0].id || '/subscriptions/random-id/resourceGroups/fake-rg/providers/Microsoft.FrontDoor/frontendEndpoints/shi'),
      customHttpsProvisioningEnabled: true,
      resourceGroupName: resourceGroup.name,
      customHttpsConfiguration: {
        certificateSource: 'FrontDoor'
      }
    }, {parent: this, dependsOn: this.frontDoor})
    
    this.frontDoorCustomHttps = new azure.frontdoor.CustomHttpsConfiguration(`${name}-custom-https`, {
      frontendEndpointId: this.frontDoor.frontendEndpoints.apply(frontendEndpoints => frontendEndpoints[1].id || '/subscriptions/random-id/resourceGroups/fake-rg/providers/Microsoft.FrontDoor/frontendEndpoints/shi'),
      customHttpsProvisioningEnabled: true,
      resourceGroupName: resourceGroup.name,
      customHttpsConfiguration: {
        certificateSource: 'FrontDoor'
      }
    }, {parent: this, dependsOn: this.frontDoor});

    this.registerOutputs({
      dns: this.dns,
      frontDoor: this.frontDoor,
      frontDoorHttps: this.frontDoorHttps,
      frontDoorCustomHttps: this.frontDoorCustomHttps,
      publicIp: this.publicIp
    })
  }
}

Then I just pass the public IP created in this front door resource into my services

const frontDoor = new FrontDoor(`${name}-frontdoor`, {
      name: ingressName,
      port
    });

const service = new k8s.core.v1.Service(`${name}-app-service`, {
    metadata: {
      namespace,
      name,
      labels: {
        app: selector
      },
      annotations: serviceAnnotations
    },
    spec: {
      externalTrafficPolicy: ingressName ? 'Cluster' : 'Local',
      loadBalancerIP: frontDoor.publicIp.ipAddress,
      ports: [{
        port,
        protocol: 'TCP'
      }],
      selector: {
        app: selector
      },
      sessionAffinity: 'None',
      type: 'LoadBalancer'
    }
  }, { provider: kubernetesProvider, deleteBeforeReplace: true, dependsOn: serviceDependencies, customTimeouts: {
    create: '1h'
  } });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants