Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headless Service for StatefulSet hangs on create/update "Finding Pods to direct traffic to" #1514

Closed
Kamaradeivanov opened this issue Apr 2, 2021 · 5 comments
Assignees
Labels
kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed

Comments

@Kamaradeivanov
Copy link

Hello, currently I cannot create a healdess service which i want to used from my statefullSet,
the issue #248 looks exactly the same. But was closed 2 years ago so it's probably another problem.

Expected behavior

Service headless was not blocking my statefullset deployment

Current behavior

When I update my Pulumi stack I have the following issue :

  kubernetes:core/v1:Service (loki-headless):
    error: 2 errors occurred:
        * the Kubernetes API server reported that "monitoring/loki-headless" failed to fully initialize or become live: Resource operation was cancelled for "loki-headless"
        * Service does not target any Pods. Selected Pods may not be ready, or field '.spec.selector' may not match labels on any Pods

Steps to reproduce

I try to upgrade Loki from version 2.1.0 to 2.2.0 and they move from deployment to statefullset.

I used helm to generate a staticfile :
helm install --dry-run promtail grafana/loki > loki.yaml
Then I clean the Helm stuff and used kube2pulumi to have a loki.ts file.
After some modification regarding my project I have something like that

import { apps, core, policy, Provider, rbac } from '@pulumi/kubernetes';

  const name = 'loki';
  const version = '2.2.0';
  const provider = new Provider('gke', {
    kubeconfig: `~/.kube/gke.config`,
  });

  const namespace = new core.v1.Namespace(
    name,
    {
      metadata: {
        name,
      },
    },
    { provider }
  );

  const matchLabels = {
    app: name,
    'app.kubernetes.io/component': name,
    'app.kubernetes.io/name': name,
    'app.kubernetes.io/part-of': 'monitoring',
  };

  const labels = {
    ...matchLabels,
    'app.kubernetes.io/version': version,
  };
  const configMetadata = {
    namespace: namespace.metadata.name,
    labels,
  };
  const metadata = {
    ...configMetadata,
    name,
  };

  const resourceOpts: any = {
    dependsOn: [namespace],
    provider,
  };

  const serviceAccount = new core.v1.ServiceAccount(
    name,
    {
      metadata,
    },
    resourceOpts
  );

  const podSecurityPolicy = new policy.v1beta1.PodSecurityPolicy(
    name,
    {
      metadata,
      spec: {
        privileged: false,
        allowPrivilegeEscalation: false,
        volumes: [
          'configMap',
          'emptyDir',
          'persistentVolumeClaim',
          'secret',
          'projected',
          'downwardAPI',
        ],
        hostNetwork: false,
        hostIPC: false,
        hostPID: false,
        runAsUser: {
          rule: 'MustRunAsNonRoot',
        },
        seLinux: {
          rule: 'RunAsAny',
        },
        supplementalGroups: {
          rule: 'MustRunAs',
          ranges: [
            {
              min: 1,
              max: 65535,
            },
          ],
        },
        fsGroup: {
          rule: 'MustRunAs',
          ranges: [
            {
              min: 1,
              max: 65535,
            },
          ],
        },
        readOnlyRootFilesystem: true,
        requiredDropCapabilities: ['ALL'],
      },
    },
    resourceOpts
  );

  const role = new rbac.v1.Role(
    name,
    {
      metadata,
      rules: [
        {
          apiGroups: ['extensions'],
          resources: ['podsecuritypolicies'],
          verbs: ['use'],
          resourceNames: [podSecurityPolicy.metadata.name],
        },
      ],
    },
    {
      ...resourceOpts,
      dependsOn: [podSecurityPolicy],
    }
  );

  const roleBinding = new rbac.v1.RoleBinding(
    name,
    {
      metadata,
      roleRef: {
        apiGroup: 'rbac.authorization.k8s.io',
        kind: 'Role',
        name: role.metadata.name,
      },
      subjects: [
        {
          kind: 'ServiceAccount',
          name: serviceAccount.metadata.name,
        },
      ],
    },
    {
      ...resourceOpts,
      dependsOn: [role, serviceAccount],
    }
  );

  const serviceHeadlessName = `${name}-headless`;
  const serviceHeadless = new core.v1.Service(
    serviceHeadlessName,
    {
      metadata: {
        ...configMetadata,
        name: serviceHeadlessName,
      },
      spec: {
        clusterIP: 'None',
        ports: [
          {
            name: 'http-metrics',
            port: 3100,
            protocol: 'TCP',
            targetPort: 'http-metrics',
          },
        ],
        selector: matchLabels,
      },
    },
    resourceOpts
  );

  const statefulSet = new apps.v1.StatefulSet(
    name,
    {
      metadata,
      spec: {
        podManagementPolicy: 'OrderedReady',
        replicas: 1,
        selector: {
          matchLabels,
        },
        serviceName: serviceHeadless.metadata.name,
        updateStrategy: {
          type: 'RollingUpdate',
        },
        template: {
          metadata: {
            labels,
            annotations: {
              'prometheus.io/port': 'http-metrics',
              'prometheus.io/scrape': 'true',
            },
          },
          spec: {
            containers: [
              {
                name,
                image: `grafana/loki:${version}`,
                imagePullPolicy: 'IfNotPresent',
                args: ['-config.file=/etc/loki/loki.yaml'],
                ports: [
                  {
                    containerPort: 3100,
                    name: 'http-metrics',
                    protocol: 'TCP',
                  },
                ],
                livenessProbe: {
                  httpGet: {
                    path: '/ready',
                    port: 'http-metrics',
                  },
                  initialDelaySeconds: 45,
                },
                readinessProbe: {
                  httpGet: {
                    path: '/ready',
                    port: 'http-metrics',
                  },
                  initialDelaySeconds: 45,
                },
                resources: {
                  limits: {
                    cpu: '500m',
                    memory: '500Mi',
                  },
                  requests: {
                    cpu: '100m',
                    memory: '100Mi',
                  },
                },
                securityContext: {
                  readOnlyRootFilesystem: true,
                },
                volumeMounts: [
                  {
                    mountPath: '/etc/loki',
                    name: 'config',
                  },
                ],
              },
            ],
            initContainers: [],
            securityContext: {
              fsGroup: 10001,
              runAsGroup: 10001,
              runAsNonRoot: true,
              runAsUser: 10001,
            },
            serviceAccountName: serviceAccount.metadata.name,
            terminationGracePeriodSeconds: 4800,
            tolerations: [
              {
                key: 'monitoring',
                operator: 'Exists',
                effect: 'NoSchedule',
              },
            ],
            volumes: [],
          },
        },
      },
    },
    resourceOpts
  );

  const podDisruptionBudget = new policy.v1beta1.PodDisruptionBudget(
    name,
    {
      metadata,
      spec: {
        selector: {
          matchLabels,
        },
        minAvailable: 1,
      },
    },
    {
      ...resourceOpts,
      dependsOn: [statefulSet],
    }
  );

  const service = new core.v1.Service(
    name,
    {
      metadata,
      spec: {
        ports: [
          {
            name: 'http-metrics',
            port: 3100,
            protocol: 'TCP',
            targetPort: 'http-metrics',
          },
        ],
        selector: matchLabels,
        type: 'ClusterIP',
      },
    },
    {
      ...resourceOpts,
      dependsOn: [statefulSet],
    }
  );

Context (Environment)

Kubernetes Server version : 1.19.8

Pulumi resources :

    "@pulumi/kubernetes": "^2.8.4",
    "@pulumi/pulumi": "^2.23.2",
@Kamaradeivanov Kamaradeivanov added the kind/bug Some behavior is incorrect or out of spec label Apr 2, 2021
@empperi
Copy link

empperi commented May 19, 2021

I want to add that I'm facing this very same issue but when setupping PostgreSQL into AKS via StatefulSet and Service. For some reason Service initialization gets stuck forever and it doesn't create it at all into Kubernetes. As described above, it gets stuck to "Finding pods to direct traffic to" and then it finally timeouts with identical error.

@viveklak
Copy link
Contributor

Could you try to reproduce this with 3.5.1+ version of the Kubernetes provider? You would have to make sure that a previous version of the provider is not being pulled in (e.g if the resource was created with an older provider version). Deleting the resources and creating with the latest Kubernetes provider would be one way to ensure this is the case. We have made several fixes in the await logic in #1647 which went out in v3.5.1.

@Kamaradeivanov
Copy link
Author

Indeed it work fine now with the package "@pulumi/kubernetes" in version 3.6.0.
Thank you.

I let you close the issue.

@lblackstone lblackstone added resolution/fixed This issue was fixed and removed kind/bug Some behavior is incorrect or out of spec labels Aug 19, 2021
@lblackstone lblackstone added the kind/bug Some behavior is incorrect or out of spec label Aug 19, 2021
@Iced-Sun
Copy link
Contributor

Iced-Sun commented Nov 20, 2021

With the latest version 3.10.1, I'm facing the same issue as well. The k8s provider version is v1.21.2-eks-06eac09.

import * as pulumi from '@pulumi/pulumi';
import * as k8s from '@pulumi/kubernetes';

/* resources */
const name = 'redis';
const service = new k8s.core.v1.Service(name, {
        spec: {
                clusterIP: 'None',
                selector: { app: name },
                ports: [{ port: 6379 }]
        }
});

@lblackstone
Copy link
Member

@Iced-Sun I'd guess that you are somehow pulling in an old version of the k8s provider that doesn't contain the fix. As a workaround, I'd suggest adding the pulumi.com/skipAwait: "true" annotation to your Service definition and running the update. Once that update completes successfully, you could test again without the annotation, or leave it if you don't need to wait on that Service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

5 participants