Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't install prometheus-operator chart #285

Closed
geekflyer opened this issue Nov 19, 2018 · 2 comments
Closed

can't install prometheus-operator chart #285

geekflyer opened this issue Nov 19, 2018 · 2 comments
Assignees
Milestone

Comments

@geekflyer
Copy link

chart: stable/prometheus-operator
version: 0.1.22
values: {} (default)

Hi it seems impossible to install the prometheus operator chart - I'm getting multiple errors that won't go away even after multiple attempts:

code:

import { core, helm } from '@pulumi/kubernetes';
import { k8sProvider } from '../cluster';

const appName = 'prometheus';

const namespaceName = appName;

const namespace = new core.v1.Namespace(
  namespaceName,
  {
    metadata: { name: namespaceName }
  },
  { provider: k8sProvider }
);

new helm.v2.Chart(
  appName,
  {
    repo: 'stable',
    chart: appName + '-operator',
    namespace: namespaceName,
    version: ' 0.1.22 ',
    values: {}
  },
  { dependsOn: namespace, providers: { kubernetes: k8sProvider } }

error:

  kubernetes:core:Service (kube-system/prometheus-prometheus-oper-kube-etcd):
    error: Plan apply failed: 2 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-oper-kube-etcd'
    * Service does not target any Pods. Application Pods may failed to become alive, or field '.spec.selector' may not match labels on any Pods

  kubernetes:core:Service (kube-system/prometheus-prometheus-oper-kube-scheduler):
    error: Plan apply failed: 2 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-oper-kube-scheduler'
    * Service does not target any Pods. Application Pods may failed to become alive, or field '.spec.selector' may not match labels on any Pods

  kubernetes:core:Service (prometheus-prometheus-node-exporter):
    error: Plan apply failed: 2 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-node-exporter'
    * Service does not target any Pods. Application Pods may failed to become alive, or field '.spec.selector' may not match labels on any Pods

  kubernetes:monitoring.coreos.com:Alertmanager (prometheus-prometheus-oper-alertmanager):
    error: Plan apply failed: unable to fetch resource description for monitoring.coreos.com/v1: the server could not find the requested resource

  kubernetes:core:Service (prometheus-prometheus-oper-alertmanager):
    error: Plan apply failed: 2 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-oper-alertmanager'
    * Service does not target any Pods. Application Pods may failed to become alive, or field '.spec.selector' may not match labels on any Pods

  kubernetes:core:Service (kube-system/prometheus-prometheus-oper-kube-controller-manager):
    error: Plan apply failed: 2 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-oper-kube-controller-manager'
    * Service does not target any Pods. Application Pods may failed to become alive, or field '.spec.selector' may not match labels on any Pods

  kubernetes:core:Service (kube-system/prometheus-prometheus-oper-coredns):
    error: Plan apply failed: 2 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-oper-coredns'
    * Service does not target any Pods. Application Pods may failed to become alive, or field '.spec.selector' may not match labels on any Pods

  kubernetes:apps:Deployment (prometheus-prometheus-oper-operator):
    error: Plan apply failed: 3 errors occurred:

    * Resource operation was cancelled for 'prometheus-prometheus-oper-operator'
    * Minimum number of live Pods was not attained
    * 1 Pods failed to run because: [CrashLoopBackOff] Back-off 40s restarting failed container=prometheus-operator pod=prometheus-prometheus-oper-operator-6878755977-zbwqw_default(cf3964d8-ebda-11e8-90b8-42010a8a012d)

It seems that it attempts to create multiple services but no matching deployments / daemonsets.
I also got in some cases an error that "a resource does not specify a metadata.name" which I believe is probably related to the non-existence of the deployments.

@lukehoban lukehoban added this to the 0.19 milestone Nov 19, 2018
@hausdorff
Copy link
Contributor

hausdorff commented Nov 24, 2018

Summary: There are two issues here which are very likely to be Pulumi issues; one of them is fixed, and the other will be fixed soon. If possible, it would be great to have you run against PR #294 and see if that resolves those issues.

The rest of the issues are either unclear (i.e., I don't have the logs) or are likely-working-as-expected. I've provided some code that should fix those, too.

More detailed discussion below.


Pulumi issues

  • ConfigMapList, apparently, is allowed to not have a name at all, as the API server knows to flatten it out and instantiate only the ConfigMaps inside. I've started Don't require names for built-in Kubernetes list types #294 to try to fix, but I'm not yet confident it's the right approach, because those semantics do not appear to be captured inside the OpenAPI spec.
  • unable to fetch resource description for monitoring.coreos.com/v1: the server could not find the requested resource This should have been fixed in Fixes in how the provider handles CRDs and CRs #271. What version of @pulumi/kubernetes is inside your package.json?

Requires more info

  • kubernetes:apps:Deployment (prometheus-prometheus-oper-operator)

Looks like the Pod is crashing. Can you run kubectl logs on the operator Pod? The operator needs to know about AlertManager, so I'm guessing that's the error you'll find in the logs. If so, that should be resolved by #271 as well.

Likely working as expected

For the errors related to the following:

  • kubernetes:core:Service (kube-system/prometheus-prometheus-oper-kube-etcd)
  • kubernetes:core:Service (kube-system/prometheus-prometheus-oper-kube-scheduler)
  • kubernetes:core:Service (prometheus-prometheus-node-exporter)
  • kubernetes:core:Service (kube-system/prometheus-prometheus-oper-kube-controller-manager)
  • kubernetes:core:Service (kube-system/prometheus-prometheus-oper-coredns)

These Services target Pods which some cloud providers do not actually expose -- among them, I believe, is GKE. Our normal strategy is to upstream fixes to Helm Charts that have bugs, but my belief is that this is desired behavior, and a good set of defaults.

If you are on one of those cloud providers, you should be able to resolve these by changing your Chart definition to something like this (tested on Kubernetes v1.9.7, which is what I had lying around at the time):

new k8s.helm.v2.Chart(
    appName,
    {
        repo: "stable",
        chart: appName + "-operator",
        namespace: namespaceName,
        version: " 0.1.22 ",
        values: {
            kubeEtcd: { enabled: false },
            kubeScheduler: { enabled: false },
            kubeControllerManager: { enabled: false },
            coreDns: { enabled: false },
            // I needed this because GKE started k8s without `PodSecurityPolicy`, somehow?
            global: { rbac: { pspEnabled: false } }
        }
    },
    { dependsOn: namespace }
);

@hausdorff
Copy link
Contributor

Ok, talking to @geekflyer, I think this is likely to be solved -- I'll close for now. If you run into more issues, please feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants