Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crd-install hook possible race condition #4925

Closed
vsliouniaev opened this issue Nov 15, 2018 · 3 comments · Fixed by #5112
Closed

crd-install hook possible race condition #4925

vsliouniaev opened this issue Nov 15, 2018 · 3 comments · Fixed by #5112

Comments

@vsliouniaev
Copy link

vsliouniaev commented Nov 15, 2018

While installing the stable/prometheus-operator chart, which creates 4 CRDs using the crd-install hooks, it is possible for the CRDs to be created but the validation fail anyway with

Error: validation failed: [unable to recognize "": no matches for kind "Alertmanager" in version
"monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version 
"monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version 
. . .

After seeing this error, repeatedly hitting kubectl get crd | grep coreos will show that at first there are <4 CRDs but eventually all 4 are created.

This behaviour is absolutely consistent on a specific cluster

  • if it occurs it will continue to behave this way in the cluster
  • if it does not occur then it cannot be reproduced on that cluster

There are multiple reports of this happening with this chart but in most cases there are no issues, including the helm/charts CI process.

Issue on helm/charts: helm/charts#9241


Can be broken

Confirmed:

helm:
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}

kubectl:
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.4", GitCommit:"bf9a868e8ea3d3a8fa53cbb22f566771b3f8068b", GitTreeState:"clean", BuildDate:"2018-10-25T19:06:30Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Reported:
helm: 2.10, 2.11
kubectl: 1.11.2, 1.11.3, 1.11.4

Can work

Confirmed:

helm:
Client: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.10.0", GitCommit:"9ad53aac42165a5fadc6c87be0dea6b115f93090", GitTreeState:"clean"}

kubectl:
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
helm:
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

kubectl:
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:43:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):

  • Azure
  • possibly others
@mortent
Copy link

mortent commented Nov 16, 2018

Do you have the logs from Tiller when this happens? I'm unable to reproduce this on my gke cluster.

Also, are you able to reproduce it multiple times in the same cluster if you delete all the CRDs belonging to the chart before trying again?

@vsliouniaev
Copy link
Author

vsliouniaev commented Nov 16, 2018

are you able to reproduce it multiple times in the same cluster if you delete all the CRDs belonging to the chart before trying again?

Yes, absolutely.

Do you have the logs from Tiller when this happens? I'm unable to reproduce this on my gke cluster.

I don't have tiller logs right this second but I'll try to get them for you.

@rwaffen
Copy link

rwaffen commented Nov 22, 2018

tried to install prometheus with helm.
as mentioned in helm/charts#9241 i post the logs

helm install \
    --name prometheus \
    --namespace monitoring \
    -f custom-values.yaml \
    stable/prometheus

custom-values.yaml

server:
  ingress:
    enabled: true
    hosts:
      - prometheus-stage.k8s.example.com
[tiller] 2018/11/22 15:48:12 preparing install for prometheus
[storage] 2018/11/22 15:48:12 getting release history for "prometheus"
[tiller] 2018/11/22 15:48:12 rendering prometheus-operator chart using values
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/psp-clusterrole.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/servicemonitor.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/serviceaccount.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/secret.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/psp-clusterrolebinding.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/exporters/kube-dns/service.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/alertmanager.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/exporters/kube-etcd/endpoints.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/exporters/kube-dns/servicemonitor.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/ingress.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus/additionalAlertmanagerConfigs.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus/additionalAlertRelabelConfigs.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus/additionalScrapeConfigs.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/psp.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus/ingress.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus-operator/cleanup-crds.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus/servicemonitors.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/podDisruptionBudget.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/charts/grafana/templates/dashboards-json-configmap.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/charts/grafana/templates/pvc.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/prometheus/podDisruptionBudget.yaml" is empty. Skipping.
2018/11/22 15:48:12 info: manifest "prometheus-operator/templates/alertmanager/service.yaml" is empty. Skipping.
[tiller] 2018/11/22 15:48:12 performing install for prometheus
[tiller] 2018/11/22 15:48:12 executing 4 crd-install hooks for prometheus
[tiller] 2018/11/22 15:48:12 deleting crd-install hook alertmanagers.monitoring.coreos.com for release prometheus due to "before-hook-creation" policy
[kube] 2018/11/22 15:48:12 Starting delete for "alertmanagers.monitoring.coreos.com" CustomResourceDefinition
[kube] 2018/11/22 15:48:12 customresourcedefinitions.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" not found
[kube] 2018/11/22 15:48:12 building resources from manifest
[kube] 2018/11/22 15:48:12 creating 1 resource(s)
[tiller] 2018/11/22 15:48:12 deleting crd-install hook prometheuses.monitoring.coreos.com for release prometheus due to "before-hook-creation" policy
[kube] 2018/11/22 15:48:12 Starting delete for "prometheuses.monitoring.coreos.com" CustomResourceDefinition
[kube] 2018/11/22 15:48:12 customresourcedefinitions.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" not found
[kube] 2018/11/22 15:48:12 building resources from manifest
[kube] 2018/11/22 15:48:12 creating 1 resource(s)
[tiller] 2018/11/22 15:48:13 deleting crd-install hook prometheusrules.monitoring.coreos.com for release prometheus due to "before-hook-creation" policy
[kube] 2018/11/22 15:48:13 Starting delete for "prometheusrules.monitoring.coreos.com" CustomResourceDefinition
[kube] 2018/11/22 15:48:13 customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" not found
[kube] 2018/11/22 15:48:13 building resources from manifest
[kube] 2018/11/22 15:48:13 creating 1 resource(s)
[tiller] 2018/11/22 15:48:13 deleting crd-install hook servicemonitors.monitoring.coreos.com for release prometheus due to "before-hook-creation" policy
[kube] 2018/11/22 15:48:13 Starting delete for "servicemonitors.monitoring.coreos.com" CustomResourceDefinition
[kube] 2018/11/22 15:48:13 customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" not found
[kube] 2018/11/22 15:48:13 building resources from manifest
[kube] 2018/11/22 15:48:13 creating 1 resource(s)
[tiller] 2018/11/22 15:48:13 hooks complete for crd-install prometheus
[tiller] 2018/11/22 15:48:13 failed install perform step: validation failed: [unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]

kubectl: v1.12.2
k8s: v1.11.3
helm client: v2.11.0
helm server: v2.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants