Wait for a CRD type to deploy before deploying resources that use the type #1117

ringerc · 2021-09-20T02:20:29Z

Presently, when kubectl is used to apply a large manifest that defines new custom resource definitions (CRDs) as well as resources that use the new resource kind, conditions can cause the deployment to fail. Assuming you're using kubectl apply -f - and an external kustomize you might see an error like:

unable to recognize "STDIN": no matches for kind "Alertmanager" in version "monitoring.coreos.com/v1"

(the exact resource "kind" and api "version" will vary depending on what you're deplying).

This appears to be a race between the k8s cluster applying the new CRD types and kustomize sending requests that use the new types, but there's no indication of that in the command's output. It's confusing for new users, and it's a hassle operationally since deployments will fail then work when re-tried. This is something the kubectl tool could help users with.

The --server-side option does not help, as the same race occurs then. And --wait=true only affects resource removal, not creation.

This can often be reproduced with a kind cluster, though it varies since it's a race. For example:

kind create cluster
git clone -b v0.8.0 https://github.com/prometheus-operator/kube-prometheus
kubectl apply -k kube-prometheus/

... which will often fail with:

daemonset.apps/node-exporter created
unable to recognize "kube-prometheus/": no matches for kind "Alertmanager" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
unable to recognize "kube-prometheus/": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"

but when the same command is repeated, it will succeed:

daemonset.apps/node-exporter unchanged
alertmanager.monitoring.coreos.com/main created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
...

There doesn't seem to be any (obvious) kubectl flag to impose a delay between requests, wait for a new resource to become visible before continuing, or retry a request if it fails because of a server-side error indicating something was missing.

The error message is confusing for new users and definitely does not help. A wording change and some context would help a lot. I raised that separately: #1118

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-09-20T02:20:35Z

@ringerc: This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ringerc · 2021-09-20T02:45:41Z

See related issue discussing retries: kubernetes/kubernetes#5762 (comment)

The server does not issue HTTP status code 429 here, presumably because it doesn't know there's a pending deployment that will create this resource.

ringerc · 2021-09-20T03:20:05Z

See related issue in kube-prometheus - but note that this is far from specific to kube-prometheus, it can affect anything where races exist between resource creation and resource use

prometheus-operator/prometheus-operator#1866

ringerc · 2021-09-20T04:35:04Z

A workaround is to use kfilt to deploy the CRDs first, then wait for them to become visible, then deploy the rest:

kustomize build somedir | kfilt -i kind=CustomResourceDefinition | kubectl apply -f -
kustomize build somedir | kfilt -i kind=CustomResourceDefinition | kubectl wait --for condition=established --timeout=60s -f -
kustomize build somedir | kubectl apply -f -

ringerc · 2021-09-23T06:28:02Z

See also

crd-install hook possible race condition helm/helm#4925
Improve Establishing logic for CRDs by implementing Installing status to prevent race conditions kubernetes#62725

k8s-triage-robot · 2021-12-22T06:28:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

eddiezane · 2022-01-05T20:05:23Z

Wait for a CRD type to deploy before deploying resources that use the type

This isn't something that we would implement for kubectl but we will investigate which part of apply can provide a better error. We'll handle that in #1118.

/close

k8s-ci-robot · 2022-01-05T20:05:43Z

@eddiezane: Closing this issue.

In response to this:

Wait for a CRD type to deploy before deploying resources that use the type

This isn't something that we would implement for kubectl but we will investigate which part of apply can provide a better error. We'll handle that in #1118.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

eddiezane · 2022-01-05T20:12:41Z

@KnVerey do you know if Kustomize has any hooks authors can use similar to how Helm handles CRD installation separately?

KnVerey · 2022-01-20T23:27:19Z

@KnVerey do you know if Kustomize has any hooks authors can use similar to how Helm handles CRD installation separately?

It does not. Kustomize is a purely client-side, so it has no deploy-related features.

ringerc · 2022-01-21T02:12:58Z

That's unfortunate, because it means basically every different user who wants robust deployment has to implement repetitive logic like:

kfilt -i kind=CustomResourceDefinition myconfig.yaml | kubectl apply -f -
kfilt -i kind=CustomResourceDefinition myconfig.yaml | kubectl wait --for condition=established --timeout=60s -f -
kfilt -i kind=Namespace myconfig.yaml | kubectl apply -f -
kfilt -i kind=Namespace myconfig.yaml | kubectl wait --for condition=established --timeout=60s -f -
kubectl apply -f myconfig.yaml

It's reasonable to expect the client sending data to kubectl to ensure that it is ordered correctly with CRDs, then namespaces, then other structure, such that dependencies are sensible.

But it's a pity that it's seemingly not practical for kubectl to ensure the requests apply correctly.

If waiting isn't viable, what about a --retry-delay '1s' --max-retry-count 5 for retrying individual requests?

tlandschoff-scale · 2022-08-26T16:30:57Z

Thank you so much for the kfilt example above. I am running into this in another way.

My vagrant provisioning step fails because k3s is installed but did not complete setting up traefik. So sometimes applying my resources failed with

default: error: unable to recognize "/vagrant/traefik_certificate.yaml": no matches for kind "TLSStore" in version "traefik.containo.us/v1alpha1"

So I can now fix this with

$ kubectl wait --for condition=established crd tlsstores.traefik.containo.us

Thanks! I agree that a more generic solution in kubectl would be great.

See: kubernetes/kubectl#1117 (comment)

ringerc added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 20, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 20, 2021

ringerc mentioned this issue Sep 20, 2021

Improve error message for missing resources (no matches for kind ... in version ...) #1118

Closed

ringerc mentioned this issue Sep 20, 2021

no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" prometheus-operator/prometheus-operator#1866

Closed

ringerc changed the title ~~Wait for a type to deploy before deploying resources that use the type~~ Wait for a CRD type to deploy before deploying resources that use the type Sep 20, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2021

k8s-ci-robot closed this as completed Jan 5, 2022

slopezz mentioned this issue May 16, 2022

Feat/Add redis twemproxy dashboard 3scale-ops/saas-operator#200

Merged

vassilvk mentioned this issue Oct 27, 2022

verbose debuging kubemod/kubemod#95

Closed

deejgregor added a commit to opennms-forge/onms-k8s-poc that referenced this issue Dec 15, 2022

Wait for CRDs to be active before we add manifests that depend on them

a395d94

See: kubernetes/kubectl#1117 (comment)

deejgregor added a commit to opennms-forge/onms-k8s-poc that referenced this issue Dec 15, 2022

Wait for CRDs to be active before we add manifests that depend on them

97a8988

See: kubernetes/kubectl#1117 (comment)

BaeKY mentioned this issue Jan 6, 2023

k8s 리소스 순서대로 배포하기 BaeKY/kube-lab#4

Closed

kimwnasptd mentioned this issue Jan 25, 2023

Manual installation instructions and Kustomize 4 kubeflow/manifests#2360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for a CRD type to deploy before deploying resources that use the type #1117

Wait for a CRD type to deploy before deploying resources that use the type #1117

ringerc commented Sep 20, 2021 •

edited

k8s-ci-robot commented Sep 20, 2021

ringerc commented Sep 20, 2021

ringerc commented Sep 20, 2021

ringerc commented Sep 20, 2021

ringerc commented Sep 23, 2021

k8s-triage-robot commented Dec 22, 2021

eddiezane commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

eddiezane commented Jan 5, 2022

KnVerey commented Jan 20, 2022

ringerc commented Jan 21, 2022

tlandschoff-scale commented Aug 26, 2022

Wait for a CRD type to deploy before deploying resources that use the type #1117

Wait for a CRD type to deploy before deploying resources that use the type #1117

Comments

ringerc commented Sep 20, 2021 • edited

k8s-ci-robot commented Sep 20, 2021

ringerc commented Sep 20, 2021

ringerc commented Sep 20, 2021

ringerc commented Sep 20, 2021

ringerc commented Sep 23, 2021

k8s-triage-robot commented Dec 22, 2021

eddiezane commented Jan 5, 2022

k8s-ci-robot commented Jan 5, 2022

eddiezane commented Jan 5, 2022

KnVerey commented Jan 20, 2022

ringerc commented Jan 21, 2022

tlandschoff-scale commented Aug 26, 2022

ringerc commented Sep 20, 2021 •

edited