Skip to content

CR creation failing with "Failed calling webhook [...] connection refused" for some time #102013

@Fabian-K

Description

@Fabian-K

What happened:

I´m deploying tekton-pipeline to a fresh kind cluster. tekton-pipeline defines a Task CRD and registers for the CRD
a MutatingWebhookConfiguration together with the matching webhook Deployment and Service. After the deployment,
I´m waiting for the webhook pod to be ready. Next, I´m trying to create a Task instance. This fails for some time
with Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused until it eventually works and the Task instance is created.

What you expected to happen:

Once the webhook pod signals readiness, creating a new CR instance should work.

How to reproduce it (as minimally and precisely as possible):

I´m using the following script to reproduce it:

echo "Creating cluster at $(date +"%T")"

kind create cluster

echo "Waiting for api-server pod at $(date +"%T")"

kubectl wait --namespace kube-system \
  --for=condition=ready pod \
  --selector=component=kube-apiserver \
  --timeout=120s

echo "Waiting for controller-manager pod at $(date +"%T")"

kubectl wait --namespace kube-system \
  --for=condition=ready pod \
  --selector=component=kube-controller-manager \
  --timeout=120s

echo "Waiting for core-dns pod at $(date +"%T")"

kubectl wait --namespace kube-system \
  --for=condition=ready pod \
  --selector=k8s-app=kube-dns \
  --timeout=120s

echo "Apply tekton pipelines at $(date +"%T")"

kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.23.0/release.yaml

echo "Waiting for webhook pod at $(date +"%T")"

kubectl wait --namespace tekton-pipelines \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=webhook \
  --timeout=120s

echo "Webhook ready at $(date +"%T")"

while :
do
  echo "Apply task at $(date +"%T")"
  kubectl apply -f task.yaml
	sleep 0.5s
done

Used task.yaml

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: example-task-name
spec:
  params:
    - name: pathToDockerFile
      type: string
      description: The path to the dockerfile to build
      default: /workspace/workspace/Dockerfile
  resources:
    inputs:
      - name: workspace
        type: git
    outputs:
      - name: builtImage
        type: image
  steps:
    - name: ubuntu-example
      image: ubuntu
      args: [ "ubuntu-build-example", "SECRETS-example.md" ]
    - image: gcr.io/example-builders/build-example
      command: [ "echo" ]
      args: [ "$(params.pathToDockerFile)" ]
    - name: dockerfile-pushexample
      image: gcr.io/example-builders/push-example
      args: [ "push", "$(resources.outputs.builtImage.url)" ]
      volumeMounts:
        - name: docker-socket-example
          mountPath: /var/run/docker.sock
  volumes:
    - name: example-volume
      emptyDir: { }

Anything else we need to know?:

Sample Execution Output:

Creating cluster at 11:27:27
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.20.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
Waiting for api-server pod at 11:28:17
pod/kube-apiserver-kind-control-plane condition met
Waiting for controller-manager pod at 11:28:25
pod/kube-controller-manager-kind-control-plane condition met
Waiting for core-dns pod at 11:29:37
pod/coredns-74ff55c5b-bx8bl condition met
pod/coredns-74ff55c5b-x2w2p condition met
Apply tekton pipelines at 11:29:37
namespace/tekton-pipelines created
podsecuritypolicy.policy/tekton-pipelines created
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-controller-cluster-access created
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-controller-tenant-access created
clusterrole.rbac.authorization.k8s.io/tekton-pipelines-webhook-cluster-access created
role.rbac.authorization.k8s.io/tekton-pipelines-controller created
role.rbac.authorization.k8s.io/tekton-pipelines-webhook created
role.rbac.authorization.k8s.io/tekton-pipelines-leader-election created
serviceaccount/tekton-pipelines-controller created
serviceaccount/tekton-pipelines-webhook created
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller-cluster-access created
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller-tenant-access created
clusterrolebinding.rbac.authorization.k8s.io/tekton-pipelines-webhook-cluster-access created
Warning: rbac.authorization.k8s.io/v1beta1 RoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 RoleBinding
rolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller created
rolebinding.rbac.authorization.k8s.io/tekton-pipelines-webhook created
rolebinding.rbac.authorization.k8s.io/tekton-pipelines-controller-leaderelection created
rolebinding.rbac.authorization.k8s.io/tekton-pipelines-webhook-leaderelection created
customresourcedefinition.apiextensions.k8s.io/clustertasks.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/conditions.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/pipelines.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/pipelineruns.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/pipelineresources.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/runs.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/tasks.tekton.dev created
customresourcedefinition.apiextensions.k8s.io/taskruns.tekton.dev created
secret/webhook-certs created
validatingwebhookconfiguration.admissionregistration.k8s.io/validation.webhook.pipeline.tekton.dev created
mutatingwebhookconfiguration.admissionregistration.k8s.io/webhook.pipeline.tekton.dev created
validatingwebhookconfiguration.admissionregistration.k8s.io/config.webhook.pipeline.tekton.dev created
clusterrole.rbac.authorization.k8s.io/tekton-aggregate-edit created
clusterrole.rbac.authorization.k8s.io/tekton-aggregate-view created
configmap/config-artifact-bucket created
configmap/config-artifact-pvc created
configmap/config-defaults created
configmap/feature-flags created
configmap/config-leader-election created
configmap/config-logging created
configmap/config-observability created
configmap/config-registry-cert created
deployment.apps/tekton-pipelines-controller created
service/tekton-pipelines-controller created
horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook created
deployment.apps/tekton-pipelines-webhook created
service/tekton-pipelines-webhook created
Waiting for webhook pod at 11:29:39
pod/tekton-pipelines-webhook-5bfbbd6475-788g5 condition met
Webhook ready at 11:30:02
Apply task at 11:30:02
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:03
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:04
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:05
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:06
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:06
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:07
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:08
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:08
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:09
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:10
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:11
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:11
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": dial tcp 10.96.15.129:443: connect: connection refused
Apply task at 11:30:12
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling webhook "webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/defaulting?timeout=10s": EOF
Apply task at 11:30:13
task.tekton.dev/example-task-name created
Apply task at 11:30:13

Based on the logs of the webhook pod, for this particular execution after 11:29:56 the webhook setup is completed. The test starts after that, at 11:30:02, to create the task and that is failing. Therefore it looks like that it´s not the webhook pod signaling readiness too early.

Also with the CR instance creation eventually succeeding, it does not look like that there is an general issue with the setup / webhook config etc.

Is there an additional condition I need to wait for before being able to create a CR instance?

Environment:

  • Kubernetes version (use kubectl version): v1.20.2
  • Cloud provider or hardware configuration: kind v0.10.0 go1.15.7 darwin/amd64

Thank you,
Fabian

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.sig/appsCategorizes an issue or PR as relevant to SIG Apps.sig/networkCategorizes an issue or PR as relevant to SIG Network.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions