Controller fails to pick up ImageCache resource on fresh install if webhook server is down #70

Chili-Man · 2021-04-14T02:56:21Z

When installing kube-fledged directly from the helm chart, both the imagecache resource, webhook and the controller servers are deployed at same time and if the controller comes up before the webhook server, then the controller runs into the following error:

I0414 00:43:56.288736       1 controller.go:122] Setting up event handlers
I0414 00:43:56.289021       1 main.go:75] Starting pre-flight checks
I0414 00:43:56.487536       1 controller.go:158] No dangling or stuck jobs found...
I0414 00:43:56.494524       1 controller.go:208] No dangling or stuck imagecaches found...
I0414 00:43:56.494553       1 main.go:79] Pre-flight checks completed
I0414 00:43:56.494582       1 controller.go:223] Starting fledged controller
I0414 00:43:56.494588       1 controller.go:226] Waiting for informer caches to sync
I0414 00:43:56.681989       1 controller.go:231] Starting image cache worker
I0414 00:43:56.682138       1 controller.go:238] Starting cache refresh worker
I0414 00:43:56.682155       1 controller.go:242] Started workers
I0414 00:43:56.682162       1 image_manager.go:340] Starting image manager
I0414 00:43:56.682170       1 image_manager.go:343] Waiting for informer caches to sync
I0414 00:43:56.682179       1 controller.go:429] Starting to sync image cache kernels-default(create)
E0414 00:43:56.700798       1 controller.go:491] Error updating imagecache status to Processing: Internal error occurred: failed calling webhook "validate-image-cache.kubefledged.k8s.io": Post "https://kubefledged-webhook-server.management.svc:3443/validate-image-cache?timeout=1s": no endpoints available for service "kubefledged-webhook-server"
E0414 00:43:56.701170       1 controller.go:366] error syncing imagecache: Internal error occurred: failed calling webhook "validate-image-cache.kubefledged.k8s.io": Post "https://kubefledged-webhook-server.management.svc:3443/validate-image-cache?timeout=1s": no endpoints available for service "kubefledged-webhook-server"
E0414 00:43:56.701206       1 controller.go:377] error syncing imagecache: Internal error occurred: failed calling webhook "validate-image-cache.kubefledged.k8s.io": Post "https://kubefledged-webhook-server.management.svc:3443/validate-image-cache?timeout=1s": no endpoints available for service "kubefledged-webhook-server"
I0414 00:43:57.082511       1 image_manager.go:348] Started image manager

Then the controller does not pick up the imagecache resource that was deployed. The current workaround is to manually delete the controller pod and then it's able to pick up the imagecache resource.

The text was updated successfully, but these errors were encountered:

senthilrch · 2021-06-07T09:08:18Z

@Chili-Man

Thanks for posting this issue. Will look into this for v0.8.0

senthilrch · 2021-06-07T09:50:31Z

@Chili-Man

For deploy-using-yaml, I have fixed this by introducing "kubectl rollout status deployment kubefledged-webhook-server --watch" before applying manifest of controller.

For helm-chart, I'll look out for different solution (init container etc.)

* Move custom resource definitions to crds directory * modified travis build conditions * Initial commit for v0.8.0 * Bumped up versions of go, alpine, operator sdk, docker, cri-tool * upgrade go in travis ci * update dependencies * code changes for upgraded dependencies * fix unit tests * make release-amd64 to build all 4 amd64 images * v1alpha1 -> v1alpha2, kubefledged.k8s.io -> kubefledged.io * ignore build binaries * add labels to manifests * updates to imagecache manifest * update apigroup apiversion in validatingwebhook * issue senthilrch#70 deploy-using-yaml: wait for webhook-server running * issue senthilrch#66: upgrade crd api version to v1 * updates to helm chart & operator * update clusterrole to list and watch * update signer name in csr * remove v1alpha1 cr * issue senthilrch#70 add init container to wait for webhook-server * helm chart fix webhook service name * helm chart update * fix chart apiVersion * updated readme for helm chart installation * updated name of helm operator cr * fix issue senthilrch#75: workaround * Ensure validating webhook configuration client config service name for the webhook server mataches the correct webhook service name * Ensure validating webhook configuration client config service name for the webhook server mataches the correct webhook service name * add init option to webhook server * update manifests * updated helm chart * updated makefile and manifests * updated helm operator * makefile wait for operator ready * get imagecache before updating status * pre-install hook for validatingwebhookconfiguration * fix golint errors * modify refresh/purge annotation key to kubefledged.io/xxx * cri-client-image name as env instead of cmd flag * read busybox image from env * use busybox image from gcr.io to overcome dockerhub ratelimiting * fix issue senthilrch#89 change hostpath filetype to socket * delete pre-install hook in "make remove-operator-and-kubefledged" * add annotations to validatingwebhookconfiguration * add "helm repo update" to readme * continue processing job deletion when not found * check if refresh-cache annotation exists * add "helm repo update" to readme * update helm chart to use release namespace * update design proposal document * expose helm parameters in operator CR * document helm parameters * update release version to v0.8.2 * set status to known when unable to fetch pod * add check for image pull/delete status unknown * update log messages * deploy controller and operator to same namespace * fix unit test errors * restore Kubefledged CR during "remove-operator-and-kubefledged" * Update README.md * Update README.md * Update design-proposal.md * Update README.md Co-authored-by: Diego Rodriguez <diego@noteable.io> Co-authored-by: Senthil Raja Chermapandian <senthilrch@gmail.com>

senthilrch added a commit that referenced this issue Jun 7, 2021

issue #70 deploy-using-yaml: wait for webhook-server running

ecb52d9

senthilrch self-assigned this Jun 7, 2021

senthilrch added a commit that referenced this issue Jun 9, 2021

issue #70 add init container to wait for webhook-server

5c86d06

senthilrch mentioned this issue Jun 11, 2021

PR for v0.8.0 #73

Merged

senthilrch closed this as completed Jun 13, 2021

Chili-Man mentioned this issue Jul 10, 2021

Ensure validating webhook configuration client config service name for the webhook server mataches the correct webhook service name noteable-io/kube-fledged#2

Open

AnchorArray mentioned this issue Sep 16, 2021

aberg/DEVOPS 247 noteable-io/kube-fledged#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller fails to pick up ImageCache resource on fresh install if webhook server is down #70

Controller fails to pick up ImageCache resource on fresh install if webhook server is down #70

Chili-Man commented Apr 14, 2021

senthilrch commented Jun 7, 2021

senthilrch commented Jun 7, 2021

Controller fails to pick up ImageCache resource on fresh install if webhook server is down #70

Controller fails to pick up ImageCache resource on fresh install if webhook server is down #70

Comments

Chili-Man commented Apr 14, 2021

senthilrch commented Jun 7, 2021

senthilrch commented Jun 7, 2021