Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconciliation failing after upgrade to 2.2.2 #4529

Closed
1 task done
Djiit opened this issue Jan 6, 2024 · 6 comments
Closed
1 task done

Reconciliation failing after upgrade to 2.2.2 #4529

Djiit opened this issue Jan 6, 2024 · 6 comments

Comments

@Djiit
Copy link

Djiit commented Jan 6, 2024

Describe the bug

When the kustomize-controller reconcile resources, it throws these logs:

  • info: server-side apply completed (with all resources unchanged -- this is expected)
  • error : health check failed after 1.145123312s: failed early due to stalled resources: <resource> status: 'Unknown' (listing all the managed resources)

Steps to reproduce

Well, I'm not sure. It just started to happen yesterday, roughly a day after a 2.1.0 => 2.2.0 upgrade. Looks like one of my image was updated through an updatepolicy, then the reconciliation failed, and then every minute the kustomize-controller threw the logs above.

Expected behavior

Everything works (actually it works, but the controller is telling me ALL the resources are in a status "unknown"). I'd had to mute my notifications for now.

Screenshots and recordings

No response

OS / Distro

macOS latest version

Flux version

flux: v2.2.2

Flux check

❯ flux check
► checking prerequisites
✔ Kubernetes 1.26.11 >=1.26.0-0
► checking version in cluster
✔ distribution: flux-v2.2.2
✔ bootstrapped: true
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.37.2
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.2.3
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.2.3
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.37.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.31.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.2.1
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta3
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta2
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta3
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1
✔ all checks passed

Git provider

Github

Container Registry provider

No response

Additional context

Is it possible there is a threshold of some sort that was misconfigured when upgrading to 2.2.2 ?

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Djiit
Copy link
Author

Djiit commented Jan 6, 2024

I can also see some desync betweeb flux and its HR?

❯ flux reconcile hr -n home-asssistant home-assistant
✗ helmreleases.helm.toolkit.fluxcd.io "home-assistant" not found

NAME          	REVISION	SUSPENDED	READY	MESSAGE
home-assistant	13.4.2  	False    	False	Helm upgrade failed for release home-assistant/home-assistant with chart home-assistant@13.4.2: context deadline exceeded

❯ flux suspend hr -n home-assistant home-assistant
► suspending helmrelease home-assistant in home-assistant namespace
✔ helmrelease suspended

❯ flux resume hr -n home-assistant home-assistant
► resuming helmrelease home-assistant in home-assistant namespace
✔ helmrelease resumed
◎ waiting for HelmRelease reconciliation
✔ HelmRelease home-assistant reconciliation completed
✔ applied revision 13.4.2

❯ flux reconcile hr -n home-asssistant home-assistant
✗ helmreleases.helm.toolkit.fluxcd.io "home-assistant" not found

Now that this HR is reconciled (with a suspend/resume), the error is gone...
It was this HR's image that was updated yesterday.

@funkymcb
Copy link

We got the same error failed early due to stalled resources: <resource> status: 'Unknown' on many different resources.
But those resources are not deployed via helm. So there is no helmrelease we could suspend or resume.
The resources themselves are all up and healthy.

We are on flux 2.2.2 aswell.
Any other fix or workaround?

@stefanprodan
Copy link
Member

@funkymcb the list of resources should contain the one that failed, grep them and filter out the Unknown ones.

@funkymcb
Copy link

funkymcb commented Jan 16, 2024

True. There was one deployment failing. Hard to spot at first sight under hundrets of resources. Thanks

@stefanprodan
Copy link
Member

Hard to spot on the first sight under hundrets of resources.

@funkymcb I have created an issue for this. My proposal to solve it is by filtering the resources and show only the failed ones fluxcd/pkg#715

@patsevanton
Copy link

Who needs to increase resources? fluxcd or the application?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants