Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metallb installation with driftDetection: mode: enabled failed to apply revision #855

Closed
zaggash opened this issue Dec 18, 2023 · 8 comments

Comments

@zaggash
Copy link

zaggash commented Dec 18, 2023

I'm trying to setup Metallb with this Kustomization:

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps-metallb
  namespace: flux-system
spec:
  path: /apps/metallb-system/metallb/app
  sourceRef:
    kind: GitRepository
    name: apps
    healthChecks:
      - apiVersion: helm.toolkit.fluxcd.io/v2beta2
        kind: HelmRelease
        name: metallb
        namespace: metallb-system
  interval: 30m
  retryInterval: 1m
  timeout: 3m

And this Helm Release:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: metallb
spec:
  interval: 15m
  driftDetection:
    mode: enabled
  chart:
    spec:
      chart: metallb
      version: "0.13.12"
      sourceRef:
        kind: HelmRepository
        name: metallb-charts
        namespace: flux-system
  maxHistory: 3
  install:
    createNamespace: true
    crds: CreateReplace
    remediation:
      retries: 3
  upgrade:
    cleanupOnFail: true
    crds: CreateReplace
    remediation:
      retries: 3
  uninstall:
    keepHistory: false
  values:
    controller:
      logLevel: warn
    speaker:
      logLevel: warn
    frr:
      enabled: false

flux version

flux: v2.2.0
distribution: flux-v2.2.1
helm-controller: v0.37.1
image-automation-controller: v0.37.0
image-reflector-controller: v0.31.1
kustomize-controller: v1.2.1
notification-controller: v1.2.3
source-controller: v1.2.3

flux get kustomizations is showing it is never Ready and marked as Unkown.

In the logs of the helm-controller I have

k -n flux-system logs helm-controller-<id>

{"level":"debug","ts":"2023-12-18T20:53:52.682Z","logger":"events","msg":"Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:\nCustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)\nCustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals)","type":"Warning","object":{"kind":"HelmRelease","namespace":"metallb-system","name":"metallb","uid":"59eb65b4-d800-4f6e-96af-59891565efc6","apiVersion":"helm.toolkit.fluxcd.io/v2beta2","resourceVersion":"181220417"},"reason":"DriftDetected"}
{"level":"debug","ts":"2023-12-18T20:53:52.683Z","msg":"instructed to stop before running drift correction action reconciler correct cluster drift","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb-system"},"namespace":"metallb-system","name":"metallb","reconcileID":"8cc34888-d956-4d75-93d4-87c10f99a24e"}

The application is successfully installed, the pods are Ready, the HelmRelease is marked as Ready.
However, the Kustomization never finish.
It continuously try to reconcile the HelmRelease for some reasons.

I tried many times to manually reconcile, tried with --with-source.
I tried to remove the HealthCheck and set it to wait: true, nothing is working.

The only way to make it work is to remove every HealthCheck or wait:true statement and it is then successfully deployed.

@zaggash zaggash changed the title Metallb installation with driftDetection: mode: enabled failed to apply kustomise revision Metallb installation with driftDetection: mode: enabled failed to apply revision Dec 18, 2023
@hiddeco
Copy link
Member

hiddeco commented Dec 19, 2023

Can you please share the .status and the events for the HelmRelease object? It appears to me like the controller is observing continued drift for the release, and you should e.g. make use of ignore rules to exclude certain fields.

The precise fields can be observable from the controllers logs, they should be logged as resource modified messages at debug level with a patch field attached to them.

@zaggash
Copy link
Author

zaggash commented Dec 19, 2023

Please see the .status of the HR

status:
  conditions:
  - lastTransitionTime: "2023-12-19T11:22:43Z"
    message: Helm install succeeded for release metallb-system/metallb.v1 with chart
      metallb@0.13.12
    observedGeneration: 4
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2023-12-18T20:11:17Z"
    message: Helm install succeeded for release metallb-system/metallb.v1 with chart
      metallb@0.13.12
    observedGeneration: 1
    reason: InstallSucceeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-12-18T20:11:17Z"
    message: Helm install succeeded for release metallb-system/metallb.v1 with chart
      metallb@0.13.12
    observedGeneration: 1
    reason: InstallSucceeded
    status: "True"
    type: Released
  helmChart: flux-system/metallb-system-metallb
  history:
  - chartName: metallb
    chartVersion: 0.13.12
    configDigest: sha256:cabfeb21c57b8b06565689d2212cdfb278c61ce442822337215254a84a4850d9
    digest: sha256:e524142b85ae05a16d30ba30962e2a175d6381995bc71d463a97794211a15c98
    firstDeployed: "2023-12-18T20:11:04Z"
    lastDeployed: "2023-12-18T20:11:04Z"
    name: metallb
    namespace: metallb-system
    status: deployed
    version: 1
  lastAttemptedConfigDigest: sha256:cabfeb21c57b8b06565689d2212cdfb278c61ce442822337215254a84a4850d9
  lastAttemptedGeneration: 4
  lastAttemptedReleaseAction: install
  lastAttemptedRevision: 0.13.12
  lastHandledReconcileAt: "2023-12-18T21:53:52.159942829+01:00"
  lastHandledResetAt: "2023-12-18T21:53:52.159942829+01:00"
  observedGeneration: -1
  storageNamespace: metallb-system

Trying to set the controller log to debug.

@zaggash
Copy link
Author

zaggash commented Dec 19, 2023

How can I extract this patch from the controller logs ?
I can't see anything when I look a the debug logs.

@hiddeco
Copy link
Member

hiddeco commented Dec 19, 2023

It should be logged right after "detected changes in cluster state", see:

https://github.com/fluxcd/helm-controller/blob/main/internal/reconcile/atomic_release.go#L377-L387

Without knowing the specific path, you should at least be able to confirm the issue is indeed due to detected drift by excluding the resource in full.

@zaggash
Copy link
Author

zaggash commented Dec 20, 2023

This is what I found in my logs.

2023-12-20T00:05:26.960Z debug  - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals) 
2023-12-20T00:05:26.961Z debug HelmRelease/metallb.metallb-system - instructed to stop before running drift correction action reconciler correct cluster drift 
2023-12-20T00:08:38.996Z info HelmRelease/metallb.metallb-system - HelmChart/flux-system/metallb-system-metallb with SourceRef 'HelmRepository/flux-system/metallb-charts' is in-sync 
2023-12-20T00:08:39.041Z debug HelmRelease/metallb.metallb-system - determining current state of Helm release 
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - determining next Helm action based on current state 
2023-12-20T00:08:39.280Z info HelmRelease/metallb.metallb-system - detected changes in cluster state: removed: 0, changed: 2, excluded: 0 
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.280Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.280Z debug  - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals) 
2023-12-20T00:08:39.296Z info HelmRelease/metallb.metallb-system - running 'correct cluster drift' action with timeout of 5m0s 
2023-12-20T00:08:39.318Z debug  - Cluster state of release metallb-system/metallb.v1 has been corrected:
CustomResourceDefinition/addresspools.metallb.io configured
CustomResourceDefinition/bgppeers.metallb.io configured 
2023-12-20T00:08:39.319Z debug HelmRelease/metallb.metallb-system - determining current state of Helm release 
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - determining next Helm action based on current state 
2023-12-20T00:08:39.541Z info HelmRelease/metallb.metallb-system - detected changes in cluster state: removed: 0, changed: 2, excluded: 0 
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.541Z debug HelmRelease/metallb.metallb-system - resource modified 
2023-12-20T00:08:39.541Z debug  - Cluster state of release metallb-system/metallb.v1 has drifted from the desired state:
CustomResourceDefinition/addresspools.metallb.io changed (0 additions, 1 changes, 0 removals)
CustomResourceDefinition/bgppeers.metallb.io changed (0 additions, 1 changes, 0 removals) 
2023-12-20T00:08:39.542Z debug HelmRelease/metallb.metallb-system - instructed to stop before running drift correction action reconciler correct cluster drif

IIRC, I need to ignore both CRDs
CustomResourceDefinition/addresspools.metallb.io
CustomResourceDefinition/bgppeers.metallb.io

For some reasons these are changed after Helm installs it, right.

@jonasbadstuebner
Copy link

We are experiencing this problem as well.
But on top of this, we also see the wrong status for the MetalLB HelmRelease (MetalLB is only an example here I guess).

We see dependency 'monitoring/xx' is not ready as status like so:

status:
  conditions:
  - lastTransitionTime: "2024-01-12T11:10:18Z"
    message: dependency 'monitoring/xx' is not ready
    observedGeneration: 17
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2024-01-11T10:06:53Z"
    message: dependency 'monitoring/xx' is not ready
    observedGeneration: 3
    reason: DependencyNotReady
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-01-09T16:54:05Z"
    message: Helm install succeeded for release metallb/metallb.v1 with chart metallb@0.13.12
    observedGeneration: 1
    reason: InstallSucceeded
    status: "True"

But in the helm-controller logs we see

{"level":"info","ts":"2024-01-12T11:03:47.753Z","msg":"checking 1 dependencies","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:47.753Z","msg":"all dependencies are ready","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.117Z","msg":"detected changes in cluster state: removed: 0, changed: 2, excluded: 0","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.163Z","msg":"running 'correct cluster drift' action with timeout of 5m0s","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}
{"level":"info","ts":"2024-01-12T11:03:48.584Z","msg":"detected changes in cluster state: removed: 0, changed: 2, excluded: 0","controller":"helmrelease","controllerGroup":"helm.toolkit.fluxcd.io","controllerKind":"HelmRelease","HelmRelease":{"name":"metallb","namespace":"metallb"},"namespace":"metallb","name":"metallb","reconcileID":"d7751623-4737-417b-82cc-d1db35fdbfe7"}

--> "msg":"all dependencies are ready"

@jonasbadstuebner
Copy link

jonasbadstuebner commented Jan 12, 2024

Setting the log-level to debug showed us the path for the (automatically) changed data.
We added the following to the MetalLB HelmRelease and it fixed the reconciliation.

  driftDetection:
    ignore:
    - paths:
      - /spec/conversion/webhook/clientConfig/caBundle
      target:
        kind: CustomResourceDefinition

Still I think the status message has to be fixed...because it seems to not change, if the HelmRelease goes into "ProgressingWithRetry" - it just keeps the status message from before, is my guess (without looking into the code).

@spietras
Copy link

spietras commented May 8, 2024

See also: metallb/metallb#1681

@zaggash zaggash closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants