Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

healthChecks for Jobs seem to fail #373

Closed
gwvandesteeg opened this issue Jun 24, 2021 · 7 comments
Closed

healthChecks for Jobs seem to fail #373

gwvandesteeg opened this issue Jun 24, 2021 · 7 comments

Comments

@gwvandesteeg
Copy link

When a kustomization has a health check configured on a Job, the health checks keep failing even when the job has successfully run.

Version Info:

$ flux check
► checking prerequisites
✔ kubectl 1.21.0 >=1.18.0-0
✔ Kubernetes 1.20.4-eks-6b7464 >=1.16.0-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.11.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.13.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.15.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.15.2
✔ all checks passed

The reconciliation of these keep showing errors like the below:

Health check timed out for [Job 'default/neo4j-bootstrap-databases', Job 'default/mysql-bootstrap-databases']
The Kustomization in question

---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: backends-configs
  namespace: flux-system
spec:
  interval: 10m0s
  dependsOn:
    - name: backends
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./backends-configs/dev
  prune: true
  validation: client
  timeout: 5m
  healthChecks:
    # make sure the neo4j bootstrap is ready
    - apiVersion: v1
      kind: Job
      name: neo4j-bootstrap-databases
      namespace: default
    # make sure the neo4j bootstrap is ready
    - apiVersion: v1
      kind: Job
      name: mysql-bootstrap-databases
      namespace: default
    # make sure the neo4j-backup release is ready
    - apiVersion: helm.toolkit.fluxcd.io/v1beta1
      kind: HelmRelease
      name: neo4j-backup
      namespace: default

And the jobs result status

$ kubectl get jobs
NAME                          COMPLETIONS   DURATION   AGE
mysql-bootstrap-databases     1/1           3s         14h
neo4j-bootstrap-databases     1/1           17s        15h

The jobs themselves just run a particular bash shell script that runs some SQL queries against the mariadb-galera database or the neo4j database to create some users, databases, and set some permissions.

This should be easily reproducible using any form of simple Job that executes a shell script.

@makkes
Copy link
Member

makkes commented Jun 24, 2021

Thanks @gwvandesteeg for raising this. I will have a look at what's going on there.

@stefanprodan
Copy link
Member

@makkes the apiVersion is wrong, Kubernetes Jobs are in batch/v1.

@stefanprodan
Copy link
Member

Also Kubernetes Jobs are immutable, so the first time the job container image changes it will fail to apply. To reconcile Kubernetes Jobs, as described in docs, the Flux Kustomization spec should contain: force: true.

@makkes
Copy link
Member

makkes commented Jun 24, 2021

@stefanprodan what do you think about improving the messaging for such configuration errors?

@stefanprodan
Copy link
Member

@makkes the error message comes from kstatus

@makkes
Copy link
Member

makkes commented Jun 24, 2021

Maybe kustomize-controller could exctract them from the ResourceStatuses. 🤔 Looking into possibilities.

@gwvandesteeg
Copy link
Author

both fixes above resolve the issue thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants