Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] when adding persistence to grafana, blocked at "Starting DB migrations" #4382

Closed
pierreloicq opened this issue Mar 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@pierreloicq
Copy link

Describe the bug a clear and concise description of what the bug is.

I added this section in my values.yml:

  persistence:
    type: pvc
    enabled: true
    storageClassName: nfs
    accessModes:
      - ReadWriteMany
    size: 10Gi 
    finalizers:
      - kubernetes.io/pvc-protection

With these lines, the grafana pod doesn't start. If I remove them, the pod starts.
Here is the output of kubectl logs prometheus-grafana-8459865c9f-kbgm7 -n prometheus:

logger=settings t=2024-03-22T11:01:24.18653234Z level=info msg="Starting Grafana" version=10.4.0 commit=03f502a94d17f7dc4e6c34acdf8428aedd986e4c branch=HEAD compiled=2024-03-22T11:01:24Z
logger=settings t=2024-03-22T11:01:24.186872016Z level=info msg="Config loaded from" file=/usr/share/grafana/conf/defaults.ini
logger=settings t=2024-03-22T11:01:24.186882649Z level=info msg="Config loaded from" file=/etc/grafana/grafana.ini
logger=settings t=2024-03-22T11:01:24.186904876Z level=info msg="Config overridden from command line" arg="default.paths.data=/var/lib/grafana/"
logger=settings t=2024-03-22T11:01:24.186908832Z level=info msg="Config overridden from command line" arg="default.paths.logs=/var/log/grafana"
logger=settings t=2024-03-22T11:01:24.186917125Z level=info msg="Config overridden from command line" arg="default.paths.plugins=/var/lib/grafana/plugins"
logger=settings t=2024-03-22T11:01:24.186920788Z level=info msg="Config overridden from command line" arg="default.paths.provisioning=/etc/grafana/provisioning"
logger=settings t=2024-03-22T11:01:24.186924583Z level=info msg="Config overridden from command line" arg="default.log.mode=console"
logger=settings t=2024-03-22T11:01:24.186928565Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_DATA=/var/lib/grafana/"
logger=settings t=2024-03-22T11:01:24.186932375Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_LOGS=/var/log/grafana"
logger=settings t=2024-03-22T11:01:24.186935935Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
logger=settings t=2024-03-22T11:01:24.18693964Z level=info msg="Config overridden from Environment variable" var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
logger=settings t=2024-03-22T11:01:24.186943162Z level=info msg="Config overridden from Environment variable" var="GF_SECURITY_ADMIN_USER=admin"
logger=settings t=2024-03-22T11:01:24.186946467Z level=info msg="Config overridden from Environment variable" var="GF_SECURITY_ADMIN_PASSWORD=*********"
logger=settings t=2024-03-22T11:01:24.186950085Z level=info msg=Target target=[all]
logger=settings t=2024-03-22T11:01:24.186958492Z level=info msg="Path Home" path=/usr/share/grafana
logger=settings t=2024-03-22T11:01:24.18696215Z level=info msg="Path Data" path=/var/lib/grafana/
logger=settings t=2024-03-22T11:01:24.186965586Z level=info msg="Path Logs" path=/var/log/grafana
logger=settings t=2024-03-22T11:01:24.186969376Z level=info msg="Path Plugins" path=/var/lib/grafana/plugins
logger=settings t=2024-03-22T11:01:24.186973081Z level=info msg="Path Provisioning" path=/etc/grafana/provisioning
logger=settings t=2024-03-22T11:01:24.186976653Z level=info msg="App mode production"
logger=sqlstore t=2024-03-22T11:01:24.187488555Z level=info msg="Connecting to DB" dbtype=sqlite3
logger=migrator t=2024-03-22T11:01:24.188438747Z level=info msg="Starting DB migrations"

The pvc is successfull, kub get pvc -n prometheus outputs:

NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-grafana                                                                                       Bound    pvc-1fe1bdb8-85be-472c-b8e1-936f414b78a7   10Gi       RWX            nfs            3h19m
prometheus-prometheus-kube-prometheus-prometheus-db-prometheus-prometheus-kube-prometheus-prometheus-0   Bound    pvc-3f2cc611-8d19-4c78-9991-4cc99c3d1868   20Gi       RWO            nfs            182d

And I can see the grafana.db file created on the volume but it weights 0 kb. I can also see an empty folder named "plugins".
I don't know what is the exact log level there since I have just level=info. I tried to add the lines indicated in the grafana doc and also just logLevel: DEBUG under the grafana section of my values.yaml but it changes nothing.

Here is the output of kubectl describe pod prometheus-grafana-8459865c9f-kbgm7 -n prometheus from the grafana section :

  grafana:
    Container ID:    containerd://86d779c2f8e89d3bae7ca797759f120973d9226d78a3b2330005e1243ec3a846
    Image:           docker.io/grafana/grafana:10.4.0
    Image ID:        docker.io/grafana/grafana@sha256:f9811e4e687ffecf1a43adb9b64096c50bc0d7a782f8608530f478b6542de7d5
    Ports:           3000/TCP, 9094/TCP, 9094/UDP
    Host Ports:      0/TCP, 0/TCP, 0/UDP
    SeccompProfile:  RuntimeDefault
    State:           Running
      Started:       Fri, 22 Mar 2024 12:09:21 +0100
    Last State:      Terminated
      Reason:        Error
      Exit Code:     2
      Started:       Fri, 22 Mar 2024 12:06:44 +0100
      Finished:      Fri, 22 Mar 2024 12:09:20 +0100
    Ready:           False
    Restart Count:   8
    Liveness:        http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
    Readiness:       http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_IP:                       (v1:status.podIP)
      GF_SECURITY_ADMIN_USER:      *************      Optional: false
      GF_SECURITY_ADMIN_PASSWORD:  *************  Optional: false
      GF_PATHS_DATA:               /var/lib/grafana/
      GF_PATHS_LOGS:               /var/log/grafana
      GF_PATHS_PLUGINS:            /var/lib/grafana/plugins
      GF_PATHS_PROVISIONING:       /etc/grafana/provisioning
    Mounts:
      /etc/grafana/grafana.ini from config (rw,path="grafana.ini")
      /etc/grafana/provisioning/dashboards/sc-dashboardproviders.yaml from sc-dashboard-provider (rw,path="provider.yaml")
      /etc/grafana/provisioning/datasources from sc-datasources-volume (rw)
      /tmp/dashboards from sc-dashboard-volume (rw)
      /var/lib/grafana from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7gzjr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-grafana
    ReadOnly:   false
  sc-dashboard-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  sc-dashboard-provider:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-grafana-config-dashboards
    Optional:  false
  sc-datasources-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-7gzjr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  26m                    default-scheduler  Successfully assigned prometheus/prometheus-grafana-8459865c9f-kbgm7 to nodepool-d96fec4f-3b94-4613-a9-node-249073
  Normal   Pulling    26m                    kubelet            Pulling image "docker.io/library/busybox:1.31.1"
  Normal   Pulled     26m                    kubelet            Successfully pulled image "docker.io/library/busybox:1.31.1" in 709.885467ms (709.891773ms including waiting)
  Normal   Created    26m                    kubelet            Created container init-chown-data
  Normal   Started    26m                    kubelet            Started container init-chown-data
  Normal   Pulling    26m                    kubelet            Pulling image "quay.io/kiwigrid/k8s-sidecar:1.26.1"
  Normal   Pulled     26m                    kubelet            Successfully pulled image "quay.io/kiwigrid/k8s-sidecar:1.26.1" in 453.908763ms (453.914344ms including waiting)
  Normal   Created    26m                    kubelet            Created container grafana-sc-dashboard
  Normal   Started    26m                    kubelet            Started container grafana-sc-dashboard
  Normal   Pulling    26m                    kubelet            Pulling image "quay.io/kiwigrid/k8s-sidecar:1.26.1"
  Normal   Pulled     26m                    kubelet            Successfully pulled image "quay.io/kiwigrid/k8s-sidecar:1.26.1" in 423.044801ms (423.050581ms including waiting)
  Normal   Created    26m                    kubelet            Created container grafana-sc-datasources
  Normal   Started    26m                    kubelet            Started container grafana-sc-datasources
  Normal   Pulling    26m                    kubelet            Pulling image "docker.io/grafana/grafana:10.4.0"
  Normal   Pulled     26m                    kubelet            Successfully pulled image "docker.io/grafana/grafana:10.4.0" in 1.009009584s (1.009015537s including waiting)
  Normal   Created    26m                    kubelet            Created container grafana
  Normal   Started    26m                    kubelet            Started container grafana
  Warning  Unhealthy  16m (x37 over 25m)     kubelet            Liveness probe failed: Get "http://10.2.0.133:3000/api/health": dial tcp 10.2.0.133:3000: connect: connection refused
  Warning  BackOff    6m30s (x8 over 7m55s)  kubelet            Back-off restarting failed container grafana in pod prometheus-grafana-8459865c9f-kbgm7_prometheus(7b0774bb-e98c-438a-9972-9b5aa5da3686)
  Warning  Unhealthy  86s (x182 over 26m)    kubelet            Readiness probe failed: Get "http://10.2.0.133:3000/api/health": dial tcp 10.2.0.133:3000: connect: connection refused

What's your helm version?

version.BuildInfo{Version:"v3.13.2", GitCommit:"2a2fb3b98829f1e0be6fb18af2f6599e0f4e8243", GitTreeState:"clean", GoVersion:"go1.20.10"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:21:19Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"windows/amd64"} Kustomize Version: v5.0.1 Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.11", GitCommit:"3cd242c51317aed8858119529ccab22079f523b1", GitTreeState:"clean", BuildDate:"2023-11-15T16:50:12Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack

What's the chart version?

57.1.0 (same problem on 54.2.2)

What happened?

No response

What you expected to happen?

I expected the pod to start

How to reproduce it?

No response

Enter the changed values of values.yaml?

persistence:
type: pvc
enabled: true
storageClassName: nfs
accessModes:
- ReadWriteMany
size: 10Gi # no idea how much I need
finalizers:
- kubernetes.io/pvc-protection

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --values myvalues.yaml --namespace prometheus --version=57.1.0

Anything else we need to know?

No response

@pierreloicq pierreloicq added the bug Something isn't working label Mar 22, 2024
@zeritti zeritti changed the title [prometheus-kube-stack] when adding persistence to grafana, blocked at "Starting DB migrations" [kube-prometheus-stack] when adding persistence to grafana, blocked at "Starting DB migrations" Mar 22, 2024
@pierreloicq
Copy link
Author

pierreloicq commented Mar 27, 2024

It seems that it is due to the fact that the PVC is on a disk connected through NFS.
This sqlite doc may explain.
On a pvc not using NFS, the pods starts.
Another solution would be to use another database but I did not manage to configure that through the values.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant