Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Prometheus volume failed #3260

Closed
albertkohl-monotek opened this issue Dec 8, 2022 · 1 comment
Closed

[BUG] Prometheus volume failed #3260

albertkohl-monotek opened this issue Dec 8, 2022 · 1 comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one

Comments

@albertkohl-monotek
Copy link

albertkohl-monotek commented Dec 8, 2022

Describe the bug
after some sort of Disk Corruption issue, i cant seem to get Prometheus monitoring health. see #3238. I would like to try to get it health before attempting and upgrade from v1.0.3 -> v1.1.1 to avoid any issues with the upgrade hanging.

To Reproduce
hard to re-preoduce because i believe it was caused by longhorn disk corruption issues for ext4 fs.

Expected behavior
Prometheus volumes should be health, or perhaps a workaround to re-set prometheus.

Support bundle
attached.

Environment

  • Harvester ISO version: v1.0.3
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): BareMetal R620s. 4 nodes

Additional context
tried to delete the failed replica in longhorn, but it is not an option. also tried to attach volume to other nodes with no luck.

image

Volume details in LH:
Volume Details
State: Detached
Health:
Unknown
Ready for workload:Ready
Conditions:
restore
scheduled
Frontend:Block Device
Attached Node & Endpoint:
Size:
50 Gi
Actual Size:64.7 Gi
Data Locality:disabled
Access Mode:ReadWriteOnce
Engine Image:longhornio/longhorn-engine:v1.2.4
Created:3 months ago
Encrypted:False
Node Tags:
Disk Tags:
Last Backup:
Last Backup At:
Replicas Auto Balance:ignored
Instance Manager:
Namespace:cattle-monitoring-system
PVC Name:prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0
PV Name:pvc-7616922b-a530-4bcc-b281-0a0438955d4d
PV Status:Bound
Revision Counter Disabled:False
Pod Name:prometheus-rancher-monitoring-prometheus-0
Pod Status:Pending
Workload Name:prometheus-rancher-monitoring-prometheus
Workload Type:StatefulSet

charts / graphs fail to load i'm guessing because of missing metrics.
image

additonal thread from slack w/ @Vicente-Cheng:
https://rancher-users.slack.com/archives/C01GKHKAG0K/p1670220962452249 (hopefully this link works!)

@albertkohl-monotek albertkohl-monotek added kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Dec 8, 2022
@rebeccazzzz rebeccazzzz added this to New in Community Issue Review via automation Dec 12, 2022
@albertkohl-monotek
Copy link
Author

i found the pod in kubectl and deleted it. came up fine after.

@rebeccazzzz rebeccazzzz moved this from New to Resolved/Scheduled in Community Issue Review Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one
Projects
Community Issue Review
Resolved/Scheduled
Development

No branches or pull requests

1 participant