[BUG] Panic during collecting metrics #8098

derekbit · 2024-03-04T02:34:04Z

Describe the bug

https://cloud-native.slack.com/archives/CNVPEL9U3/p1709281294886339

2024-03-01T10:28:17.583228766+01:00 time="2024-03-01T09:28:17Z" level=warning msg="Failed to get engine proxy of pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0 for volume pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed" func="metrics_collector.(*VolumeCollector).Collect" file="volume_collector.go:192" collector=volume error="failed to get binary client for engine pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0: cannot get client for engine pvc-4c4da4f6-6585-4afe-8c2c-23b6624573ed-e-0: engine is not running" node=core8
2024-03-01T10:28:17.583267365+01:00 time="2024-03-01T09:28:17Z" level=warning msg="Panic during collecting metrics" func="metrics_collector.(*VolumeCollector).Collect.func1" file="volume_collector.go:164" collector=volume error="runtime error: invalid memory address or nil pointer dereference" node=core8
2024-03-01T10:28:17.590616231+01:00 10.0.0.75 - - [01/Mar/2024:09:28:17 +0000] "GET /metrics HTTP/1.1" 200 33847 "" "Prometheus/2.47.1"

To Reproduce

Expected behavior

Support bundle for troubleshooting

supportbundle_e3a48966-8520-476f-90bc-758701853d85_2024-03-01T09-29-10Z.zip

Environment

Longhorn version: v1.6.0
Impacted volume (PV):
Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of control plane nodes in the cluster:
- Number of worker nodes in the cluster:
Node config
- OS type and version:
- Kernel version:
- CPU per node:
- Memory per node:
- Disk type (e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes (Gbps):
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
Number of Longhorn volumes in the cluster:

Additional context

longhorn-io-github-bot · 2024-03-04T08:06:31Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:

Monitoring the volume metrics
Trigger [BUG] Volume attach/detach/delete operations stuck in version 1.6.0 #7915
The warning should be observed in longhorn-manager pod

Does the PR include the explanation for the fix or the feature?
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at

longhorn/longhorn-manager#2665

Which areas/issues this PR might have potential impacts on?
Area: metrics
Issues

derekbit · 2024-03-04T08:11:24Z

The issue is not harmful and won't destroy longhorn-manager pod, because the recover mechanism is introduced.

github-actions bot mentioned this issue Mar 4, 2024

[BACKPORT][v1.6.1][BUG] Panic during collecting metrics #8099

Closed

derekbit mentioned this issue Mar 4, 2024

metrics: refactor volume collector longhorn/longhorn-manager#2665

Merged

derekbit self-assigned this Mar 4, 2024

derekbit added the area/resilience System or volume resilience label Mar 4, 2024

derekbit added this to the v1.7.0 milestone Mar 4, 2024

derekbit added the backport/1.5.5 label Mar 4, 2024

github-actions bot mentioned this issue Mar 4, 2024

[BACKPORT][v1.5.5][BUG] Panic during collecting metrics #8102

Closed

innobead closed this as completed Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Panic during collecting metrics #8098

[BUG] Panic during collecting metrics #8098

derekbit commented Mar 4, 2024 •

edited by innobead

Loading

longhorn-io-github-bot commented Mar 4, 2024 •

edited by derekbit

Loading

derekbit commented Mar 4, 2024

[BUG] Panic during collecting metrics #8098

[BUG] Panic during collecting metrics #8098

Comments

derekbit commented Mar 4, 2024 • edited by innobead Loading

Describe the bug

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

Additional context

longhorn-io-github-bot commented Mar 4, 2024 • edited by derekbit Loading

Pre Ready-For-Testing Checklist

derekbit commented Mar 4, 2024

derekbit commented Mar 4, 2024 •

edited by innobead

Loading

longhorn-io-github-bot commented Mar 4, 2024 •

edited by derekbit

Loading