generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
The CSI recover loop logs recovery behavior, but the logs are currently inconsistent in severity and lack sufficient context to understand recovery state transitions.
Specifically:
- Recoverable failures (e.g. mount / unmount failures) are logged as errors
- Some logs lack key context such as mount path, source path, or mount count
- Operators cannot easily tell:
- When recovery starts
- When recovery is skipped
- When recovery succeeds
- Why retries continue to happen
This makes diagnosing CSI recovery behavior difficult in production clusters.
Why this matters
- CSI recovery runs continuously as a daemonset
- Logs are the primary debugging signal for operators
- Incorrect log levels add noise and obscure real failures
- Clear observability improves maintainability without changing behavior
Proposed Solution
Improve observability of the CSI recover loop by:
-
Standardizing log levels:
- Info for normal flow and state transitions - Warning for recoverable mount / unmount failures - Error for unexpected or API-level failures -
Adding structured context to logs (mountPath, source, count, thresholds)
-
Logging recovery start, skip, cleanup, and success paths consistently
-
Keeping the change log-only (no behavior changes)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels