Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement volume health monitoring feature to detect abnormal volumes #819

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Praveenrajmani
Copy link
Collaborator

@Praveenrajmani Praveenrajmani commented Jul 24, 2023

This is a CSI feature which monitors the volume health, identifies and reports the abnormal volumes as events to the respective pods and PVCs.

Fixes #455

Steps to test :

NOTE: This feature requites CSIVolumeHealth feature gate to be enabled. For example, in minikube you can enable it like the following

minikube start --feature-gates=CSIVolumeHealth=true --driver=none

However, the the external controller monitoring doesn't requite this feature gate. ie, you will still see "abnormal" events in your PVCs.

STEP 1: Enable CSIVolumeHealth feature gate
STEP 2: Install DirectPV with --enable-volume-health-monitoring flag.

kubectl directpv install --enable-volume-health-montoring

STEP 3: Discover and initialize drives
STEP 4: Deploy any workload
STEP 5: After the pods are up, umount a volume's container path or staging path.
STEP 6: Wait for few mins and describe the pods and PVCs to see the "abnormal" volume events.

@Praveenrajmani Praveenrajmani marked this pull request as draft July 24, 2023 16:18
@Praveenrajmani
Copy link
Collaborator Author

Praveenrajmani commented Jul 24, 2023

For this PR, we need registry.k8s.io/sig-storage/csi-external-health-monitor-controller:v0.10.0 to be pushed to quay.io/minio/

cc @harshavardhana

pkg/consts/consts.go Outdated Show resolved Hide resolved
@Praveenrajmani Praveenrajmani force-pushed the volume-health branch 3 times, most recently from d0697c7 to ce587bc Compare September 22, 2023 05:49
@Praveenrajmani Praveenrajmani marked this pull request as ready for review September 22, 2023 05:52
@@ -47,30 +47,34 @@ const (
// csiResizerImage = csi-resizer:v1.8.0
csiResizerImage = "csi-resizer@sha256:819f68a4daf75acec336302843f303cf360d4941249f9f5019ffbb690c8ac7c0"
openshiftCSIResizerImage = "registry.redhat.io/openshift4/ose-csi-external-resizer-rhel8@sha256:837b32a0c432123e2605ad6d029e7f3b4489d9c52a9d272c7a133d41ad10db87"

// csiHealthMonitorImage = csi-external-health-monitor-controller:v0.10.0
csiHealthMonitorImage = "registry.k8s.io/sig-storage/csi-external-health-monitor-controller:v0.10.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we keep this image in quay.io/minio?
  2. Use sha256 hash of the image
  3. Use the image from registry.redhat.io for openshift

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use the quay.io/minio/ image after pushing to the registry

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have any equivalent image for this available in redhat openshift

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have any equivalent image for this available in redhat openshift

Then we are stuck to certify DirectPV for Openshift certification. We would need to build csi-external-health-monitor-controller our own to certify it for openshift

Copy link
Collaborator Author

@Praveenrajmani Praveenrajmani Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have sent a PR https://github.com/minio/directpv/pull/859/files to use a script to download and release the image which can comply with RedHat image certification testcases.

pkg/k8s/k8s.go Outdated Show resolved Hide resolved
pkg/volume/volume-health.go Outdated Show resolved Hide resolved
pkg/volume/volume-health.go Show resolved Hide resolved
pkg/volume/volume-health.go Outdated Show resolved Hide resolved
pkg/k8s/k8s.go Outdated Show resolved Hide resolved
}

// ResetConditionIfMatches resets the condition values to default if the type, reason and message matches.
func ResetConditionIfMatches(conditions []metav1.Condition, ctype string, reason, message, newReason string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not needed. Use UpdateCondition()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is required because we need to reset the condition to default only if all the condition value matches. Unlike UpdateCondition() where only condition type is checked

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make any difference to use UpdateCondition(). Only difference is it appends the type if not found.

pkg/csi/utils/utils.go Outdated Show resolved Hide resolved
For private registries, please note that the following image is required for enabling volume health monitoring

```
quay.io/minio/csi-external-health-monitor-controller:v0.10.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@Praveenrajmani Praveenrajmani Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume health monitoring is not enabled by default, so, for a fresh installation we don't need this image. Users need this image only when they enable volume health monitoring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users make common mistakes to miss pulling this image when enabling it. There is no problem to have unused image of csi-external-health-monitor-controller. When health monitoring is enabled, it works seamlessly than failures.

@@ -79,3 +79,21 @@ scrape_configs:
action: replace
target_label: kubernetes_name
```

# Volume health monitoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have the doc like

# Monitoring volume health
DirectPV monitors volume health and reports abnormalities as event to PVCs and Pods. Currently volume mounts are monitored and more checks will be added later. This feature uses [Volume health monitoring CSI feature](https://kubernetes.io/docs/concepts/storage/volume-health-monitoring/).

To enable volume health monitoring, pass `--enable-volume-health-monitoring` flag to installation and `CSIVolumeHealth` feature gate must be enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also added the following because this image is required only when volume health monitor is enabled.

For private registries, please note that the following image is required for enabling volume health monitoring

quay.io/minio/csi-external-health-monitor-controller:v0.10.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement volume health monitor feature
2 participants