Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1529,6 +1529,8 @@ Topics:
File: persistent-storage-hostpath
- Name: Persistent storage using LVM Storage
File: persistent-storage-using-lvms
- Name: Troubleshooting local persistent storage using LVMS
File: troubleshooting-local-persistent-storage-using-lvms
- Name: Using Container Storage Interface (CSI)
Dir: container_storage_interface
Distros: openshift-enterprise,openshift-origin
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// This module is included in the following assemblies:
//
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc

:_content-type: PROCEDURE
[id="investigating-a-pvc-stuck-in-the-pending-state_{context}"]
= Investigating a PVC stuck in the Pending state

A persistent volume claim (PVC) can get stuck in a `Pending` state for a number of reasons. For example:

- Insufficient computing resources
- Network problems
- Mismatched storage class or node selector
- No available volumes
- The node with the persistent volume (PV) is in a `Not Ready` state

Identify the cause by using the `oc describe` command to review details about the stuck PVC.

.Procedure

. Retrieve the list of PVCs by running the following command:
+
[source,terminal]
----
$ oc get pvc
----
+
.Example output
[source,terminal]
----
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
lvms-test Pending lvms-vg1 11s
----

. Inspect the events associated with a PVC stuck in the `Pending` state by running the following command:
+
[source,terminal]
----
$ oc describe pvc <pvc_name> <1>
----
<1> Replace `<pvc_name>` with the name of the PVC. For example, `lvms-vg1`.
+
.Example output
[source,terminal]
----
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 4s (x2 over 17s) persistentvolume-controller storageclass.storage.k8s.io "lvms-vg1" not found
----
97 changes: 97 additions & 0 deletions modules/lvms-troubleshooting-performing-a-forced-cleanup.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// This module is included in the following assemblies:
//
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc

:_content-type: PROCEDURE
[id="performing-a-forced-cleanup_{context}"]
= Performing a forced cleanup

If disk- or node-related problems persist after you complete the troubleshooting procedures, it might be necessary to perform a forced cleanup procedure. A forced cleanup is used to comprehensively address persistent issues and ensure the proper functioning of the LVMS.

.Prerequisites

. All of the persistent volume claims (PVCs) created using the logical volume manager storage (LVMS) driver have been removed.

. The pods using those PVCs have been stopped.


.Procedure

. Switch to the `openshift-storage` namespace by running the following command:
+
[source,terminal]
----
$ oc project openshift-storage
----

. Ensure there is no `Logical Volume` custom resource (CR) remaining by running the following command:
+
[source,terminal]
----
$ oc get logicalvolume
----
+
.Example output
[source,terminal]
----
No resources found
----

.. If there are any `LogicalVolume` CRs remaining, remove their finalizers by running the following command:
+
[source,terminal]
----
$ oc patch logicalvolume <name> -p '{"metadata":{"finalizers":[]}}' --type=merge <1>
----
<1> Replace `<name>` with the name of the CR.

.. After removing their finalizers, delete the CRs by running the following command:
+
[source,terminal]
----
$ oc delete logicalvolume <name> <1>
----
<1> Replace `<name>` with the name of the CR.

. Make sure there are no `LVMVolumeGroup` CRs left by running the following command:
+
[source,terminal]
----
$ oc get lvmvolumegroup
----
+
.Example output
[source,terminal]
----
No resources found
----

.. If there are any `LVMVolumeGroup` CRs left, remove their finalizers by running the following command:
+
[source,terminal]
----
$ oc patch lvmvolumegroup <name> -p '{"metadata":{"finalizers":[]}}' --type=merge <1>
----
<1> Replace `<name>` with the name of the CR.

.. After removing their finalizers, delete the CRs by running the following command:
+
[source,terminal]
----
$ oc delete lvmvolumegroup <name> <1>
----
<1> Replace `<name>` with the name of the CR.

. Remove any `LVMVolumeGroupNodeStatus` CRs by running the following command:
+
[source,terminal]
----
$ oc delete lvmvolumegroupnodestatus --all
----

. Remove the `LVMCluster` CR by running the following command:
+
[source,terminal]
----
$ oc delete lvmcluster --all
----
33 changes: 33 additions & 0 deletions modules/lvms-troubleshooting-recovering-from-disk-failure.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// This module is included in the following assemblies:
//
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc

:_content-type: PROCEDURE
[id="recovering-from-disk-failure_{context}"]
= Recovering from disk failure

If you see a failure message while inspecting the events associated with the persistent volume claim (PVC), there might be a problem with the underlying volume or disk. Disk and volume provisioning issues often result with a generic error first, such as `Failed to provision volume with StorageClass <storage_class_name>`. A second, more specific error message usually follows.

.Procedure

. Inspect the events associated with a PVC by running the following command:
+
[source,terminal]
----
$ oc describe pvc <pvc_name> <1>
----
<1> Replace `<pvc_name>` with the name of the PVC. Here are some examples of disk or volume failure error messages and their causes:
+
- *Failed to check volume existence:* Indicates a problem in verifying whether the volume already exists. Volume verification failure can be caused by network connectivity problems or other failures.
+
- *Failed to bind volume:* Failure to bind a volume can happen if the persistent volume (PV) that is available does not match the requirements of the PVC.
+
- *FailedMount or FailedUnMount:* This error indicates problems when trying to mount the volume to a node or unmount a volume from a node. If the disk has failed, this error might appear when a pod tries to use the PVC.
+
- *Volume is already exclusively attached to one node and can't be attached to another:* This error can appear with storage solutions that do not support `ReadWriteMany` access modes.

. Establish a direct connection to the host where the problem is occurring.

. Resolve the disk issue.

After you have resolved the issue with the disk, you might need to perform the forced cleanup procedure if failure messages persist or reoccur.
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
// This module is included in the following assemblies:
//
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc

:_content-type: PROCEDURE
[id="recovering-from-missing-lvms-or-operator-components_{context}"]
= Recovering from missing LVMS or Operator components

If you encounter a storage class "not found" error, check the `LVMCluster` resource and ensure that all the logical volume manager storage (LVMS) pods are running. You can create an `LVMCluster` resource if it does not exist.

.Procedure

. Verify the presence of the LVMCluster resource by running the following command:
+
[source,terminal]
----
$ oc get lvmcluster -n openshift-storage
----
+
.Example output
[source,terminal]
----
NAME AGE
my-lvmcluster 65m
----

. If the cluster doesn't have an `LVMCluster` resource, create one by running the following command:
+
[source,terminal]
----
$ oc create -n openshift-storage -f <custom_resource> <1>
----
<1> Replace `<custom_resource>` with a custom resource URL or file tailored to your requirements.
+
.Example custom resource
[source,yaml,options="nowrap",role="white-space-pre"]
----
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
name: my-lvmcluster
spec:
storage:
deviceClasses:
- name: vg1
default: true
thinPoolConfig:
name: thin-pool-1
sizePercent: 90
overprovisionRatio: 10
----

. Check that all the pods from LVMS are in the `Running` state in the `openshift-storage` namespace by running the following command:
+
[source,terminal]
----
$ oc get pods -n openshift-storage
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
lvms-operator-7b9fb858cb-6nsml 3/3 Running 0 70m
topolvm-controller-5dd9cf78b5-7wwr2 5/5 Running 0 66m
topolvm-node-dr26h 4/4 Running 0 66m
vg-manager-r6zdv 1/1 Running 0 66m
----
+
The expected output is one running instance of `lvms-operator` and `vg-manager`. One instance of `topolvm-controller` and `topolvm-node` is expected for each node.
+
If `topolvm-node` is stuck in the `Init` state, there is a failure to locate an available disk for LVMS to use. To retrieve the information necessary to troubleshoot, review the logs of the `vg-manager` pod by running the following command:
+
[source,terminal]
----
$ oc logs -l app.kubernetes.io/component=vg-manager -n openshift-storage
----
34 changes: 34 additions & 0 deletions modules/lvms-troubleshooting-recovering-from-node-failure.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// This module is included in the following assemblies:
//
// storage/persistent_storage/persistent_storage_local/troubleshooting-local-persistent-storage-using-lvms.adoc

:_content-type: PROCEDURE
[id="recovering-from-node-failure_{context}"]
= Recovering from node failure

Sometimes a persistent volume claim (PVC) is stuck in a `Pending` state because a particular node in the cluster has failed. To identify the failed node, you can examine the restart count of the `topolvm-node` pod. An increased restart count indicates potential problems with the underlying node, which may require further investigation and troubleshooting.

.Procedure

* Examine the restart count of the `topolvm-node` pod instances by running the following command:
+
[source,terminal]
----
$ oc get pods -n openshift-storage
----
+
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
lvms-operator-7b9fb858cb-6nsml 3/3 Running 0 70m
topolvm-controller-5dd9cf78b5-7wwr2 5/5 Running 0 66m
topolvm-node-dr26h 4/4 Running 0 66m
topolvm-node-54as8 4/4 Running 0 66m
topolvm-node-78fft 4/4 Running 17 (8s ago) 66m
vg-manager-r6zdv 1/1 Running 0 66m
vg-manager-990ut 1/1 Running 0 66m
vg-manager-an118 1/1 Running 0 66m
----
+
After you resolve any issues with the node, you might need to perform the forced cleanup procedure if the PVC is still stuck in a `Pending` state.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
:_content-type: ASSEMBLY
[id="troubleshooting-local-persistent-storage"]
= Troubleshooting local persistent storage using LVMS
include::_attributes/common-attributes.adoc[]
:context: troubleshooting-local-persistent-storage-using-lvms

toc::[]

Because {product-title} does not scope a persistent volume (PV) to a single project, it can be shared across the cluster and claimed by any project using a persistent volume claim (PVC). This can lead to a number of issues that require troubleshooting.

include::modules/lvms-troubleshooting-investigating-a-pvc-stuck-in-the-pending-state.adoc[leveloffset=+1]

include::modules/lvms-troubleshooting-recovering-from-missing-lvms-or-operator-components.adoc[leveloffset=+1]

include::modules/lvms-troubleshooting-recovering-from-node-failure.adoc[leveloffset=+1]

[role="_additional-resources"]
[id="additional-resources-forced-cleanup-1"]
.Additional resources

* xref:troubleshooting-local-persistent-storage-using-lvms.adoc#performing-a-forced-cleanup_troubleshooting-local-persistent-storage-using-lvms[Performing a forced cleanup]

include::modules/lvms-troubleshooting-recovering-from-disk-failure.adoc[leveloffset=+1]

[role="_additional-resources"]
[id="additional-resources-forced-cleanup-2"]
.Additional resources

* xref:troubleshooting-local-persistent-storage-using-lvms.adoc#performing-a-forced-cleanup_troubleshooting-local-persistent-storage-using-lvms[Performing a forced cleanup]

include::modules/lvms-troubleshooting-performing-a-forced-cleanup.adoc[leveloffset=+1]