Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods in terminating loop on creation #20

Closed
Rid opened this issue Aug 31, 2022 · 3 comments
Closed

Pods in terminating loop on creation #20

Rid opened this issue Aug 31, 2022 · 3 comments

Comments

@Rid
Copy link

Rid commented Aug 31, 2022

We created a bunch of new pvcs at the same time (when setting the debug setting for piraeusdatastore/linstor-csi#172) and now the ha-controller is continuously evicting them due to lost quorum.

It seems that none of the nodes has been assigned to the volume on the linstor side as the primary:

┊ pvc-b5a0c859-161b-498d-ac4f-00de2311a912 ┊ dedi1-node1.23-106-60-155.lon-01.uk ┊ 7008 ┊ Unused ┊       ┊    Unknown ┊ 2022-08-31 14:39:52 ┊
┊ pvc-b5a0c859-161b-498d-ac4f-00de2311a912 ┊ vm6-cplane1.23-106-61-231.lon-01.uk ┊ 7008 ┊ Unused ┊       ┊    Unknown ┊                     ┊
┊ pvc-b5a0c859-161b-498d-ac4f-00de2311a912 ┊ vm9-node2.23-106-61-193.lon-01.uk   ┊ 7008 ┊ Unused ┊       ┊    Unknown ┊ 2022-08-31 14:41:02 ┊

This is causing the ha-controller to continuously delete the pods when they are re-created:

Events:
  Type     Reason               Age   From                                           Message
  ----     ------               ----  ----                                           -------
  Warning  VolumeWithoutQuorum  70s   linstor.linbit.com/HighAvailabilityController  Pod was evicted because attached volume lost quorum
  Warning  VolumeWithoutQuorum  60s   linstor.linbit.com/HighAvailabilityController  Pod was evicted because attached volume lost quorum
  Warning  FailedScheduling     79s   default-scheduler                              0/4 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) had untolerated taint {drbd.linbit.com/lost-quorum: }, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Normal   Scheduled            74s   default-scheduler                              Successfully assigned team-100/supabase-data-nfs-server-provisioner-0 to dedi1-node1.23-106-60-155.lon-01.uk
  Warning  FailedAttachVolume   70s   attachdetach-controller                        AttachVolume.Attach failed for volume "pvc-b5a0c859-161b-498d-ac4f-00de2311a912" : volume attachment is being deleted

Could it be the same timeout affecting us with piraeusdatastore/linstor-csi#172 is causing none of the nodes to become primary?

@Rid
Copy link
Author

Rid commented Aug 31, 2022

Here's the relevant logs from the controller with debugging enabled

time="2022-08-31T14:39:15Z" level=debug msg="method called" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.Run.func1 file="/src/pkg/driver/driver.go:1214" linstorCSIComponent=driver method=/csi.v1.Controller/ListVolumes nodeID=vm6-cplane1.23-106-61-231.lon-01.uk provisioner=linstor.csi.linbit.com req= resp="entries:<volume:<capacity_bytes:11811160064 volume_id:\"pvc-064664ee-c952-4542-afe1-a1ca3a858a8c\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" published_node_ids:\"vm6-cplane1.23-106-61-231.lon-01.uk\" published_node_ids:\"vm9-node2.23-106-61-193.lon-01.uk\" volume_condition:<message:\"Volume healthy\" > > > entries:<volume:<capacity_bytes:10737418240 volume_id:\"pvc-35476d46-50db-4182-83a5-d5f33023b82e\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" volume_condition:<message:\"Volume healthy\" > > > entries:<volume:<capacity_bytes:209715200 volume_id:\"pvc-6a920591-ce6b-4cfe-8f98-244b64fb06ba\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk\" > > > entries:<volume:<capacity_bytes:209715200 volume_id:\"pvc-888f82a5-374e-4c13-a1fb-284af7c838a5\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk\" > > > entries:<volume:<capacity_bytes:2147483648 volume_id:\"pvc-9a7a5829-a04f-4836-a0e4-736048d475d6\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk\" > > > entries:<volume:<capacity_bytes:2147483648 volume_id:\"pvc-9d5d6695-0903-4708-acd4-09cdb328fc7b\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" published_node_ids:\"vm6-cplane1.23-106-61-231.lon-01.uk\" published_node_ids:\"vm9-node2.23-106-61-193.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): vm6-cplane1.23-106-61-231.lon-01.uk,vm9-node2.23-106-61-193.lon-01.uk\" > > > entries:<volume:<capacity_bytes:22548578304 volume_id:\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): dedi1-node1.23-106-60-155.lon-01.uk\" > > > entries:<volume:<capacity_bytes:2147483648 volume_id:\"pvc-e2f209a6-0fad-45ea-bd64-b22e3c972939\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" published_node_ids:\"vm6-cplane1.23-106-61-231.lon-01.uk\" published_node_ids:\"vm9-node2.23-106-61-193.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): vm6-cplane1.23-106-61-231.lon-01.uk,vm9-node2.23-106-61-193.lon-01.uk\" > > > entries:<volume:<capacity_bytes:11811160064 volume_id:\"pvc-f06656cd-7e8a-47f4-a93b-af2154e86fbf\" > status:<published_node_ids:\"dedi1-node1.23-106-60-155.lon-01.uk\" published_node_ids:\"vm6-cplane1.23-106-61-231.lon-01.uk\" published_node_ids:\"vm9-node2.23-106-61-193.lon-01.uk\" volume_condition:<abnormal:true message:\"Resource with issues on node(s): vm6-cplane1.23-106-61-231.lon-01.uk,vm9-node2.23-106-61-193.lon-01.uk\" > > > " version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d
time="2022-08-31T14:39:59Z" level=debug msg="{\"resource\":{\"name\":\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\",\"node_name\":\"vm9-node2.23-106-61-193.lon-01.uk\",\"layer_object\":{\"drbd\":{\"drbd_resource_definition\":{}},\"luks\":{},\"storage\":{},\"nvme\":{},\"openflex\":{\"openflex_resource_definition\":{}},\"writecache\":{},\"cache\":{},\"bcache\":{}}},\"layer_list\":[\"DRBD\",\"STORAGE\"]}\n"
time="2022-08-31T14:39:59Z" level=debug msg="curl -X 'POST' -d '{\"resource\":{\"name\":\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\",\"node_name\":\"vm9-node2.23-106-61-193.lon-01.uk\",\"layer_object\":{\"drbd\":{\"drbd_resource_definition\":{}},\"luks\":{},\"storage\":{},\"nvme\":{},\"openflex\":{\"openflex_resource_definition\":{}},\"writecache\":{},\"cache\":{},\"bcache\":{}}},\"layer_list\":[\"DRBD\",\"STORAGE\"]}\n' -H 'Accept: application/json' -H 'Content-Type: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-b5a0c859-161b-498d-ac4f-00de2311a912/resources/vm9-node2.23-106-61-193.lon-01.uk'"
time="2022-08-31T14:40:00Z" level=debug msg="reconcile volume placement failed: context canceled" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Create" file="/src/pkg/client/linstor.go:314" linstorCSIComponent=client volume="&{ID:pvc-b5a0c859-161b-498d-ac4f-00de2311a912 SizeBytes:22548578304 ResourceGroup:team-100-persistent-replicated FsType:xfs Properties:map[Aux/csi-provisioning-completed-by:linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d] UseQuorum:false}"
time="2022-08-31T14:40:00Z" level=info msg="deleting volume" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Delete" file="/src/pkg/client/linstor.go:337" linstorCSIComponent=client volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:00Z" level=error msg="failed to clean up volume" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.failpathDelete file="/src/pkg/driver/driver.go:1384" error="context canceled" linstorCSIComponent=driver nodeID=vm6-cplane1.23-106-61-231.lon-01.uk provisioner=linstor.csi.linbit.com version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:00Z" level=error msg="method failed" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.Run.func1 file="/src/pkg/driver/driver.go:1220" error="rpc error: code = Internal desc = CreateVolume failed for pvc-b5a0c859-161b-498d-ac4f-00de2311a912: context canceled" linstorCSIComponent=driver method=/csi.v1.Controller/CreateVolume nodeID=vm6-cplane1.23-106-61-231.lon-01.uk provisioner=linstor.csi.linbit.com req="name:\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\" capacity_range:<required_bytes:22548578304 > volume_capabilities:<mount:<fs_type:\"xfs\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > parameters:<key:\"csi.storage.k8s.io/pv/name\" value:\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\" > parameters:<key:\"csi.storage.k8s.io/pvc/name\" value:\"data-supabase-data-nfs-server-provisioner-0\" > parameters:<key:\"csi.storage.k8s.io/pvc/namespace\" value:\"team-100\" > parameters:<key:\"nodeList\" value:\"dedi1-node1.23-106-60-155.lon-01.uk vm9-node2.23-106-61-193.lon-01.uk\" > parameters:<key:\"placementPolicy\" value:\"Manual\" > parameters:<key:\"property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict\" value:\"retry-connect\" > parameters:<key:\"property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible\" value:\"suspend-io\" > parameters:<key:\"property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated\" value:\"force-secondary\" > parameters:<key:\"property.linstor.csi.linbit.com/DrbdOptions/auto-quorum\" value:\"suspend-io\" > parameters:<key:\"resourceGroup\" value:\"team-100-persistent-replicated\" > parameters:<key:\"storagePool\" value:\"team-100\" > accessibility_requirements:<requisite:<segments:<key:\"beta.kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"beta.kubernetes.io/os\" value:\"linux\" > segments:<key:\"kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"kubernetes.io/hostname\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"kubernetes.io/os\" value:\"linux\" > segments:<key:\"linbit.com/hostname\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"linbit.com/sp-DfltDisklessStorPool\" value:\"true\" > segments:<key:\"linbit.com/sp-team-100\" value:\"true\" > segments:<key:\"openebs.io/nodeid\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"openebs.io/nodename\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"registered-by\" value:\"piraeus-operator\" > > requisite:<segments:<key:\"beta.kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"beta.kubernetes.io/os\" value:\"linux\" > segments:<key:\"kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"kubernetes.io/hostname\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"kubernetes.io/os\" value:\"linux\" > segments:<key:\"linbit.com/hostname\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"linbit.com/sp-DfltDisklessStorPool\" value:\"true\" > segments:<key:\"linbit.com/sp-team-100\" value:\"true\" > segments:<key:\"openebs.io/nodeid\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"openebs.io/nodename\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"registered-by\" value:\"piraeus-operator\" > > preferred:<segments:<key:\"beta.kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"beta.kubernetes.io/os\" value:\"linux\" > segments:<key:\"kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"kubernetes.io/hostname\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"kubernetes.io/os\" value:\"linux\" > segments:<key:\"linbit.com/hostname\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"linbit.com/sp-DfltDisklessStorPool\" value:\"true\" > segments:<key:\"linbit.com/sp-team-100\" value:\"true\" > segments:<key:\"openebs.io/nodeid\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"openebs.io/nodename\" value:\"vm9-node2.23-106-61-193.lon-01.uk\" > segments:<key:\"registered-by\" value:\"piraeus-operator\" > > preferred:<segments:<key:\"beta.kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"beta.kubernetes.io/os\" value:\"linux\" > segments:<key:\"kubernetes.io/arch\" value:\"amd64\" > segments:<key:\"kubernetes.io/hostname\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"kubernetes.io/os\" value:\"linux\" > segments:<key:\"linbit.com/hostname\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"linbit.com/sp-DfltDisklessStorPool\" value:\"true\" > segments:<key:\"linbit.com/sp-team-100\" value:\"true\" > segments:<key:\"openebs.io/nodeid\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"openebs.io/nodename\" value:\"dedi1-node1.23-106-60-155.lon-01.uk\" > segments:<key:\"registered-by\" value:\"piraeus-operator\" > > > " resp="<nil>" version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d
time="2022-08-31T14:40:01Z" level=info msg="determined volume id for volume named 'pvc-b5a0c859-161b-498d-ac4f-00de2311a912'" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.CreateVolume file="/src/pkg/driver/driver.go:512" linstorCSIComponent=driver nodeID=vm6-cplane1.23-106-61-231.lon-01.uk provisioner=linstor.csi.linbit.com version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="looking up resource by CSI volume id" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).FindByID" file="/src/pkg/client/linstor.go:239" csiVolumeID=pvc-b5a0c859-161b-498d-ac4f-00de2311a912 linstorCSIComponent=client
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-b5a0c859-161b-498d-ac4f-00de2311a912'"
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-b5a0c859-161b-498d-ac4f-00de2311a912/volume-definitions'"
time="2022-08-31T14:40:01Z" level=debug msg="creating new volume" func=github.com/piraeusdatastore/linstor-csi/pkg/driver.Driver.createNewVolume file="/src/pkg/driver/driver.go:1250" linstorCSIComponent=driver nodeID=vm6-cplane1.23-106-61-231.lon-01.uk provisioner=linstor.csi.linbit.com size=22548578304 version=v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="reconcile resource group from storage class" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Create" file="/src/pkg/client/linstor.go:287" linstorCSIComponent=client volume="&{ID:pvc-b5a0c859-161b-498d-ac4f-00de2311a912 SizeBytes:22548578304 ResourceGroup:team-100-persistent-replicated FsType:xfs Properties:map[Aux/csi-provisioning-completed-by:linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d] UseQuorum:false}"
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-groups/team-100-persistent-replicated'"
time="2022-08-31T14:40:01Z" level=debug msg="{\"select_filter\":{}}\n"
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'PUT' -d '{\"select_filter\":{}}\n' -H 'Accept: application/json' -H 'Content-Type: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-groups/team-100-persistent-replicated'"
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-groups/team-100-persistent-replicated'"
time="2022-08-31T14:40:01Z" level=debug msg="reconcile resource definition for volume" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Create" file="/src/pkg/client/linstor.go:294" linstorCSIComponent=client volume="&{ID:pvc-b5a0c859-161b-498d-ac4f-00de2311a912 SizeBytes:22548578304 ResourceGroup:team-100-persistent-replicated FsType:xfs Properties:map[Aux/csi-provisioning-completed-by:linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d] UseQuorum:false}"
time="2022-08-31T14:40:01Z" level=info msg="reconcile resource definition for volume" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).reconcileResourceDefinition" file="/src/pkg/client/linstor.go:1204" linstorCSIComponent=client volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="check if resource definition already exists" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).reconcileResourceDefinition" file="/src/pkg/client/linstor.go:1206" linstorCSIComponent=client volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-b5a0c859-161b-498d-ac4f-00de2311a912'"
time="2022-08-31T14:40:01Z" level=debug msg="reconcile volume definition for volume" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Create" file="/src/pkg/client/linstor.go:302" linstorCSIComponent=client volume="&{ID:pvc-b5a0c859-161b-498d-ac4f-00de2311a912 SizeBytes:22548578304 ResourceGroup:team-100-persistent-replicated FsType:xfs Properties:map[Aux/csi-provisioning-completed-by:linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d] UseQuorum:false}"
time="2022-08-31T14:40:01Z" level=info msg="reconcile volume definition for volume" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).reconcileVolumeDefinition" file="/src/pkg/client/linstor.go:1253" linstorCSIComponent=client volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="check if volume definition already exists" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).reconcileVolumeDefinition" file="/src/pkg/client/linstor.go:1257" linstorCSIComponent=client volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'GET' -H 'Accept: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-b5a0c859-161b-498d-ac4f-00de2311a912/volume-definitions/0'"
time="2022-08-31T14:40:01Z" level=debug msg="reconcile volume placement" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Create" file="/src/pkg/client/linstor.go:310" linstorCSIComponent=client volume="&{ID:pvc-b5a0c859-161b-498d-ac4f-00de2311a912 SizeBytes:22548578304 ResourceGroup:team-100-persistent-replicated FsType:xfs Properties:map[Aux/csi-provisioning-completed-by:linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d] UseQuorum:false}"
time="2022-08-31T14:40:01Z" level=info msg="reconcile resource placement for volume" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).reconcileResourcePlacement" file="/src/pkg/client/linstor.go:1296" linstorCSIComponent=client volume=pvc-b5a0c859-161b-498d-ac4f-00de2311a912
time="2022-08-31T14:40:01Z" level=debug msg="{\"resource\":{\"name\":\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\",\"node_name\":\"dedi1-node1.23-106-60-155.lon-01.uk\",\"layer_object\":{\"drbd\":{\"drbd_resource_definition\":{}},\"luks\":{},\"storage\":{},\"nvme\":{},\"openflex\":{\"openflex_resource_definition\":{}},\"writecache\":{},\"cache\":{},\"bcache\":{}}},\"layer_list\":[\"DRBD\",\"STORAGE\"]}\n"
time="2022-08-31T14:40:01Z" level=debug msg="curl -X 'POST' -d '{\"resource\":{\"name\":\"pvc-b5a0c859-161b-498d-ac4f-00de2311a912\",\"node_name\":\"dedi1-node1.23-106-60-155.lon-01.uk\",\"layer_object\":{\"drbd\":{\"drbd_resource_definition\":{}},\"luks\":{},\"storage\":{},\"nvme\":{},\"openflex\":{\"openflex_resource_definition\":{}},\"writecache\":{},\"cache\":{},\"bcache\":{}}},\"layer_list\":[\"DRBD\",\"STORAGE\"]}\n' -H 'Accept: application/json' -H 'Content-Type: application/json' 'https://piraeus-op-cs.default.svc:3371/v1/resource-definitions/pvc-b5a0c859-161b-498d-ac4f-00de2311a912/resources/dedi1-node1.23-106-60-155.lon-01.uk'

This one specifically looks similar:

time="2022-08-31T14:40:00Z" level=debug msg="reconcile volume placement failed: context canceled" func="github.com/piraeusdatastore/linstor-csi/pkg/client.(*Linstor).Create" file="/src/pkg/client/linstor.go:314" linstorCSIComponent=client volume="&{ID:pvc-b5a0c859-161b-498d-ac4f-00de2311a912 SizeBytes:22548578304 ResourceGroup:team-100-persistent-replicated FsType:xfs Properties:map[Aux/csi-provisioning-completed-by:linstor-csi/v0.20.0-d514e41db7cdcb580769cc69f1c1ef2b8a5def5d] UseQuorum:false}"

@Rid
Copy link
Author

Rid commented Aug 31, 2022

These error reports are coming from the arbitrator node:

ERROR REPORT 630F59B9-8B858-000023

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.19.1
Build ID:                           a758bf07796c374fd2004465b0d8690209b74356
Build time:                         2022-07-27T06:36:54+00:00
Error time:                         2022-08-31 14:40:17
Node:                               vm6-cplane1.23-106-61-231.lon-01.uk

============================================================

Reported error:
===============

Description:
    Failed to adjust DRBD resource pvc-b5a0c859-161b-498d-ac4f-00de2311a912

Category:                           LinStorException
Class name:                         ResourceException
Class canonical name:               com.linbit.linstor.core.devmgr.exceptions.ResourceException
Generated at:                       Method 'adjustDrbd', Source file 'DrbdLayer.java', Line #819

Error message:                      Failed to adjust DRBD resource pvc-b5a0c859-161b-498d-ac4f-00de2311a912

Error context:
    An error occurred while processing resource 'Node: 'vm6-cplane1.23-106-61-231.lon-01.uk', Rsc: 'pvc-b5a0c859-161b-498d-ac4f-00de2311a912''

Call backtrace:

    Method                                   Native Class:Line number
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:819
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:393
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:847
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829

Caused by:
==========

Description:
    Execution of the external command 'drbdadm' failed.
Cause:
    The external command exited with error code 1.
Correction:
    - Check whether the external program is operating properly.
    - Check whether the command line is correct.
      Contact a system administrator or a developer if the command line is no longer valid
      for the installed version of the external program.
Additional information:
    The full command line executed was:
    drbdadm -vvv adjust pvc-b5a0c859-161b-498d-ac4f-00de2311a912

    The external command sent the following output data:
    drbdsetup new-resource pvc-b5a0c859-161b-498d-ac4f-00de2311a912 2 --on-no-data-accessible=suspend-io --on-no-quorum=suspend-io --on-suspended-primary-outdated=force-secondary --quorum=majority
    drbdsetup new-minor pvc-b5a0c859-161b-498d-ac4f-00de2311a912 1008 0 --diskless


    The external command sent the following error information:
    New resource pvc-b5a0c859-161b-498d-ac4f-00de2311a912
    New minor 1008 (vol:0)
    new-minor pvc-b5a0c859-161b-498d-ac4f-00de2311a912 1008 0: sysfs node '/sys/devices/virtual/block/drbd1008' (already? still?) exists
    pvc-b5a0c859-161b-498d-ac4f-00de2311a912: Failure: (161) Minor or volume exists already (delete it first)
    Command 'drbdsetup new-minor pvc-b5a0c859-161b-498d-ac4f-00de2311a912 1008 0 --diskless' terminated with exit code 10


Category:                           LinStorException
Class name:                         ExtCmdFailedException
Class canonical name:               com.linbit.extproc.ExtCmdFailedException
Generated at:                       Method 'execute', Source file 'DrbdAdm.java', Line #593

Error message:                      The external command 'drbdadm' exited with error code 1


Call backtrace:

    Method                                   Native Class:Line number
    execute                                  N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:593
    adjust                                   N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:90
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:741
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:393
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:847
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:359
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:169
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

@Rid
Copy link
Author

Rid commented Aug 31, 2022

It looks like this was due to deleting the piraeus controllers before removing the pvs, so there were still drbd devices on the tiebreaker node as it was not rebooting.

@Rid Rid closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant