Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to add disk using Longhorn UI node page. #2477

Closed
khushboo-rancher opened this issue Apr 13, 2021 · 6 comments
Closed

[BUG] Unable to add disk using Longhorn UI node page. #2477

khushboo-rancher opened this issue Apr 13, 2021 · 6 comments
Assignees
Labels
kind/bug priority/1 Highly recommended to fix in this release (managed by PO) severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Milestone

Comments

@khushboo-rancher
Copy link
Contributor

khushboo-rancher commented Apr 13, 2021

Describe the bug
Unable to add additional disk using Longhorn UI node page for a node.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Longhorn-rc1 on a k8s cluster of aws nodes.
  2. Create an ebs volume in aws and attach it to a node.
  3. Mount the additional disk on a directory.
  4. Add the disk to a node using Longhorn UI and write some data into it.
  5. Detach the ebs volume and try to add it to another node.
  6. The disk appears as unschedulable and it is unable to detect the storage on the node.

Note:
If I add same disk to a setup with Longhorn v1.1.0, it is getting added successfully.

Screen Shot 2021-04-14 at 12 32 10 AM

Expected behavior
The additional disk should be added

Log
Response of nodes api call:

"disk-1": {
				"allowScheduling": true,
				"conditions": {
					"Ready": {
						"lastProbeTime": "",
						"lastTransitionTime": "2021-04-13T20:09:38Z",
						"message": "Disk disk-1(/data/) on node ip- is not ready: disk has same file system ID 251796fd78924e70 as other disks [default-disk-251796fd78924e70 disk-1]",
						"reason": "DiskFilesystemChanged",
						"status": "False",
						"type": "Ready"
					},
					"Schedulable": {
						"lastProbeTime": "",
						"lastTransitionTime": "2021-04-13T20:09:38Z",
						"message": "the disk disk-1(/data/) on the node ip-1is not ready",
						"reason": "DiskNotReady",
						"status": "False",
						"type": "Schedulable"
					}
				},
				"diskUUID": "",
				"evictionRequested": false,
				"path": "/data/",
				"scheduledReplica": {},
				"storageAvailable": 0,
				"storageMaximum": 0,
				"storageReserved": 0,
				"storageScheduled": 0,
				"tags": null
			}
		}

Environment:

  • Longhorn version: Longhorn -rc1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: Rke - k8s v1.20.5
@khushboo-rancher khushboo-rancher added kind/bug severity/1 Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade) kind/regression Regression which has worked before labels Apr 13, 2021
@khushboo-rancher khushboo-rancher added this to the v1.1.1 milestone Apr 13, 2021
@yasker yasker added the priority/0 Must be fixed in this release (managed by PO) label Apr 13, 2021
@khushboo-rancher
Copy link
Contributor Author

Updated the steps to reproduce. This is not a direct disk addition case but when disk is moved from one node to another node.

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Apr 14, 2021

@khushboo-rancher
I couldn't reproduce the problem following your provided reproduce steps. I was able to move the an EBS disk from node-1 to node-2 and move it back to node-1 without any issue. I also verified the data of replica on the EBS disk after each moving.

Regrading to error that you provided:

"message": "Disk disk-1(/data/) on node ip- is not ready: disk has same file system ID 251796fd78924e70 as other disks [default-disk-251796fd78924e70 disk-1]",

I was able to produce it by mounting the same physical disk to 2 different mount points data-1 and data-2. Then, create 2 new disks in Longhorn UI pointing to those mount points. In this case, the 2 active Longhorn disks have the same physical disk (same filesystem ID) so they fight each other. I think the behavior is ok since the users shouldn't add disks like this. Did you do the same as this? Or you took different steps?

@khushboo-rancher
Copy link
Contributor Author

khushboo-rancher commented Apr 15, 2021

@PhanLe1010 The updated exact test steps are as below:

  1. Created a cluster with aws instances 3 workers.
  2. Deployed Longhorn.
  3. Created some volumes.
  4. Terminated one instance. That instance become down in Longhorn UI.
  5. Moved the root ebs of the downed instance to a new instance and added the instance in the cluster.
  6. Tried adding the moved disk for the new instance in the Longhorn UI.

note: Same behavior with v1.1.0

@khushboo-rancher khushboo-rancher removed the kind/regression Regression which has worked before label Apr 15, 2021
@khushboo-rancher
Copy link
Contributor Author

khushboo-rancher commented Apr 15, 2021

This happens only if root disk is moved from one node to another with same template. In that case the filesystem ID are same. Lowering the severity for now.

@khushboo-rancher khushboo-rancher added priority/1 Highly recommended to fix in this release (managed by PO) severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact) and removed priority/0 Must be fixed in this release (managed by PO) severity/1 Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade) labels Apr 15, 2021
@PhanLe1010
Copy link
Contributor

This problem happens when users try to add a new disk that have the same filesystem ID as the existing disk on the node.

The VMs created by cloud provider (AWS, DigitalOcean, Linode) using the same template have the same filesystem ID for the root disk. Therefore, user cannot move root disk to different VM with the current implementation of Longhorn.

Since it is uncommon for user to move a root disk to another node as a data disk. We don’t see this is an immediate concern for now

@PhanLe1010
Copy link
Contributor

Note:
we’re using filesystem ID to detect duplicate mounts of the same filesystem. But filesystem IDs can be duplicated in the same node, we might need to use some other mechanism e.g. major/minor number in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug priority/1 Highly recommended to fix in this release (managed by PO) severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Projects
None yet
Development

No branches or pull requests

4 participants