Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't return an error on unmap while rebuilding #1008

Merged

Conversation

ejweber
Copy link
Contributor

@ejweber ejweber commented Feb 6, 2024

Longhorn 7103

Which issue(s) this PR fixes:

longhorn/longhorn#7103

What this PR does / why we need it:

Don't return an error to tgt/liblonghorn when UNMAP is called during a rebuild. We are not required to do so, and doing so causes harvester/harvester#4739.

Longhorn 7103

Signed-off-by: Eric Weber <eric.weber@suse.com>
@ejweber ejweber requested a review from a team as a code owner February 6, 2024 18:09
@ejweber
Copy link
Contributor Author

ejweber commented Feb 6, 2024

To test:

Follow steps 1-8 in the reproduce: longhorn/longhorn#7103 (comment). Then:

  1. Once the replica is rebuilding (snapshots have purged and there is noticeable rebuilding progress), execute fstrim from the container again. Supposedly many GiB are trimmed. (This is not really the case.)
root@test-pod:/# fstrim -va
/data: 4.7 GiB (5093191680 bytes) trimmed on /dev/longhorn/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e
  1. Check the kernel ring buffer for IO errors. There are none.
root@test-pod:/# dmesg --ctime | tail -n 200
# Nothing interesting here.
  1. Check the instance-manager logs for associated warnings.
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 96301056 bytes at offset 37916672 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 73728 bytes at offset 138424320 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 2011136 bytes at offset 140595200 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 4182016 bytes at offset 406859776 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 4182016 bytes at offset 675295232 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 4182016 bytes at offset 943730688 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 4182016 bytes at offset 1212166144 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 100532224 bytes at offset 2181169152 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 2281701376 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 2415919104 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 2550136832 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 2684354560 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 2818572288 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 2952790016 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 3087007744 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 3221225472 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 130011136 bytes at offset 3359649792 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 3489660928 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 130011136 bytes at offset 3628085248 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 3758096384 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 3892314112 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 4026531840 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 4160749568 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 100532224 bytes at offset 4328652800 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 67108864 bytes at offset 4496293888 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 4563402752 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 4697620480 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 4831838208 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 4966055936 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5100273664 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5234491392 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5368709120 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5502926848 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5637144576 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5771362304 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 5905580032 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 6039797760 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 6174015488 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 6308233216 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 100532224 bytes at offset 6476136448 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 130011136 bytes at offset 6580875264 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 6710886400 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 6845104128 while rebuilding is in progress"
[pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 134217728 bytes at offset 6979321856 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 92274688 bytes at offset 7155482624 while rebuilding is in progress"
time="2024-02-06T18:11:16Z" level=warning msg="Failed to unmap" func="controller.(*Controller).UnmapAt" file="control.go:1066" error="cannot unmap 100532224 bytes at offset 8623620096 while rebuilding is in progress"
  1. Notice that the actual volume size has not changed because the trim had no effect.
  2. Attempt the trim again. This time, supposedly no GiB are trimmed.
  3. Delete and recreate the test pod (to remount the volume). Do not delete and recreate the PVC/PV.
  4. Attempt the trim again. Again, supposedly many GiB are trimmed. This time, notice that the actual volume size HAS reduced considerably. The volume had to be remounted before trim could be effective again.
root@test-pod:/# fstrim -va
/data: 4.7 GiB (5093191680 bytes) trimmed on /dev/longhorn/pvc-f087066f-b32d-4a60-a8e7-8b8cb13e1e26

@ejweber
Copy link
Contributor Author

ejweber commented Feb 6, 2024

Copy link
Contributor

@shuo-wu shuo-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM

// This is a legitimate error (which is handled by tgt/liblonghorn at a higher level). Since tgt/liblonghorn
// does not actually print the error, print it here.
err := fmt.Errorf("EOF: unmap of %v bytes at offset %v is beyond volume size %v", length, off, c.size)
logrus.WithError(err).Error("Failed to unmap")
c.RUnlock()
return 0, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: return 0, io.ErrUnexpectedEOF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm likely fine with this, but can you help clarify the goal here, as I am new to the conventions in this code. If we return io.ErrUnexpectedEOF here, it will end up in:

msg.Type = TypeResponse
if err == io.EOF {
msg.Type = TypeEOF
msg.Data = msg.Data[:count]
msg.Size = uint32(len(msg.Data))
} else if err != nil {
msg.Type = TypeError
msg.Data = []byte(err.Error())
msg.Size = uint32(len(msg.Data))
}
s.responses <- msg

We will just treat it like any other error and push it back to liblonghorn with msg.Type == TypeError instead of msg.Type == TypeEOF. Is that the intention, or do we need to make additional changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

longhorn-engine/pkg/dataconn is the data transfer service between engine and replicas. The results are handled by engine like this.
While the error in longhorn-engine/pkg/controller/control will be returned to the frontend iscsi layer.

This is a NIT suggestion anyway. I am fine with the existing implementation.

Comment on lines +1065 to +1066
err := fmt.Errorf("cannot unmap %v bytes at offset %v while rebuilding is in progress", length, off)
logrus.WithError(err).Warn("Failed to unmap")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: logrus.Warnf()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do in a followup. I was mostly just going for consistency with the other cases.

@innobead innobead merged commit f1f4ce3 into longhorn:master Feb 7, 2024
5 checks passed
@innobead
Copy link
Member

innobead commented Feb 7, 2024

@mergify backport v1.6.x v1.5.x

@ejweber you can tackle @shuo-wu feedback after if needed.

Copy link

mergify bot commented Feb 7, 2024

backport v1.6.x v1.5.x

✅ Backports have been created

@ejweber
Copy link
Contributor Author

ejweber commented Feb 7, 2024

@mergify backport v1.4.x

Copy link

mergify bot commented Feb 7, 2024

backport v1.4.x

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants