Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Executing fstrim while rebuilding causes IO errors #7103

Closed
ejweber opened this issue Nov 14, 2023 · 11 comments
Closed

[BUG] Executing fstrim while rebuilding causes IO errors #7103

ejweber opened this issue Nov 14, 2023 · 11 comments
Assignees
Labels
area/stability System or volume stability area/volume-trim Volume trim related backport/1.4.5 backport/1.5.4 backport/1.6.1 kind/bug priority/0 Must be fixed in this release (managed by PO) reproduce/always 100% reproducible require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Milestone

Comments

@ejweber
Copy link
Contributor

ejweber commented Nov 14, 2023

Describe the bug (🐛 if you encounter this issue)

Executing fstrim from a priveleged container that mounts a Longhorn volume results in IO errors in the kernel ring buffer if the volume is rebuilding. This would almost certainly also happen if fstrim were executed outside the container.

To Reproduce

  1. Deploy Longhorn on a three node cluster.
  2. Deploy the below manifest, which runs a privileged Ubuntu 22.04 container that mounts a Longhorn volume to /data.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: default
spec:
  restartPolicy: Always
  containers:
  - name: test-container
    image: ubuntu:22.04
    securityContext:
      privileged: true
    args: 
      - sleep
      - infinity
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
          - ls
          - /data/lost+found
      initialDelaySeconds: 5
      periodSeconds: 5
    volumeMounts:
    - name: data
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: test-pvc
  1. Write a reasonably 5GB of data to a file on the volume using dd twice. (Doing it twice may not be necessary, but it results in a bunch of blocks that can be later trimmed.)
eweber@laptop:~/issue-files/SURE-6935> k exec -it test-pod -- bash
root@test-pod:/# dd if=/dev/urandom iflag=count_bytes count=5G bs=1M of=/data/large
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 47.4062 s, 113 MB/s
root@test-pod:/# dd if=/dev/urandom iflag=count_bytes count=5G bs=1M of=/data/large
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 58.9104 s, 91.1 MB/s
  1. Verify that the volume's actual size is approximately 10GB as expected.
eweber@laptop:~/longhorn> kl get volume -oyaml pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9 | grep actual
  actualSize: 10978639872
  1. Verify that fstrim executed from the container can clean up approximately 5GB as expected.
root@test-pod:/# fstrim -va
/data: 4.3 GiB (4569669632 bytes) trimmed on /dev/longhorn/pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9
  1. Write 5GB again to get the volume's actual size back to approximately 10GB.
root@test-pod:/# dd if=/dev/urandom iflag=count_bytes count=5G bs=1M of=/data/large
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 57.7038 s, 93.0 MB/s
root@test-pod:/# fstrim -va
/data: 0 B (0 bytes) trimmed on /dev/longhorn/pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9
  1. Verify that the volume's actual size is approximately 10GB.
eweber@laptop:~/longhorn> kl get volume -oyaml pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9 | grep actual
  actualSize: 10978639872
  1. Delete any replica.
  2. Once the replica is rebuilding (snapshots have purged and there is noticeable rebuilding progress), execute fstrim from the container again. No blocks are deleted.
root@test-pod:/# fstrim -va
/data: 0 B (0 bytes) trimmed on /dev/longhorn/pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9
  1. Check the kernel ring buffer for IO errors. Verify that they occurred during the fstrim operation.
root@test-pod:/# dmesg --ctime | tail -n 200
...
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#83 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#83 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#83 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#83 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 270360 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#84 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#84 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#84 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#84 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 794648 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#71 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#71 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#71 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#71 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 1318936 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#72 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#72 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#72 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#72 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 1843224 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#73 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#73 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#73 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#73 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 2326528 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#76 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#76 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#76 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#76 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 2367512 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#85 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#85 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#85 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#85 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 2621440 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#86 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#86 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#86 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#86 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 2883584 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#64 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#64 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#64 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#64 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 3145728 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#65 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#65 Sense Key : Hardware Error [current] 
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#65 Add. Sense: Internal target failure
[Tue Nov 14 21:46:42 2023] sd 3:0:0:1: [sdc] tag#65 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Tue Nov 14 21:46:42 2023] blk_update_request: critical target error, dev sdc, sector 3407872 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
  1. Check the instance-manager logs for associated errors.
eweber@laptop:~/longhorn> kl logs instance-manager-de0a44c527b5c06489d7a009124e4fd3
...
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-e-0] time="2023-11-14T21:46:42Z" level=info msg="Finished to snapshot: 10.42.86.152:10000 5cea841f-f0b0-44a3-913e-228f38c36c06 UserCreated false Created at 2023-11-14T21:46:42Z, Labels map[]"
time="2023-11-14T21:46:42Z" level=info msg="Adding backend: tcp://10.42.86.152:10000"
time="2023-11-14T21:46:42Z" level=info msg="Start monitoring tcp://10.42.86.152:10000"
[longhorn-instance-manager] time="2023-11-14T21:46:42Z" level=info msg="Using replica tcp://10.42.27.34:10000 as the source for rebuild"
[longhorn-instance-manager] time="2023-11-14T21:46:42Z" level=info msg="Using replica tcp://10.42.86.152:10000 as the target for rebuild"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-e-0] time="2023-11-14T21:46:42Z" level=info msg="Set revision counter of 10.42.86.152:10000 to : 0"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-e-0] time="2023-11-14T21:46:42Z" level=info msg="Set backend tcp://10.42.86.152:10000 revision counter to 0"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-e-0] time="2023-11-14T21:46:42Z" level=info msg="Synchronizing volume-head-002.img.meta to volume-head-001.img.meta:10.42.86.152:10003"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-r-418301ed] time="2023-11-14T21:46:42Z" level=info msg="Syncing file volume-head-002.img.meta to 10.42.86.152:10003"
time="2023-11-14T21:46:42Z" level=info msg="Syncing file volume-head-002.img.meta to 10.42.86.152:10003: size 178, directIO false, fastSync false"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-r-418301ed] time="2023-11-14T21:46:42Z" level=info msg="Finished sync for the source file volume-head-002.img.meta, size 178, elapsed 0.02s"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-r-418301ed] time="2023-11-14T21:46:42Z" level=info msg="Done syncing file volume-head-002.img.meta to 10.42.86.152:10003"
[pvc-c244bf53-4319-4f77-82cc-e7cf28b31cc9-e-0] time="2023-11-14T21:46:42Z" level=info msg="Done synchronizing volume-head-002.img.meta to volume-head-001.img.meta:10.42.86.152:10003"
response_process: Receive error for response 3 of seq 19169
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:138424320 length:57344
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19170
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:406859776 length:4182016
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19171
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:675295232 length:4182016
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19172
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:943730688 length:4182016
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19173
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1191182336 length:16777216
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19174
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1212166144 length:130011136
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19175
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1342177280 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19176
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1476395008 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19177
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1610612736 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19178
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1744830464 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19179
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:1879048192 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19180
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2013265920 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19181
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2281701376 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19182
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2415919104 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19183
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2550136832 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19184
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2684354560 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19185
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2818572288 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19186
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:2952790016 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19187
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3087007744 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19188
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3221225472 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19189
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3359649792 length:130011136
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19190
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3489660928 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19191
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3628085248 length:130011136
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19192
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3758096384 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19193
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:3892314112 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19194
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4026531840 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19195
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4160749568 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19196
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4496293888 length:67108864
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19197
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4563402752 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19198
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4697620480 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19199
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4831838208 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19200
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:4966055936 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19201
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5100273664 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19202
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5234491392 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19203
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5368709120 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19204
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5502926848 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19205
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5637144576 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19206
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5771362304 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19207
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:5905580032 length:134217728
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success
response_process: Receive error for response 3 of seq 19208
tgtd: bs_longhorn_request(183) Failed to punch hole for UNMAP at offset:6039797760 length:75497472
tgtd: bs_longhorn_request(210) io error 0x1ca3d10 42 0 16 0, Success

Expected behavior

I'm not 100% sure. Maybe there is no way Longhorn can respond differently in this situation? However, Harvester VMs running on top of a Longhorn volume when this occurs pause until the rebuilding finishes (any IO error causes the VM to pause, and every time it unpauses, it experiences the same IO error again), so we need to investigate whether better behavior is possible.

@longhorn/dev-data-plane, is this behavior expected?

Environment

  • Longhorn version: v1.4.3 and master
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.25.12+rke2r1
    • Number of management node in the cluster: 1
    • Number of worker node in the cluster: 3
  • Node config
    • OS type and version: Ubuntu 22.04.2 LTS
    • Kernel version: 5.15.0-88-generic
    • Disk type(e.g. SSD/NVMe/HDD): SSD
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): DigitalOcean
  • Number of Longhorn volumes in the cluster: 1
@ejweber ejweber added kind/bug require/qa-review-coverage Require QA to review coverage require/backport Require backport. Only used when the specific versions to backport have not been definied. reproduce/always 100% reproducible labels Nov 14, 2023
@innobead innobead added this to the v1.6.0 milestone Nov 15, 2023
@innobead innobead added priority/0 Must be fixed in this release (managed by PO) area/volume-trim Volume trim related labels Nov 15, 2023
@ejweber
Copy link
Contributor Author

ejweber commented Nov 17, 2023

After digging deeper with the supported customer who originally ran into this issue, we found that their VM had the root filesystem mounted with the discard option (in /etc/fstab). This caused them to hit the issue virtually every time Longhorn replicated. Most users will only hit the issue periodically, as many Linux distributions enable weekly fstrim. For Ubuntu, it generally happens on Mondays in the early morning.

@ejweber
Copy link
Contributor Author

ejweber commented Jan 23, 2024

I recall from a previous discussion with @shuo-wu that we return an I/O error when we receive an UNMAP while rebuilding as a precautionary measure. We probably don't need to (since the UNMAP only affects the volume head, which is RW for all replicas during a rebuild). Testing whether this is the case is probably the best path forward for this issue.

However, from reading some discussions online and the SCSI spec, it occurs to me that we can probably legally ignore UNMAP commands if we want to (instead of returning an error). Doing so may limit the effectiveness of our trim support (e.g. a file system may choose not to send an UNMAP again until reboot for a particular block if it has already done so), but it would be much better than causing VMs to pause.

Page 28 of Information technology - SCSI Block Commands – 4 (SBC-4)
https://standards.incits.org/apps/group_public/download.php/124286/livelink

4.7.3.4.2 Processing unmap requests

Application clients use unmap commands (see 4.2.2) to request that LBAs be unmapped. For each LBA that
is requested to be unmapped, the device server shall:
    
    a) perform an unmap operation (see 4.7.3.4.3) on the LBA; or
    b) make no change to the logical block provisioning state of the LBA.

Page 232 of SCSI Commands Reference Manual
https://www.seagate.com/files/staticfiles/support/docs/manual/Interface%20manuals/100293068j.pdf

3.54.2 UNMAP parameter list

The UNMAP parameter list (see table 205) contains the data sent by an application client along with an UNMAP command.
Included in the data are an UNMAP parameter list header and block descriptors for LBA extents to be processed by the device
server for the UNMAP command. The LBAs specified in the block descriptors may contain overlapping extents, and may be in
any order.

For each specified LBA:

    a) a mapped LBA should be unmapped (see SBC-4), or may remain mapped; and
    b) an unmapped LBA shall remain unmapped.

@shuo-wu
Copy link
Contributor

shuo-wu commented Jan 24, 2024

We probably don't need to (since the UNMAP only affects the volume head, which is RW for all replicas during a rebuild). Testing whether this is the case is probably the best path forward for this issue.

Agree. And based on the description of SCSI commands, it seems that Longhorn should do nothing rather than return an error when the unmap command cannot be processed.

@ejweber
Copy link
Contributor Author

ejweber commented Jan 30, 2024

Testing with not returning an error first.

This minor change in longhorn-engine is promising.
longhorn/longhorn-engine@master...ejweber:7103-return-success-when-cannot-unmap
fstrim while rebuilding does not actually unmap blocks and returns with no error. There is also no error in instance-manager or in the kernel ring buffer. After rebuilding is complete, ext4 remembers which blocks it already trimmed and refuses to try to trim them again, so no space can be reclaimed.

# During rebuilding.
root@test-pod:/# fstrim -va
/data: 4.7 GiB (5093191680 bytes) trimmed on /dev/longhorn/pvc-bc6cc450-72ad-46b2-b2c2-d7fb5d0733ae

# After rebuilding.
root@test-pod:/# fstrim -va
/data: 0 B (0 bytes) trimmed on /dev/longhorn/pvc-bc6cc450-72ad-46b2-b2c2-d7fb5d0733ae

However, after remounting the file system, fstrim issues unmaps again, and space is freed.

eweber@laptop:~/issue-files/7103> kubectl delete pod test-pod
pod "test-pod" deleted

eweber@laptop:~/issue-files/7103> k apply -f test.yaml 
persistentvolumeclaim/test-pvc unchanged
pod/test-pod created

eweber@laptop:~/issue-files/7103> k exec -it test-pod -- bash

root@test-pod:/# fstrim -va
/data: 4.7 GiB (5093191680 bytes) trimmed on /dev/longhorn/pvc-bc6cc450-72ad-46b2-b2c2-d7fb5d0733ae

# Actual size reduces from 9.xx GiB to 5.22 GiB.

@shuo-wu
Copy link
Contributor

shuo-wu commented Jan 31, 2024

There is also no error in instance-manager or in the kernel ring buffer. After rebuilding is complete, ext4 remembers which blocks it already trimmed and refuses to try to trim them again, so no space can be reclaimed.

IIRC, we encountered this issue before but I am not sure if that's the reason why we return an error when the trim cannot be handled.

@ejweber
Copy link
Contributor Author

ejweber commented Feb 5, 2024

@shuo-wu, can you help to clarify the following point?

I was working off the assumption that UNMAP only affected the volume head. However, after digging deeper, I am understanding that it affects the volume head AND all removed snapshots.

If I understand correctly, these means allowing an UNMAP during a rebuild would be dangerous.

  • First the volume-head is synchronized. All replicas (including the rebuilding one) have the same contents in volume-head, so it is safe to punch holes in volume-head while a rebuild is ongoing.
  • Then, all other differencing disks are synchronized. Some of these differencing disks may be removed snapshots. RW replicas will have holes punched in all removed disks.
    • Maybe holes will be punched at the same time a replica is sending one of these removed disks.
    • Maybe holes will be punched after one of these removed disks is synchronized (in which case I THINK the holed won't be punched in the rebuilding replica's copy?).

So I am thinking we were correct to disallow UNMAP during rebuild in our initial implementation, and we should not attempt to change that behavior now.

I'm fairly new to working with this aspect of Longhorn. Is there something I'm missing?

@ejweber
Copy link
Contributor Author

ejweber commented Feb 5, 2024

I tried #7103 (comment) with a longhorn-engine that allows UNMAP during rebuilding in two scenarios (basic steps in #7103 (comment)):

Scenario 1 - remove-snapshots-during-filesystem-trim == false

Nothing particularly interesting happened. The snapshot just behind volume-head is NOT marked as removed during the rebuild, so UNMAP only affects volume-head. In my case, volume-head was empty (because I was not writing during the rebuild), so no blocks were unmapped.

Scenario 2 - remove-snapshots-during-filesystem-trim == true

When I issued the fstrim command, the snapshot behind volume head was marked as removed. The engine's instance manager logged the following, because the rebuilding replica did not yet have the removed snapshot to trim from:

[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:03:43Z" level=info msg="Running ssync server for file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img at port 10004" func="rpc.(*SyncAgentServer).launchReceiver.func1" file="server.go:407"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] time="2024-02-05T16:03:43Z" level=info msg="Creating Ssync service" func=rest.Server file="server.go:79"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:03:43Z" level=info msg="file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img does not exist" func=rest.Server file="server.go:92"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-094ccdf7] time="2024-02-05T16:03:43Z" level=info msg="Syncing file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img to 10.42.1.220:10004" func="rpc.(*SyncAgentServer).FileSend" file="server.go:343"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] time="2024-02-05T16:03:43Z" level=info msg="Syncing file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img to 10.42.1.220:10004: size 10737418240, directIO true, fastSync true" func=sparse.SyncFile file="client.go:110"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] time="2024-02-05T16:03:43Z" level=warning msg="Failed to get change time and checksum of local file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img" func=sparse.SyncContent file="client.go:149" error="failed to open checksum file: open volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img.checksum: no such file or directory"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:03:43Z" level=info msg="Receiving volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img: size 10737418240, directIO true" func="rest.(*SyncServer).doOpen" file="handlers.go:103"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:03:43Z" level=info msg="Ssync server is opened and ready for receiving file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img" func="rest.(*SyncServer).open" file="handlers.go:84"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:04:12Z" level=info msg="Reloading the revision counter before processing the first write, the current revision cache is 0, the latest revision counter in file is 0" func="replica.(*Replica).increaseRevisionCounter" file="revision_counter.go:133"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:12Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 4096" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:12Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 117440512" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:12Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 125829120" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 125829120" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 125829120" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 125829120" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 125829120" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-e-0] time="2024-02-05T16:04:13Z" level=warning msg="One of the MultiUnmapper get different size 0 from others 134217728" func="controller.(*MultiUnmapperAt).UnmapAt.func1" file="multi_unmapper_at.go:60"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:05:42Z" level=info msg="Closing Ssync server" func="rest.(*SyncServer).close" file="handlers.go:165"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-094ccdf7] time="2024-02-05T16:05:42Z" level=info msg="Finished sync for the source file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img, size 10737418240, elapsed 119.35s" func=sparse.SyncContent file="client.go:171"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214] time="2024-02-05T16:05:42Z" level=info msg="Done running ssync server for file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img at port 10004" func="rpc.(*SyncAgentServer).launchReceiver.func1" file="server.go:412"
[instance-manager-082ed51c98f6227cd7ad985a3a912171] time="2024-02-05T16:05:42Z" level=info msg="Running ssync server for file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img.meta at port 10005" func="rpc.(*SyncAgentServer).launchReceiver.func1" file="server.go:407"
[instance-manager-413dddc5cbb0881de748c6d4eb6b5553] [pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-094ccdf7] time="2024-02-05T16:05:42Z" level=info msg="Done syncing file volume-snap-13d2d691-14ef-4956-b944-fcb663380592.img to 10.42.1.220:10004" func="rpc.(*SyncAgentServer).FileSend" file="server.go:347"

Once the rebuild was complete, I checked the files on replica files on disk for a mismatch.

eweber@laptop:~/liblonghorn> kl get replica -oyaml | grep -ie creationTimestamp -e directory -e " name:"
    creationTimestamp: "2024-02-05T15:42:47Z"
    name: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-094ccdf7
      name: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e
    dataDirectoryName: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4
    creationTimestamp: "2024-02-05T15:42:47Z"
    name: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-12f62800
      name: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e
    dataDirectoryName: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b
    creationTimestamp: "2024-02-05T16:03:31Z"
    name: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-r-4ccb4214
      name: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e
    dataDirectoryName: pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c

eweber@laptop:~/liblonghorn> kl exec -it instance-manager-082ed51c98f6227cd7ad985a3a912171 -- bash -c "du -h /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-*/*"
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c/revision.counter
0       /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c/volume-head-001.img
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c/volume-head-001.img.meta
9.5G    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c/volume-snap-7ca3f397-904c-4925-a30c-291a324e38a5.img
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c/volume-snap-7ca3f397-904c-4925-a30c-291a324e38a5.img.meta
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-cfcd557c/volume.meta
eweber@laptop:~/liblonghorn> kl exec -it instance-manager-413dddc5cbb0881de748c6d4eb6b5553 -- bash -c "du -h /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-*/*"
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4/revision.counter
0       /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4/volume-head-003.img
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4/volume-head-003.img.meta
5.3G    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4/volume-snap-7ca3f397-904c-4925-a30c-291a324e38a5.img
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4/volume-snap-7ca3f397-904c-4925-a30c-291a324e38a5.img.meta
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-6bb570a4/volume.meta
eweber@laptop:~/liblonghorn> kl exec -it instance-manager-699da83c0e9d22726e667344227e096b -- bash -c "du -h /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-*/*"
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b/revision.counter
0       /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b/volume-head-003.img
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b/volume-head-003.img.meta
5.3G    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b/volume-snap-7ca3f397-904c-4925-a30c-291a324e38a5.img
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b/volume-snap-7ca3f397-904c-4925-a30c-291a324e38a5.img.meta
4.0K    /host/var/lib/longhorn/replicas/pvc-bae950ef-b641-4fcc-82c2-243348ff0b2e-ab058c6b/volume.meta

The snapshot file for the newly rebuilt replica uses much more disk space than the others (which were successfully trimmed). As far as I can tell so far, this is not actually problematic from a data integrity perspective, but I still don't think we should allow replicas to become out-of-sync like this.

@shuo-wu
Copy link
Contributor

shuo-wu commented Feb 6, 2024

Nice Catch! Setting remove-snapshots-during-filesystem-trim does make allowing an UNMAP during a rebuild a dangerous operation. And I agree that let's keep our initial implementation.

@ejweber
Copy link
Contributor Author

ejweber commented Feb 6, 2024

Given this discussion, I think there are now two goals for this issue:

  • Modify longhorn-engine code so we do NOT return an error when an UNMAP is issued on rebuild we we DO refuse to actually do the UNMAP. This is legal according to the SCSI standard and will prevent really annoying issues in Harvester.
  • Update our documentation to make the effects of this change clear. There is already mention that the user may need to remount the filesystem to make a trim effective in some situations. This is another situation in which that might be the case.

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Feb 6, 2024

Pre Ready-For-Testing Checklist

@innobead innobead added the area/stability System or volume stability label Feb 21, 2024
@yangchiu
Copy link
Member

yangchiu commented Mar 1, 2024

Verified passed on master-head (longhorn-engine 5bd31c0) following the test steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/stability System or volume stability area/volume-trim Volume trim related backport/1.4.5 backport/1.5.4 backport/1.6.1 kind/bug priority/0 Must be fixed in this release (managed by PO) reproduce/always 100% reproducible require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Projects
Status: Closed
Development

No branches or pull requests

5 participants