[BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume #6899

WebberHuang1118 · 2023-10-16T07:54:48Z

Describe the bug (🐛 if you encounter this issue)

The volume encounters a data integrity issue, if it's from a backing image, and such backing image is exported by a volume backed by another backing image.

To Reproduce

Preparing a Harvester v1.2.0-head cluster (for using volume as an OS root disk)
Uploading Ubuntu jammy cloud image (latest version), img1
Creating volume vol1 from img1
Creating a VM with vol1 as the root disk, and writing some data
Issuing sync on filesystem, e.q. $ sudo sync
Stopping VM, and waiting until vol1 is completely detached
Exporting vol1 as img2
Creating vol2 from img2
Comparing data content between vol1 and vol2, the data is inconsistent
Running e2fsck -v on vol2, fsck will report filesystem errors and require data repair

Expected behavior

The data of vol1 and vol2 should be consistent.

Support bundle for troubleshooting

supportbundle_066aa8ed-a12b-4ccc-971c-59404ee44cd4_2023-10-16T07-46-44Z.zip

Environment

Longhorn version: v1.4.3
Harvester version: v1.2.0-head
Number of Longhorn volumes in the cluster: 2
Impacted Longhorn resources:
- Volume names: pvc-b8bd9013-e8e9-449d-9c25-6c6c0824d0b3, pvc-de64dd95-b0c1-4ce9-864a-88d38e06d9d3

Additional context

If vol1 is not backed by backing image, there is no data inconsistency found between vol1 and vol2
A single node Harvester cluster should be enough to reproduce this issue

The text was updated successfully, but these errors were encountered:

innobead · 2023-10-26T11:57:21Z

cc @shuo-wu

WebberHuang1118 · 2023-10-27T06:29:04Z

Some observations with the following steps:

Preparing a Harvester v1.2.0-head single node cluster (for using volume as an OS root disk)
Uploading Ubuntu jammy cloud image (latest version), img1 with replica number is 1
Checking data usage of img1 (mapped to bi default-image-s9rfj), which is 640MB
Creating volume os-vol (mapped to volume pvc-c3b1a434-86ee-453b-afc7-a52fe113c059) from img1 with claim size 10GB
Creating a VM with os-vol as the root disk
Stopping VM, and waiting until os-vol is completely detached
Exporting os-vol as os-vol.img (mapped to bi default-image-j8ghr) with replica number is 1
Checking replica data usage of os-vol, which is 523MB
Checking data usage of os-vol.img, which is 2507MB
- For the data usage, os-vol.img(2507MB) is much greater than the summation of img1(640MB) and os-vol(523MB)
- The os-vol.img's data usage shouldn't be greater than the summation of img1's and os-vol's ?
Creating volume copy-vol from os-vol.img with claim size 10GB
Comparing data content between os-vol and copy-vol with $cmp -l os-vol copy-vol
From the compare result cmp.log, there are several continuous byte ranges overwritten with zero in the synthetic volume, copy-vol. And running fsck on copy-vol, fsck will report filesystem errors and require data repair

Thanks

ChanYiLin · 2024-01-03T03:08:02Z

Note: From Longhorn side, we can't reproduce the issue

Longhorn side reproduce steps

Create a BackingImageA
Create a VolumeA with the BackingImageA
Write some data to the VolumeA
Get the checksum of the VolumeA
Export the BackingImageB from the Volume
Create the VolumeB with BackingImageB
Get the checksum of the VolumeB
VolumeA and VolumeB have the same checksum

innobead · 2024-01-03T10:23:14Z

@WebberHuang1118 Please work with @ChanYiLin to reproduce the issue.

WebberHuang1118 · 2024-01-03T11:31:13Z

For the above reproduce steps #6899 (comment),

After step 6 in #6899 (comment),
since the VM is stopped and the volume is detached, IMHO, this issue could be reduced to "if the exported backing image content is the same as the source volume ?"

As I described in step 9 #6899 (comment),
the original backing image size is 640MB and the volume is 523MB, however, the synthesized backing image's grows to 2507MB (> (640MB + 523MB)), this seems abnormal?

If we create a new volume from the synthesized backing image and compare the new volume's content with the source volume, the content is inconsistent. However, if the source volume is not a backing image backed volume, there is no such data inconsistency.

Thanks

WebberHuang1118 · 2024-01-08T06:38:00Z

I found a method to reproduce a similar case without Harvester-specific operations (no VM operation, only creating PVC from backing image and I/O access)

Detailed steps:

Uploading Ubuntu jammy cloud image (latest version), img1
Creating PVC os-vol from img1 with claim size 10GB and 1 replica number

Applying the following yaml to mount os-vol as block mode in the pod

apiVersion: v1
kind: Pod
metadata:
  name: block-volume-test
  namespace: default
spec:
  containers:
  - name: volume-test
    image: ubuntu
    imagePullPolicy: IfNotPresent
    securityContext:
      privileged: true
    command: ["/bin/sleep"]
    args: ["3600"]
    volumeDevices:
    - devicePath: /dev/os-vol
      name: os-vol
  volumes:
    - name: os-vol
      persistentVolumeClaim:
        claimName: os-vol

Issuing filesystem expand and I/O on os-vol
- Enter pod block-volume-test
- Install fdisk
  - $ apt update; apt install -y fdisk
- Expanding filesystem on os-vol
  - Check device (/dev/sda for exmaple) mapped to os-vol by device number
  - $ fdisk /dev/sda
    - d 1 n 1 default default No w
  - $ resize2fs -fp /dev/sda1
- Mounting /dev/sda1 and writing I/O into it
  - $ mkdir /data; $ mount -t ext4 /dev/sda1 /data
  - $ dd if=/dev/zero of=/data/file bs=1M count=2048
- Flushing data to storage for os-vol
  - $ umount /data
- Delete pod block-volume-test, and waiting until os-vol's LH volume detach
Exporting os-vol as image os-vol.img
Creating PVC copy-vol from os-vol.img with claim size 10GB
Comparing data content between os-vol and copy-vol with $cmp -l os-vol copy-vol
- The data content is different between the two PVCs
Finding filesystem error on copy-vol with $e2fsck -c /dev/sda1 (assume PVC copy-vol mapped to /dev/sda)
- There is no filesystem error on os-vol

PS:

IMHO, it may be related resizefs, since it contains more complicated I/O patterns, including discard to image block ranges.

ChanYiLin · 2024-01-08T10:04:19Z

I can reproduce the issue, will look into how we export BackingImage

cc @shuo-wu

ChanYiLin · 2024-01-09T06:17:39Z

BackingImage Export Mechanism

Volume is composed of a series of snapshots and a BackingImage

Volume = BackingImage -> snap1 -> head

When Exporting Volume to another BackingImage code

create a snapshot
create an in-memory replica by OpenSnapshot() from that snaphot
preload() the block index map (including BackingImage)
- so we know where to find the latest data in each block
use SyncContent() from sparse-tools to sync the replica content to BackingImage Datasource
- we use GetDataLayout() to see if the block is hole or data

Volume Cloning Mechanism

Is totally the same as exporting BackingImage except it won't load BackingImage when doing `preload()

ChanYiLin · 2024-01-09T06:23:54Z

Volume Cloning Test - PASSED

Both Volumes are consistent

Creating BackingImage img1 with Ubuntu jammy cloud image (latest version)
Creating StorageClass

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-test
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "1"
  staleReplicaTimeout: "2880"
  backingImage: "img1"

Creating PVC os-vol from img1 with claim size 5GB and 1 replicas
Mount os-vol as block mode in the pod
Issuing filesystem expand and I/O on os-vol
- fdisk recreate the filesystem partition
- resize2fs resize the filesystem
Cloning os-vol to another volume os-cloned by applying the YAML

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: os-cloned
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  storageClassName: longhorn-test
  resources:
    requests:
      storage: 5Gi
  dataSource:
    name: os-vol
    kind: PersistentVolumeClaim

attach both os-vol and os-cloned to the host and check the checksum are the same

ChanYiLin · 2024-01-09T06:45:07Z

BackingImage Export Test (without resize2fs) - FAILED

Following the same steps mentioned here, but don't apply resize2fs

Create BackingImageA
Create PVC/VolumeA and mount to pod
fdisk recreate the partition, but don't resize2fs
Write some data
Export the BackingImageB and create another VolumeB with it
VolumeA and VolumeB are not consistent

Still inconsistent

ChanYiLin · 2024-01-09T07:18:33Z

BackingImage Export Test (without `fdisk`&`resize2fs`) - PASSED

Following the same steps mentioned here, but don't apply fdisk and resize2fs

Create BackingImageA
Create PVC/VolumeA and mount to pod
don't fdisk recreate the partition and resize2fs to device
Simply mount the devise original Linux Filesystem (/dev/sdb1 in the following example) to the /data and write some data

Disk /dev/sdb: 5 GiB, 5368709120 bytes, 10485760 sectors
Disk model: VIRTUAL-DISK    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A7AB4193-D2B4-4239-8244-0F2ABDF8C44F

Device      Start     End Sectors  Size Type
/dev/sdb1  227328 4612062 4384735  2.1G Linux filesystem
/dev/sdb14   2048   10239    8192    4M BIOS boot
/dev/sdb15  10240  227327  217088  106M EFI System

Export the BackingImageB and create another VolumeB with it
VolumeA and VolumeB are consistent

ChanYiLin · 2024-01-09T07:43:38Z

Consider Volume Clone(with partition recreation) works but BackingImage export(with partition recreation) doesn't.
I am thinking somehow the partition information are not correctly export when exporting with BackingImage

The only difference between Volume Clone and BackingImage export is that when
doing preload() to build replica.volume.Location

replica.volume.location = array of sectorIndex -> diskIndex

Volume Clone won't consider BackingIamge index in this case, so the export won't export BackingImage data
BackingImage Export will consider BackingImage when the sector has no new data, so it will export BackingImage data

cc @derekbit @shuo-wu

shuo-wu · 2024-01-10T04:40:00Z

Volume Clone won't consider BackingIamge index in this case, so the export won't export BackingImage data
BackingImage Export will consider BackingImage when the sector has no new data, so it will export BackingImage data

Sorry...? Do you mean that VolumeExport won't retrieve the backing image fiemap/sector list hence the BackingImage data won't be exported? But why does Volume Cloning Test get passed?

ChanYiLin · 2024-01-10T09:12:19Z

Hi @shuo-wu
When doing VolumeExport here

It will create a replica and preload the replica to build the r.volume.location index map
Both Volume Clone and BackingImage Export utilize the same function

In Volume Clone when doing it, since it don't clone the BackingImage, so when doing preload(), it ignore the BackingImage by init the index map with "0".
In BackingImage Export, it will consider BackingImage, so the preload() will init the index map with "1" which is the index of BackingImage.
ref: https://github.com/longhorn/longhorn-engine/blob/master/pkg/replica/replica.go#L1437

Volume Clone only compressed all the Snapshots into one Snapshot and sync to the target Volume
BackingImage Export compressed all the Snapshots + BackingImage and sync to the target BackingImage

In the test, we mount the BackingImage to the Pod.
We recreate the partition and resize the filesystem and write some date

If we just do Volume Clone, then the new target Volume + Original BackingImage will be consistent
If we do BackingImage Export, then the new empty Volume + New BackingImage will be inconsistent

I am guessing somehow something when wrong when doing preload and syncContent()

ChanYiLin · 2024-01-10T09:20:05Z

And also we found that

mount the BackingImage to the Pod + write some data => Export BackingImage is consistent
mount the BackingImage to the Pod + recreate partition&resizefs + write some data => Export BackingImage is inconsistent and the filesystem is corrupted

So maybe the partition information corrupted when exporting
those information should be in the new snapshot image and overwrites the base BackingImage when exporting

shuo-wu · 2024-01-10T09:30:17Z

Does the volume clone test and the failed backing image test mean that the new partitioning info can be cloned correctly via VolumeClone but cannot be exported via ExportingBackingImage?

ChanYiLin · 2024-01-10T09:34:20Z

Yes, that is what I observed here.

And the only different between these two is preload()

ChanYiLin · 2024-01-10T10:04:05Z

I am thinking if it is because

assuming snapshot chain as follow
0. nil
1. BackingImage [1, 1, 0, 0]
2. Snap1 [1, 0, 0, 1]


Case 1 - Volume Clone: Export without BackingImage
first init to all 0, location = [0, 0, 0, 0]
r.volumes.preload(), location:
i = 0, nil                          skip => location = [0, 0, 0, 0]
i = 1, BackingImage [1, 1, 0, 0],   skip => location = [0, 0, 0, 0]
i = 2, Snap1        [1, 0, 0, 1]         => location = [2, 0, 0, 2]

ExportVolume and GetDataLayout=> [2, 0, 0, 2] which is [Data, Hole, Hole, Data]

Case 2 - BackingImage Export: Export with BackingImage
first init to all 1, location = [1, 1, 1, 1]
r.volumes.preload(), location
i = 0, nil                          skip => location = [1, 1, 1, 1]
i = 1, BackingImage [1, 1, 0, 0],   skip => location = [1, 1, 1, 1]
i = 2, Snap1        [1, 0, 0, 1]         => location = [2, 1, 1, 2]

ExportVolume and GetDataLayout => [2, 1, 1, 2] which is [Data, Data, Data, Data]

When SyncContent() and GetDataLayout(), we actually check if the sector is Data or Hole by checking if the DiskIndex is 0 or non-0
If it is Data, we read the data from the replica of the interval and send to the target
If it is Hole, we simply ask the target to punch Hole

So in the Case 1, we still punch hole for sector 2, 3
But in the Case 2, since we init the index map to all "1", so all the sector is Data, we don't do punch hole. Then there might have some sectors should be Hole because it doesn't have data in BackingImage either, but we fill the sector with 0 instead of Hole?

GetDataLayout()

Note: If we don't recreate the partition, the export BackingImage would be consistent...but why? TestCase

cc @shuo-wu @PhanLe1010

shuo-wu · 2024-01-10T11:01:39Z

The question is, in case 2, location = [2, 1, 1, 2] means [reading data in snap1, reading data in BI, reading data in BI, reading data in snap1 ]. And reading data from BI should get the same result, which is a bunch of zero. In other words, the final data set should be identical...

ChanYiLin · 2024-01-10T11:55:46Z

Yeah so the problem is still
Why the partition info is not exported or is corrupted during exporting BackingImage…

PhanLe1010 · 2024-01-11T00:32:38Z

This example assumes that the logical (virtual) size of the backing image is equal to the logical size of snapshot:

assuming snapshot chain as follow
0. nil
1. BackingImage [1, 1, 0, 0]
2. Snap1 [1, 0, 0, 1]


Case 1 - Volume Clone: Export without BackingImage
first init to all 0, location = [0, 0, 0, 0]
r.volumes.preload(), location:
i = 0, nil                          skip => location = [0, 0, 0, 0]
i = 1, BackingImage [1, 1, 0, 0],   skip => location = [0, 0, 0, 0]
i = 2, Snap1        [1, 0, 0, 1]         => location = [2, 0, 0, 2]

ExportVolume and GetDataLayout=> [2, 0, 0, 2] which is [Data, Hole, Hole, Data]

Case 2 - BackingImage Export: Export with BackingImage
first init to all 1, location = [1, 1, 1, 1]
r.volumes.preload(), location
i = 0, nil                          skip => location = [1, 1, 1, 1]
i = 1, BackingImage [1, 1, 0, 0],   skip => location = [1, 1, 1, 1]
i = 2, Snap1        [1, 0, 0, 1]         => location = [2, 1, 1, 2]

ExportVolume and GetDataLayout => [2, 1, 1, 2] which is [Data, Data, Data, Data]

In reality, they are not:

The ubuntu image has 2.2 GiB logical size:

➜  Downloads qemu-img info jammy-server-cloudimg-amd64.img 
image: jammy-server-cloudimg-amd64.img
file format: qcow2
virtual size: 2.2 GiB (2361393152 bytes)
disk size: 642 MiB
cluster_size: 65536
Format specific information:
    compat: 0.10
    compression type: zlib
    refcount bits: 16

While the Longhorn snapshot has 10GB logical size

Revisiting 2 cases:

mount the BackingImage to the Pod + write some data => Export BackingImage is consistent
-> In this case, all the write fall into the first 2.2GB region of the backing image file
mount the BackingImage to the Pod + recreate partition&resizefs + write some data => Export BackingImage is inconsistent and the filesystem is corrupted
-> In this case, some data might fall outside of the 2.2GB region of the backing image file. So something like:
```
assuming snapshot chain as follow
0. nil
1. BackingImage [1, 1, 0, 0]
2. Snap1        [1, 0, 0, 1, 0, 1, 0]
```
I am wondering if reading data from the qcow2 backing image outside of its virtual size can cause any inconsistency? Could we test with a raw backing image instead to see if we can hit the same issue @ChanYiLin ?

PhanLe1010 · 2024-01-11T02:40:08Z

An other testing idea I am thinking about is:

Create an empty qcow2/raw image file of 1MB
Create a volume with that backing image and the volume size is 16MB
Write some data at the 2MB offset to the volume
Exporting the backing image from the volume
Read and compare the data from the newly exported backing image and the old volume.
1. Which regions of the data are different?
2. Does the newly exported backing image contain all zero? -> If yes, it means the backing image layer overwrite the snapshot layer?

PhanLe1010 · 2024-01-11T04:01:18Z

Thanks for the idea, I will try
But we have confirmed that using qcow2 image,
if we directly mount the filesystem, without recreating the partition, the exported BackingImage will be consistent.
Here is the test case

I think the new test idea is different than the consistent test case because the new test idea does write data to outside of the backing image boundary (similar to when you recreating the partition and resize the filesystem to be bigger than the virtual size of the backing image). WDYT?

ChanYiLin · 2024-01-11T05:46:39Z

Hi @PhanLe1010 I think you are right

Backing Image Export Test (using empty raw image with larger volume size) - FAILED

Create an empty raw image and upload as a BackingImage empty-raw
- $ truncate -s 2M empty-raw
Create a 16MB Volume test-empty with the raw image
Attach to the host
dd 1MB data starts from 3MB offset
- $ dd of=/dev/longhorn/test-empty if=/dev/urandom bs=1M count=1 oflag=seek_bytes seek=3145728
export the Volume to another BackingImage empty-raw-export
Create a 16MB Volume test-empty-export with the BackingImage empty-raw-export
Attach to the host

Test Result

test-empty-export is empty
and the comparison result shows that it dffers from 3MB offset

brw-rw----  1 root root 8, 48 Jan 11 05:35 test-empty
brw-rw----  1 root root 8, 64 Jan 11 05:36 test-empty-export
root@kworker1:/dev/longhorn# cat test-empty-export
root@kworker1:/dev/longhorn# cmp test-empty test-empty-export
test-empty test-empty-export differ: byte 3145729, line 1
root@kworker1:/dev/longhorn#

cc @shuo-wu

ChanYiLin · 2024-01-11T05:49:57Z

This is actually fit the testing results from @WebberHuang1118 testing
The cmp.log he provided also showed that the different bytes started after 2.2GB which is the size of the jammy ubuntu image.
And the differences were also because those values were now "0"

PhanLe1010 · 2024-01-11T06:01:11Z

Awesome. It means somehow the region outside of the backing image boundary (3MB offset to 4MB offset region) in the backing image layer is overwriting the snapshot layer.

Maybe next step would be adding some logs to the preload() function to see if we can identify the faulted logic (why choosing the backing image layer instead of snapshot layer for the region 3MB-4MB)?

ChanYiLin · 2024-01-11T06:06:52Z

Hi @PhanLe1010
I did intentionally print the r.volume.location after the preload and also print the segments sending during the exporting
Here is the results

As you can see the preload location I think is correct, there is a series of 2 which means the data from the snapshot
~~But when doing exporting, it only sent one segment [D 0 4096] and then finished~~

Misleading here, the 4096 here means 4096 Blocks, each Block is 4096Bytes
It is correct, it is sending 16MB in one interval because all blocks are Data

Note: I add the log message here, this function will be executed by multiple goroutine in parallel when sending segment
lhim_log.txt

Note:

VolumeSectorSize = 4096 which is the size of each sector in r.Volume.Location

PhanLe1010 · 2024-01-11T06:24:17Z

Sorry @ChanYiLin. I will have to continue tomorrow.

ChanYiLin · 2024-01-11T06:24:45Z

sure no problem, thanks for your help!
I think we are pretty close to the answer

ChanYiLin · 2024-01-11T09:19:23Z

it seems the preload result and the sending segment intervals are correct
now the problem might happen in reading the data with the given segments?

ChanYiLin · 2024-01-11T10:09:19Z

The Sending Batch Size is 2MB, so for 16MB Volume
It only sends 8 Batch: [0, 2MB], [2MB, 4MB], .....,
I printed the read data out and found out that
Event it was reading [2MB, 4MB] position from the replicas, the data is all 0
However, I also printed out the reading disk index during the ReadAt(), it was reading from the correct disk.
Need to investiage more.

Here is the log lhim2_new2.txt

PhanLe1010 · 2024-01-12T02:37:19Z

Interesting finding @ChanYiLin !

Event it was reading [2MB, 4MB] position from the replicas, the data is all 0

What about when cloning, ReadAt() also returns all 0 in the region [2MB, 4MB] ?

By the way, how do you know that ReadAt() return 0 in the region [2MB,4MB]? From https://github.com/longhorn/longhorn/files/13905510/lhim2_new2.txt I only see the newTarget 2 which indicates the snapshot layer

time="2024-01-11T09:53:54Z" level=info msg="[DEBUG in fullReadAt]: newTarget 2 for sector: 805" func="replica.(*diffDisk).fullReadAt" file="diff_disk.go:195"

ChanYiLin · 2024-01-12T03:09:29Z

@PhanLe1010 you can search [DEBUG in syncDataInterval]: Sending batch: [ 512: 1024](512), dataBuffer: in the file
that is the data sending from the sorce volume the target volume

Since the sending batch size is 2MB and the sector size is 4KB so the [ 512: 1024] means [2MB: 4MB]

ChanYiLin · 2024-01-12T03:15:57Z

Hi @PhanLe1010
Data only exists in [3MB: 4MB]
As you can see from the previous test, the whole [2MB:4MB] data sending is "0"

But Here is the result of using clone
The result of preload location is correct and the when sending the batch, it is sending Sending batch: [ 768: 1024](256) exactly 1MB because [2MB: 3MB] is Hole now, it didn't even try to read from that offest
and the result is correct
lhim1_new3.txt

I can't see where goes wrong...

Note: it also contains syncing the .meta so just look at syncing the image

ChanYiLin · 2024-01-15T11:45:39Z

I think I found the bug
When reading [2MB-4MB] from the Volume

[2MB-3MB] is BackingImage
[3MB-4MB] is snap1

https://github.com/longhorn/longhorn-engine/blob/deb8b18a1558ddf4f359146b289d5469bef7cc6d/pkg/replica/diff_disk.go#L202-L241
Here when we try to read the data

If the target disk is out of bound, it can because of Volume Expansion or BackingImage.
- In this case the preload location indicated that [2MB-3MB] is BackingImage and tried to read from it.
Then we directly returned 0 count because we cannot read any data from BackingImage to the buf in this range

	// May read the expanded part
	if offset >= size {
		return 0, nil
	}

For [3MB-4MB], the target disk was correct, and we read the data from snap1 to the buf and the count would be 1MB
But we cut the return buf with count after reading from the file here. So the return data abandoned the data we read from the snap1 and only leave the first half of "0" data.

func ReadDataInterval(rw ReaderWriterAt, dataInterval Interval) ([]byte, error) {
	data := AllocateAligned(int(dataInterval.Len()))
	n, err := rw.ReadAt(data, dataInterval.Begin)
	if err != nil {
		if err == io.EOF {
			log.Debugf("Have read at the end of file, total read: %d", n)
		} else {
			log.WithError(err).Errorf("Failed to read interval %+v", dataInterval)
			return nil, errors.Wrapf(err, "failed to read interval %+v", dataInterval)
		}
	}
	return data[:n], nil
}

I wonder do we really need to cut it with count?
~~Or is the alignment here correct?~~
Or maybe we should still return the buf length count when reading from expanded part like we are reading a bunch of "0" instead of 0 count.

cc @shuo-wu @PhanLe1010 @WebberHuang1118

ChanYiLin · 2024-01-16T05:08:27Z

TEST (By directly returning `data` instead of `data[:n]`) - PASSED

The Volume are now consistent even testing with the original Test Case

cksum are the same
filesystem is not corrupted

Summary

I think this is the correct fix.
When sending the batch with the interval, we allocate the buf with the batch size (interval size)
then we send the data with the interval information to the target volume to write the data in that interval
we should not cut the data with the count we actually read from volume because the count might be 0 in case of volume expansion or BackingImage
And I don't think in this case the count number matters

shuo-wu · 2024-01-16T10:44:04Z

Agreeing with removing the cut for ReadDataInterval.

On the other hand, I am thinking if we need to improve diffDisk.fullReadAt result asl well.
In the above case, we try to read data [2MB, 4MB], then diffDisk tells us the read count is 1MB. Intuitively users will consider that the read content is at the begin of the buffer then they will discard the latter 1MB. But actually NO...
What we can improve here is: When we read a data interval like [out of bound part of the backing image/smaller snapshot,data, out of bound part of the backing image/smaller snapshot], the returned read length should be len([out of bound part of the backing image/smaller snapshot, data]) rather than len([data])

ChanYiLin · 2024-01-16T13:45:58Z

From my perspective, I think we should still return len([data]) for diffDisk.fullReadAt
The reason is that, from the caller's perspective, it is reading from the "Volume"

func (r *Replica) ReadAt(buf []byte, offset int64) (int, error) {
	r.RLock()
	c, err := r.volume.ReadAt(buf, offset) // which will call diffDisk.fullReadAt()
	r.RUnlock()
	return c, err
}

So it has no knowledge about the snapshot chain underneath.

If it reads outside of the boundary of the "Volume", we do return io.ErrUnexpectedEOF error here
But if it only reads outside of the boundary of "specific snapshot image", and we allow it to do so, then we should consider it as reading a bunch of "0", and that is also true. The caller does get a bunch of "0".
The caller should not take care of the situation.
It might get confused that it doesn't get EOF error but how come the data length is not correct.

shuo-wu · 2024-01-17T03:04:50Z

After discussing with @ChanYiLin, we agree that Longhorn should return the length the caller wants to read if there is no error EOF

longhorn-io-github-bot · 2024-01-18T05:55:14Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:
- [BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume #6899 (comment)
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at
- fix: return the read sectors length when it reads the expansion part longhorn-engine#986
- fix: return the whole data buf sparse-tools#96

chriscchien · 2024-01-18T13:18:55Z

Hi @ChanYiLin ,

Following test steps, this issue verified pass on master-head, but in v1.6.x-head, the two backing image data were still not consistent.

The measter-head engine images build time is later then v1.6.x-head, not sure if this means that v1.6.x engine does not contain your fix, could you help to check? thank you.

> docker images
REPOSITORY                   TAG           IMAGE ID       CREATED       SIZE
longhornio/longhorn-engine   master-head   16d097e2f0e6   4 hours ago   343MB
longhornio/longhorn-engine   v1.6.x-head   1c7202bbd25e   5 hours ago   343MB

innobead · 2024-01-18T13:21:30Z

@chriscchien 1.6 PR was just merged, so please wait for https://drone-publish.longhorn.io/longhorn/longhorn-engine/810 finished.

chriscchien · 2024-01-19T01:22:54Z

Verified pass on longhorn master(longhorn-engine ac4748), longhorn v1.6.x(longhorn-engine 0b553a) with test steps

I could reproduce this issue before the fix, after the fix, the two volumes data are identical.

WebberHuang1118 added kind/bug require/qa-review-coverage Require QA to review coverage require/backport Require backport. Only used when the specific versions to backport have not been definied. labels Oct 16, 2023

ChanYiLin self-assigned this Oct 16, 2023

ChanYiLin added this to the v1.6.0 milestone Oct 23, 2023

innobead added the priority/0 Must be fixed in this release (managed by PO) label Oct 26, 2023

innobead added the area/backing-image Backing image related label Dec 19, 2023

WebberHuang1118 mentioned this issue Dec 21, 2023

[BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume harvester/harvester#4900

Open

ChanYiLin mentioned this issue Jan 16, 2024

fix: return the whole data buf longhorn/sparse-tools#96

Closed

innobead added backport/1.4.5 backport/1.5.4 labels Jan 16, 2024

This was referenced Jan 16, 2024

[BACKPORT][v1.5.4][BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume #7701

Closed

[BACKPORT][v1.4.5][BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume #7702

Open

ChanYiLin mentioned this issue Jan 17, 2024

fix: return the read sectors length when it reads the expansion part longhorn/longhorn-engine#986

Merged

chriscchien self-assigned this Jan 18, 2024

chriscchien closed this as completed Jan 19, 2024

WebberHuang1118 mentioned this issue Mar 28, 2024

[BUG] Harvester v120 cann't start new VM from saved VM template harvester/harvester#4589

Closed

[BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume #6899

[BUG] Backing Image Data Inconsistency if it's Exported from a Backing Image Backed Volume #6899

Comments

WebberHuang1118 commented Oct 16, 2023 • edited

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

Additional context

innobead commented Oct 26, 2023

WebberHuang1118 commented Oct 27, 2023 • edited

ChanYiLin commented Jan 3, 2024 • edited

innobead commented Jan 3, 2024

WebberHuang1118 commented Jan 3, 2024 • edited

WebberHuang1118 commented Jan 8, 2024 • edited

ChanYiLin commented Jan 8, 2024 • edited

ChanYiLin commented Jan 9, 2024 • edited

BackingImage Export Mechanism

Volume Cloning Mechanism

ChanYiLin commented Jan 9, 2024 • edited

Volume Cloning Test - PASSED

ChanYiLin commented Jan 9, 2024 • edited

BackingImage Export Test (without resize2fs) - FAILED

ChanYiLin commented Jan 9, 2024

BackingImage Export Test (without fdisk&resize2fs) - PASSED

ChanYiLin commented Jan 9, 2024 • edited

shuo-wu commented Jan 10, 2024

ChanYiLin commented Jan 10, 2024 • edited

ChanYiLin commented Jan 10, 2024

shuo-wu commented Jan 10, 2024

ChanYiLin commented Jan 10, 2024 • edited

ChanYiLin commented Jan 10, 2024 • edited

shuo-wu commented Jan 10, 2024

ChanYiLin commented Jan 10, 2024

PhanLe1010 commented Jan 11, 2024 • edited

PhanLe1010 commented Jan 11, 2024 • edited

PhanLe1010 commented Jan 11, 2024

ChanYiLin commented Jan 11, 2024 • edited

Backing Image Export Test (using empty raw image with larger volume size) - FAILED

Test Result

ChanYiLin commented Jan 11, 2024

PhanLe1010 commented Jan 11, 2024 • edited

ChanYiLin commented Jan 11, 2024 • edited

PhanLe1010 commented Jan 11, 2024

ChanYiLin commented Jan 11, 2024

ChanYiLin commented Jan 11, 2024

ChanYiLin commented Jan 11, 2024 • edited

PhanLe1010 commented Jan 12, 2024 • edited

ChanYiLin commented Jan 12, 2024

ChanYiLin commented Jan 12, 2024 • edited

ChanYiLin commented Jan 15, 2024 • edited

ChanYiLin commented Jan 16, 2024 • edited

TEST (By directly returning data instead of data[:n]) - PASSED

Summary

shuo-wu commented Jan 16, 2024

ChanYiLin commented Jan 16, 2024 • edited

shuo-wu commented Jan 17, 2024

longhorn-io-github-bot commented Jan 18, 2024 • edited by ChanYiLin

Pre Ready-For-Testing Checklist

chriscchien commented Jan 18, 2024

innobead commented Jan 18, 2024

chriscchien commented Jan 19, 2024

WebberHuang1118 commented Oct 16, 2023 •

edited

WebberHuang1118 commented Oct 27, 2023 •

edited

ChanYiLin commented Jan 3, 2024 •

edited

WebberHuang1118 commented Jan 3, 2024 •

edited

WebberHuang1118 commented Jan 8, 2024 •

edited

ChanYiLin commented Jan 8, 2024 •

edited

ChanYiLin commented Jan 9, 2024 •

edited

ChanYiLin commented Jan 9, 2024 •

edited

ChanYiLin commented Jan 9, 2024 •

edited

BackingImage Export Test (without `fdisk`&`resize2fs`) - PASSED

ChanYiLin commented Jan 9, 2024 •

edited

ChanYiLin commented Jan 10, 2024 •

edited

ChanYiLin commented Jan 10, 2024 •

edited

ChanYiLin commented Jan 10, 2024 •

edited

PhanLe1010 commented Jan 11, 2024 •

edited

PhanLe1010 commented Jan 11, 2024 •

edited

ChanYiLin commented Jan 11, 2024 •

edited

PhanLe1010 commented Jan 11, 2024 •

edited

ChanYiLin commented Jan 11, 2024 •

edited

ChanYiLin commented Jan 11, 2024 •

edited

PhanLe1010 commented Jan 12, 2024 •

edited

ChanYiLin commented Jan 12, 2024 •

edited

ChanYiLin commented Jan 15, 2024 •

edited

ChanYiLin commented Jan 16, 2024 •

edited

TEST (By directly returning `data` instead of `data[:n]`) - PASSED

ChanYiLin commented Jan 16, 2024 •

edited

longhorn-io-github-bot commented Jan 18, 2024 •

edited by ChanYiLin