Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate sparse file support for VMs #662

Closed
candlerb opened this issue Mar 25, 2024 · 14 comments · Fixed by #773
Closed

Investigate sparse file support for VMs #662

candlerb opened this issue Mar 25, 2024 · 14 comments · Fixed by #773
Assignees
Labels
Bug Confirmed to be a bug Maybe Undecided whether in scope for the project
Milestone

Comments

@candlerb
Copy link
Contributor

Required information

Incus 0.6, Ubuntu 22.04, further details as #658

Issue description

If you do "incus storage volume export", the tarfile it creates contains a non-sparse image file - so when you untar it, much more disk space is allocated than necessary.

Steps to reproduce

Start an incus VM, attach a storage volume to it (e.g. sdb). Here the volume is called "testzfs", and the volume it created is in the "default" storage pool which is of type "dir"

incus launch --vm images:ubuntu/22.04/cloud testvm
incus storage volume create default testzfs size=20GiB --type block
incus config device add testvm "sdb" disk pool=default source=testzfs

Inside the VM, format the storage volume and write some data to it. Here I wrote around 1.6GB in a 20GB image file.

Check disk usage and you can see the image is sparse:

# ls -ls /var/lib/incus/storage-pools/default/custom/default_testzfs/
total 1584392
1584392 -rw------- 1 root root 21474836480 Mar 25 16:09 root.img
# du -sch /var/lib/incus/storage-pools/default/custom/default_testzfs/
1.6G	/var/lib/incus/storage-pools/default/custom/default_testzfs/
1.6G	total

Stop the VM, and export the storage volume (it takes a while, I suspect it's compressing all those zeros)

incus storage volume export default testzfs /var/tmp/default_testzfs.tgz

Examine it and unpack it:

$ cd /var/tmp
$ ls -l default_testzfs.tgz
-rw-rw-r-- 1 nsrc nsrc 1197538517 Mar 25 16:23 default_testzfs.tgz
$ rm -rf backup
$ tar -xvzf default_testzfs.tgz
backup/index.yaml
backup/volume.img

Check the resulting files:

$ ls -ls backup
total 20971548
       4 -rw-r--r-- 1 nsrc nsrc         325 Mar 25 16:20 index.yaml
20971544 -rw------- 1 nsrc nsrc 21474836480 Mar 25 16:20 volume.img
$ du -sch backup
21G	backup
21G	total

However, the tar format (at least GNU tar) does support sparse files. It's possible to repack it efficiently:

$ cd backup
$ dd if=volume.img of=volume.img.new conv=sparse bs=1024k && mv volume.img.new volume.img
20480+0 records in
20480+0 records out
21474836480 bytes (21 GB, 20 GiB) copied, 4.06962 s, 5.3 GB/s
$ ls -ls
total 2286604
      4 -rw-r--r-- 1 nsrc nsrc         325 Mar 25 16:20 index.yaml
2286600 -rw-rw-r-- 1 nsrc nsrc 21474836480 Mar 25 16:26 volume.img
$ cd ..
$ tar --sparse -czf new.tgz backup
$ ls -l default_testzfs.tgz new.tgz
-rw-rw-r-- 1 nsrc nsrc 1197538517 Mar 25 16:23 default_testzfs.tgz
-rw-rw-r-- 1 nsrc nsrc 1178878904 Mar 25 16:30 new.tgz

Test unpacking:

$ mv backup backup.old
$ tar -xvzf new.tgz
backup/
backup/volume.img
backup/index.yaml
$ ls -ls backup
total 2286604
      4 -rw-r--r-- 1 nsrc nsrc         325 Mar 25 16:20 index.yaml
2286600 -rw-rw-r-- 1 nsrc nsrc 21474836480 Mar 25 16:26 volume.img
$ du -sch backup
2.2G	backup
2.2G	total

That was successful, although a bit less sparse than the original (presumably depends on the chunk size tested for contiguous zeros)

# ls -ls /var/lib/incus/storage-pools/default/custom/default_testzfs/root.img /var/tmp/backup/volume.img
1584392 -rw------- 1 root root 21474836480 Mar 25 16:09 /var/lib/incus/storage-pools/default/custom/default_testzfs/root.img
2286600 -rw-rw-r-- 1 nsrc nsrc 21474836480 Mar 25 16:26 /var/tmp/backup/volume.img
# shasum /var/lib/incus/storage-pools/default/custom/default_testzfs/root.img /var/tmp/backup/volume.img
a52a7432f8f21d4ef6239134e318df1390371697  /var/lib/incus/storage-pools/default/custom/default_testzfs/root.img
a52a7432f8f21d4ef6239134e318df1390371697  /var/tmp/backup/volume.img

Discussion

The size of the export tarfile itself isn't affected much, since gzip compresses very well over long runs of zeros.

For most users, it's probably more important whether "incus volume import" also creates sparse files (which depends on how it does the untarring[^1]). But for my use case, I wanted to take the output of "incus volume export" and process it further, to turn it into a VM image for running elsewhere.

An alternative solution would be for "incus volume export" to create qcow2 files inside the tar, instead of raw files. That would be consistent with image tarballs, but presumably have backwards compatibility issues.


[^1] Re-importing these two exports (the original and the sparsified one):

$ incus storage volume import default default_testzfs.tgz test1
$ incus storage volume import default new.tgz test2

Result:

# cd /var/lib/incus/storage-pools/default/custom
# ls -ls default_test1/
total 20971544
20971544 -rw-r--r-- 1 root root 21474836480 Mar 25 16:45 root.img
# ls -ls default_test2/
total 20971552
20971552 -rw-r--r-- 1 root root 21474836480 Mar 25 16:46 root.img

Neither of them is sparse, and therefore restoring a volume this way could use much more disk space than the volume used originally. It ought to be relatively easy to skip long runs of zeros to recreate a sparse file though (and this doesn't matter whether the tarball is sparse or not)

@stgraber stgraber changed the title "incus storage volume export" creates a non-sparse tarfile Investigate sparse file support for VMs Mar 25, 2024
@stgraber
Copy link
Member

Turning this issue into an umbrella issue for sparse file support.

The following should be ideally supported:

  • Export a VM as a sparse raw image inside the tarball
  • Import a VM from a tarball containing a sparse raw image (likely works already)
  • Create sparse image file on dir and btrfs when receiving an image from migration API

I'm marking the issue as maybe and for later because as it stands, there are no native functions in Go to easily wrap an IO writer to generate a sparse file and their tar writer logic which we use to both read and write our tarballs doesn't support sparse file.

@stgraber stgraber added Bug Confirmed to be a bug Maybe Undecided whether in scope for the project labels Mar 25, 2024
@stgraber stgraber added this to the later milestone Mar 25, 2024
@candlerb
Copy link
Contributor Author

I think there is a quick win for snapshots. When I did incus snapshot create assembler preboot for a virtual machine which was using a dir storage pool, I noticed that it spawned the following process:

dd if=/var/lib/incus/storage-pools/default/virtual-machines/nsrc-builder_assembler/root.img of=/var/lib/incus/storage-pools/default/virtual-machines-snapshots/nsrc-builder_assembler/preboot/root.img bs=16M conv=nocreat iflag=direct oflag=direct

and so the snapshot is not sparse:

# ls -ls /var/lib/incus/storage-pools/default/virtual-machines/nsrc-builder_assembler/root.img
2416096 -rw------- 1 root root 64424509440 Mar 29 07:24 /var/lib/incus/storage-pools/default/virtual-machines/nsrc-builder_assembler/root.img
# ls -ls /var/lib/incus/storage-pools/default/virtual-machines-snapshots/nsrc-builder_assembler/preboot/root.img
62914572 -rw------- 1 root root 64424509440 Mar 29 07:29 /var/lib/incus/storage-pools/default/virtual-machines-snapshots/nsrc-builder_assembler/preboot/root.img

However, I think that if you changed conv=nocreate to conv=nocreate,sparse then you should get sparse snapshots with no additional work.

I don't know what happens if you tried that on a filesystem which doesn't support sparse files though. And I'm also not sure why incus doesn't simply spawn cp instead of dd, which I believe would handle an existing sparse file.

stgraber added a commit to stgraber/incus that referenced this issue Mar 29, 2024
Part of lxc#662

Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
stgraber added a commit to stgraber/incus that referenced this issue Mar 29, 2024
Part of lxc#662

Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
@ktran1403
Copy link

Hello, can my group be assigned to this issue? What are some references and starting points?

@stgraber
Copy link
Member

stgraber commented Apr 3, 2024

@ktran1403 I can assign it to you, but note that this issue is labeled as Maybe which means that it's work that may not be doable or may require too much effort/risk for what it brings.

@ktran1403
Copy link

@stgraber We would like to work on this issue. We understand that it's most likely out of our reach, but we want to try. What are the references/topics we should familiarize ourselves with to best deal with this problem?

@stgraber
Copy link
Member

stgraber commented Apr 3, 2024

The most tractable part of this issue would be to rework the dir storage driver (incus/internal/storage/drivers/driver_dir*) to create a sparse file when creating a new VM image.

That'd cover:

  • Create a VM
  • Copy a VM
  • Create a snapshot of an existing VM

That should be possible to do by having some kind of Go io.Writer which will automatically detect holes (byte chunks that are all uninitialized/zero) and make use of Seek/Truncate (when supported by the underlying filesystem) rather than writing those zero bytes.

@TinkeringWithUS
Copy link
Contributor

@stgraber Is our understanding correct? When we create a new VM, we call CreateVolume because images are the same as volume, with some differences in physical hardware. When this gets called, it ensures there's a path to create this volume and then invokes runFiller, which allows for custom operations on a volume during its creation. The idea is also similiar for CreateVolumeFromCopy.

Can you give us some pointers as to the workflow of creating, copying, and exporting images (which functions get called in sequence)? And what is the Fill type in driver_types.go?

@stgraber
Copy link
Member

Okay, so I think we should focus on an easily reproducible case.

Something like:

incus storage create dir dir
incus create images:alpine/edge a1 --vm --storage dir
du -sch /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
incus export a1
incus delete a1
incus import a1.tar.gz
du -sch /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img

Running that here, I see an initial size of 254MB and then it jumps to the full 11GB after going through the export/import dance.

The code path for that is in CreateVolumeFromBackup, which for dir calls genericVFSBackupUnpack, in that part of particular interest should be the part about "Extract block file to block volume". Specifically the io.Copy call in there. That to here is the io.Writer that needs to become capable of producing a sparse file.

Currently it's getting a full stream from the tarball reader, most of which being sequential null bytes and it's just writing those as they are to disk, that's how we end up with a full 11GB when we should only have 254MB. We need to wrap that io.Writer and have the wrapper detect sequences of null bytes and use Truncate and Seek calls to have the same result but without actually writting zeroes to disk.

@stgraber
Copy link
Member

Basically with the instructions above, root.img before and after the export/import should be roughly the same size (unlikely to be 100% identical) and hashing the whole file with sha256sum should result in an identical checksum (ensuring the wrapper didn't accidentally alter the content).

@milaiwi
Copy link
Contributor

milaiwi commented Apr 19, 2024

We wrote the following code as follows:

   29 type SparseFileWrapper struct {
   30     W *os.File
   31 }
   32
   33 func (sfw *SparseFileWrapper) Write(p []byte) (n int, err error) {
   34     originalLength := len(p)
   35     start := 0
   36
   37     for start < len(p) {
   38         end := start
   39         if p[start] == 0 {
   40             for end < len(p) && p[end] == 0 {
   41                 end++
   42             }
   43
   44             if _, err := sfw.W.Seek(int64(end-start), io.SeekCurrent); err != nil {
   45                 return start, err // could not seek
   46             }
   47
   48             start = end
   49         } else {
   50             // Write non-zero bytes
   51             for end < len(p) && p[end] != 0 {
   52                 end++
   53             }
   54             if written, err := sfw.W.Write(p[start:end]); err != nil {
   55                 return start + written, err
   56             } else {
   57                 start = end
   58             }
   59         }
   60     }
   61     return originalLength, nil
   62 }

And replaced io.Copy with:

    801                     customWrapper := &SparseFileWrapper{W: to}
    827                     _, err = io.Copy(customWrapper, tr)

Running the commands given:

On creation:

ubuntu@ubuntu:~/incus$ sudo du -sch /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
254M	/var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
254M	total
ubuntu@ubuntu:~/incus$ sudo  sha256sum /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
3621514d973e5e7f80f7c9302d1348bd2a9789454421ec3f458bac7c0feb9f0c  /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img

On import:

ubuntu@ubuntu:~/incus$ sudo du -sch /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
254M	/var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
254M	total
ubuntu@ubuntu:~/incus$ sudo  sha256sum /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img
3621514d973e5e7f80f7c9302d1348bd2a9789454421ec3f458bac7c0feb9f0c  /var/lib/incus/storage-pools/dir/virtual-machines/a1/root.img

We plan on putting the struct inside internal/server/storage/drivers/driver_dir.go and the function inside internal/server/storage/drivers/driver_dir_volumes.go. Does this approach make sense? Since incus is such a large codebase, is there more tests and ways we can check if our work is correct? So far we've just compared using different images from incus image list:images.

@stgraber
Copy link
Member

Great to see!

The SparseFileWrapper struct and associated Write function can go in internal/server/storage/drivers/utils.go as they will likely be used elsewhere in the future.

That should be its own commit, something like incusd/storage/drivers: Introduce SparseFileWrapper and then have another commit incusd/storage/drivers/vfs: Use SparseFileWrapper on backup import with the change to genericVFSBackupUnpack.

@stgraber
Copy link
Member

So we now have:

  • Create a VM (uses dd in sparse mode)
  • Copy a VM (uses dd in sparse mode)
  • Create a snapshot of an existing VM (uses dd in sparse mode)
  • Import a VM from a backup (new code to create sparse files as we read them)
  • Migrate a VM between pools or remotely (new code to create sparse files as we read them)

@stgraber
Copy link
Member

The only thing remaining would be the ability to generate a sparse tar archive but that's been an ongoing discussion in upstream Go for over 7 years now as it's apparently quite the mess with two different tar implementation supporting it (but not identically) and the remaining ones just not understanding the concept.

We'll definitely be keeping an eye on this space as it's not been idle for 7 years but just been making slow progress. Given the complexity of this, I don't feel comfortable with us working around the Go archive/tar implementation to add this feature ourselves and would rather we wait for upstream to find a safe way to handle this.

Some pointers:

@milaiwi
Copy link
Contributor

milaiwi commented Apr 20, 2024

Makes sense, thanks!

@stgraber stgraber modified the milestones: later, incus-6.1 Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug Maybe Undecided whether in scope for the project
Development

Successfully merging a pull request may close this issue.

5 participants