-
-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate sparse file support for VMs #662
Comments
Turning this issue into an umbrella issue for sparse file support. The following should be ideally supported:
I'm marking the issue as maybe and for later because as it stands, there are no native functions in Go to easily wrap an IO writer to generate a sparse file and their tar writer logic which we use to both read and write our tarballs doesn't support sparse file. |
I think there is a quick win for snapshots. When I did
and so the snapshot is not sparse:
However, I think that if you changed I don't know what happens if you tried that on a filesystem which doesn't support sparse files though. And I'm also not sure why incus doesn't simply spawn |
Part of lxc#662 Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
Part of lxc#662 Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
Hello, can my group be assigned to this issue? What are some references and starting points? |
@ktran1403 I can assign it to you, but note that this issue is labeled as Maybe which means that it's work that may not be doable or may require too much effort/risk for what it brings. |
@stgraber We would like to work on this issue. We understand that it's most likely out of our reach, but we want to try. What are the references/topics we should familiarize ourselves with to best deal with this problem? |
The most tractable part of this issue would be to rework the That'd cover:
That should be possible to do by having some kind of Go |
@stgraber Is our understanding correct? When we create a new VM, we call Can you give us some pointers as to the workflow of creating, copying, and exporting images (which functions get called in sequence)? And what is the Fill type in driver_types.go? |
Okay, so I think we should focus on an easily reproducible case. Something like:
Running that here, I see an initial size of The code path for that is in Currently it's getting a full stream from the tarball reader, most of which being sequential null bytes and it's just writing those as they are to disk, that's how we end up with a full |
Basically with the instructions above, |
We wrote the following code as follows:
And replaced io.Copy with:
Running the commands given: On creation:
On import:
We plan on putting the struct inside |
Great to see! The That should be its own commit, something like |
So we now have:
|
The only thing remaining would be the ability to generate a sparse tar archive but that's been an ongoing discussion in upstream Go for over 7 years now as it's apparently quite the mess with two different tar implementation supporting it (but not identically) and the remaining ones just not understanding the concept. We'll definitely be keeping an eye on this space as it's not been idle for 7 years but just been making slow progress. Given the complexity of this, I don't feel comfortable with us working around the Go archive/tar implementation to add this feature ourselves and would rather we wait for upstream to find a safe way to handle this. Some pointers: |
Makes sense, thanks! |
Required information
Incus 0.6, Ubuntu 22.04, further details as #658
Issue description
If you do "incus storage volume export", the tarfile it creates contains a non-sparse image file - so when you untar it, much more disk space is allocated than necessary.
Steps to reproduce
Start an incus VM, attach a storage volume to it (e.g. sdb). Here the volume is called "testzfs", and the volume it created is in the "default" storage pool which is of type "dir"
Inside the VM, format the storage volume and write some data to it. Here I wrote around 1.6GB in a 20GB image file.
Check disk usage and you can see the image is sparse:
Stop the VM, and export the storage volume (it takes a while, I suspect it's compressing all those zeros)
Examine it and unpack it:
Check the resulting files:
However, the tar format (at least GNU tar) does support sparse files. It's possible to repack it efficiently:
Test unpacking:
That was successful, although a bit less sparse than the original (presumably depends on the chunk size tested for contiguous zeros)
Discussion
The size of the export tarfile itself isn't affected much, since gzip compresses very well over long runs of zeros.
For most users, it's probably more important whether "incus volume import" also creates sparse files (which depends on how it does the untarring[^1]). But for my use case, I wanted to take the output of "incus volume export" and process it further, to turn it into a VM image for running elsewhere.
An alternative solution would be for "incus volume export" to create qcow2 files inside the tar, instead of raw files. That would be consistent with image tarballs, but presumably have backwards compatibility issues.
[^1] Re-importing these two exports (the original and the sparsified one):
Result:
Neither of them is sparse, and therefore restoring a volume this way could use much more disk space than the volume used originally. It ought to be relatively easy to skip long runs of zeros to recreate a sparse file though (and this doesn't matter whether the tarball is sparse or not)
The text was updated successfully, but these errors were encountered: