Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make Squashfs filesystem creation reproducible
Ever since Mksquashfs was parallelised back in 2006, there has been a certain randomness in how fragments and multi-block files are ordered in the output filesystem even if the input remains the same. This is because the multiple parallel threads can be scheduled differently between Mksquashfs runs. For example, the thread given fragment 10 to compress may finish before the thread given fragment 9 to compress on one run (writing fragment 10 to the output filesystem before fragment 9), but, on the next run it could be vice-versa. There are many different scheduling scenarios here, all of which can have a knock on effect causing different scheduling and ordering later in the filesystem too. Mkquashfs doesn't care about the ordering of fragments and multi-block files within the filesystem, as this does not affect the correctness of the filesystem. In fact not caring about the ordering, as it doesn't matter, allows Mksquashfs to run as fast as possible, maximising CPU and I/O performance. But, in the last couple of years, Squashfs has become used in scenarios (cloud etc) where this randomness is causing problems. Specifically this appears to be where downloaders, installers etc. try to work out the differences between Squashfs filesystem updates to minimise the amount of data that needs to transferred to update an image. There are two changes which need to be made to Mksquashfs to eliminate this random ordering, and to make Mksquashfs generate reproducible filesystems, that are the same on multiple runs: 1. When starting to output a "multi-block file" Mksquashfs needs to ensure no fragments are written interleaved between the file blocks. This is obviously because the filesystem layout doesn't allow that to happen. There are two solutions to prevent this interleaving by the parallel fragment output threads. 1.1 The first is to "lock" the fragment threads so that they can not write fragments while a "multi-block file" is being output. During this time the fragment threads will continue compressing, but will queue the fragments for later writing. 1.2 The second solution is when a "muti-block file" is to output, Mksquashfs waits for all current "in-flight" fragments to be compressed and written to disk first. Initially Mksquashfs used the second solution, but, switched over to the first solution, as it doesn't produce a fragment compression stall. The first solution generates output randomness, because it is entirely dependent on scheduling how many outstanding fragments have been written before the fragment threads get "locked" to output a "multi-block file". The second solution always produces the same ordering (the total amount of fragments produced at that point is always the same). But, this is at the potential cost of a pipeline stall (you need to wait). To make the output reproducible, Mksquashfs needs to switch to the second original solution. 2. The second change relates to the behaviour of the multiple parallel fragment compressor threads. It is entirely dependent on scheduling which thread compresses a fragment block first, and outputs it to the filesystem, and that produces random ordering in the output. The solution here is to add a sequence number, and use a "sequenced queue". The sequenced queue outputs fragments in sequence order, rather than the order in which the fragments were queued. This makes the output reproducible. This commit adds the necessary code changes. Subsequent commits will add various configuration options and Mksquashfs options to control how Mksquashfs behaves. It should also be clear that the necessary changes to make Mksquashfs reproducible are fairly minimal, if it is done correctly. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
- Loading branch information