Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Mksquashfs: fix duplicate check when last file block is sparse
Bruno Wolf III has reported a bug against the latest release
(https://bugzilla.redhat.com/show_bug.cgi?id=1985561).

This is a regression caused by rewriting the duplicate checking code
to be more aggressive when looking for duplicates when using tail-end
packing.  In particular a file doesn't now have to be the same size
to be considered a (partial) duplicate.

Unfortunately, the code didn't handle the special case where the last
block in a file is sparse.  This previously wasn't an issue because
files considered to a duplicate were always the same size, and so the
special case did not occur.

The reason why the bug occurs when the last block is sparse, is
because you get identical block lists even if the file size is
different.  This is because the tail-end sparse block represents
a length between 1 byte and block size - 1, but is stored as 0 (to
indicate sparsely stored).

To make this concrete, consider two files (128K blocks).

1. File 1 is 128K + 10 bytes in length.

  The last block (10 bytes) is ZERO filled.
  The first block has data, compresses to 32K.

  The block list will hold

  block 0: 32K
  block 1: 0

2. File 2 is 128K + 4096 bytes in length.

  The last block (4096 bytes) is ZERO filled.
  The first block has identical data to file 1, compresses to 32K

  The block list will hold

  block 0: 32K
  block 1: 0

So the block data is identical, *except* for the fact the tail-end sparse
block represents a different amount of data.

The duplicate checking code didn't deal with this special case, and in
the case of file 2, it would return file 1, effectively truncating most
of the trailing ZERO filed data.

The fix is to recognise this special case, and to return a new entry
with the correct file size and the duplicate data blocks and block list.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
  • Loading branch information
plougher committed Jul 25, 2021
1 parent 2399660 commit 19b161c
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions squashfs-tools/mksquashfs.c
Expand Up @@ -2190,10 +2190,23 @@ static struct file_info *duplicate(int *dupf, int *block_dup, long long file_siz
/* Yes, the block list matches. We can use this, rather
* than writing an identical block list.
* If both it and us doesn't have a tail-end fragment, then we're
* finished. Return the duplicate */
* finished. Return the duplicate.
*
* We have to deal with the special case where the
* last block is a sparse block. This means the
* file will have matched, but, it may be a different
* file length (because a tail-end sparse block may be
* anything from 1 byte to block_size - 1 in size, but
* stored as zero). We can still use the block list in
* this case, but, we must return a new entry with the
* correct file size */
if(!frag_bytes && !dupl_ptr->fragment->size) {
*dupf = *block_dup = TRUE;
return dupl_ptr;
if(file_size == dupl_ptr->file_size)
return dupl_ptr;
else
return create_non_dup(file_size, bytes, blocks, sparse, dupl_ptr->block_list,
dupl_ptr->start, dupl_ptr->fragment, checksum, 0, checksum_flag, FALSE);
}

/* We've got a tail-end fragment, and this file most likely
Expand Down

0 comments on commit 19b161c

Please sign in to comment.