Skip to content

jbruchon/imagepile

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
            imagepile: a disk image block-level de-duplication tool
             (C) 2014-2023 by Jody Bruchon <jody@jodybruchon.com>
            -------------------------------------------------------

Any sufficiently sophisticated computer service department knows the value of
disk imaging. The ability to take a snapshot of a working system has been an
invaluable tool for decades. Unfortunately, disk images take up a lot of disk
space--even if they are compressed. Disk images containing lots of identical
data (i.e. two disk images containing the same operating system version) tend
to result in massive amounts of unnecessarily duplicated data. This is okay
if there are only a few images and if those images are relatively small, but
there are many situations where more than "a few" images may be necessary.
Taking snapshots of multiple computers in a small business that are mostly
identical while serving different purposes and using different software is
one such situation; maintaining "bare" and "full" images that contain just an
OS and the OS plus a standard set of installed programs is another.

'imagepile' solves the problem of massively duplicated data between raw disk
images. When you add an image to the "image pile," this program checks them
block-by-block against all of the blocks previously stored there, including
previous blocks from the same image. If the blocks are identical, no new data
is stored in the pile to store that block. This can result in staggering disk
space savings because the only data that can expand the image pile database
is unique data, and many disk images tend to have large amounts of repeated
blocks throughout.

Image data is stored in three separate files:

* A database "imagepile.ddb" stores all of the raw data blocks,
* A hash index "imagepile.hdb" stores the hashes for data blocks, and
* Various "*.ipil" files store the DB block offsets that make up the image.

'imagepile' operates on blocks that are 4,096 bytes (4 KiB) in size by
default. This number was chosen because Advanced Format hard drives and
most modern filesystems (NTFS and practically every Linux filesystem) are
all oriented around physical and logical blocks of this size or a multiple
of this size. Because of the fact that Windows prior to Windows Vista would
partition drives in such a way as to be compatible with ancient C/H/S drive
geometry standards and those modes of partitioning frequently would cause
an entire drive's first partition to start on a sector boundary indivisible
by 4,096 bytes, imagepile supports the addition of image data with an offset
that will artificially pad the input data to align it correctly.

The hash index is not necessary for the sole purpose of reading image data
out of the image database (though it is mandatory for adding more data).

To build, first get libjodycode and build it beside the imagepile directory
(if you have ./imagepile/ then build in ./libjodycode/) and then build
imagepile this way:

make USE_NEARBY_JC=1 static_jc