From 9101cf70cd76d12645dad3787352b59e79ae6dae Mon Sep 17 00:00:00 2001 From: chyyran Date: Fri, 27 May 2022 01:33:10 -0400 Subject: [PATCH 1/5] docs(chd): header structure --- docs/source/techspecs/chd.rst | 64 +++++++++++++++++++++++++++++++++ docs/source/techspecs/index.rst | 1 + 2 files changed, 65 insertions(+) create mode 100644 docs/source/techspecs/chd.rst diff --git a/docs/source/techspecs/chd.rst b/docs/source/techspecs/chd.rst new file mode 100644 index 0000000000000..d89488d3b9fd7 --- /dev/null +++ b/docs/source/techspecs/chd.rst @@ -0,0 +1,64 @@ +MAME Compressed Hunks of Data (CHD) +=================================== + +.. contents:: :local: + +Introduction +------------ + + +Header Specification +-------------------- + +Version 1 +~~~~~~~~~ + +Version 2 +~~~~~~~~~ + +Version 3 +~~~~~~~~~ + +Version 4 +~~~~~~~~~ + +Version 5 +~~~~~~~~~ + +Map Specification +----------------- + +Legacy Map (Version 1-4) +~~~~~~~~~~~~~~~~~~~~~~~~ + +Version 5 Map +~~~~~~~~~~~~~ + + + +Compression Codecs +------------------ + +Raw Deflate/Zlib (``zlib``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Raw LZMA (``lzma``) +~~~~~~~~~~~~~~~~~~~ + +Raw FLAC (``flac``) +~~~~~~~~~~~~~~~~~~~ + +Raw Huffman (``huff``) +~~~~~~~~~~~~~~~~~~~~~~ + +CD-ROM LZMA (``cdlz``) +~~~~~~~~~~~~~~~~~~~~~~ + +CD-ROM Deflate (``cdzl``) +~~~~~~~~~~~~~~~~~~~~~~~~~ + +CD-ROM FLAC (``cdfl``) +~~~~~~~~~~~~~~~~~~~~~~ + +A/V Huffman (``avhu``) +~~~~~~~~~~~~~~~~~~~~~~ \ No newline at end of file diff --git a/docs/source/techspecs/index.rst b/docs/source/techspecs/index.rst index 40d4048e5227b..c34a9722512d6 100644 --- a/docs/source/techspecs/index.rst +++ b/docs/source/techspecs/index.rst @@ -21,3 +21,4 @@ MAME’s source or working on scripts that run within the MAME framework. luareference m6502 poly_manager + chd \ No newline at end of file From 18e3b67183666e38587b4784b45ece719c6abcf9 Mon Sep 17 00:00:00 2001 From: chyyran Date: Fri, 27 May 2022 18:36:39 -0400 Subject: [PATCH 2/5] doc(chd): header spec --- docs/source/techspecs/chd.rst | 547 +++++++++++++++++++++++++++++++++- 1 file changed, 538 insertions(+), 9 deletions(-) diff --git a/docs/source/techspecs/chd.rst b/docs/source/techspecs/chd.rst index d89488d3b9fd7..b20411d8f36ab 100644 --- a/docs/source/techspecs/chd.rst +++ b/docs/source/techspecs/chd.rst @@ -6,35 +6,546 @@ MAME Compressed Hunks of Data (CHD) Introduction ------------ +Compressed Hunks of Data (CHD) is a container format for compressing hard disks, CD-ROMs, +or LaserDiscs originally written by Aaron Giles. CHD divides an input stream into 'hunks' of +equal size, each of which can potentially be compressed by a different codec or encoded as a +duplicate of another hunk in the same, or a 'parent' CHD file. -Header Specification --------------------- +This document describes the CHD format. It is explicitly *descriptive*, and does not prescribe +how to encode a stream into a CHD file. It also describes the format parameters for each compression +codec used to compress individual hunks. + +Definitions +----------- +Some terms used elsewhere in this document are defined here for clarity. + +Hunk +~~~~ +A *hunk* is a logical unit of compressed data in a CHD file. Hunks are described by their +*map entry* by their offset in the stream, compressed size (*block size*) and optionally +a checksum depending on the format and version of the CHD file. Each hunk decompresses +completely into a buffer of consistent length (*hunk size*), which is the same for all +hunks and is global on a CHD file level. + +Block Size +~~~~~~~~~~ +The compressed length of a hunk. Not to be confused with the *hunk size*. + +Block Offset +~~~~~~~~~~~~ +The offset in the CHD file to the compressed data of the hunk. The compressed hunk data +begins at the block offset for block size number of bytes. + +Hunk Size +~~~~~~~~~ +The length of an uncompressed hunk. This length is the same for all hunks in a CHD file. +Not to be confused with the *block size*. + +Map Entry +~~~~~~~~~ +Each hunk is defined by a co-indexed map entry in the map. A valid map entry for a hunk +contains at least a *block offset* and *block size* for a hunk, and for hunks compressed +with a codec, a checksum value. + +Parent +~~~~~~ +A separate CHD file that contains hunks referred to in the child CHD file. Successful decoding +of the child CHD file requires the parent CHD. + +Codec +~~~~~ +A compression algorithm used to compress a hunk. + +Header Format +------------- +There have been 5 versions of the CHD file format. All versions but version +5 are considered deprecated and are no longer in common use. Each CHD version +has a different layout, but the first 16 bytes are always the same and are +sufficient to determine the CHD version. All numbers are in **big endian** order. Version 1 ~~~~~~~~~ +The CHD version 1 header is 76 bytes long. The structure of the version 1 header +is as follows. CHD version 1 only supports hard disks. + ++------------------+----------+ +| Magic_Number | 8 bytes | ++------------------+----------+ +| Header_Length | 4 bytes | ++------------------+----------+ +| Header_Version | 4 bytes | ++------------------+----------+ +| Flags | 4 bytes | ++------------------+----------+ +| Compression_Type | 4 bytes | ++------------------+----------+ +| Hunk_Size | 4 bytes | ++------------------+----------+ +| Total_Hunks | 4 bytes | ++------------------+----------+ +| Cylinders | 4 bytes | ++------------------+----------+ +| Heads | 4 bytes | ++------------------+----------+ +| Sectors | 4 bytes | ++------------------+----------+ +| MD5_Hash | 16 bytes | ++------------------+----------+ +| Parent_MD5_Hash | 16 bytes | ++------------------+----------+ + +Magic_Number +'''''''''''' +'MComprHD', 8 bytes + +Header_Length +''''''''''''' +4 byte unsigned integer, big-endian. The length of the header. Value: 76. + +Header_Version +'''''''''''''' +4 byte unsigned integer, big-endian. The version of the header. Value: 1. + +Flags +''''' +4 byte unsigned integer, big-endian. + +Possible values: + +* ``0x00000001`` CHD requires a parent +* ``0x00000002`` CHD allows writes + +Compression_Type +'''''''''''''''' +4 byte unsigned integer, big-endian. The type of compression used for all +compressed hunks in the CHD file. + +Possible values: + +* ``0x00000000`` No compression (``CHDCOMPRESSION_NONE``) +* ``0x00000001`` Deflate/Zlib (``CHDCOMPRESSION_ZLIB``) + +Hunk_Size +''''''''' +4 byte unsigned integer, big-endian. Number of 512-byte sectors per hunk. +**Not** the *hunk size* as used conventionally in this document. To calculate +the *hunk size*, multiply ``Hunk_Size`` by 512. + +Total_Hunks +''''''''''' +4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. + +Cylinders +''''''''' +4 byte unsigned integer, big-endian. The total number of cylinders in the CHD file. + +Heads +''''' +4 byte unsigned integer, big-endian. The total number of heads in the CHD file. + +Sectors +''''''' +4 byte unsigned integer, big-endian. The total number of sectors in the CHD file. + +MD5_Hash +'''''''' +16 byte MD5 hash of the decompressed data in this CHD file. + +Parent_MD5_Hash +''''''''''''''' +16 byte MD5 hash of the compressed parent CHD file. + Version 2 ~~~~~~~~~ +The CHD version 2 header is 80 bytes long. The structure of the version 2 header +is as follows. CHD version 2 only supports hard disks. + ++------------------+----------+ +| Magic_Number | 8 bytes | ++------------------+----------+ +| Header_Length | 4 bytes | ++------------------+----------+ +| Header_Version | 4 bytes | ++------------------+----------+ +| Flags | 4 bytes | ++------------------+----------+ +| Compression_Type | 4 bytes | ++------------------+----------+ +| Hunk_Size | 4 bytes | ++------------------+----------+ +| Total_Hunks | 4 bytes | ++------------------+----------+ +| Cylinders | 4 bytes | ++------------------+----------+ +| Heads | 4 bytes | ++------------------+----------+ +| Sectors | 4 bytes | ++------------------+----------+ +| MD5_Hash | 16 bytes | ++------------------+----------+ +| Parent_MD5_Hash | 16 bytes | ++------------------+----------+ +| Sector_Length | 4 bytes | ++------------------+----------+ + +Magic_Number +'''''''''''' +'MComprHD', 8 bytes + +Header_Length +''''''''''''' +4 byte unsigned integer, big-endian. The length of the header. Value: 76. + +Header_Version +'''''''''''''' +4 byte unsigned integer, big-endian. The version of the header. Value: 1. + +Flags +''''' +4 byte unsigned integer, big-endian. + +Possible values: + +* ``0x00000001`` CHD requires a parent +* ``0x00000002`` CHD allows writes + +Compression_Type +'''''''''''''''' +4 byte unsigned integer, big-endian. The type of compression used for all +compressed hunks in the CHD file. + +Possible values: + +* ``0x00000000`` No compression (``CHDCOMPRESSION_NONE``) +* ``0x00000001`` Deflate/Zlib (``CHDCOMPRESSION_ZLIB``) + +Hunk_Size +''''''''' +4 byte unsigned integer, big-endian. Number of ``Sector_Length``-length sectors per hunk. +**Not** the *hunk size* as used conventionally in this document. To calculate +the *hunk size*, multiply ``Hunk_Size`` by ``Sector_Length``. + +Total_Hunks +''''''''''' +4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. + +Cylinders +''''''''' +4 byte unsigned integer, big-endian. The total number of cylinders in the CHD file. + +Heads +''''' +4 byte unsigned integer, big-endian. The total number of heads in the CHD file. + +Sectors +''''''' +4 byte unsigned integer, big-endian. The total number of sectors in the CHD file. + +MD5_Hash +'''''''' +16 byte MD5 hash of the decompressed data in this CHD file. + +Parent_MD5_Hash +''''''''''''''' +16 byte MD5 hash of the compressed parent CHD file. + +Sector_Length +''''''''''''' +4 byte unsigned integer, big-endian. The number of bytes per sector. Version 3 ~~~~~~~~~ +The CHD version 3 header is 120 bytes long. The structure of the version 3 header is as follows. + ++------------------+----------+ +| Magic_Number | 8 bytes | ++------------------+----------+ +| Header_Length | 4 bytes | ++------------------+----------+ +| Header_Version | 4 bytes | ++------------------+----------+ +| Flags | 4 bytes | ++------------------+----------+ +| Compression_Type | 4 bytes | ++------------------+----------+ +| Total_Hunks | 4 bytes | ++------------------+----------+ +| Logical_Size | 8 bytes | ++------------------+----------+ +| Metadata_Offset | 8 bytes | ++------------------+----------+ +| MD5_Hash | 16 bytes | ++------------------+----------+ +| Parent_MD5_Hash | 16 bytes | ++------------------+----------+ +| Hunk_Size | 4 bytes | ++------------------+----------+ +| SHA1_Hash | 20 bytes | ++------------------+----------+ +| Parent_SHA1_Hash | 20 bytes | ++------------------+----------+ + +Magic_Number +'''''''''''' +'MComprHD', 8 bytes + +Header_Length +''''''''''''' +4 byte unsigned integer, big-endian. The length of the header. Value: 76. + +Header_Version +'''''''''''''' +4 byte unsigned integer, big-endian. The version of the header. Value: 1. + +Flags +''''' +4 byte unsigned integer, big-endian. + +Possible values: + +* ``0x00000001`` CHD requires a parent +* ``0x00000002`` CHD allows writes + +Compression_Type +'''''''''''''''' +4 byte unsigned integer, big-endian. The type of compression used for all +compressed hunks in the CHD file. + +Possible values: + +* ``0x00000000`` No compression (``CHDCOMPRESSION_NONE``) +* ``0x00000001`` Deflate/Zlib (``CHDCOMPRESSION_ZLIB``) +* ``0x00000002`` Deflate/Zlib+ (``CHDCOMPRESSION_ZLIB_PLUS``) + +Total_Hunks +''''''''''' +4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. + +Logical_Size +'''''''''''' +4 byte unsigned integer, big-endian. The logical length in bytes of the decompressed data. + +Metadata_Offset +''''''''''''''' +8 byte unsigned integer, big-endian. The offset in the CHD file to the first metadata entry. + +MD5_Hash +'''''''' +16 byte MD5 hash of the decompressed data in this CHD file. + +Parent_MD5_Hash +''''''''''''''' +16 byte MD5 hash of the compressed parent CHD file. + +Hunk_Size +''''''''' +4 byte unsigned integer, big-endian. The *hunk size*; the decompressed length of each hunk in the file. + +SHA1_Hash +''''''''' +20 byte SHA1 hash of the decompressed data in this CHD file. + +Parent_SHA1_Hash +'''''''''''''''' +20 byte SHA1 hash of the compressed parent CHD file. Version 4 ~~~~~~~~~ +The CHD version 4 header is 108 bytes long. The structure of the version 4 header is as follows. + ++------------------+----------+ +| Magic_Number | 8 bytes | ++------------------+----------+ +| Header_Length | 4 bytes | ++------------------+----------+ +| Header_Version | 4 bytes | ++------------------+----------+ +| Flags | 4 bytes | ++------------------+----------+ +| Compression_Type | 4 bytes | ++------------------+----------+ +| Total_Hunks | 4 bytes | ++------------------+----------+ +| Logical_Size | 8 bytes | ++------------------+----------+ +| Metadata_Offset | 8 bytes | ++------------------+----------+ +| Hunk_Size | 4 bytes | ++------------------+----------+ +| SHA1_Hash | 20 bytes | ++------------------+----------+ +| Parent_SHA1_Hash | 20 bytes | ++------------------+----------+ +| Raw_SHA1_Hash | 20 bytes | ++------------------+----------+ + +Magic_Number +'''''''''''' +'MComprHD', 8 bytes + +Header_Length +''''''''''''' +4 byte unsigned integer, big-endian. The length of the header. Value: 76. + +Header_Version +'''''''''''''' +4 byte unsigned integer, big-endian. The version of the header. Value: 1. + +Flags +''''' +4 byte unsigned integer, big-endian. + +Possible values: + +* ``0x00000001`` CHD requires a parent +* ``0x00000002`` CHD allows writes + +Compression_Type +'''''''''''''''' +4 byte unsigned integer, big-endian. The type of compression used for all +compressed hunks in the CHD file. + +Possible values: + +* ``0x00000000`` No compression (``CHDCOMPRESSION_NONE``) +* ``0x00000001`` Deflate/Zlib (``CHDCOMPRESSION_ZLIB``) +* ``0x00000002`` Deflate/Zlib+ (``CHDCOMPRESSION_ZLIB_PLUS``) +* ``0x00000003`` AV Huffman (``CHDCOMPRESSION_AV``) + +Total_Hunks +''''''''''' +4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. + +Logical_Size +'''''''''''' +4 byte unsigned integer, big-endian. The logical length in bytes of the decompressed data. + +Metadata_Offset +''''''''''''''' +8 byte unsigned integer, big-endian. The offset in the CHD file to the first metadata entry. + +Hunk_Size +''''''''' +4 byte unsigned integer, big-endian. The *hunk size*; the decompressed length of each hunk in the file. + +SHA1_Hash +''''''''' +20 byte SHA1 hash of the CHD file including compressed data and metadata. + +Parent_SHA1_Hash +'''''''''''''''' +20 byte SHA1 hash of the parent CHD file including compressed data and metadata. + +Raw_SHA1_Hash +''''''''''''' +20 byte SHA1 hash of the decompressed data in this CHD file. Version 5 ~~~~~~~~~ +The CHD version 5 header is 124 bytes long. The structure of the version 5 header is as follows. -Map Specification ------------------ ++---------------------+----------+ +| Magic_Number | 8 bytes | ++---------------------+----------+ +| Header_Length | 4 bytes | ++---------------------+----------+ +| Header_Version | 4 bytes | ++---------------------+----------+ +| Compression_Type[4] | 16 bytes | ++---------------------+----------+ +| Logical_Size | 8 bytes | ++---------------------+----------+ +| Map_Offset | 8 bytes | ++---------------------+----------+ +| Metadata_Offset | 8 bytes | ++---------------------+----------+ +| Hunk_Size | 4 bytes | ++---------------------+----------+ +| Unit_Size | 4 bytes | ++---------------------+----------+ +| Raw_SHA1_Hash | 20 bytes | ++---------------------+----------+ +| SHA1_Hash | 20 bytes | ++---------------------+----------+ +| Parent_SHA1_Hash | 20 bytes | ++---------------------+----------+ -Legacy Map (Version 1-4) -~~~~~~~~~~~~~~~~~~~~~~~~ +Magic_Number +'''''''''''' +'MComprHD', 8 bytes + +Header_Length +''''''''''''' +4 byte unsigned integer, big-endian. The length of the header. Value: 76. + +Header_Version +'''''''''''''' +4 byte unsigned integer, big-endian. The version of the header. Value: 1. + +Compression_Type +'''''''''''''''' +Array of 4, 4 byte unsigned integers, big-endian. The types of compression used +when compressing hunks in this CHD file. Each hunk can be compressed with any one +of the four compression types. Version 5 compression codes are all FourCC codes except +for ``CHD_CODEC_NONE``, which uses the value ``0``. + +Possible values: + +* ``0x00000000`` No compression (``CHD_CODEC_NONE``) +* ``zlib`` Raw Deflate/zlib (``CHD_CODEC_ZLIB``) +* ``lzma`` Raw LZMA (``CHD_CODEC_LZMA``) +* ``flac`` Raw FLAC (``CHD_CODEC_FLAC``) +* ``huff`` Raw Huffman (``CHD_CODEC_HUFF``) +* ``cdzl`` CD-ROM Deflate/zlib (``CHD_CODEC_CDZL``) +* ``cdlz`` CD-ROM LZMA (``CHD_CODEC_CDLZ``) +* ``cdfl`` CD-ROM FLAC (``CHD_CODEC_CDFL``) +* ``avhu`` A/V Huffman (``CHD_CODEC_AVHUFF``) + +Logical_Size +'''''''''''' +4 byte unsigned integer, big-endian. The logical length in bytes of the decompressed data. + +Map_Offset +'''''''''' +8 byte unsigned integer, big-endian. The offset in the CHD file to the beginning of the hunk map. + +Metadata_Offset +''''''''''''''' +8 byte unsigned integer, big-endian. The offset in the CHD file to the first metadata entry. + +Hunk_Size +''''''''' +4 byte unsigned integer, big-endian. The *hunk size*; the decompressed length of each hunk in the file. + +Unit_Size +''''''''' +4 byte unsigned integer, big-endian. The length of each unit within each hunk. + +Raw_SHA1_Hash +''''''''''''' +20 byte SHA1 hash of the decompressed data in this CHD file. + +SHA1_Hash +''''''''' +20 byte SHA1 hash of the CHD file including compressed data and metadata. + +Parent_SHA1_Hash +'''''''''''''''' +20 byte SHA1 hash of the parent CHD file including compressed data and metadata. -Version 5 Map -~~~~~~~~~~~~~ +Hunk Map Format +--------------- +Version 1-2 Map +~~~~~~~~~~~~~~~ + +Version 3-4 Map +~~~~~~~~~~~~~~~ + +Version 5 Map +~~~~~~~~~~~~~ Compression Codecs ------------------ @@ -61,4 +572,22 @@ CD-ROM FLAC (``cdfl``) ~~~~~~~~~~~~~~~~~~~~~~ A/V Huffman (``avhu``) -~~~~~~~~~~~~~~~~~~~~~~ \ No newline at end of file +~~~~~~~~~~~~~~~~~~~~~~ + + +Metadata +-------- + + +Static Huffman Coding +--------------------- + +Importing from a RLE-encoded Huffman Tree +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + +Importing from a Small Huffman-encoded Huffman Tree +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Delta-RLE Huffman +~~~~~~~~~~~~~~~~~ From 7c3eadeffde85a976d82657b56359483c726418f Mon Sep 17 00:00:00 2001 From: chyyran Date: Mon, 15 Aug 2022 00:47:48 -0400 Subject: [PATCH 3/5] doc(chd): document some codecs --- docs/source/techspecs/chd.rst | 83 +++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/docs/source/techspecs/chd.rst b/docs/source/techspecs/chd.rst index b20411d8f36ab..e724900ecc09f 100644 --- a/docs/source/techspecs/chd.rst +++ b/docs/source/techspecs/chd.rst @@ -41,6 +41,10 @@ Hunk Size The length of an uncompressed hunk. This length is the same for all hunks in a CHD file. Not to be confused with the *block size*. +Hunk Count +~~~~~~~~~~ +The total number of hunks in a CHD file. + Map Entry ~~~~~~~~~ Each hunk is defined by a co-indexed map entry in the map. A valid map entry for a hunk @@ -537,9 +541,17 @@ Parent_SHA1_Hash Hunk Map Format --------------- +CHD version 1 and 2 share a map format, CHD version 3 and 4 extends the V1-2 map format differently, +and CHD version 5 uses a completely different map format. For CHD version 1-4, the map begins directly after +the header, and in CHD v5, the map occurs at ``Map_Offset``. The map has a total length of the size of +a map entry multiplied by the *hunk count*, and each map entry is laid out sequentially. Version 1-2 Map ~~~~~~~~~~~~~~~ +The size of each map entry in the V1-2 map format is 8 bytes. The total size of the map in CHD version 1-2 can +be calculated by multiplying the *hunk count* by 8. Each map entry has the following structure. + + Version 3-4 Map ~~~~~~~~~~~~~~~ @@ -549,18 +561,77 @@ Version 5 Map Compression Codecs ------------------ +CHD hunks can be compressed with a variety of codecs. Some of these codecs are implemented via vendored libraries whereas +some are implemented within MAME. For well-known algorithms, this document only describes necessary compression parameters +needed to decompress a chunk with a well-behaved implementation of the codec. Lesser-known algorithms will have their implementation +details and data layout described in more detail. + +CHD compression works at a hunk granularity. A compressed hunk always decompresses to a buffer of **hunk sized** bytes, regardless +of codec used. Hunks may also be "compressed" with ``CHD_CODEC_NONE`` (``0x0``), which indicates uncompressed data, or refer to another +hunk in the same or parent CHD, but this section only describes the codecs and parameters used to compress hunks. See :ref:`Hunk Decoding` for +more details on how a hunk is decompressed. Raw Deflate/Zlib (``zlib``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This codec is used in CHD versions 1-5. + +The ``zlib`` codec compresses hunks using the `Deflate `_ algorithm. The zlib header +is not used, and each hunk is raw, Deflate compressed bytes. + +In CHD versions 1-4, this codec is known as ``CHDCOMPRESSION_ZLIB``. CHD versions 3 and 4 supported ``CHDCOMPRESSION_ZLIB_PLUS``, which is decoded in +an identical manner as ``CHDCOMPRESSION_ZLIB``. Raw LZMA (``lzma``) ~~~~~~~~~~~~~~~~~~~ +This codec is only used in CHD version 5. + +The ``lzma`` codec compresses hunks with the `LZMA `_ algorithm. +Hunks are compressed with LZMA1 **without any stream headers**, with compression level **9** and the default ``lclppb`` compression parameters for LZMA 19.0. These settings are + +* Literal Context Bits (``lc``): 3 +* Literal Position Bits (``lp``): 0 +* Position Bits (``pb``): 2 + +While an unlimited dictionary size can be used, calculation of an appropriate dictionary size can be done with the following algorithm, lifted from +`LzmaEnc::LzmaEncProps_Normallize `_, where +``level`` is the compression level (``9``), and ``hunk_size`` is the hunk size of the CHD. If relevant, all integers should be truncated to 32 bits long. + +.. code-block:: python + def get_lzma_dict_size(level = 9, hunk_size): + if level <= 5: + dict_size = 1 << (level * 2 + 14) + elif level <= 7: + dict_size = 1 << 25 + else: + dict_size = 1 << 26 + + if dict_size > hunk_size: + for i in range(11, 31): # Inclusive range [11, 30] + if hunk_size <= (2 << i): + dict_size = 2 << i + break + if hunk_size <= (3 << i): + dict_size = 3 << i + break + return dict_size + Raw FLAC (``flac``) ~~~~~~~~~~~~~~~~~~~ +This codec is only used in CHD version 5. + +The ``flac`` codec compresses hunks with `FLAC `_ audio compression codec. + +At the start of each compressed hunk, there is a one byte header of either ``L`` (``0x4C``) to indicate little-endian output, or ``B`` (``0x42``) to indicate big-endian output. The FLAC-compressed bytes begin +after this one byte header. The FLAC decompressor implementation must be correctly configured according to the header byte. + +FLAC data is compressed as raw FLAC frames, **without a ```STREAM`` section `_, and thus no ``STREAMINFO``** or any other metadata. There are 2 channels per block, +each channel encoded as 16-bit signed integer PCM. The samples are interleaved with the left channel first, then the right channel. + Raw Huffman (``huff``) ~~~~~~~~~~~~~~~~~~~~~~ +This codec is only used in CHD version 5. CD-ROM LZMA (``cdlz``) ~~~~~~~~~~~~~~~~~~~~~~ @@ -570,10 +641,13 @@ CD-ROM Deflate (``cdzl``) CD-ROM FLAC (``cdfl``) ~~~~~~~~~~~~~~~~~~~~~~ +This codec is used in CHD versions 3-5. A/V Huffman (``avhu``) ~~~~~~~~~~~~~~~~~~~~~~ +This codec is used in CHD versions 3-5. +In CHD versions 3 and 4, this codec is known as ``CHDCOMPRESSION_AV``. Metadata -------- @@ -591,3 +665,12 @@ Importing from a Small Huffman-encoded Huffman Tree Delta-RLE Huffman ~~~~~~~~~~~~~~~~~ + +Hunk Decoding +------------- + +Decoding Legacy Hunks +~~~~~~~~~~~~~~~~~~~~~ + +Decoding V5 Hunks +~~~~~~~~~~~~~~~~~ From 94037f381b7639bb409855c3bfa2178b3acb403e Mon Sep 17 00:00:00 2001 From: chyyran Date: Sun, 11 Sep 2022 02:12:57 -0400 Subject: [PATCH 4/5] doc(chd): document map layout --- docs/source/techspecs/chd.rst | 209 +++++++++++++++++++++++++++++----- 1 file changed, 182 insertions(+), 27 deletions(-) diff --git a/docs/source/techspecs/chd.rst b/docs/source/techspecs/chd.rst index e724900ecc09f..dacb5e595396c 100644 --- a/docs/source/techspecs/chd.rst +++ b/docs/source/techspecs/chd.rst @@ -15,6 +15,7 @@ This document describes the CHD format. It is explicitly *descriptive*, and does how to encode a stream into a CHD file. It also describes the format parameters for each compression codec used to compress individual hunks. + Definitions ----------- Some terms used elsewhere in this document are defined here for clarity. @@ -64,7 +65,7 @@ Header Format ------------- There have been 5 versions of the CHD file format. All versions but version 5 are considered deprecated and are no longer in common use. Each CHD version -has a different layout, but the first 16 bytes are always the same and are +has a different layout, but the layout of the first 16 bytes are always the same and are sufficient to determine the CHD version. All numbers are in **big endian** order. Version 1 @@ -75,7 +76,7 @@ is as follows. CHD version 1 only supports hard disks. +------------------+----------+ | Magic_Number | 8 bytes | +------------------+----------+ -| Header_Length | 4 bytes | +| Header_Size | 4 bytes | +------------------+----------+ | Header_Version | 4 bytes | +------------------+----------+ @@ -85,7 +86,7 @@ is as follows. CHD version 1 only supports hard disks. +------------------+----------+ | Hunk_Size | 4 bytes | +------------------+----------+ -| Total_Hunks | 4 bytes | +| Hunk_Count | 4 bytes | +------------------+----------+ | Cylinders | 4 bytes | +------------------+----------+ @@ -102,7 +103,7 @@ Magic_Number '''''''''''' 'MComprHD', 8 bytes -Header_Length +Header_Size ''''''''''''' 4 byte unsigned integer, big-endian. The length of the header. Value: 76. @@ -135,7 +136,7 @@ Hunk_Size **Not** the *hunk size* as used conventionally in this document. To calculate the *hunk size*, multiply ``Hunk_Size`` by 512. -Total_Hunks +Hunk_Count ''''''''''' 4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. @@ -168,7 +169,7 @@ is as follows. CHD version 2 only supports hard disks. +------------------+----------+ | Magic_Number | 8 bytes | +------------------+----------+ -| Header_Length | 4 bytes | +| Header_Size | 4 bytes | +------------------+----------+ | Header_Version | 4 bytes | +------------------+----------+ @@ -178,7 +179,7 @@ is as follows. CHD version 2 only supports hard disks. +------------------+----------+ | Hunk_Size | 4 bytes | +------------------+----------+ -| Total_Hunks | 4 bytes | +| Hunk_Count | 4 bytes | +------------------+----------+ | Cylinders | 4 bytes | +------------------+----------+ @@ -190,14 +191,14 @@ is as follows. CHD version 2 only supports hard disks. +------------------+----------+ | Parent_MD5_Hash | 16 bytes | +------------------+----------+ -| Sector_Length | 4 bytes | +| Sector_Size | 4 bytes | +------------------+----------+ Magic_Number '''''''''''' 'MComprHD', 8 bytes -Header_Length +Header_Size ''''''''''''' 4 byte unsigned integer, big-endian. The length of the header. Value: 76. @@ -226,11 +227,11 @@ Possible values: Hunk_Size ''''''''' -4 byte unsigned integer, big-endian. Number of ``Sector_Length``-length sectors per hunk. +4 byte unsigned integer, big-endian. Number of ``Sector_Size``-length sectors per hunk. **Not** the *hunk size* as used conventionally in this document. To calculate -the *hunk size*, multiply ``Hunk_Size`` by ``Sector_Length``. +the *hunk size*, multiply ``Hunk_Size`` by ``Sector_Size``. -Total_Hunks +Hunk_Count ''''''''''' 4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. @@ -254,7 +255,7 @@ Parent_MD5_Hash ''''''''''''''' 16 byte MD5 hash of the compressed parent CHD file. -Sector_Length +Sector_Size ''''''''''''' 4 byte unsigned integer, big-endian. The number of bytes per sector. @@ -265,7 +266,7 @@ The CHD version 3 header is 120 bytes long. The structure of the version 3 heade +------------------+----------+ | Magic_Number | 8 bytes | +------------------+----------+ -| Header_Length | 4 bytes | +| Header_Size | 4 bytes | +------------------+----------+ | Header_Version | 4 bytes | +------------------+----------+ @@ -273,7 +274,7 @@ The CHD version 3 header is 120 bytes long. The structure of the version 3 heade +------------------+----------+ | Compression_Type | 4 bytes | +------------------+----------+ -| Total_Hunks | 4 bytes | +| Hunk_Count | 4 bytes | +------------------+----------+ | Logical_Size | 8 bytes | +------------------+----------+ @@ -294,7 +295,7 @@ Magic_Number '''''''''''' 'MComprHD', 8 bytes -Header_Length +Header_Size ''''''''''''' 4 byte unsigned integer, big-endian. The length of the header. Value: 76. @@ -322,7 +323,7 @@ Possible values: * ``0x00000001`` Deflate/Zlib (``CHDCOMPRESSION_ZLIB``) * ``0x00000002`` Deflate/Zlib+ (``CHDCOMPRESSION_ZLIB_PLUS``) -Total_Hunks +Hunk_Count ''''''''''' 4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. @@ -336,7 +337,7 @@ Metadata_Offset MD5_Hash '''''''' -16 byte MD5 hash of the decompressed data in this CHD file. +16 byte MD5 hash of the decompressed data in the CHD file. Parent_MD5_Hash ''''''''''''''' @@ -361,7 +362,7 @@ The CHD version 4 header is 108 bytes long. The structure of the version 4 heade +------------------+----------+ | Magic_Number | 8 bytes | +------------------+----------+ -| Header_Length | 4 bytes | +| Header_Size | 4 bytes | +------------------+----------+ | Header_Version | 4 bytes | +------------------+----------+ @@ -369,7 +370,7 @@ The CHD version 4 header is 108 bytes long. The structure of the version 4 heade +------------------+----------+ | Compression_Type | 4 bytes | +------------------+----------+ -| Total_Hunks | 4 bytes | +| Hunk_Count | 4 bytes | +------------------+----------+ | Logical_Size | 8 bytes | +------------------+----------+ @@ -388,7 +389,7 @@ Magic_Number '''''''''''' 'MComprHD', 8 bytes -Header_Length +Header_Size ''''''''''''' 4 byte unsigned integer, big-endian. The length of the header. Value: 76. @@ -417,7 +418,7 @@ Possible values: * ``0x00000002`` Deflate/Zlib+ (``CHDCOMPRESSION_ZLIB_PLUS``) * ``0x00000003`` AV Huffman (``CHDCOMPRESSION_AV``) -Total_Hunks +Hunk_Count ''''''''''' 4 byte unsigned integer, big-endian. The total number of hunks in the CHD file. @@ -452,7 +453,7 @@ The CHD version 5 header is 124 bytes long. The structure of the version 5 heade +---------------------+----------+ | Magic_Number | 8 bytes | +---------------------+----------+ -| Header_Length | 4 bytes | +| Header_Size | 4 bytes | +---------------------+----------+ | Header_Version | 4 bytes | +---------------------+----------+ @@ -479,7 +480,7 @@ Magic_Number '''''''''''' 'MComprHD', 8 bytes -Header_Length +Header_Size ''''''''''''' 4 byte unsigned integer, big-endian. The length of the header. Value: 76. @@ -544,20 +545,174 @@ Hunk Map Format CHD version 1 and 2 share a map format, CHD version 3 and 4 extends the V1-2 map format differently, and CHD version 5 uses a completely different map format. For CHD version 1-4, the map begins directly after the header, and in CHD v5, the map occurs at ``Map_Offset``. The map has a total length of the size of -a map entry multiplied by the *hunk count*, and each map entry is laid out sequentially. +a map entry multiplied by the *hunk count*, and each map entry is laid out sequentially for all versions. Version 1-2 Map ~~~~~~~~~~~~~~~ -The size of each map entry in the V1-2 map format is 8 bytes. The total size of the map in CHD version 1-2 can -be calculated by multiplying the *hunk count* by 8. Each map entry has the following structure. +Each map entry in the V1-2 map format is stored as an 8-byte big-endian integer. The total size of the map in CHD version 1-2 can be calculated by +multiplying the *hunk count* by 8. The structure of a V1-2 map entry is as follows, assuming little-endian conversion. ++------------+---------+ +| Block_Size | 20 bits | ++------------+---------+ +| Offset | 44 bits | ++------------+---------+ +Block_Size +'''''''''' +High 20 bits, interpreted as a 4 byte, unsigned little-endian integer. The *block size* of the hunk this map entry refers to. + +Offset +'''''' +Low 44 bits, interpreted as a 8 byte unsigned little-endian integer. The offset in the CHD file to the beginning of the hunk this map entry refers to. Version 3-4 Map ~~~~~~~~~~~~~~~ +Each map entry in the V3-4 map format is 16 bytes long. The total size of the map in CHD version 1-2 can be calculated by +multiplying the *hunk count* by 16. The structure of a V3-4 map entry is as follows. + ++------------------+---------+ +| Offset | 8 bytes | ++------------------+---------+ +| CRC | 4 bytes | ++------------------+---------+ +| Block_Size | 3 bytes | ++------------------+---------+ +| Compression_Type | 1 byte | ++------------------+---------+ + +Offset +'''''' +8 byte unsigned integer, big-endian. The offset in the CHD file to the beginning of the hunk this map entry refers to. + +CRC +''' +4 byte unsigned integer, big-endian. The `CRC32 ISO/HDLC `_ checksum for the +decompressed hunk data. + +Block_Size +'''''''''' +3 byte unsigned integer, big-endian. The *block size* of the hunk this map entry refers to. + +Compression_Type +'''''''''''''''' +1 byte unsigned integer, the upper four bits are undefined. The type of compression used in the hunk this map entry refers to. + +Possible compression types are + +* ``0x0`` Invalid +* ``0x1`` Compressed with the CHD file Codec +* ``0x2`` Uncompressed +* ``0x3`` Mini (Raw Data stored in Offset) +* ``0x4`` Decompress from Self +* ``0x5`` Decompress from Parent +* ``0x6`` Secondary Compression Version 5 Map ~~~~~~~~~~~~~ +CHD V5 maps can be either compressed or uncompressed. If the first codec in the header `Compression_Type` is `0x0`, then the entire CHD file, including the map, is uncompressed. +If the first codec in the header `Compression_Type` is anything other than `0x0`, then a form of compression is used, and the map will be compressed. + +Uncompressed Map +'''''''''''''''' +Since the uncompressed map only occurs when the entire CHD file is uncompressed, each uncompressed map entry contains only the 4-byte offset to the beginning of the hunk data. +The total size of an uncompressed map in CHD version 5 can be calculated by multiplying the *hunk count* by 4. The structure of a V5 uncompressed map entry is as follows. + ++------------------+---------+ +| Offset | 4 bytes | ++------------------+---------+ + +Offset +>>>>>> +4 byte unsigned integer, big-endian. The offset in the CHD file to the beginning of the hunk this map entry refers to. + +Compressed Map +'''''''''''''' +The structure of the compressed map header is as follows. + ++-----------------+---------+ +| Map_Size | 4 bytes | ++-----------------+---------+ +| Map_Offset | 6 bytes | ++-----------------+---------+ +| Map_CRC | 2 bytes | ++-----------------+---------+ +| Size_Bits | 1 byte | ++-----------------+---------+ +| Self_Bits | 1 byte | ++-----------------+---------+ +| Parent_Bits | 1 byte | ++-----------------+---------+ +| Reserved | 1 byte | ++-----------------+---------+ + +Map_Size +>>>>>>>> +4 byte unsigned integer, big-endian. The compressed length of the map. + +Map_Offset +>>>>>>>>>> +6 byte unsigned integer, big-endian. The offset in the CHD file to the first compressed map entry. + +Map_CRC +>>>>>>> +2 byte unsigned integer, big-endian. The `CRC16 IBM/3740 `_ checksum for the +decompressed map data. + +Size_Bits +>>>>>>>>> +1 byte unsigned integer. The number of bits used to store the *block size* of a hunk in a map entry, for hunks compressed with one of four codecs used in the CHD file. + +Self_Bits +>>>>>>>>> +1 byte unsigned integer. The number of bits used to store the offset to a hunk that is a reference to another hunk in the same CHD file, for hunks that are decompressed from Self. + +Parent_Bits +>>>>>>>>>>> +1 byte unsigned integer. The number of bits used to store the offset to a hunk that is a reference to another hunk in the parent file, for hunks that are decompressed from Parent. + +Reserved +>>>>>>>> +Reserved for future use. + +The structure of each map entry in the compressed map, once decompressed, is as follows. + ++------------------+---------+ +| Compression_Type | 1 byte | ++------------------+---------+ +| Block_Size | 3 bytes | ++------------------+---------+ +| Offset | 6 bytes | ++------------------+---------+ +| CRC | 2 bytes | ++------------------+---------+ + +Compression_Type +>>>>>>>>>>>>>>>> +1 byte unsigned integer, the upper four bits are undefined. The type of compression used in the hunk this map entry refers to. + +Possible compression types are + +* ``0x0`` Compression Type 0 (`Compression_Type[0]`) +* ``0x1`` Compression Type 1 (`Compression_Type[1]`) +* ``0x2`` Compression Type 2 (`Compression_Type[2]`) +* ``0x3`` Compression Type 3 (`Compression_Type[3]`) +* ``0x4`` Uncompressed +* ``0x5`` Decompress from Self +* ``0x6`` Decompress from Parent + +Block_Size +>>>>>>>>>> +3 byte unsigned integer, big-endian. The *block size* of the hunk this map entry refers to. + +Offset +>>>>>> +6 byte unsigned integer, big-endian. The offset in the CHD file to the beginning of the hunk this map entry refers to. + +CRC +>>> +2 byte unsigned integer, big-endian. The `CRC16 IBM/3740 `_ checksum for the +decompressed hunk data. Compression Codecs ------------------ From 42354aae22e2618466207c886d2f9387af3de2f3 Mon Sep 17 00:00:00 2001 From: chyyran Date: Sun, 11 Sep 2022 02:36:54 -0400 Subject: [PATCH 5/5] doc(chd): clarify definitions --- docs/source/techspecs/chd.rst | 36 +++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/source/techspecs/chd.rst b/docs/source/techspecs/chd.rst index dacb5e595396c..143ed561260b9 100644 --- a/docs/source/techspecs/chd.rst +++ b/docs/source/techspecs/chd.rst @@ -630,29 +630,29 @@ Compressed Map '''''''''''''' The structure of the compressed map header is as follows. -+-----------------+---------+ -| Map_Size | 4 bytes | -+-----------------+---------+ -| Map_Offset | 6 bytes | -+-----------------+---------+ -| Map_CRC | 2 bytes | -+-----------------+---------+ -| Size_Bits | 1 byte | -+-----------------+---------+ -| Self_Bits | 1 byte | -+-----------------+---------+ -| Parent_Bits | 1 byte | -+-----------------+---------+ -| Reserved | 1 byte | -+-----------------+---------+ ++--------------+---------+ +| Map_Size | 4 bytes | ++--------------+---------+ +| First_Offset | 6 bytes | ++--------------+---------+ +| Map_CRC | 2 bytes | ++--------------+---------+ +| Size_Bits | 1 byte | ++--------------+---------+ +| Self_Bits | 1 byte | ++--------------+---------+ +| Parent_Bits | 1 byte | ++--------------+---------+ +| Reserved | 1 byte | ++--------------+---------+ Map_Size >>>>>>>> 4 byte unsigned integer, big-endian. The compressed length of the map. -Map_Offset +First_Offset >>>>>>>>>> -6 byte unsigned integer, big-endian. The offset in the CHD file to the first compressed map entry. +6 byte unsigned integer, big-endian. The offset in the CHD file to the beginning of the hunk the first map entry refers to. Map_CRC >>>>>>> @@ -780,7 +780,7 @@ The ``flac`` codec compresses hunks with `FLAC `_, and thus no ``STREAMINFO``** or any other metadata. There are 2 channels per block, +FLAC data is compressed as raw FLAC frames, without `metadata blocks or a fLaC header `_. There are 2 channels per block, each channel encoded as 16-bit signed integer PCM. The samples are interleaved with the left channel first, then the right channel.