util/bitstream: Fix cases where bits would be dropped when reading and writing #12057

987123879113 · 2024-02-23T13:36:57Z

Additional testing from someone else before merging would be appreciated.

In this case, the parent chunk types were using 26 bits per offset values which behaved unexpectedly resulting in some bits being dropped when reading and writing the data. I think the issue only happens when dealing with values larger than 24 bits due to the way buffering was implemented. So I've changed both bitstream_in and bitstream_out to more greedily fill its buffers, including partial reads for bitstream_in.

Old (can see an additional CRC error message I added to help debug, should be reproducible):

> ./chdman copy -i gdj_aaa_2007071100.chd -op gdj_aaa_2007100800.chd -o newchild2.chd 
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3687-gb0c8802da65-dirty)
Output CHD:   newchild2.chd
Parent CHD:   gdj_aaa_2007100800.chd
Input CHD:    gdj_aaa_2007071100.chd
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
Hunk size:    4,096
Logical size: 40,000,000,000
Compression complete ... final ratio = 7.8%

> ./chdman info -i newchild2.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3687-gb0c8802da65-dirty)
CRC did not match! expected 0000f003, found 000055da
Error opening CHD file (newchild2.chd): Decompression error
Fatal error occurred: 1

New:

> ./chdman copy -i gdj_aaa_2007071100.chd -op gdj_aaa_2007100800.chd -o newchild3.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Output CHD:   newchild3.chd
Parent CHD:   gdj_aaa_2007100800.chd
Input CHD:    gdj_aaa_2007071100.chd
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
Hunk size:    4,096
Logical size: 40,000,000,000
Compression complete ... final ratio = 7.8%

> ./chdman info -i newchild3.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Input file:   newchild3.chd
File Version: 5
Logical size: 40,000,000,000 bytes
Hunk Size:    4,096 bytes
Total Hunks:  9,765,625
Unit Size:    512 bytes
Total Units:  78,125,000
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     3,137,000,409 bytes
Ratio:        7.8%
SHA1:         af76f981a86d117df0799170aa1a6babeb92fd00
Data SHA1:    f86c948f8ab9124ed04d0462c7b9b31aabd0afa4
Parent SHA1:  b32f4f09e155f2fd0fc16b0bb77331756b3928d0
Metadata:     Tag='GDDD'  Index=0  Length=37 bytes
              CYLS:156250,HEADS:10,SECS:50,BPS:512.

Parent CHDs just to make it easy to verify the metadata of the new child CHD looks correct:

> ./chdman info -i gdj_aaa_2007071100.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Input file:   gdj_aaa_2007071100.chd
File Version: 5
Logical size: 40,000,000,000 bytes
Hunk Size:    4,096 bytes
Total Hunks:  9,765,625
Unit Size:    512 bytes
Total Units:  78,125,000
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     10,268,950,478 bytes
Ratio:        25.7%
SHA1:         af76f981a86d117df0799170aa1a6babeb92fd00
Data SHA1:    f86c948f8ab9124ed04d0462c7b9b31aabd0afa4
Metadata:     Tag='GDDD'  Index=0  Length=37 bytes
              CYLS:156250,HEADS:10,SECS:50,BPS:512.
              
> ./chdman info -i gdj_aaa_2007100800.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Input file:   gdj_aaa_2007100800.chd
File Version: 5
Logical size: 40,000,000,000 bytes
Hunk Size:    4,096 bytes
Total Hunks:  9,765,625
Unit Size:    512 bytes
Total Units:  78,125,000
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     7,161,091,862 bytes
Ratio:        17.9%
SHA1:         b32f4f09e155f2fd0fc16b0bb77331756b3928d0
Data SHA1:    2001011370d1201129d7f43f5f8cf37752985645
Metadata:     Tag='GDDD'  Index=0  Length=37 bytes
              CYLS:156250,HEADS:10,SECS:50,BPS:512.

…f there wasn't enough space in the buffer - util/bitstream: Allow for reading partial bits into the buffer

987123879113 · 2024-02-23T13:48:11Z

src/lib/util/bitstream.h

+		// offload more bits if it'll still overflow the buffer
+		if (m_bits + numbits >= 32)


This is the exact case this is fixing. The first write of 0x0f successfully writes 0x0f, but the second write of 0x0f only writes 0x0e and drops the last bit because it ended up where m_bits = 7 after writing the 3 full 8-bit bytes from m_buffer to m_write, and the number of bits we wanted to write into the buffer was 26 which ends up being 33 bits total, so the last bit was silently dropped.

m_buffer[42047d00] m_bits[29] newbits[0000000f] numbits[26] requiredbits[55] (before offloading to m_write buffer) m_buffer[00000000] m_bits[ 5] numbits[26] requiredbits[31] (after offloading to m_write buffer) newbits[000003c0] (after shifting newbits by 32 - numbits) m_buffer[0000001e] m_bits[31] val[0000000f] (state at end of write func) m_buffer[0000001e] m_bits[31] newbits[0000000f] numbits[26] requiredbits[57] (before offloading to m_write buffer) m_buffer[1e000000] m_bits[ 7] numbits[26] requiredbits[33] (after offloading to m_write buffer) newbits[000003c0] (after shifting newbits by 32 - numbits) m_buffer[1e000007] m_bits[33] val[0000000e] (state at end of write func)

cuavas · 2024-02-23T15:31:34Z

We’re awfully close to a release, so there’s less than 24 hours to make a decision on this. It’s a pretty bad bug as it causes bad CHDs to be produced for large hard disk sizes. I’d appreciate if someone else can eye it over.

src/lib/util/bitstream.h

987123879113 · 2024-02-23T15:40:52Z

I gave it a run through 170 CHDs I had on my HDD and I encountered no immediate issues so I'm marking it ready for review now.

For my testing, I ran chdman info on all of them and no issues reading the metadata for all of them. Then I ran chdman extractraw on all of them and no issues. Then a test of chdman extractcd/extracthd -> chdman createcd/createhd -> chdman info on the newly recreated CHDs and no errors. And lastly a test of chdman copy to create deltas for various random CHDs I had and they all worked with chdman info.

cuavas · 2024-02-23T15:45:20Z

I’m merging this. I’d still appreciate other people looking at it.

…hen reading and writing. (#12057)" This reverts commit 69c3cd7. This causes CHD SHA1 digests to change. Either it's buggy, or CHD SHA1 digests depend on the representation rather than the data itself.

987123879113 · 2024-02-23T17:51:47Z

Commenting this here for visibility. I requested a revert after noticing SHA1s were different when extracting and recreating a lot of the CHDs. I think it's unrelated to this PR after all. The changes seem to happen even with upstream chdman before this PR and the changing SHA1s seem to be a result of other changes from the past.

229d19d#commitcomment-138992742

john-iv · 2024-02-24T02:54:39Z

I created new delta CHDs using Windyfairy's .exes against my problemed kpython2 child CHDs. Just finished and they all pass MAME's audit and verify/info as expected. These were the sets in question, attached:

https://www.emulab.it/forum/index.php?topic=9336.msg26277#msg26277

…ing and writing. (#12057) * In some cases, bits would be dropped when writing if there wasn't enough space in the buffer. * Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.

…ing and writing. (mamedev#12057) * In some cases, bits would be dropped when writing if there wasn't enough space in the buffer. * Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.

…hen reading and writing. (mamedev#12057)" This reverts commit 69c3cd7. This causes CHD SHA1 digests to change. Either it's buggy, or CHD SHA1 digests depend on the representation rather than the data itself.

…ing and writing. (mamedev#12057) * In some cases, bits would be dropped when writing if there wasn't enough space in the buffer. * Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.

…hen reading and writing. (mamedev#12057)" This reverts commit 69c3cd7. This causes CHD SHA1 digests to change. Either it's buggy, or CHD SHA1 digests depend on the representation rather than the data itself.

…ing and writing. (mamedev#12057) * In some cases, bits would be dropped when writing if there wasn't enough space in the buffer. * Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.

- util/bitstream: Fix case where bits would be dropped when writing i…

6df94f1

…f there wasn't enough space in the buffer - util/bitstream: Allow for reading partial bits into the buffer

987123879113 commented Feb 23, 2024

View reviewed changes

cuavas reviewed Feb 23, 2024

View reviewed changes

src/lib/util/bitstream.h Outdated Show resolved Hide resolved

src/lib/util/bitstream.h Outdated Show resolved Hide resolved

Fix variable types

cd48f60

987123879113 marked this pull request as ready for review February 23, 2024 15:40

cuavas merged commit 69c3cd7 into mamedev:master Feb 23, 2024
5 checks passed

987123879113 deleted the fix_bitstream branch March 7, 2024 05:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util/bitstream: Fix cases where bits would be dropped when reading and writing #12057

util/bitstream: Fix cases where bits would be dropped when reading and writing #12057

987123879113 commented Feb 23, 2024

987123879113 Feb 23, 2024 •

edited

Loading

cuavas commented Feb 23, 2024

987123879113 commented Feb 23, 2024

cuavas commented Feb 23, 2024

987123879113 commented Feb 23, 2024

john-iv commented Feb 24, 2024 •

edited

Loading

		// offload more bits if it'll still overflow the buffer
		if (m_bits + numbits >= 32)

util/bitstream: Fix cases where bits would be dropped when reading and writing #12057

util/bitstream: Fix cases where bits would be dropped when reading and writing #12057

Conversation

987123879113 commented Feb 23, 2024

987123879113 Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

cuavas commented Feb 23, 2024

987123879113 commented Feb 23, 2024

cuavas commented Feb 23, 2024

987123879113 commented Feb 23, 2024

john-iv commented Feb 24, 2024 • edited Loading

987123879113 Feb 23, 2024 •

edited

Loading

john-iv commented Feb 24, 2024 •

edited

Loading