Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/bitstream: Fix cases where bits would be dropped when reading and writing #12057

Merged
merged 2 commits into from Feb 23, 2024

Conversation

987123879113
Copy link
Contributor

Additional testing from someone else before merging would be appreciated.

In this case, the parent chunk types were using 26 bits per offset values which behaved unexpectedly resulting in some bits being dropped when reading and writing the data. I think the issue only happens when dealing with values larger than 24 bits due to the way buffering was implemented. So I've changed both bitstream_in and bitstream_out to more greedily fill its buffers, including partial reads for bitstream_in.

Old (can see an additional CRC error message I added to help debug, should be reproducible):

> ./chdman copy -i gdj_aaa_2007071100.chd -op gdj_aaa_2007100800.chd -o newchild2.chd 
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3687-gb0c8802da65-dirty)
Output CHD:   newchild2.chd
Parent CHD:   gdj_aaa_2007100800.chd
Input CHD:    gdj_aaa_2007071100.chd
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
Hunk size:    4,096
Logical size: 40,000,000,000
Compression complete ... final ratio = 7.8%

> ./chdman info -i newchild2.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3687-gb0c8802da65-dirty)
CRC did not match! expected 0000f003, found 000055da
Error opening CHD file (newchild2.chd): Decompression error
Fatal error occurred: 1

New:

> ./chdman copy -i gdj_aaa_2007071100.chd -op gdj_aaa_2007100800.chd -o newchild3.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Output CHD:   newchild3.chd
Parent CHD:   gdj_aaa_2007100800.chd
Input CHD:    gdj_aaa_2007071100.chd
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
Hunk size:    4,096
Logical size: 40,000,000,000
Compression complete ... final ratio = 7.8%

> ./chdman info -i newchild3.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Input file:   newchild3.chd
File Version: 5
Logical size: 40,000,000,000 bytes
Hunk Size:    4,096 bytes
Total Hunks:  9,765,625
Unit Size:    512 bytes
Total Units:  78,125,000
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     3,137,000,409 bytes
Ratio:        7.8%
SHA1:         af76f981a86d117df0799170aa1a6babeb92fd00
Data SHA1:    f86c948f8ab9124ed04d0462c7b9b31aabd0afa4
Parent SHA1:  b32f4f09e155f2fd0fc16b0bb77331756b3928d0
Metadata:     Tag='GDDD'  Index=0  Length=37 bytes
              CYLS:156250,HEADS:10,SECS:50,BPS:512.

Parent CHDs just to make it easy to verify the metadata of the new child CHD looks correct:

> ./chdman info -i gdj_aaa_2007071100.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Input file:   gdj_aaa_2007071100.chd
File Version: 5
Logical size: 40,000,000,000 bytes
Hunk Size:    4,096 bytes
Total Hunks:  9,765,625
Unit Size:    512 bytes
Total Units:  78,125,000
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     10,268,950,478 bytes
Ratio:        25.7%
SHA1:         af76f981a86d117df0799170aa1a6babeb92fd00
Data SHA1:    f86c948f8ab9124ed04d0462c7b9b31aabd0afa4
Metadata:     Tag='GDDD'  Index=0  Length=37 bytes
              CYLS:156250,HEADS:10,SECS:50,BPS:512.
              
> ./chdman info -i gdj_aaa_2007100800.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.262 (mame0252-3658-gbaf313500d7)
Input file:   gdj_aaa_2007100800.chd
File Version: 5
Logical size: 40,000,000,000 bytes
Hunk Size:    4,096 bytes
Total Hunks:  9,765,625
Unit Size:    512 bytes
Total Units:  78,125,000
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
CHD size:     7,161,091,862 bytes
Ratio:        17.9%
SHA1:         b32f4f09e155f2fd0fc16b0bb77331756b3928d0
Data SHA1:    2001011370d1201129d7f43f5f8cf37752985645
Metadata:     Tag='GDDD'  Index=0  Length=37 bytes
              CYLS:156250,HEADS:10,SECS:50,BPS:512.

…f there wasn't enough space in the buffer

- util/bitstream: Allow for reading partial bits into the buffer
Comment on lines +234 to +235
// offload more bits if it'll still overflow the buffer
if (m_bits + numbits >= 32)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the exact case this is fixing. The first write of 0x0f successfully writes 0x0f, but the second write of 0x0f only writes 0x0e and drops the last bit because it ended up where m_bits = 7 after writing the 3 full 8-bit bytes from m_buffer to m_write, and the number of bits we wanted to write into the buffer was 26 which ends up being 33 bits total, so the last bit was silently dropped.

m_buffer[42047d00] m_bits[29] newbits[0000000f] numbits[26] requiredbits[55] (before offloading to m_write buffer)
m_buffer[00000000] m_bits[ 5] numbits[26] requiredbits[31] (after offloading to m_write buffer)
newbits[000003c0] (after shifting newbits by 32 - numbits)
m_buffer[0000001e] m_bits[31] val[0000000f] (state at end of write func)

m_buffer[0000001e] m_bits[31] newbits[0000000f] numbits[26] requiredbits[57] (before offloading to m_write buffer)
m_buffer[1e000000] m_bits[ 7] numbits[26] requiredbits[33] (after offloading to m_write buffer)
newbits[000003c0] (after shifting newbits by 32 - numbits)
m_buffer[1e000007] m_bits[33] val[0000000e] (state at end of write func)

@cuavas
Copy link
Member

cuavas commented Feb 23, 2024

We’re awfully close to a release, so there’s less than 24 hours to make a decision on this. It’s a pretty bad bug as it causes bad CHDs to be produced for large hard disk sizes. I’d appreciate if someone else can eye it over.

src/lib/util/bitstream.h Outdated Show resolved Hide resolved
src/lib/util/bitstream.h Outdated Show resolved Hide resolved
@987123879113
Copy link
Contributor Author

I gave it a run through 170 CHDs I had on my HDD and I encountered no immediate issues so I'm marking it ready for review now.

For my testing, I ran chdman info on all of them and no issues reading the metadata for all of them. Then I ran chdman extractraw on all of them and no issues. Then a test of chdman extractcd/extracthd -> chdman createcd/createhd -> chdman info on the newly recreated CHDs and no errors. And lastly a test of chdman copy to create deltas for various random CHDs I had and they all worked with chdman info.

@987123879113 987123879113 marked this pull request as ready for review February 23, 2024 15:40
@cuavas cuavas merged commit 69c3cd7 into mamedev:master Feb 23, 2024
5 checks passed
@cuavas
Copy link
Member

cuavas commented Feb 23, 2024

I’m merging this. I’d still appreciate other people looking at it.

cuavas added a commit that referenced this pull request Feb 23, 2024
…hen reading and writing. (#12057)"

This reverts commit 69c3cd7.

This causes CHD SHA1 digests to change.  Either it's buggy, or CHD SHA1
digests depend on the representation rather than the data itself.
@987123879113
Copy link
Contributor Author

Commenting this here for visibility. I requested a revert after noticing SHA1s were different when extracting and recreating a lot of the CHDs. I think it's unrelated to this PR after all. The changes seem to happen even with upstream chdman before this PR and the changing SHA1s seem to be a result of other changes from the past.

229d19d#commitcomment-138992742

@john-iv
Copy link

john-iv commented Feb 24, 2024

I created new delta CHDs using Windyfairy's .exes against my problemed kpython2 child CHDs. Just finished and they all pass MAME's audit and verify/info as expected. These were the sets in question, attached:

Delta

https://www.emulab.it/forum/index.php?topic=9336.msg26277#msg26277

cuavas pushed a commit that referenced this pull request Feb 24, 2024
…ing and writing. (#12057)

* In some cases, bits would be dropped when writing if there wasn't enough space in the buffer.
* Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.
Mokona pushed a commit to Mokona/mame that referenced this pull request Feb 28, 2024
…ing and writing. (mamedev#12057)

* In some cases, bits would be dropped when writing if there wasn't enough space in the buffer.
* Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.
Mokona pushed a commit to Mokona/mame that referenced this pull request Feb 28, 2024
…hen reading and writing. (mamedev#12057)"

This reverts commit 69c3cd7.

This causes CHD SHA1 digests to change.  Either it's buggy, or CHD SHA1
digests depend on the representation rather than the data itself.
Mokona pushed a commit to Mokona/mame that referenced this pull request Feb 28, 2024
…ing and writing. (mamedev#12057)

* In some cases, bits would be dropped when writing if there wasn't enough space in the buffer.
* Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.
@987123879113 987123879113 deleted the fix_bitstream branch March 7, 2024 05:51
stonedDiscord pushed a commit to stonedDiscord/mame that referenced this pull request Apr 8, 2024
…ing and writing. (mamedev#12057)

* In some cases, bits would be dropped when writing if there wasn't enough space in the buffer.
* Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.
stonedDiscord pushed a commit to stonedDiscord/mame that referenced this pull request Apr 8, 2024
…hen reading and writing. (mamedev#12057)"

This reverts commit 69c3cd7.

This causes CHD SHA1 digests to change.  Either it's buggy, or CHD SHA1
digests depend on the representation rather than the data itself.
stonedDiscord pushed a commit to stonedDiscord/mame that referenced this pull request Apr 8, 2024
…ing and writing. (mamedev#12057)

* In some cases, bits would be dropped when writing if there wasn't enough space in the buffer.
* Fixes bad hunk maps being written to CHD files and incorrect hunk map data being read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants