Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chdman createdvd writess corrupt CHD files on CIFS storage #12605

Open
denilsonsa opened this issue Jul 21, 2024 · 9 comments
Open

chdman createdvd writess corrupt CHD files on CIFS storage #12605

denilsonsa opened this issue Jul 21, 2024 · 9 comments

Comments

@denilsonsa
Copy link

denilsonsa commented Jul 21, 2024

MAME version

0.267 (mame0267-dirty)

System information

Manjaro Linux, x86_64

Intel i7-6700 CPU with 8 cores/threads

Storage on a NAS device mounted over CIFS.

INI configuration details

No response

Emulated system/software

No response

Incorrect behaviour

  1. Obtain a Playstation 2 ISO file from a game that was released as DVD.

  2. Convert the DVD ISO file to a CHD file:

     chdman createdvd -i GameNameHere.iso -o GameNameHere.chd
    
  3. Try to verify it:

     chdman verify -i GameNameHere.iso
    

What happens:

  • Sometimes it verifies correctly.
  • Sometimes it shows: Decompression error
  • Sometimes it shows: Compression error
  • Sometimes it segfaults.

But the behavior is always consistent on the same .CHD file.

Expected behaviour

It should be able to verify any CHD created via createdvd. It shouldn't crash either.

Steps to reproduce

No response

Additional details

If I were to guess, I would think the createdvd isn't deterministic. There might be some uninitialized data that sometimes gets corrupted and sometimes gets written wrong on the CHD file. Or maybe the file seek/write/sync/flush isn't done correctly. Or perhaps it's related to running the compression in parallel on multiple CPU cores.

Or, I would guess verify code is buggy, as the code shouldn't ever segfault.

Given the random nature of this, it may as well happen that my hardware is defective somehow. Although that's weird, given it correctly verified many CHD files, but only had trouble with a few of the ones I had created using createdvd. The CHD files generated from CD bin/cue files worked just fine.

I know debugging something random like this is a nightmare. I'm sorry about it.


As a guess after some more thought, it could be related to racing conditions on the createdvd method when running in parallel. Since the storage is on a spinning disk in a network device (Synology NAS with limited CPU power), the bottleneck is probably the I/O. Some of the CPU threads might not run in the expected timing due to delays in the I/O. If this is the case, moving the ISO to /tmp (a RAM drive) and creating the CHD over there shouldn't have I/O issues. Well, still a mystery for me, and testing this takes quite some time.

@denilsonsa
Copy link
Author

denilsonsa commented Jul 23, 2024

To verify if the issue was related to multi-threading, I tried this creating a CHD file three times with the default number of processes and three times more with a single thread:

for a in 1 2 3 ; do chdman createdvd -i AnyLargePS2DVD.iso -o multi-$a.chd ; done
for a in 1 2 3 ; do chdman createdvd -i AnyLargePS2DVD.iso -np 1 -o single-$a.chd ; done

Both the source and the destination were on the same folder, a CIFS-mounted shared folder from my NAS. The result wasn't good:

$ md5sum multi-* single-*
ea3bc4e3ad10ca387e7ed734501d476a  multi-1.chd
5ab5db891572d1301aa41b3b2624684e  multi-2.chd
ea3bc4e3ad10ca387e7ed734501d476a  multi-3.chd
c610c5723decf413b295b132d51e5ae3  single-1.chd
b2b54cdcd115663df0dbd27ac9afd3c0  single-2.chd
945f851408e0c87df6ad9fe816cd87f9  single-3.chd

Then I decided the same approach, but with the source .ISO file on the CIFS mount and the destination .CHD file on /tmp RAM drive. The result was promising:

$ md5sum /tmp/multi-* /tmp/single-*
ea3bc4e3ad10ca387e7ed734501d476a  /tmp/multi-1.chd
ea3bc4e3ad10ca387e7ed734501d476a  /tmp/multi-2.chd
ea3bc4e3ad10ca387e7ed734501d476a  /tmp/multi-3.chd
ea3bc4e3ad10ca387e7ed734501d476a  /tmp/single-1.chd
ea3bc4e3ad10ca387e7ed734501d476a  /tmp/single-2.chd
ea3bc4e3ad10ca387e7ed734501d476a  /tmp/single-3.chd

I also tried two other combinations of CIFS and RAM as either sources or destinations. Here's the results:

CIFS destination RAM drive destination
CIFS source inconsistent perfect
RAM drive source inconsistent perfect

This makes me believe the process of writing the CHD file has some assumption that isn't working here. Maybe it tries to write and read and seek too quickly, or without flushing the changes. I don't know, I haven't looked at the code.

Given the recent findings, I'm updating this issue title from "chdman can't verify CHD files created by createdvd" to "chdman createdvd writess corrupt CHD files on CIFS storage".


One more note: I know I have created many .CHD files from CUE/BIN CD-based images, and they all verified correctly. This makes me believe the issue is only with createdvd, and not with createcd.

@denilsonsa denilsonsa changed the title chdman can't verify CHD files created by createdvd chdman createdvd writess corrupt CHD files on CIFS storage Jul 23, 2024
@mamehaze
Copy link
Contributor

mamehaze commented Jul 24, 2024

have you ensured this isn't a bad RAM / thermal issue? I can believe there could be issues, but typically on machines where I've seen CHD outright create files that don't validate it's been because of bad RAM, or a CPU overheating creating corrupt compressed data. The process is intensive, and will swamp your CPU, pushing all the cores to their limit which for larger files like DVDs could be too much in some cases.

@denilsonsa
Copy link
Author

It is unlikely to be a thermal or RAM issue, because:

  • I don't notice any abnormal crashes on this machine. (I've had faulty RAM on a Linux system many years ago, so I kind of know what to expect.)
  • All CDs converted to CHD verified just fine.
  • I run a little one-liner script to chdman verify all CHD files (over CIFS), and they were all verified correctly. (I know, this is less CPU-intensive than creating those files.)
  • The CHD files were created correctly when writing to a RAM drive. In fact, it cannot be more CPU/RAM-intensive than reading from and writing to a RAM drive; and yet it worked just right.
  • I've re-created all broken CHD files I had by re-creating them on RAM drive, and then moving over to the CIFS mount, and then verifying them over CIFS. They all verified correctly.

I cannot 100% discard the possibility of faulty hardware on my side (after all, hardware fails), but in this case I find it very unlikely.

If you have any specific scenario you want me to try (or variations of command-line parameters), please tell me, and I can try running them later.

@MooglyGuy
Copy link
Contributor

If you have any specific scenario you want me to try (or variations of command-line parameters), please tell me

Sure. Reproduce the issue on a different machine, preferably AMD or an earlier generation of Intel.

@denilsonsa
Copy link
Author

denilsonsa commented Jul 30, 2024

On the same machine, I tried converting a 600MB Sega CD CUE+BIN to CHD. I used this command:

for a in 1 2 3 4 5 6 7 8 9 ; do chdman createcd -i Foobar.cue -o multi-$a.chd ; done
for a in 1 2 3 4 5 6 7 8 9 ; do chdman createcd -i Foobar.cue -np 1 -o single-$a.chd ; done

Essentially the same command as before, but with createcd instead of createdvd. Also reading from CIFS and writing to CIFS. The result was interesting:

$ md5sum *.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-1.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-2.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-3.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-4.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-5.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-6.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-7.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-8.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-9.chd
7f50d49392430d60a08805e61de64ef5  single-1.chd
cb279c1d5ad9d870d1d7afe23e370063  single-2.chd
4e37f3a44ceb55a58705b27a78c21eba  single-3.chd
cb279c1d5ad9d870d1d7afe23e370063  single-4.chd
2dcc5d096dff53aba89c2768546afafb  single-5.chd
8afd362061362566be291a49479f10cf  single-6.chd
cb279c1d5ad9d870d1d7afe23e370063  single-7.chd
cb279c1d5ad9d870d1d7afe23e370063  single-8.chd
57ca0894cddde1102a238da5aafa8f38  single-9.chd

To make sure overheating wasn't the issue, I repeated the test while setting the CPU power profile to "Power Save" instead of "Balanced", which limits the clock speed to 900MHz (instead of over 2GHz~3GHz). The CPU was at around 45°C this time (instead of over 80°C). This took significantly more time.

$ md5sum *.chd
1c2107afcbdf56bca8a63db6c10bbd21  multi-1.chd
7e3c12a8c919556d6fc164e430b47862  multi-2.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-3.chd
8477cca6863ceba46f59e7d6cc711c45  multi-4.chd
5cf98a8b3cc23cafdca317bf7b6fe27b  multi-5.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-6.chd
7c5a1639db4302dcf2375ed7b177df38  multi-7.chd
c27de612808b53c4ca438d275925cf71  multi-8.chd
cb279c1d5ad9d870d1d7afe23e370063  single-1.chd
cb279c1d5ad9d870d1d7afe23e370063  single-2.chd
f9cca43a3518d718abf5e859bb2e4fc2  single-3.chd
34ddf2875fed771079ad8d1f3b77bbc9  single-4.chd
9e1d8bf395261cba01258623f541149d  single-5.chd
fe481d7eda3cace16f8e2aa2884eaa63  single-6.chd
e756a67d96853a3f5d4ab713edad5f04  single-7.chd

Then I tried the test again, but this time reading from CIFS and writing to /tmp/ (i.e. to a RAM drive).

$ md5sum /tmp/*.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-1.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-2.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-3.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-4.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-5.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-6.chd
cb279c1d5ad9d870d1d7afe23e370063  single-1.chd
cb279c1d5ad9d870d1d7afe23e370063  single-2.chd
cb279c1d5ad9d870d1d7afe23e370063  single-3.chd
cb279c1d5ad9d870d1d7afe23e370063  single-4.chd
cb279c1d5ad9d870d1d7afe23e370063  single-5.chd
cb279c1d5ad9d870d1d7afe23e370063  single-6.chd

Then I repeated the same test on the Steam Deck. Same LAN, both over wired Ethernet (EDIT: the Steam Deck was over Wi-Fi and also over Ethernet), same CIFS/Samba server, same folder.

$ md5sum *.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-1.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-2.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-3.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-4.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-5.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-6.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-7.chd
cb279c1d5ad9d870d1d7afe23e370063  deckmulti-8.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-1.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-2.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-3.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-4.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-5.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-6.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-7.chd
cb279c1d5ad9d870d1d7afe23e370063  decksingle-8.chd

Both mounted the CIFS folder using this entry on the /etc/fstab:

//192.168.12.34/Images  /mnt/Images  cifs  credentials=/etc/fstab-cifs-credentials,rw,uid=1000,gid=1000,nobrl  0  0

Slightly different versions:

  • chdman - MAME Compressed Hunks of Data (CHD) manager 0.267 (mame0267-dirty) on the main PC where this bug is happening. Installed from mame-tools package.
  • chdman - MAME Compressed Hunks of Data (CHD) manager 0.267 (unknown) on the Steam Deck. Installed as a flatpak from flathub. (Executed with --command after using FlatSeal to grand filesystem access to MAME.)

I'll do even more tests and post the results as I collect them.

@denilsonsa
Copy link
Author

This time I repeated the tests on the same intel-based machine, but using the same flatpak version of chdman as I used in the Steam deck. I kept the power profile as "Power Save". Here's the results of createcd from Sega CD:

$ md5sum *.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-1.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-2.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-3.chd
3c41d4b17560b17a3bcfb2b2a8653390  multi-4.chd
1b0ab0f3c07b401b0e12afc2d093ca3f  multi-5.chd
0d8e393115cd09c46b0e8d94495b09ff  multi-6.chd
a80ef3705fdcd2169a3cbc4639c4f6e7  multi-7.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-8.chd
cb279c1d5ad9d870d1d7afe23e370063  single-1.chd
48507f59b9dd73fd1463e66af7983b4e  single-2.chd
79ee00e37c43cf6879530e54d4a477cc  single-3.chd
cb279c1d5ad9d870d1d7afe23e370063  single-4.chd
cb279c1d5ad9d870d1d7afe23e370063  single-5.chd
d9e540c1223c287b120df046f8b28e02  single-6.chd
7edb1555d284ae7f56478e43b78b6aef  single-7.chd
dcec42f4edaf30d3c2eeda723b5b8157  single-8.chd

I also used the Steam Deck to createdvd from a Playstation 2 ISO file. Here's the results:

$ md5sum *.chd
ea3bc4e3ad10ca387e7ed734501d476a  deckmulti-1.chd
ea3bc4e3ad10ca387e7ed734501d476a  deckmulti-2.chd
ea3bc4e3ad10ca387e7ed734501d476a  deckmulti-3.chd
ea3bc4e3ad10ca387e7ed734501d476a  deckmulti-4.chd
ea3bc4e3ad10ca387e7ed734501d476a  deckmulti-5.chd
ea3bc4e3ad10ca387e7ed734501d476a  deckmulti-6.chd
ea3bc4e3ad10ca387e7ed734501d476a  decksingle-1.chd
ea3bc4e3ad10ca387e7ed734501d476a  decksingle-2.chd
ea3bc4e3ad10ca387e7ed734501d476a  decksingle-3.chd
ea3bc4e3ad10ca387e7ed734501d476a  decksingle-4.chd
ea3bc4e3ad10ca387e7ed734501d476a  decksingle-5.chd
ea3bc4e3ad10ca387e7ed734501d476a  decksingle-6.chd

Could it be that writing to the CIFS folder is buggy? I then tried copying one single CHD file from /tmp/ to the CIFS folder multiple times, and every time the file copied perfectly. So, sequential (i.e. bulk) writes to the CIFS folder still works fine. (Which makes sense, as I've never noticed any issue anywhere else.) But the way chdman is writing data leads to a buggy behavior.


Summary so far:

  • I've reproduced it on the Manjaro Linux (which is based on Arch Linux) machine with Intel i7-6700 CPU.
    • I could not reproduce it yet on the Steam Deck (AMD CPU and SteamOS, which is based on Arch Linux).
  • Using chdman to write directly to the CIFS storage will randomly cause corruption on that machine.
    • Using "Power Save" profile doesn't make any difference, which leads to the conclusion there is no overheating problem.
    • Using the mame-tools package from the distro, or using MAME from flatpak doesn't make any difference, the results are the same.
    • Writing to RAM leads to correct files.
    • Copying from RAM to CIFS (using cp) leads to correct files.

@mamehaze
Copy link
Contributor

mamehaze commented Jul 30, 2024

have you tried other (non-CHDMAN cases) with the CIFS and large files, under load, with lots of appending of additional data etc.? maybe the OS handling of the CIFS itself is what is buggy?

CHDMAN itself isn't doing anything special in those cases, and it seems to be the common factor.

@wilbertpol
Copy link
Contributor

I am not an expert on CIFS mounts, but other things to try may be not using nolbr and/or cache=none on the CIFS mount.

It might be that the combination of how chdman persists data and the nolbr option are causing the problems that you are experiencing.

@denilsonsa
Copy link
Author

denilsonsa commented Jul 30, 2024

I tried remounting with cache=none and re-doing the test. Here's the results:

$ md5sum *.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-1.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-2.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-3.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-4.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-5.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-6.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-7.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-8.chd
cb279c1d5ad9d870d1d7afe23e370063  multi-9.chd

So, indeed caching sounds like it could be the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants