Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/cdrom: Refactor parse_cue and parse_gdicue #12087

Merged
merged 8 commits into from Mar 6, 2024

Conversation

987123879113
Copy link
Contributor

Closes #12081. Follow up to #11913.

After fixing the problem mentioned in the issue (the first commit) I realized there isn't that much unique code between parse_gdicue and parse_cue so I decided to merge them into one function so that there's no need to maintain two separate .cue parsers anymore.

Additionally I did some other small cleanups such as clearing out some of the line buffers and adding some additional error checks in. Sufficient size checks were already in place already so it wasn't likely to crash anything but I figured it didn't hurt while I'm at changing things around. Mainly was concerned that it looks like you could end up copying a bunch of garbage into token if linebuffer had garbage in it when cdrom_file::tokenize was called.

For testing I extracted some v5 CHDs I had on hand and then recreated them to make sure the hashes matched. Also tried a few Redump Dreamcast dumps to make sure the extracted .bin matched the original Redump data (compared with all individual tracks combined into a single file so I could just do a SHA1 check between the CHD dump and original data).

Comment on lines +2334 to +2341
if (wavlen != 0)
{
outtoc.tracks[trknum].frames = wavlen/2352;
outinfo.track[trknum].offset = wavoffs;
wavoffs = wavlen = 0;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This originally set outtoc.tracks[trknum].trktype = CD_TRACK_AUDIO; but directly below this convert_type_string_to_track_info gets called which will set outtoc.tracks[trknum].trktype based on the TRACK's typestring so I removed it.

Maybe it would be a better idea to move the call to convert_type_string_to_track_info before this so the track type can be forced when a WAVE file type is selected as input?

@987123879113 987123879113 force-pushed the remove_gdipattern branch 4 times, most recently from 9443cb3 to 5250a57 Compare March 3, 2024 08:25
@987123879113
Copy link
Contributor Author

987123879113 commented Mar 3, 2024

I ended up reverting #11913 and rewrote how Dreamcast cue/bins are parsed. This PR breaks both the SHA1 and sometimes the Data SHA1 of any Dreamcast CHDs previously made using Redump cue/bins but it feels necessary. CHDs created using GDIs (both TOSEC and Redump GDIs) are unaffected.

I didn't test with a ton of games but I did test with all of the type varieties previously implemented. Most games are either "type I" or "type III split". Here's the breakdown using all of the Dreamcast cues from redump's database:

type1 796
type2 3
type3 4
type3_split 590
multisession 80
unknown 0
total 1473

For easy testing/reference, here are the type II and type III (non-split) games:

    "type2": [
        "UnderCover AD2025 Kei (Japan).cue",
        "Memories Off 2nd - Making Disc (Japan).cue",
        "Roommate Asami - Okusama wa Joshikousei - Director's Edition (Japan).cue"
    ],
    "type3": [
        "Shenmue II (Japan) (Disc 3).cue",
        "Shenmue II (Japan) (Disc 2).cue",
        "Shenmue II (Japan) (Disc 4).cue",
        "Virtua Fighter History & VF4 (Japan).cue"
    ],

Disclaimer: The TOSEC .raw files all seem to have the audio data start at a random offset (dump method issue?) whereas Redump's audio tracks were all uniform from my testing, so for testing purposes I manually modified the TOSEC .raw files to have the data start at offset 0 and padded the files an appropriate length to keep the same filesize.

  • Create CHD from TOSEC GDI, extract to GDI
    • Working, generates same output bins+GDI as input TOSEC GDI.
  • Create CHD from Redump cue/bin, extract to Redump cue/bin
    • Working but not perfect, generates invalid cue (no REM single/double commands, files not split as required) but same output bin as input Redump bins. Data is intact so it can be restored by splitting the .bin into separate tracks and using the Redump .cue file.
  • Create CHD from Redump cue/bin, extract to GDI
    • Because the data internally gets shifted to match TOSEC GDI, the output GDI becomes a TOSEC GDI but has the pregap data normally not present in the TOSEC rips included at the end of the track data. For example, for Star Wars - Episode I - Racer v1.001 (US) the TOSEC GDI says track 2 starts at 753 (so 753 * 2352 = 0x1b0630 bytes of data between start of track 1 and start of track 2, but the actual track01.bin is (753 - 150) * 2352 = 0x15a410 bytes). Personally I don't think this should cause problems because that data appears to have been just left blank in the TOSEC GDI dumps, and now it's filled in with real data that'd go in between those gaps. The actual data outside of those gaps is the same and if you remove the extra data at the end of the tracks to match the TOSEC GDI dump filesizes then the extracted track data has the same SHA-1 hashes as the original TOSEC GDI dump track data.
  • Create CHD from Redump GDI, extract to GDI
    • Working, generates same output bins+GDI as input Redump GDI. Format of GDI doesn't match exactly due to text formatting but contents are the same. I would not recommend creating CHDs using the Redump GDIs simply for the fact that it'll be in a different format internally compared to the TOSEC GDI and Redump cue/bin CHDs (pregap is associated to different track compared to TOSEC layout and there's no easy way to reorder it without some hacky way to detect a Redump GDI vs TOSEC GDI).
  • Create CHD from Redump GDI, extract to Redump cue/bin
    • Working but not perfect, generates invalid cue (no REM single/double commands, files not split as required) but same output bin as input Redump bins. Data is intact so it can be restored by splitting the .bin into separate tracks and using the Redump .cue file.

@987123879113 987123879113 marked this pull request as ready for review March 3, 2024 09:24
@987123879113 987123879113 marked this pull request as draft March 3, 2024 10:16
@987123879113 987123879113 marked this pull request as ready for review March 3, 2024 10:59
Comment on lines +2631 to +2635
if (cdrom->is_gdrom() && (mode == MODE_CUEBIN))
{
util::stream_format(std::cout, "Warning: extracting GD-ROM CHDs as bin/cue is not fully supported and will result in an unusable CD-ROM cue file.\n");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never worked even before the recent changes, so finally add a warning about it. Someone needs to go in and add proper support for GD-ROM cues.

Copy link
Member

@cuavas cuavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t see any major issues with a quick read, but I’m pretty tired and I’m not overly familiar with this part of the code. It really needs testing, and probably some more pairs of eyes.

I’m not very happy about the fallout from the last change to the GD-ROM handling, or the other issues in chdman that keep being found. Issues in CHD handling don’t just affect MAME.

Comment on lines -1504 to +1506
while ((i < linebuffersize) && (j < tokensize))
while ((i < linebuffersize) && (j < tokensize) && (linebuffer[i] != '\0'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does linebuffer[i] != '\0' happen? Is linebuffersize the size of the buffer rather than the size of its content? The code seemed to be treating it as the size of the content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linebuffersize is just sizeof(linebuffer) so it's the entire buffer size rather than its contents. So without the null byte check, the entire buffer is processed every time instead of just ending when reasonable.

mame/src/lib/util/cdrom.cpp

Lines 1411 to 1422 in 1565fb9

* @def TOKENIZE();
*
* @brief A macro that defines tokenize.
*
* @param linebuffer The linebuffer.
* @param i Zero-based index of the.
* @param sizeof(linebuffer) The sizeof(linebuffer)
* @param token The token.
* @param sizeof(token) The sizeof(token)
*/
#define TOKENIZE i = tokenize( linebuffer, i, sizeof(linebuffer), token, sizeof(token) );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eugh, if linebuffer is a pointer, bad things will happen. Will it work if you change it to tokenize( linebuffer, i, std::size(linebuffer), token, std::size(token) ); so it will give an error if linebuffer and token aren’t array-like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it works from some quick testing.
a8388ef

src/lib/util/cdrom.cpp Outdated Show resolved Hide resolved
src/lib/util/cdrom.cpp Outdated Show resolved Hide resolved
@cuavas cuavas merged commit 853db18 into mamedev:master Mar 6, 2024
5 checks passed
@cuavas
Copy link
Member

cuavas commented Mar 6, 2024

I merged this. It should put us in a better position than where we’ve been for the last few months, anyway. I’d still appreciate if more people could give it a spin.

stonedDiscord pushed a commit to stonedDiscord/mame that referenced this pull request Apr 8, 2024
…Hub mamedev#12081). (mamedev#12087)

This should greatly improve data integrity when creating and extracting GD-ROM images.

* util/cdrom.cpp: Refactored parse_cue to handle GD-ROMs.
* util/cdrom.cpp: Don't discard any data from GD-ROM cue/bin input including pre-gap data.
* tools/chdman.cpp: Fixed splitframes handling.
* tools/chdman.cpp: Added warning when extracting GD-ROM CHDs to cue/bin format.
@TheRealGusBus
Copy link

@cuavas Tested against a nearly complete Redump set (a few seem MIA) using oxyromon as the ROM manager. Re-validated the CHD files using the original CUE files with no obvious issues.

@cuavas
Copy link
Member

cuavas commented Apr 14, 2024

@cuavas Tested against a nearly complete Redump set (a few seem MIA) using oxyromon as the ROM manager. Re-validated the CHD files using the original CUE files with no obvious issues.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chdman 0.263 error does not compress dreamcast games
3 participants