util/cdrom: Refactor parse_cue and parse_gdicue #12087

987123879113 · 2024-03-02T15:18:38Z

Closes #12081. Follow up to #11913.

After fixing the problem mentioned in the issue (the first commit) I realized there isn't that much unique code between parse_gdicue and parse_cue so I decided to merge them into one function so that there's no need to maintain two separate .cue parsers anymore.

Additionally I did some other small cleanups such as clearing out some of the line buffers and adding some additional error checks in. Sufficient size checks were already in place already so it wasn't likely to crash anything but I figured it didn't hurt while I'm at changing things around. Mainly was concerned that it looks like you could end up copying a bunch of garbage into token if linebuffer had garbage in it when cdrom_file::tokenize was called.

For testing I extracted some v5 CHDs I had on hand and then recreated them to make sure the hashes matched. Also tried a few Redump Dreamcast dumps to make sure the extracted .bin matched the original Redump data (compared with all individual tracks combined into a single file so I could just do a SHA1 check between the CHD dump and original data).

987123879113 · 2024-03-02T15:22:50Z

src/lib/util/cdrom.cpp

+				if (wavlen != 0)
+				{
+					outtoc.tracks[trknum].frames = wavlen/2352;
+					outinfo.track[trknum].offset = wavoffs;
+					wavoffs = wavlen = 0;
+				}


This originally set outtoc.tracks[trknum].trktype = CD_TRACK_AUDIO; but directly below this convert_type_string_to_track_info gets called which will set outtoc.tracks[trknum].trktype based on the TRACK's typestring so I removed it.

Maybe it would be a better idea to move the call to convert_type_string_to_track_info before this so the track type can be forced when a WAVE file type is selected as input?

…amedev#11913)" This reverts commit b75b8d8.

987123879113 · 2024-03-03T09:24:07Z

I ended up reverting #11913 and rewrote how Dreamcast cue/bins are parsed. This PR breaks both the SHA1 and sometimes the Data SHA1 of any Dreamcast CHDs previously made using Redump cue/bins but it feels necessary. CHDs created using GDIs (both TOSEC and Redump GDIs) are unaffected.

I didn't test with a ton of games but I did test with all of the type varieties previously implemented. Most games are either "type I" or "type III split". Here's the breakdown using all of the Dreamcast cues from redump's database:

type1 796
type2 3
type3 4
type3_split 590
multisession 80
unknown 0
total 1473

For easy testing/reference, here are the type II and type III (non-split) games:

    "type2": [
        "UnderCover AD2025 Kei (Japan).cue",
        "Memories Off 2nd - Making Disc (Japan).cue",
        "Roommate Asami - Okusama wa Joshikousei - Director's Edition (Japan).cue"
    ],
    "type3": [
        "Shenmue II (Japan) (Disc 3).cue",
        "Shenmue II (Japan) (Disc 2).cue",
        "Shenmue II (Japan) (Disc 4).cue",
        "Virtua Fighter History & VF4 (Japan).cue"
    ],

Disclaimer: The TOSEC .raw files all seem to have the audio data start at a random offset (dump method issue?) whereas Redump's audio tracks were all uniform from my testing, so for testing purposes I manually modified the TOSEC .raw files to have the data start at offset 0 and padded the files an appropriate length to keep the same filesize.

Create CHD from TOSEC GDI, extract to GDI
- Working, generates same output bins+GDI as input TOSEC GDI.
Create CHD from Redump cue/bin, extract to Redump cue/bin
- Working but not perfect, generates invalid cue (no REM single/double commands, files not split as required) but same output bin as input Redump bins. Data is intact so it can be restored by splitting the .bin into separate tracks and using the Redump .cue file.
Create CHD from Redump cue/bin, extract to GDI
- Because the data internally gets shifted to match TOSEC GDI, the output GDI becomes a TOSEC GDI but has the pregap data normally not present in the TOSEC rips included at the end of the track data. For example, for Star Wars - Episode I - Racer v1.001 (US) the TOSEC GDI says track 2 starts at 753 (so 753 * 2352 = 0x1b0630 bytes of data between start of track 1 and start of track 2, but the actual track01.bin is (753 - 150) * 2352 = 0x15a410 bytes). Personally I don't think this should cause problems because that data appears to have been just left blank in the TOSEC GDI dumps, and now it's filled in with real data that'd go in between those gaps. The actual data outside of those gaps is the same and if you remove the extra data at the end of the tracks to match the TOSEC GDI dump filesizes then the extracted track data has the same SHA-1 hashes as the original TOSEC GDI dump track data.
Create CHD from Redump GDI, extract to GDI
- Working, generates same output bins+GDI as input Redump GDI. Format of GDI doesn't match exactly due to text formatting but contents are the same. I would not recommend creating CHDs using the Redump GDIs simply for the fact that it'll be in a different format internally compared to the TOSEC GDI and Redump cue/bin CHDs (pregap is associated to different track compared to TOSEC layout and there's no easy way to reorder it without some hacky way to detect a Redump GDI vs TOSEC GDI).
Create CHD from Redump GDI, extract to Redump cue/bin
- Working but not perfect, generates invalid cue (no REM single/double commands, files not split as required) but same output bin as input Redump bins. Data is intact so it can be restored by splitting the .bin into separate tracks and using the Redump .cue file.

…ous compatibility fixes (fixes everything except 150 lost pregap frames from type III split formats)

…bins

987123879113 · 2024-03-03T11:15:15Z

src/tools/chdman.cpp

+		if (cdrom->is_gdrom() && (mode == MODE_CUEBIN))
+		{
+			util::stream_format(std::cout, "Warning: extracting GD-ROM CHDs as bin/cue is not fully supported and will result in an unusable CD-ROM cue file.\n");
+		}
+


This never worked even before the recent changes, so finally add a warning about it. Someone needs to go in and add proper support for GD-ROM cues.

cuavas

I didn’t see any major issues with a quick read, but I’m pretty tired and I’m not overly familiar with this part of the code. It really needs testing, and probably some more pairs of eyes.

I’m not very happy about the fallout from the last change to the GD-ROM handling, or the other issues in chdman that keep being found. Issues in CHD handling don’t just affect MAME.

cuavas · 2024-03-03T20:41:33Z

src/lib/util/cdrom.cpp

-	while ((i < linebuffersize) && (j < tokensize))
+	while ((i < linebuffersize) && (j < tokensize) && (linebuffer[i] != '\0'))


How does linebuffer[i] != '\0' happen? Is linebuffersize the size of the buffer rather than the size of its content? The code seemed to be treating it as the size of the content.

linebuffersize is just sizeof(linebuffer) so it's the entire buffer size rather than its contents. So without the null byte check, the entire buffer is processed every time instead of just ending when reasonable.

mame/src/lib/util/cdrom.cpp

Lines 1411 to 1422 in 1565fb9

* @def TOKENIZE();

*

* @brief A macro that defines tokenize.

*

* @param linebuffer The linebuffer.

* @param i Zero-based index of the.

* @param sizeof(linebuffer) The sizeof(linebuffer)

* @param token The token.

* @param sizeof(token) The sizeof(token)

*/

#define TOKENIZE i = tokenize( linebuffer, i, sizeof(linebuffer), token, sizeof(token) );

Eugh, if linebuffer is a pointer, bad things will happen. Will it work if you change it to tokenize( linebuffer, i, std::size(linebuffer), token, std::size(token) ); so it will give an error if linebuffer and token aren’t array-like?

Yeah, it works from some quick testing.
a8388ef

src/lib/util/cdrom.cpp

…element count for tokenizer

cuavas · 2024-03-06T15:15:59Z

I merged this. It should put us in a better position than where we’ve been for the last few months, anyway. I’d still appreciate if more people could give it a spin.

…Hub mamedev#12081). (mamedev#12087) This should greatly improve data integrity when creating and extracting GD-ROM images. * util/cdrom.cpp: Refactored parse_cue to handle GD-ROMs. * util/cdrom.cpp: Don't discard any data from GD-ROM cue/bin input including pre-gap data. * tools/chdman.cpp: Fixed splitframes handling. * tools/chdman.cpp: Added warning when extracting GD-ROM CHDs to cue/bin format.

TheRealGusBus · 2024-04-14T06:34:31Z

@cuavas Tested against a nearly complete Redump set (a few seem MIA) using oxyromon as the ROM manager. Re-validated the CHD files using the original CUE files with no obvious issues.

cuavas · 2024-04-14T15:34:15Z

@cuavas Tested against a nearly complete Redump set (a few seem MIA) using oxyromon as the ROM manager. Re-validated the CHD files using the original CUE files with no obvious issues.

Thanks.

johnsanc314 · 2024-05-24T21:06:20Z

OK then... specifically, this disc: Redump ID 49774

987123879113 commented Mar 2, 2024

View reviewed changes

987123879113 added 2 commits March 3, 2024 07:47

Revert "util/cdrom.cpp: Don't strip pregaps from Redump GD-ROM files (m…

df787c2

…amedev#11913)" This reverts commit b75b8d8.

util/cdrom: Refactor parse_cue to handle GDROM

ce11a2c

987123879113 force-pushed the remove_gdipattern branch 4 times, most recently from 9443cb3 to 5250a57 Compare March 3, 2024 08:25

987123879113 marked this pull request as ready for review March 3, 2024 09:24

987123879113 marked this pull request as draft March 3, 2024 10:16

987123879113 added 3 commits March 3, 2024 19:58

util/cdrom: Revert change to favor Redump format for Dreamcast + vari…

b248fd0

…ous compatibility fixes (fixes everything except 150 lost pregap frames from type III split formats)

tools/chdman: Fix splitframes handling

64a2cca

util/cdrom: Don't throw out any data from input Redump Dreamcast cue/…

d3169c7

…bins

987123879113 force-pushed the remove_gdipattern branch from 5250a57 to d3169c7 Compare March 3, 2024 10:58

987123879113 marked this pull request as ready for review March 3, 2024 10:59

tools/chdman: Add warning when extracting GD-ROM CHD to cue/bin

8a7e13a

987123879113 commented Mar 3, 2024

View reviewed changes

987123879113 mentioned this pull request Mar 3, 2024

chdman 0.263 error does not compress dreamcast games #12081

Closed

cuavas reviewed Mar 3, 2024

View reviewed changes

util/cdrom: Remove static usages

0ed1996

987123879113 force-pushed the remove_gdipattern branch from f63c605 to 0ed1996 Compare March 4, 2024 02:02

alucryd mentioned this pull request Mar 4, 2024

[Issue] Redump Dreamcast ROMs fail to verify after conversion to CHD format. alucryd/oxyromon#110

Closed

maxexcloo mentioned this pull request Mar 5, 2024

Handle Disk Image Formats (CHD, CSO, RVZ, etc.) emmercm/igir#937

Open

cuavas reviewed Mar 5, 2024

View reviewed changes

src/lib/util/cdrom.cpp Outdated Show resolved Hide resolved

util/cdrom: Replace usages of sizeof with std::size when referencing …

a8388ef

…element count for tokenizer

cuavas merged commit 853db18 into mamedev:master Mar 6, 2024
5 checks passed

987123879113 deleted the remove_gdipattern branch March 7, 2024 05:51

987123879113 mentioned this pull request Mar 18, 2024

[chdman] some Dreamcast CHDs are missing some data after going back to CUE/BIN #11903

Closed

987123879113 mentioned this pull request Mar 30, 2024

chdman: Add support for exporting Dreamcast cues and outputting one bin per track #12191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util/cdrom: Refactor parse_cue and parse_gdicue #12087

util/cdrom: Refactor parse_cue and parse_gdicue #12087

987123879113 commented Mar 2, 2024

987123879113 Mar 2, 2024

987123879113 commented Mar 3, 2024 •

edited

Loading

987123879113 Mar 3, 2024

cuavas left a comment

cuavas Mar 3, 2024

987123879113 Mar 4, 2024

cuavas Mar 5, 2024

987123879113 Mar 6, 2024 •

edited

Loading

cuavas commented Mar 6, 2024

TheRealGusBus commented Apr 14, 2024

cuavas commented Apr 14, 2024

johnsanc314 commented May 24, 2024

		while ((i < linebuffersize) && (j < tokensize))
		while ((i < linebuffersize) && (j < tokensize) && (linebuffer[i] != '\0'))

	* @def TOKENIZE();
	*
	* @brief A macro that defines tokenize.
	*
	* @param linebuffer The linebuffer.
	* @param i Zero-based index of the.
	* @param sizeof(linebuffer) The sizeof(linebuffer)
	* @param token The token.
	* @param sizeof(token) The sizeof(token)
	*/

	#define TOKENIZE i = tokenize( linebuffer, i, sizeof(linebuffer), token, sizeof(token) );

util/cdrom: Refactor parse_cue and parse_gdicue #12087

util/cdrom: Refactor parse_cue and parse_gdicue #12087

Conversation

987123879113 commented Mar 2, 2024

987123879113 Mar 2, 2024

Choose a reason for hiding this comment

987123879113 commented Mar 3, 2024 • edited Loading

987123879113 Mar 3, 2024

Choose a reason for hiding this comment

cuavas left a comment

Choose a reason for hiding this comment

cuavas Mar 3, 2024

Choose a reason for hiding this comment

987123879113 Mar 4, 2024

Choose a reason for hiding this comment

cuavas Mar 5, 2024

Choose a reason for hiding this comment

987123879113 Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

cuavas commented Mar 6, 2024

TheRealGusBus commented Apr 14, 2024

cuavas commented Apr 14, 2024

johnsanc314 commented May 24, 2024

987123879113 commented Mar 3, 2024 •

edited

Loading

987123879113 Mar 6, 2024 •

edited

Loading