Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redump Sega - Mega-CD - Sega CD missing serials #839

Open
ghost opened this issue Apr 20, 2019 · 10 comments
Open

Redump Sega - Mega-CD - Sega CD missing serials #839

ghost opened this issue Apr 20, 2019 · 10 comments

Comments

@ghost
Copy link

ghost commented Apr 20, 2019

It seems that the way index.js currently works at https://github.com/robloach/libretro-dats is creating issues for some games with alternate versions being misidentified/ignored.

In the case of “Sonic CD (USA)” for Sega CD, the official Redump DAT has three entries (tracklist abridged for readability) :

<game name="Sonic CD (USA)">
		<category>Games</category>
		<description>Sonic CD (USA)</description>
		<rom name="Sonic CD (USA).cue" size="3820" crc="33b99240" md5="638f1e41830319de1914772076096626" sha1="5b085f78a6904d275715dcfeb454ecc262e704e5"/>
		<rom name="Sonic CD (USA) (Track 01).bin" size="139381872" crc="a6184e05" md5="ed96420538f989e94480f810d5f90685" sha1="3c316af52cc9ee8c30976a1cdb66339545dda3bd"/>
		[...]

<game name="Sonic CD (USA) (RE125)">
		<category>Games</category>
		<description>Sonic CD (USA) (RE125)</description>
		<rom name="Sonic CD (USA) (RE125).cue" size="4100" crc="e8a15187" md5="92d0c52b29c25b21e8d0bef3e3086115" sha1="ff275b2a8bf5c301fed311addca55aee8a1a13ab"/>
		<rom name="Sonic CD (USA) (RE125) (Track 01).bin" size="139381872" crc="a6184e05" md5="ed96420538f989e94480f810d5f90685" sha1="3c316af52cc9ee8c30976a1cdb66339545dda3bd"/>
		[...]
		
<game name="Sonic CD (USA) (RE125) (Alt)">
		<category>Games</category>
		<description>Sonic CD (USA) (RE125) (Alt)</description>
		<rom name="Sonic CD (USA) (RE125) (Alt).cue" size="4310" crc="53948c76" md5="ac75d9b8331ba59da608c49e5cc03cb7" sha1="4d190b6a7cc7e2284f7c16f6afaf94ff625dfb8c"/>
		<rom name="Sonic CD (USA) (RE125) (Alt) (Track 01).bin" size="139381872" crc="a6184e05" md5="ed96420538f989e94480f810d5f90685" sha1="3c316af52cc9ee8c30976a1cdb66339545dda3bd"/>
		[...]

All versions share the same “Track 01.bin” checksums. Once parsed with the libretro-dat javascript routine, there is only one entry under “Sonic CD (USA) (RE125) (Alt)” in the libretro redump metadat:

game (
	name "Sonic CD (USA) (RE125) (Alt)"
	description "Sonic CD (USA) (RE125) (Alt)"
	rom ( name "Sonic CD (USA) (RE125) (Alt) (Track 01).bin" size 139381872 crc a6184e05 md5 ed96420538f989e94480f810d5f90685 sha1 3c316af52cc9ee8c30976a1cdb66339545dda3bd )

The difference in this case only appears in Track 13:

<rom name="Sonic CD (USA) (Track 13).bin" size="14013216" crc="c41a848e" md5="f01a263ce6475c1f43871483e5ab68f2" sha1="21dcc020dae2a0d650eed76540b54897d34ee889"/>
<rom name="Sonic CD (USA) (RE125) (Track 13).bin" size="14013216" crc="282d4f23" md5="88107ee77b536248c0b9149ac1c71ea4" sha1="5797d380e8df06f7d83a37ac55b3072f8a3dff57"/>
<rom name="Sonic CD (USA) (RE125) (Alt) (Track 13).bin" size="14013216" crc="8e9aaeaf" md5="c02b110395c9cb8a6a1066eb5c36835c" sha1="6cfbd4139d60eb7762874aa0308a01f40e0e17fc"/>

The same is true of “Layer Section (Japan) (3M)” and “Layer Section (Japan) (6M)” in the Sega Saturn set. Except the differences appear from Track 02.bin.

        <rom name="Layer Section (Japan) (6M).cue" size="2183" crc="4a838a68" md5="2f21a45ccdf5b439a35d0ad0a4b9f99a" sha1="75932d563fb498e77e0a107efca33c034a75b04c"/>
		<rom name="Layer Section (Japan) (6M) (Track 01).bin" size="16779168" crc="07e9d8da" md5="4cb2552702a5767be5b791efad43762d" sha1="d90ef75e7a8d0ff60fceb58d4e534308b85391ea"/>
		<rom name="Layer Section (Japan) (6M) (Track 02).bin" size="5644800" crc="4385448d" md5="f52bb23aaf351581cea09442d74e0981" sha1="76ee44b608700841f89a3b0c325e7ccf691562c5"/>
		

		<rom name="Layer Section (Japan) (3M).cue" size="2183" crc="6f8ecb12" md5="e6ec1fb1245f5a5ff095bfa29f9d62e1" sha1="7905e5451fbcd372905dae9c4ef2339dff271357"/>
		<rom name="Layer Section (Japan) (3M) (Track 01).bin" size="16779168" crc="07e9d8da" md5="4cb2552702a5767be5b791efad43762d" sha1="d90ef75e7a8d0ff60fceb58d4e534308b85391ea"/>
		<rom name="Layer Section (Japan) (3M) (Track 02).bin" size="5644800" crc="ccffab2f" md5="c7edd32ea27e9596546b6bf8d7148784" sha1="cce1afd9f917210cdf061bdf23982f1597e6639a"/>

Sega - Saturn Libretro MetaDAT

game (
	name "Layer Section (Japan) (6M)"
	description "Layer Section (Japan) (6M)"
	rom ( name "Layer Section (Japan) (6M) (Track 01).bin" size 16779168 crc 07e9d8da md5 4cb2552702a5767be5b791efad43762d sha1 d90ef75e7a8d0ff60fceb58d4e534308b85391ea )
)

Could the script entries be based on the cue checksums instead to make sure all versions are accounted for?

@RobLoach
Copy link
Member

Hey! Thanks for jumping into the craziness of checksum checking. Certainly could update it to use the .cue files instead, and that might help it be more consistent.

It finds the entry from the game over in this code here:
https://github.com/RobLoach/libretro-dats/blob/master/index.js#L262-L285

We could replace a lot of what's there to just force use of the CUE file if it's there. I think it was updated recently to select the first one from the .cue file, but 🤷‍♀️

I think just having it pick the .cue file makes more sense.

@ghost
Copy link
Author

ghost commented Apr 20, 2019

Hi Rob!
Thanks for the reply (and all your work in general!)

After a bit of butchering in the index.js (I don't know JS - or any language for that matter), I managed to output libretro DAT files with checksums for the cue files (SCD/SS). This solves the issue of games not being accounted for however no playlist gets created by RetroArch internal scan (after conversion to RDB with c_converter). I guess it can't be that easy!

@RobLoach
Copy link
Member

Great work debugging! There was some other serial debuggin that went on over here too: libretro/RetroArch#7404

Could you put up a Pull Request with your .cue changes? i'd love to check it out.

@ghost
Copy link
Author

ghost commented Apr 20, 2019

Hi again Rob!
I am new to github so I'm not sure how to "pull" that off. In the meantime, I have forked the repo and uploaded my changes here: https://github.com/KamilleBadesalz/libretro-database/blob/redump-debug-hacks/index.js

I don't think you are going to like it, I really just hacked in there with very little notion of what I was actually doing...

On the PSX front, I have had issues with a couple of games not being written to playlist despite being parsed in the metadat "Gunners Heaven (Japan)" and "Raiden Project (Japan)" (latest RDB from RetroArch Online Updater). But when I compiled my own DATs/RDB from the latest Redump DAT, it was the other way around (no games written to playlist, except for those two!)

@RobLoach
Copy link
Member

RobLoach commented Apr 25, 2019

Disc media uses the serial number. Looking at the following URL:
http://redump.org/discs/system/mcd/

I can see Sonic CD is listed as 4407, but we don't have that in our DB. Would be good to build a scrapper for the serials from redump.
http://redump.org/disc/29188/

When you load the game, does the log say it found a serial number?

@RobLoach RobLoach changed the title libretro Redump metadat issue? (alt. version of SCD/SS games absent from db) Redump Sega - Mega-CD - Sega CD missing serials Apr 25, 2019
@ghost
Copy link
Author

ghost commented Apr 25, 2019

Hi Rob! Thanks for the update!

This is what RetroArch logs (RDB compiled from my own DAT file with cue checksums).

[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 01).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 02).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 03).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 04).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 05).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 06).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 07).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 08).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 09).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 10).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 11).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 12).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 13).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 14).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 15).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 16).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 17).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 18).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 19).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 20).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 21).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 22).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 23).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 24).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 25).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 26).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 27).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 28).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 29).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 30).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 31).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 32).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 33).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 34).bin
[INFO] Pruning file referenced by cue: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 35).bin
[INFO] Parsing CUE file 'Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA).cue'...
[INFO] Reading first data track...
[INFO] Comparing with known magic numbers...
[INFO] Parsing CUE file 'Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA).cue'...
[INFO] CUE 'Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA).cue' primary track: Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA) (Track 01).bin
 (0, 139381872)
[INFO] Reading first data track...
[INFO] CUE 'Z:\EmuBox\Sega - Mega-CD - Sega CD\Sonic CD (USA)\Sonic CD (USA).cue' crc: a6184e05
[INFO] Written to playlist file: D:\_Emulators\RetroArch\playlists\Sega - Mega-CD - Sega CD.lpl

However, it gets written/identified as follows in the playlist:

      "path": "Z:\\EmuBox\\Sega - Mega-CD - Sega CD\\Sonic CD (USA)\\Sonic CD (USA).cue",
      **"label": "Sonic CD (USA) (RE125) (Alt)",**
      "core_path": "DETECT",
      "core_name": "DETECT",
      "crc32": "A6184E05|crc",
      "db_name": "Sega - Mega-CD - Sega CD.lpl"
    },
    {

The crc32 is that of the cue (great!) but the label is incorrect. Should be "Sonic CD (USA)"

Remark 1: the online Redump database shows all version as "Sonic CD" but the Redump DAT shows differently.

Based on the Track 13 checksums (cf. my first post)
"Sonic CD (USA)" is this version http://redump.org/disc/29188/ (the one I have)
"Sonic CD (USA) (RE125)" is this one http://redump.org/disc/24557/
"Sonic CD (USA) (RE125) (Alt)" is this one http://redump.org/disc/29653/

Remark 2: Both http://redump.org/disc/24557/ and http://redump.org/disc/29188/ share the same serial...

Same issue with the two versions of "Shining Force CD (USA)"
http://redump.org/disc/2702/ is "Shining Force CD (USA)" in the DAT
http://redump.org/disc/3875/ is "Shining Force CD (USA) (Alt)" in the DAT
But both have the same serial 4656

So we either:

  1. don't trust serials for db identification of Sega CD games (some games don't even have serials in the Redump list?)
  2. update the script to get rid of (RE125), (Alt) and probably countless other obscure suffixes...

I would be of the opinion that RetroArch favors the cue file checksum identification rather than serials but it seems to have been opposed in the past? (#423)

EDIT: same issue with Sega Saturn. In the case of "Layer Section"
http://redump.org/disc/3562/ "Layer Section (3M)"
http://redump.org/disc/17001/ "Layer Section (6M)"
Same serial for both.

@RobLoach
Copy link
Member

RobLoach commented Apr 25, 2019

Was able to replicate...

[INFO] Comparing with known magic numbers...
[INFO] Parsing CUE file 'Sonic CD (USA).cue'...
[INFO] CUE 'Sonic CD (USA).cue' primary track: Sonic CD (USA) (Track 01).bin
 (0, 139381872)
[INFO] Reading first data track...
[INFO] CUE 'Sonic CD (USA).cue' crc: a6184e05
[INFO] Written to playlist file: /home/rob/.config/retroarch/playlists/Sega - Mega-CD - Sega CD.lpl

➜  libretro-database git:(master) crc32 "Sonic CD (USA).cue"
33b99240
➜  libretro-database git:(master) crc32 "Sonic CD/Sonic CD (USA) (Track 01).bin"
a6184e05

So when RetroArch finds a cue file, it uses the first track found, and grabs its CRC. The optimal solution would be to update the scanning logic, but I think you're right about just removing the (Alt) entries.

Another question. Where is Sonic CD (USA) in the database?

game (
	name "Sonic CD (Europe)"
	description "Sonic CD (Europe)"
	rom ( name "Sonic CD (Europe) (Track 01).bin" size 129590496 crc 0060ac74 md5 1e0e1d19bbc308e9524ff853c2320e88 sha1 1ab058c8228af31a50def1b01f1f33961f2f23d7 )
)

game (
	name "Sonic CD (Japan) (Demo)"
	description "Sonic CD (Japan) (Demo)"
	rom ( name "Sonic CD (Japan) (Demo) (Track 01).bin" size 129592848 crc 128ac97f md5 8166ba70699bee43fb0d04fe34789fe6 sha1 02ebd329959c894ab95d63b4aec5940eeb333c28 )
)

game (
	name "Sonic CD (USA) (RE125) (Alt)"
	description "Sonic CD (USA) (RE125) (Alt)"
	rom ( name "Sonic CD (USA) (RE125) (Alt) (Track 01).bin" size 139381872 crc a6184e05 md5 ed96420538f989e94480f810d5f90685 sha1 3c316af52cc9ee8c30976a1cdb66339545dda3bd )
)

Does the title get overriden during the build of the .dat file? I think when a duplicate is found, it overrides the entry. You're right that there are three entries from Redump with a6184e05 as the first track:

  • Sonic CD (USA) (RE125)
  • Sonic CD (USA) (RE125) (Alt)
  • Sonic CD (USA)

Perhaps when there are duplicates found, it should pick the one with the least amount of characters in its title?

@ghost
Copy link
Author

ghost commented Apr 26, 2019

Hello again Rob,

Yes, with the current index.js routine, it looks like titles with similar Track01.bin checksums are overridden/assimilated under one single title. For example, in the case of Sonic it is "Sonic CD (USA) (RE125) (Alt)". This was the main focus of my initial post above. I originally thought that having the index.js pick up the CRC of the CUE would solve the problem (and it does in a way since all 3 versions are accounted for in the libretro-DAT if you change the script to retain the CUE checksum instead of Track01), but I know understand RetroArch has another way to identify CD-based games (serial/magic number).

Perhaps when there are duplicates found, it should pick the one with the least amount of characters in its title?

This unfortunately won't work in the case other games/sets. Case in point Layer Section (in the Saturn set). See first post -- "Layer Section (Japan) (6M)" and "Layer Section (Japan) (3M)".

Couldn't there be a switch in RetroArch to let the user pick up the scan method? (serial or checksums).

For reference, here are the libretro-DAT files I created with CUE checksums:
Sega CD
Saturn

EDIT: readability, links to DAT files

@i30817
Copy link
Contributor

i30817 commented May 30, 2019

Cue files are a bad way to get crcs, mostly because the actual game is on the track where the executable is, and so are any modifications or translations. By using a cue, you're introducing false positives for any hack that targets that game.

There are others SNAFUs like one of the sets for the dreamcast using the same gdis because someone in the dumping group had the brilliant idea to name track files the same, and nearly all gdis are the same there because there is no salt on the files.

Of course, track files are a bad way to get to the actual index file needed to play the game shrug. These dumping formats are not preoccupied with indexability or thinking ahead. MAME and chd can't eat them all fast enough.

I'm thinking of two things to 'solve' this without chd support everywhere, that wouldn't solve everything anyway, though i'm no C coder:

libretro/RetroArch#8873 <- this attaches checksums to the files themselves in any posix filesystem (not windows unfortunately). This means that any filesystem in linux, macos etc would behave like a zipped file to the scanner, reading crc32 directly instead of calculating (well, after a single scan of the tool), and those values move with the files. In fact i just got finished coding the feature in the tool i linked there and using it on my collection, though now to see if retroarch takes advantage, since it's a bit obscure. It also solves the problem of 'softpatching' false positives / false negatives by attaching the checksum of the result of the softpatch, though you have to remember to rescan with rhdndat the game when you update the softpatch, so maybe 'solves' is a bit too strong.

libretro/RetroArch#8672 <- this is my attempt to sketch out a 'configurable' scanner, where people could control even the types of files the scanner would consider as it travelled down a filesystem. This would be invisible to normal users, but power users could make the scanner use custom (for specialized cores) launcher files, though i'd prefer a grammar that could support crcs too, besides the name to cleanly separate the launcher file from the scanned file; so you could say 'if you find a matching crc on this dir, i want the cue file on the same dir as the launcher file'.

example:

top_game_dir
--------------------PSX_game_dir
------------------------------Sony - Playstation.detect with contents *.cue
------------------------------Game_dir_with_cue
------------------------------Game_dir2_with_cue
___________________Game_dir_with_toc
-------------------------------------------Sony - Playstation.detect with contents *.toc
-------------------Dosbox_game_dir
-----------------------------Dosbox.detect with contents dosbox.conf => game/*.exe:CRC32 (for any dosbox.conf use it as index file and use get the metadata key by looking for executables CRC32 you find under the 'game' subdir in the dosbox database (from the filename of the detect file). Something similar (*.cue => *.bin:CRC32) could be done to show the relationship between the 'index' file from the 'metadata key' for ps1 games if you wanted images there for instance.

You could even get it to scan only the 'actual game' track if you know your dump set has a name standard and divides music tracks out of the data track: *.cue => *(Track 1).bin:CRC32

@pkos
Copy link
Contributor

pkos commented Mar 31, 2020

Could the script entries be based on the cue checksums instead to make sure all versions are accounted for?

This PR updates the scanner code to check Redump serials, when it cant identify a disc a crc is generated. The logs and playlists now contain these serials. If you wish, please clone the PR and test this issue again. Also, Rob has updated the databases to contain serials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants