Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saturn, SegaCD and Dreamcast using CRC instead of Serial #613

Closed
i30817 opened this issue Apr 11, 2018 · 8 comments
Closed

Saturn, SegaCD and Dreamcast using CRC instead of Serial #613

i30817 opened this issue Apr 11, 2018 · 8 comments

Comments

@i30817
Copy link
Contributor

i30817 commented Apr 11, 2018

Is there any reason for this? The serial scan (mostly) picks up hacks if with the wrong name.

Is there any reason why sega consoles serial is hard to scrape? Redump seems to have them, and if they're hard to calculate, you can scrape the whole header and create a 'serial' that is a checksum of the header bytes.
From looking at the examples on the serial column for segacd there are some tricks to parsing it:
http://redump.org/discs/system/mcd/

like this game which apparently has a different serial on the disc face than on the header with the serial in between parenthesis being the one on disc:
http://redump.org/disc/20094/
K78Z5170(T-75014)

There is also the small problem that you have to treat iso files different than MODE1/2352 (what redump uses) and have slightly different indexes for the 'start of the header because of that. This is important because translations sometimes mandate iso conversion, though i suppose this is a RA fix.

rant about RA database 'keys' below:

On my 'ideal' all the cd consoles would use serial, hacks would have a additional 'crc32' on the database and when a serial was requested, if any of the resulting 'games' had a crc32 to check (ie: a hack) then the crc32 would be forced to check, to disambiguate.

Calculated (if actually calculated) crc32 (not actually the one in the database entry) would be stored on the playlists along with the serial, and if a feature on RA requires a crc32 - like netplay - it would force the calculation of it or fetch from the playlist if it exists.

@i30817
Copy link
Contributor Author

i30817 commented Apr 11, 2018

I also have another questions.

game (
	name "Mega Man - The Wily Wars (Europe) Move-Shoot Hack"
	description "Hacks for 60FPS, SRAM support and NES-like movement/shooting"
	rom ( name "Mega Man - The Wily Wars (move-shoot hack).bin" size 2097152 crc 1d9b9c91 md5 c13a4784a8bce87bb514d6f24a05d2f1 sha1 99cb17c2b141c21b5661049d3811ebf90fde40b2 )
)
  1. In the 'hacks' dats, does the rom name prevent a game from scanning if it's not exactly the same file ("Mega Man - The Wily Wars (move-shoot hack).bin" in this case) on the user side or does the crc32 takes care of it?
  2. If we are trying to document a cd based hack, we should enter the rom name as the track that gets patched or the cue? What if there are more than one file patched? I noticed that the name here are 'invented' and quite likely not to be the same as the user names.
  3. The name of that track should be based on the version of the game that gets patched? IE: if i patched the USA redump dump, that leads to a different crc32 than patching the PAL (or darkwater etc) dump so it should use the name of the track to differentiate?
  4. Can we use clrmamepro comments to document stuff like 'Redump (USA) source', 'requires conversion of track 1 to iso and edit of cue file to set track1 to MODE1/2048', 'requires ECD correction using tool x' etc?
  5. Are translations welcome here or should I not add them to (new) hacks .dats?

@RobLoach RobLoach changed the title Noticed that saturn, segacd and dreamcast are still using crc32 not serial scan. Saturn, SegaCD and Dreamcast using CRC instead of Serial Apr 11, 2018
@RobLoach
Copy link
Member

libretro-build-database sets the indexes as rom.crc instead of serial. Haven't tested the detection. We can switch it over to rom.serial on two conditions:

  1. RetroArch must be able to pick up the serial from the file
  2. libretro-database has the rom.serial information available

In the 'hacks' dats, does the rom name prevent a game from scanning

No, the rom name is irrelevant. What's important is the rom.crc or rom.serial. If indexes all the values, and then grabs the title from their entries.

we are trying to document a cd based hack, we should enter the rom name as the track that gets patched or the cue?

Doesn't really matter, as with CD-based media, RetroArch will attempt to find the serial info no matter what.

Can we use clrmamepro comments to document stuff

I don't think that's a problem. We'll find out when rebuilding the database if it parses it correctly.

Are translations welcome here or should I not add them to (new) hacks .dats?

Most certainly! Just move 'em into Hacks directory and make sure to have clear titles and descriptions.

@i30817
Copy link
Contributor Author

i30817 commented Apr 11, 2018

What about if RA finds a serial and then gives up on parsing to find the more specific hacked game crc?

Because that makes me think it's actually worthless to do a hack dat for systems which are using serial currently if the first thing they find is the serial and then give up on calculating the CRC and thus miss the hacks. Am I wrong about this?

It would be nice if RA was smart about things and searched first on hacks with serial for fast match 'failible' match and crc for exact match if the serial is found on the hack rdb, and only after failure fallback to the main rdb. So hacks would get found first always and the normal games could all use serials.

@RobLoach
Copy link
Member

What about if RA finds a serial and then gives up on parsing to find the more specific hacked game crc?

It does not currently fallback to CRC scanning when serial fails. There's a few issues in the RA queue about that... libretro/RetroArch#2033 is one, there are others.

@RobLoach
Copy link
Member

RobLoach commented Aug 29, 2018

Need two things to make serial scanning work....

  1. Serial data
  2. Have RetroArch able to read the serial information correctly

We don't have either of those in the platforms mentioned above.

@i30817
Copy link
Contributor Author

i30817 commented Aug 29, 2018

If there was a reliable way to get the serial data i could add it for the few cd hacks on those systems i have. However that info is not enough by itself because it would mean that hacks would have the same serial as the base game.

My dumpid utility script at https://gist.github.com/96674a3de8d9e4cb890e92cec3f36990

works (on linux only) for ps1 and ps2 hacks (barring a single misprint from origin that retroarch also gets wrong that i hacked around). The ps1 and ps2 serials are easy to find because they're the executable filename (early ps1 has them on the cd label).

( this section is the hack:


            #Urban Chaos has the wrong serial name in the executable (the one from threads of fate)
            #their label is different though
            if serial == 'SLUS-01019' and serial == iso.pvd.volume_identifier.strip():
                serial = 'SLUS-01091' #replace by serial in redump

)

This way I could at least add a serial to the hacks 'info' but it would have to be treated 'special' by Retroarch because this would mean that serials get duplicates on the database from the hacks. This could be used to recognize when a further crc check is needed though (the idea is duplicate serials == ambiguity). For this to not sink performance too much, the database would have to be pre-sorted by serial (so duplicates are fast to find) though - if that even matters.

Disadvantage is that games with cd game hacks would need to go down to crc32 checking on the scanner always because of the ambiguity involved, which would slow those down immensely in the case where cue or gdi files give bad results (i still can't believe TOSEC incompetence of having non-unique gdi files).

Even if this works though, the serials for the Sega consoles are especially poor quality because the file format of the header appears to have many different versions to the point that even redump has a lot of trash 'serials' that are obviously broken on the database. I don't know if these were inputed by hand or by script but reproducing them by script looks impossible.

What you can do and what i did on the script is to crc the whole header (which is on redump too).

If we could scrape the header from the redump pages and i use the fact that the whole header is fixed size on reliable positions that gives a unique id that is just as good as a serial but much more reliable to automate (except it makes harder to compare to foreign databases).

This idea might have a problem for the SegaCD though, since SEGA decided at the time to make these games have both a 'segacd' header and a following 'genesis' header (i don't know the reason). And redump only saves this second 'genesis' header if i remember this correctly (i might not). This may cause duplicates between segacd and genesis versions of the same game on this idea to reuse redump data for a premade collection to derive the new 'id' since no one here has a complete set to generate new ids for a new database. But then again, for genesis the scanner (and everyone else apparently) just crc32 the whole rom file not the header so it's irrelevant if the header is the same...

@i30817
Copy link
Contributor Author

i30817 commented Aug 29, 2018

A interesting alternative to crc32 whole gigabytes of files is to salt the cue file hash or game serial with the 'canonical' filename.

This would make users that rename files mad but if you use cue files for hashcodes, you're already depending on canonical filenames because they index 'real' files by filename inside of them and changing that would change the crc32, so it would already ruin the redump match, so it shouldn't be a big deal to make a cue file match the redump filename it's supposed to have and a cue file for a hack match another descriptive filename from a dat.

With a 'romhack' dat and a utility to rename the mismatched files retroarch could use serials+filename for everything for cds. A bit of a silly idea to force a standard that doesn't depend on Retroarch calculating checksums (the romhacks dat file would have to rename only cue, gdi and iso files though, 'index files' only to not interfere with the names inside the cue and gdi files).

edit: i explained this idea further in the issue you linked. filename+cue crc is also a alternative but probably a worse one (because many dump isos don't have a small index cue file to crc unless you make one automatically... which is.... possible i guess)

@pkos
Copy link
Contributor

pkos commented Mar 31, 2020

A interesting alternative to crc32 whole gigabytes of files is to salt the cue file hash or game serial with the 'canonical' filename.

This would make users that rename files mad but if you use cue files for hashcodes, you're already depending on canonical filenames because they index 'real' files by filename inside of them and changing that would change the crc32, so it would already ruin the redump match, so it shouldn't be a big deal to make a cue file match the redump filename it's supposed to have and a cue file for a hack match another descriptive filename from a dat.

With a 'romhack' dat and a utility to rename the mismatched files retroarch could use serials+filename for everything for cds. A bit of a silly idea to force a standard that doesn't depend on Retroarch calculating checksums (the romhacks dat file would have to rename only cue, gdi and iso files though, 'index files' only to not interfere with the names inside the cue and gdi files).

edit: i explained this idea further in the issue you linked. filename+cue crc is also a alternative but probably a worse one (because many dump isos don't have a small index cue file to crc unless you make one automatically... which is.... possible i guess)

This PR updates the scanner to detect Redump serials from the discs. Rob has updated the databases with serials. If you wish, please clone this PR and test your issue again,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants