Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Database) Serial scanning for Gamecube/MegaCD/SegaCD/Saturn/PSX/PSP/Dreamcast/Wii #10291

Closed
wants to merge 3 commits into from

Conversation

pkos
Copy link
Contributor

@pkos pkos commented Mar 15, 2020

Guidelines

  1. Rebase before opening a pull request
  2. If you are sending several unrelated fixes or features, use a branch and a separate pull request for each
  3. If possible try squashing everything in a single commit. This is particularly beneficial in the case of feature merges since it allows easy bisecting when a problem arises

Description

These updates improve support for "Directory Scan". While previously the scan was made against raw disc serials, this code converts and parses those serials into redump.org curated serials.

Logs and playlists now include these Redump serials.

Multi-disc support.

CRC checks are still enabled and disabling them may increase performance.

Bad discs are scrubbed and logged.

Related Issues

@RobLoach has updated the libretro-databases to include serials (3/17/20).
libretro/libretro-database#1026

@RobLoach has updated some string parsing to redump codes. This added support for Playstation - Portable as confirmed no changes to the existing psp detect serial function was needed.
RobLoach/libretro-dats#37

@RobLoach has updated muti-disc appendices (3/28/20). This update added support for multi-disc scanning.
libretro/libretro-database#1032

This PR code will close the following issues:
#10282
libretro/libretro-database#990
libretro/libretro-database#839
libretro/libretro-database#813
libretro/libretro-database#613
libretro/libretro-database#423

Currently, scanner will not detect .gcz, .cso files, but it will crc them.
#7428

This PR code replaces changes already merged here:
Reworked magic numbers.
#6544
#6555
#5239
Serials are in playlists.
#6548

Related Pull Requests

The following libchdr issue will crash the scanner.
#10368
Issue resolved (3/31/20)
#10376

Reviewers

@realnc
@celerizer
@Alcaro
@panasonic
@phcoder

They have added support in #programming, fixing bugs and providing C code.

tasks/task_database_cue.c Outdated Show resolved Hide resolved
@RobLoach
Copy link
Member

Excited to see scanning improve here. Nicely done! Is there something in redump dats that needs to change?

@pkos
Copy link
Contributor Author

pkos commented Mar 15, 2020

Excited to see scanning improve here. Nicely done! Is there something in redump dats that needs to change?

Thank you, I just updated this PR with truly real c code... yay.

@pkos
Copy link
Contributor Author

pkos commented Mar 15, 2020

Excited to see scanning improve here. Nicely done! Is there something in redump dats that needs to change?

Thank you, I just updated this PR with truly real c code... yay.

The redump dats are fine. The game_id variable (serial) is set in this code to match the redump dats serials. If anything, maybe an update to the gamecube rdb would be nice.

@pkos
Copy link
Contributor Author

pkos commented Mar 15, 2020

Excited to see scanning improve here. Nicely done! Is there something in redump dats that needs to change?

Thank you, I just updated this PR with truly real c code... yay.

The redump dats are fine. The game_id variable (serial) is set in this code to match the redump dats serials. If anything, maybe an update to the gamecube rdb would be nice.

@realnc has helped me to properly use C code formatting.. i think its ready

@realnc
Copy link
Contributor

realnc commented Mar 15, 2020

You still have errors. I assumed you actually compiled the code prior to pushing it:

return rue;

You should compile and test your code before submitting.

tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
@pkos
Copy link
Contributor Author

pkos commented Mar 16, 2020

You still have errors. I assumed you actually compiled the code prior to pushing it:

return rue;

You should compile and test your code before submitting.

fixed and just recently compiled on windows

retroarch.cfg Outdated Show resolved Hide resolved
@RobLoach RobLoach changed the title (database task) converts raw gamecube serial to redump serial (Database) Converts raw GameCube serial to Redump serial Mar 16, 2020
@pkos pkos changed the title (Database) Converts raw GameCube serial to Redump serial (database task) converts raw serial to redump.org serial, support for Nintendo - Gamecube and Sega - Mega-CD - Sega CD Mar 18, 2020
@pkos
Copy link
Contributor Author

pkos commented Mar 18, 2020

You still have errors. I assumed you actually compiled the code prior to pushing it:

return rue;

You should compile and test your code before submitting.

Definately, the code compiles with make C89_BUILD=1.. more than that its actually works with the latest rdbs from Rob.

@pkos pkos changed the title (database task) converts raw serial to redump.org serial, support for Nintendo - Gamecube and Sega - Mega-CD - Sega CD (database task) converts raw serial to redump.org serial, support for Nintendo - Gamecube and Sega - Mega-CD - Sega CD. Improved detect system. Mar 18, 2020
@RobLoach RobLoach changed the title (database task) converts raw serial to redump.org serial, support for Nintendo - Gamecube and Sega - Mega-CD - Sega CD. Improved detect system. (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD Mar 19, 2020
@pkos pkos changed the title (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn Mar 19, 2020
@lgtm-com
Copy link

lgtm-com bot commented Mar 20, 2020

This pull request introduces 1 alert when merging 47444b7 into 532fd88 - view on LGTM.com

new alerts:

  • 1 for Missing return statement

@pkos pkos changed the title (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/Playstation Portable Mar 20, 2020
@pkos pkos changed the title (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/Playstation Portable (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast Mar 21, 2020
@lgtm-com
Copy link

lgtm-com bot commented Mar 21, 2020

This pull request introduces 3 alerts when merging 028b2e5 into f48c668 - view on LGTM.com

new alerts:

  • 3 for Comparison result is always the same

Makefile.psp1 Outdated
@@ -25,7 +25,7 @@ INCDIR = deps deps/stb deps/7zip deps/pthreads deps/pthreads/platform/psp deps/p
CFLAGS = $(OPTIMIZE_LV) -G0 -std=gnu99 -ffast-math -fsingle-precision-constant
ASFLAGS = $(CFLAGS)

RARCH_DEFINES = -DPSP -D_MIPS_ARCH_ALLEGREX -DHAVE_LANGEXTRA -DHAVE_ZLIB -DHAVE_RPNG -DHAVE_RJPEG -DHAVE_GRIFFIN=1 -DRARCH_INTERNAL -DRARCH_CONSOLE -DHAVE_MENU -DHAVE_CONFIGFILE -DHAVE_RGUI -DHAVE_FILTERS_BUILTIN -DHAVE_7ZIP -DHAVE_CC_RESAMPLER
RARCH_DEFINES = -DPSP -D_MIPS_ARCH_ALLEGREX -DHAVE_LANGEXTRA -DHAVE_ZLIB -DHAVE_AUDIOMIXER -DHAVE_RPNG -DHAVE_RJPEG -DHAVE_GRIFFIN=1 -DRARCH_INTERNAL -DRARCH_CONSOLE -DHAVE_MENU -DHAVE_CONFIGFILE -DHAVE_RGUI -DHAVE_FILTERS_BUILTIN -DHAVE_7ZIP -DHAVE_CC_RESAMPLER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some unrelated changes. Did the rebase mess up again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but i restored the original files

@pkos pkos changed the title (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast [WIP] (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast Mar 22, 2020
@lgtm-com
Copy link

lgtm-com bot commented Mar 22, 2020

This pull request introduces 3 alerts when merging f58fcb2 into f48c668 - view on LGTM.com

new alerts:

  • 3 for Comparison result is always the same

@pkos
Copy link
Contributor Author

pkos commented Mar 22, 2020

This pull request introduces 3 alerts when merging f58fcb2 into f48c668 - view on LGTM.com

new alerts:

  • 3 for Comparison result is always the same

its not always the same, based on raw serial!!!!!!

@pkos pkos changed the title [WIP] (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast [WIP] (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast. Logs and playlists contain Redump serials. Mar 23, 2020
@pkos pkos changed the title [WIP] (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast. Logs and playlists contain Redump serials. (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast. Logs and playlists contain Redump serials. Mar 23, 2020
@lgtm-com
Copy link

lgtm-com bot commented Mar 23, 2020

This pull request introduces 3 alerts when merging f02da6b into 8643cb2 - view on LGTM.com

new alerts:

  • 3 for Comparison result is always the same

@pkos pkos changed the title (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast. Logs and playlists contain Redump serials. (Database) Scan Redump serials for Gamecube/MegaCD/SegaCD/Saturn/PlayStation Portable/Dreamcast/Wii. Logs and playlists contain Redump serials. Mar 23, 2020
@RobLoach
Copy link
Member

@bkoropoff Would you mind giving a review, if possible? I've seen you in the scanning code too.

@i30817
Copy link
Contributor

i30817 commented Mar 31, 2020

Aren't the segacd serials kind of unreliable? I feel like they're probably in JIS enconding or something from how crazy is looking a what appears in redump webpages and what appears if you look at the bytes in a stream of a good dump. If a game appears on both the segacd and the megadrive there are vast parts of the header that are unmodified, so i'm kind of nervous about that too iirc.

For that matter, is the scanner even using encodings when parsing out the serials? Last i checked it scanned the bytestream for a magic 'platform id' then took a number of bytes to search for a 'serial id, so if every redump dat lists utf-8, that is kind of problematic.

And needless to say this PR makes for more platforms where hacks scanning will no longer work due to serial scanning.

@pkos
Copy link
Contributor Author

pkos commented Mar 31, 2020

The serials are pretty reliable. 11 chars (no JIS) at offset 0x0193 or 0x0194.

Bad dumps will be scrubbed by the scanner although that is not logged.

@pkos
Copy link
Contributor Author

pkos commented Mar 31, 2020

Hacks with the same serial as the original will be scanned as original. Hackers should take note.

tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Show resolved Hide resolved
tasks/task_database_cue.c Show resolved Hide resolved
tasks/task_database_cue.c Show resolved Hide resolved
@RobLoach
Copy link
Member

@Jamiras Thanks for the review!

Copy link
Contributor Author

@pkos pkos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes expect the trim left right function were approved. The code was tested again with the same results as last time.

tasks/task_database_cue.c Outdated Show resolved Hide resolved
tasks/task_database_cue.c Outdated Show resolved Hide resolved
@i30817
Copy link
Contributor

i30817 commented Apr 1, 2020

I have a question about this, does this update touch the thumbnail behavior? The usecase i'm interested in is the thumbnails when there is no serial or crc available.

I think it currently uses the filename, which is the reason the manual scanner even works for thumbnails sometimes as long as the actual name is the same as the database name.

Well i made a m3u script to create them : https://gist.github.com/i30817/ba37fbb2b3c6e34ff926ad833f465055

And my script makes multidisc games look like this first and not the second

Xenogears (USA).m3u

Xenogears (USA) (Disc 1).m3u

this obviously misses the game thumbnail, though it's nicer. So i'm asking if there is any attempt to parse the serial of the first disc or something like that in m3u files thumbnails download if the name can't be used? Just to know if i'm going to have to have a compatibility switch on the tool to use that second form on m3u filenames, even if it's ugly.

@pkos
Copy link
Contributor Author

pkos commented Apr 1, 2020

I have a question about this, does this update touch the thumbnail behavior? The usecase i'm interested in is the thumbnails when there is no serial or crc available.

When serials are available they will be matched to the database entry, this will effect the thumbnails as they are matched to the database entries. No serial will generate a crc, if the crc is in the database it will effect thumbnails. No crc, well I don't know if a case where no crc is generated.

@Sanaki
Copy link
Contributor

Sanaki commented Apr 1, 2020

Thumbnails have nothing to do with this, nor should they. If a thumbnail exists for a standard m3u name, it's used. Some already are in the thumbnail repo, such as Chrono Cross (USA). The correct solution would be to add more thumbnails to match standardized names like that.

@i30817
Copy link
Contributor

i30817 commented Apr 1, 2020

No crc, well I don't know if a case where no crc is generated.

The manual scanner - it matches the filename to a rom-name on the database for the playlist system, and in that case the label is the filename minus extension - though i think @sanaka is correct, this would be cleaner fixed in the database, even if only as a symlink to the cover of 'disc 1' or something. The names without the 'disc' could be autogenerated from the redump cues collection, so maybe it's a good idea to add a issue on libretro-database.

@pkos
Copy link
Contributor Author

pkos commented Apr 1, 2020

The serials are pretty reliable. 11 chars (no JIS) at offset 0x0193 or 0x0194.

Bad dumps will be scrubbed by the scanner although that is not logged.

Scrubbed discs are now logged.

@pkos
Copy link
Contributor Author

pkos commented Apr 1, 2020

No crc, well I don't know if a case where no crc is generated.

The manual scanner - it matches the label to a rom-name on the database for the playlist system, and in that case the label is the filename minus extension - though i think @sanaka is correct, this would be cleaner fixed in the database, even if only as a symlink to the cover of 'disc 1' or something. The names without the 'disc' could be autogenerated from the redump cues collection, so maybe it's a good idea to add a issue on libretro-database.

I agree, Sega/Mega CD can be cleaned up a little in the database.. the Sega/Mega CD dat file has a lot of duplicate serials for games from different regions. Rob and I should be able to handle different Mega/Sega CD regions in the code and database.

These changes will not affect the manual scanner, only "Scan Directory" and "Scan File" as with manual a database check is not made.

…Station/PlayStation Portable/Dreamcast/Wii. Multi-disc support. Logs and playlists contain Redump serials.
@RobLoach
Copy link
Member

RobLoach commented Jun 8, 2020

@pkos Looks like a git conflict. Fixed it up over at https://github.com/pkos/RetroArch/pull/4

@inactive123
Copy link
Contributor

Hmm, seems there is a merge conflict that has to be resolved first.

After the merge conflict is resolved, I can ask jdgleaver to review this.

@RobLoach
Copy link
Member

RobLoach commented Jul 2, 2020

Might have missed a few things in the conflicts... Would appreciate a review. https://github.com/pkos/RetroArch/pull/5

@inactive123
Copy link
Contributor

Hi guys, I am still very interested in this PR, but it hasn't seem to have gotten any traction in the past couple of months. Any initiative to get the remaining issues sorted?

@pkos
Copy link
Contributor Author

pkos commented Dec 19, 2020

Hi guys, I am still very interested in this PR, but it hasn't seem to have gotten any traction in the past couple of months. Any initiative to get the remaining issues sorted?

Currently, the serial scanner is complete for the above systems, due to limitations in some digital media not all games will have a serial. I'd say the serial scanner is a valid way to quickly scan games, however, it can be further improved in two ways.

First, the addition of "fuzzy" matching using the Levenshtein Distance could improve identification of games with the same serial but are actually considered different by Redump (ie. RE1, Beta, etc.). In addition, after updating the RA code, there would need to be a simultaneous update to the RA databases to include these different games.

Second, a vast and complete identification of derivatives of the original Redump images is underway and slowly growing with support currently for SNK Neo Geo CD, NEC - PC - FX, and Panasonic 3DO. The datafiles for these CRC32, internal SHA1, and MD5 are available here (https://github.com/pkos/CRoSG). With these files added to the RA databases, it is possible to scan by crc32 the CHDs, and with updated RA code it would be improved to internal SHA1, the fastest and more versatile way to identify CHDs.

My final thought is that the serial scanner alone should be working as complete as it is possible to make it and it will suffer some limitations to the number of game it will identify alone.

@pkos
Copy link
Contributor Author

pkos commented Dec 19, 2020

Also, I've forked RA with the latest code updates here (https://github.com/pkos/RetroArch).

The appropriate pull request is here (#11719).

I've fixed all merge issues today.

@i30817
Copy link
Contributor

i30817 commented Dec 19, 2020

Levenshtein Distance is not a good idea to id gamecube games. Changing a letter in the first position is much more meaningful that changing many letters in the last and levenshtein is geared to pick up mistakes which are independent of position or to identify languages (with a weird 'average levensthtein distance' thing that works most of the time surprisingly).

Honestly it should be hardcoded, and if you require a 'group' for the variants for reasons and it's not predictable from the Nintendo standard, just group them on a fileformat somewhere.

I'd also mention that 'games hacks being id'ed as the original game' is a very common problem on other platforms that use this kind of flawed id, but it's admittedly not much of a problem on gamecube/wii because there are few hacks (some translations but it doesn't affect the settings needing to be different most times).

edit: rereading the post it appears you don't want to use the algorithm on the serial? Why would you want fuzziness then?

@pkos
Copy link
Contributor Author

pkos commented Dec 19, 2020

Levenshtein Distance is not a good idea to id gamecube games. Changing a letter in the first position is much more meaningful that changing many letters in the last and levenshtein is geared to pick up mistakes which are independent of position or to identify languages (with a weird 'average levensthtein distance' thing that works most of the time surprisingly).

Honestly it should be hardcoded, and if you require a 'group' for the variants for reasons and it's not predictable from the Nintendo standard, just group them on a fileformat somewhere.

I'd also mention that 'games hacks being id'ed as the original game' is a very common problem on other platforms that use this kind of flawed id, but it's admittedly not much of a problem on gamecube/wii because there are few hacks (some translations but it doesn't affect the settings needing to be different most times).

edit: rereading the post it appears you don't want to use the algorithm on the serial? Why would you want fuzziness then?

I think you've confused the scope of the LD with actually matching by name, this is not the case, it will only be done by serial or CRC32. Same serials however would benefit from LD (limited scope).

@RobLoach
Copy link
Member

The latest changes are over in this Pull Request: #11719

@RobLoach RobLoach closed this Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants