New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Database) Serial scanning for Gamecube/MegaCD/SegaCD/Saturn/PSX/PSP/Dreamcast/Wii #10291
Conversation
Excited to see scanning improve here. Nicely done! Is there something in redump dats that needs to change? |
Thank you, I just updated this PR with truly real c code... yay. |
The redump dats are fine. The game_id variable (serial) is set in this code to match the redump dats serials. If anything, maybe an update to the gamecube rdb would be nice. |
@realnc has helped me to properly use C code formatting.. i think its ready |
You still have errors. I assumed you actually compiled the code prior to pushing it:
You should compile and test your code before submitting. |
fixed and just recently compiled on windows |
Definately, the code compiles with make C89_BUILD=1.. more than that its actually works with the latest rdbs from Rob. |
This pull request introduces 1 alert when merging 47444b7 into 532fd88 - view on LGTM.com new alerts:
|
This pull request introduces 3 alerts when merging 028b2e5 into f48c668 - view on LGTM.com new alerts:
|
Makefile.psp1
Outdated
@@ -25,7 +25,7 @@ INCDIR = deps deps/stb deps/7zip deps/pthreads deps/pthreads/platform/psp deps/p | |||
CFLAGS = $(OPTIMIZE_LV) -G0 -std=gnu99 -ffast-math -fsingle-precision-constant | |||
ASFLAGS = $(CFLAGS) | |||
|
|||
RARCH_DEFINES = -DPSP -D_MIPS_ARCH_ALLEGREX -DHAVE_LANGEXTRA -DHAVE_ZLIB -DHAVE_RPNG -DHAVE_RJPEG -DHAVE_GRIFFIN=1 -DRARCH_INTERNAL -DRARCH_CONSOLE -DHAVE_MENU -DHAVE_CONFIGFILE -DHAVE_RGUI -DHAVE_FILTERS_BUILTIN -DHAVE_7ZIP -DHAVE_CC_RESAMPLER | |||
RARCH_DEFINES = -DPSP -D_MIPS_ARCH_ALLEGREX -DHAVE_LANGEXTRA -DHAVE_ZLIB -DHAVE_AUDIOMIXER -DHAVE_RPNG -DHAVE_RJPEG -DHAVE_GRIFFIN=1 -DRARCH_INTERNAL -DRARCH_CONSOLE -DHAVE_MENU -DHAVE_CONFIGFILE -DHAVE_RGUI -DHAVE_FILTERS_BUILTIN -DHAVE_7ZIP -DHAVE_CC_RESAMPLER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some unrelated changes. Did the rebase mess up again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but i restored the original files
This pull request introduces 3 alerts when merging f58fcb2 into f48c668 - view on LGTM.com new alerts:
|
its not always the same, based on raw serial!!!!!! |
This pull request introduces 3 alerts when merging f02da6b into 8643cb2 - view on LGTM.com new alerts:
|
@bkoropoff Would you mind giving a review, if possible? I've seen you in the scanning code too. |
Aren't the segacd serials kind of unreliable? I feel like they're probably in JIS enconding or something from how crazy is looking a what appears in redump webpages and what appears if you look at the bytes in a stream of a good dump. If a game appears on both the segacd and the megadrive there are vast parts of the header that are unmodified, so i'm kind of nervous about that too iirc. For that matter, is the scanner even using encodings when parsing out the serials? Last i checked it scanned the bytestream for a magic 'platform id' then took a number of bytes to search for a 'serial id, so if every redump dat lists utf-8, that is kind of problematic. And needless to say this PR makes for more platforms where hacks scanning will no longer work due to serial scanning. |
The serials are pretty reliable. 11 chars (no JIS) at offset 0x0193 or 0x0194. Bad dumps will be scrubbed by the scanner although that is not logged. |
Hacks with the same serial as the original will be scanned as original. Hackers should take note. |
@Jamiras Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes expect the trim left right function were approved. The code was tested again with the same results as last time.
I have a question about this, does this update touch the thumbnail behavior? The usecase i'm interested in is the thumbnails when there is no serial or crc available. I think it currently uses the filename, which is the reason the manual scanner even works for thumbnails sometimes as long as the actual name is the same as the database name. Well i made a m3u script to create them : https://gist.github.com/i30817/ba37fbb2b3c6e34ff926ad833f465055 And my script makes multidisc games look like this first and not the second
this obviously misses the game thumbnail, though it's nicer. So i'm asking if there is any attempt to parse the serial of the first disc or something like that in m3u files thumbnails download if the name can't be used? Just to know if i'm going to have to have a compatibility switch on the tool to use that second form on m3u filenames, even if it's ugly. |
When serials are available they will be matched to the database entry, this will effect the thumbnails as they are matched to the database entries. No serial will generate a crc, if the crc is in the database it will effect thumbnails. No crc, well I don't know if a case where no crc is generated. |
Thumbnails have nothing to do with this, nor should they. If a thumbnail exists for a standard m3u name, it's used. Some already are in the thumbnail repo, such as |
The manual scanner - it matches the filename to a rom-name on the database for the playlist system, and in that case the label is the filename minus extension - though i think @sanaka is correct, this would be cleaner fixed in the database, even if only as a symlink to the cover of 'disc 1' or something. The names without the 'disc' could be autogenerated from the redump cues collection, so maybe it's a good idea to add a issue on libretro-database. |
Scrubbed discs are now logged. |
I agree, Sega/Mega CD can be cleaned up a little in the database.. the Sega/Mega CD dat file has a lot of duplicate serials for games from different regions. Rob and I should be able to handle different Mega/Sega CD regions in the code and database. These changes will not affect the manual scanner, only "Scan Directory" and "Scan File" as with manual a database check is not made. |
…Station/PlayStation Portable/Dreamcast/Wii. Multi-disc support. Logs and playlists contain Redump serials.
@pkos Looks like a git conflict. Fixed it up over at https://github.com/pkos/RetroArch/pull/4 |
Hmm, seems there is a merge conflict that has to be resolved first. After the merge conflict is resolved, I can ask jdgleaver to review this. |
Might have missed a few things in the conflicts... Would appreciate a review. https://github.com/pkos/RetroArch/pull/5 |
Hi guys, I am still very interested in this PR, but it hasn't seem to have gotten any traction in the past couple of months. Any initiative to get the remaining issues sorted? |
Currently, the serial scanner is complete for the above systems, due to limitations in some digital media not all games will have a serial. I'd say the serial scanner is a valid way to quickly scan games, however, it can be further improved in two ways. First, the addition of "fuzzy" matching using the Levenshtein Distance could improve identification of games with the same serial but are actually considered different by Redump (ie. RE1, Beta, etc.). In addition, after updating the RA code, there would need to be a simultaneous update to the RA databases to include these different games. Second, a vast and complete identification of derivatives of the original Redump images is underway and slowly growing with support currently for SNK Neo Geo CD, NEC - PC - FX, and Panasonic 3DO. The datafiles for these CRC32, internal SHA1, and MD5 are available here (https://github.com/pkos/CRoSG). With these files added to the RA databases, it is possible to scan by crc32 the CHDs, and with updated RA code it would be improved to internal SHA1, the fastest and more versatile way to identify CHDs. My final thought is that the serial scanner alone should be working as complete as it is possible to make it and it will suffer some limitations to the number of game it will identify alone. |
Also, I've forked RA with the latest code updates here (https://github.com/pkos/RetroArch). The appropriate pull request is here (#11719). I've fixed all merge issues today. |
Levenshtein Distance is not a good idea to id gamecube games. Changing a letter in the first position is much more meaningful that changing many letters in the last and levenshtein is geared to pick up mistakes which are independent of position or to identify languages (with a weird 'average levensthtein distance' thing that works most of the time surprisingly). Honestly it should be hardcoded, and if you require a 'group' for the variants for reasons and it's not predictable from the Nintendo standard, just group them on a fileformat somewhere. I'd also mention that 'games hacks being id'ed as the original game' is a very common problem on other platforms that use this kind of flawed id, but it's admittedly not much of a problem on gamecube/wii because there are few hacks (some translations but it doesn't affect the settings needing to be different most times). edit: rereading the post it appears you don't want to use the algorithm on the serial? Why would you want fuzziness then? |
I think you've confused the scope of the LD with actually matching by name, this is not the case, it will only be done by serial or CRC32. Same serials however would benefit from LD (limited scope). |
The latest changes are over in this Pull Request: #11719 |
Guidelines
Description
These updates improve support for "Directory Scan". While previously the scan was made against raw disc serials, this code converts and parses those serials into redump.org curated serials.
Logs and playlists now include these Redump serials.
Multi-disc support.
CRC checks are still enabled and disabling them may increase performance.
Bad discs are scrubbed and logged.
Related Issues
@RobLoach has updated the libretro-databases to include serials (3/17/20).
libretro/libretro-database#1026
@RobLoach has updated some string parsing to redump codes. This added support for Playstation - Portable as confirmed no changes to the existing psp detect serial function was needed.
RobLoach/libretro-dats#37
@RobLoach has updated muti-disc appendices (3/28/20). This update added support for multi-disc scanning.
libretro/libretro-database#1032
This PR code will close the following issues:
#10282
libretro/libretro-database#990
libretro/libretro-database#839
libretro/libretro-database#813
libretro/libretro-database#613
libretro/libretro-database#423
Currently, scanner will not detect .gcz, .cso files, but it will crc them.
#7428
This PR code replaces changes already merged here:
Reworked magic numbers.
#6544
#6555
#5239
Serials are in playlists.
#6548
Related Pull Requests
The following libchdr issue will crash the scanner.
#10368
Issue resolved (3/31/20)
#10376
Reviewers
@realnc
@celerizer
@Alcaro
@panasonic
@phcoder
They have added support in #programming, fixing bugs and providing C code.