New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: Reliable data driven alternative to scanner heuristics #8672
Comments
If this feature gets implemented int its simplest form (filenames with simple globbing expressions), it could be extended to dissociate the 'launcher' file from the actual medatada key on the database so images and stuff keeps working, examples: Dosbox.detect with contents For any dosbox.conf file under and in the detect file dir, use it as launcher file and get the metadata key by looking for executables CRC32 you find under the 'game' subdir in the dosbox database and place it on the 'Dosbox' playlist (these come from the name of the .detect file). Sony - Playstation.detect with content For any .cue file under and in the detect file dir, search for any cue file as launcher file and get the metadata key by looking for '.bin' files CRC32 on the same dir as the cue. Sony - Playstation.detect with content As above, but you 'know' that you only have redump files, therefore you can afford to only scan the game executable track to distinguish the game. NEC - Turbografx-16.detect with content Turbografx Cds need to use the second track to identify because that is where the game actually is you probably want to mandate a 'non-system' directory separator here so the files can be portable, though i don't know if that happens in playlists already. Anyway as you can clearly see a fully implemented version of this would allow power users to
I specified Though i understand why RA devs are hesitant to use this idea, if regex/path globbing libraries are not exactly portable to consoles or something. |
of the examples you mentioned, it seems to me that those would make sense to ship as the default behavior anyway. That is, I think it would make sense to put those 'detect' files alongside the databases themselves, rather than in the content dirs. Then, we could ship track and ship sensible defaults and then users would have a central location where they could fine-tune the behavior. |
I (low key) agree with that the default behavior could be much more consistently implemented with a setup like this, but it would be slower i suspect (because of multiple types of scan going on if you have more than a line per .detect file and the inherent performance of globbing). Anyway, the idea is to make it usable in the GUI too as a easy way to use some default rules ('scan for redump ps1' for instance), or to allow the user to configure a certain dir and descendents, if they have a particular weird case (like the dosbox conf launcher idea); so it needs to be programmed with care to accommodate both 'ignore the filesystem and only use this rule' and 'allow rules overrides in the filesystem'. For instance i have translations in some consoles that do not use redump standard anymore (they need to be converted to iso, and thus the default rule wouldn't work), but i also don't want to eat the performance cost or the 'hierarchic problems' of two applicable entry points or more for a game. Thus the top level would have the 'redump' rule and that particular iso translation would have a 'iso:CRC32' rule. I haven't thought about replacing the normal scan with this framework as much as 'configuring' scanning options, but now that i think about it, this would probably be for the best because just the effort to decouple the 'scan methods' (crc32, serial, etc) from the scanner itself so they can be used on this would probably churn the 'defaults' anyway. |
I'm also not sure if there is a portable globbing library in C that can be adapted to work with RA in all the platforms it has. |
A problem this decoupling idea (just the 'second part' of this issue, first part without the 'entrypoint' files would 'just' create normal entries associated to execs like normal in RA dosbox) still has: If retroarch libretro-database has two or more entries for a single game, for example a DOS game with a installer and a game executable (you can see that the DOS.dat does this here),
This can of course be 'prevented' by There is also the edited out too complicated ideas, see bellow for better idea. As a aside it probably should also be possible to skip the 'method' entirely and just pick up a entry of the database directly with a key. For instance |
I had a better idea for the above, but it's different from normal glob. So So we simply require that if there is a * on both sides, the expansion of the * has to be the same on both sides. If not, the tuple is discarded and won't scan. It'll still work for sets by dumping groups because the expansion of * of the launcher file (*.cue) is either partially on the signature file (redump, TOSEC etc), or fixed and this rule doesn't apply (track1.bin etc). Might have to adjust to not take into account the last space in some presets, eg:
The first is intended and the second is not that horrible because it will still be the game main metadata¹, but as soon as both sides are there only a conf file with part the same name as part of the executable will be considered, which means that the launcher file name will absolutely 'associate to the right metadata'. I think. It's a simple idea for users to digest 'rename the conf file to the exec/signature file launched by it to use it as playlist entry to launch the game'; with more flexibility if you know about the format. ¹ if it is last positioned in the database for the set that is returned, because apparently that libretro dat places installers first, and the game last. In the DOS.dat case this doesn't appear to matter because the database name is the same for all signature files of the same game, but it could if it wasn't. Maybe this convention doesn't really hold to all of the affected .dats and simply the first is easier. |
I was thinking of the 'decoupling' again and found a obvious problem (again) that i'm unsure how to solve without special treatment and i'd like a opinion if it should be done. The idea of the 'decoupling' is ofc to specify the 'core launcher' file that appears on a playlist as different from the 'metadata signature file' that actually identifies the game. So far so good. But that's not the only indirection that RA needs. There is another possible indirection that 'could' and maybe should appear on playlists: m3u files. A m3u is a collection of 'core launcher' files (the normal example being cue files). So i'm on a bit of a pickle I 'feel' a m3u can appear on a playlist with metadata anyway without further complication of this 'mini-scanner-language', by the scanner adding them unconditionally by default ( Do you think this is the right approach or should i try something else? |
I think i'll open a new issue where i consolidate all of this info and ideas into a legible format if no one minds. |
closed for 9656 |
Description
Users have very little control over how the scanner will classify their games, in the assumption that the scanner and cores in combination are going to figure out which 'entry point' formats should be given to the core.
I'd like to challenge this assumption and give a alternative, optional solution that doesn't require GUI work on RA, just some branches in the scanner, and is more reliable and useful for the user, at the cost of more work for the user (but not that much).
The idea is that the scanner recursion function gains a stack list argument.
The user could use this to finely control which files are allowed to be entered on which playlist and to 'forbid' some files by simply not whitelisting them. For instance if i want to play dosbox games but do not want bat files starting them but *.conf files (dosbox configs which may have a autoexec section) i'd place a DOS.config with '.conf' on the topmost folder with the DOS collection.
Similarly if a core adds a new feature for a new kind of entry point file, the user can just edit or add a new '.detect' file on a game dir with that new file and get a playlist with it. Then, on choosing a game to run, RA would try to match that playlist to a list of cores and, further filter the cores by if the core.info suports the extension.
Actual behavior
The scanner is supposed to be a fire and forget operation. You choose a directory, and it iterates over the tree finding sets of N 'entry points' and then some heuristics are applied that are supposed to assign them to one playlist, by figuring out the 'platform' the entrypoint is for.
This idea is flawed in at least two ways:
The scanner heuristic might work with the game on fileformat, but won't with the same game on a slightly different fileformat the core accepts too. This is normal because RA would have to have a multiformat cd-image mounter to even be able to standardize the byte-reading, so it uses some raw byte reading and byte array matching. But fileformats come in many forms and some consoles will accept 'generic' formats like iso without necessarily the right 'ID' at the start (homebrews, unlicensed games, etc).
The game might have two (or even more!) valid entry points, which have slightly different behavior. Consider a cue/iso with separate music tracks file set. The scanner has the choice of showing the cue as entry point, or the iso and the second one ends up without digital audio, but is still a 'valid game'. This leads to complicated 'hierarchy' filtering code. To be fair, this example would still be needed on a CRC scanner where the 'entrypoint' and the actual checksummed file are different (ie: the entry point would be the cue and track1.bin the checksummed file). But there are other examples.
Before you mention the cases where 2 or more files with the correct extension exist on the same dir and only one is supposed to be chosen or two or more in a order, the idea above has two solutions for that. In case a single one is the 'real' entry point, algorithm will accept suffixes, not just extensions to scan, so you only need to place a detect file on the dir with complete name of the entry point file and omit the other. In the case where 2+ files are supposed to be given in order (same extension or not), retroarch has the habit to support '.cmd' files in cores that need this, which would make them the right extension to detect in these cases.
I also have the notion that this idea could be valuable for more than just the (proposed) filename parser, but also for a simplification of the serial and the CRC scanner, by allowing a alternative to the heuristics. The serial would still need to parse, and the CRC might need to parse a cue (for instance) to get to the 'correct' file / track to checksum so they'd still need to minimally understand the accepted fileformats, but the entrypoints would be fixed by the user and the fallible magic heuristics would not be used.
Anyway, this feature would be hidden, and the current 'fire and forget' scanner would continue to work; only the people that read the manual would figure out they could make it have less false positives and false negatives with just a few files (usually) on the top most dirs of a console set.
The text was updated successfully, but these errors were encountered: