Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAYLISTS] Add an option to scan unknown roms #8624

Closed
andiandi13 opened this issue Apr 22, 2019 · 56 comments
Closed

[PLAYLISTS] Add an option to scan unknown roms #8624

andiandi13 opened this issue Apr 22, 2019 · 56 comments

Comments

@andiandi13
Copy link

[ISSUE UPDATED] The other one was blocked

Many many, users complains about RetroArch not adding files to their playlists because the CRC doesn't match the original dump.

It can be because of a translation patch, an SRAM patch, and so on...

So I suggest a very simple solution :

Just add an option in playlist menu that could be named 'Ignore CRC while scanning roms' or 'Add unknown roms to playlists', whatever.

The thing is that it'll replace that line

"crc32": XXXXXXXX | crc"

By

DETECT

In order to easily display all roms on playlists, with a label taken from rom filenames, so that roms with a good labels would display thumbnails, and roms with another label (like T-eng 100% etc...) would not display cover (it's not a big deal).

RetroArch playlists are painful to use because of those unique CRC32, and it would be wonderful to simplify it in that way !

For now, I'm creating my Playlists with RetroArch Playlists Manager on Windows, to add my SRAM patched GBA roms, as well as all my non-detected translated roms.

Thanks for reading, I really hope to see that option.

Version

  • RetroArch: 1.7.6

Environment information

  • OS: Windows, Android, Switch
@i30817
Copy link
Contributor

i30817 commented Apr 22, 2019

I should apologize. I misread the previous issue and thought it was about 'disabling CRCs' when i already knew the CRCs weren't used anymore and was pissed about it. To be constructive i suggest using 'Add unknown roms to playlists' for this reason.

However, i'm not sure it's that easy to make a 'match' of a rom to a image/data with a filename. They're not unique not only because of hacks but because dumping groups (from where the name comes from) can have the same name for the same release of the same game on a different console. This is kind of unintuitive, but i'd expect this to happen a bunch on redump for instance.

So it'll have to be a tuple (scanned cd/rom console type, filename). I'm also unsure that this makes it any better than the serial¹, which if it's failing, it's likely to be failing in classifying the cd or extracting the serial because of non-standard dumping formats of the users, which would also apply to the first element of that tuple. Eg: the feature may get implemented and the games that are missing still not appear because the bug was elsewhere.

edit: ¹ actually it sounds slightly better in that versions will have different names on redump and truerip standard namings (but not re-editions i think) so at least no more serial duplicates in those sets (except re-editions, which if you squint, don't count), if they actually happened with serials (i only tested on the genesis, where it does happen).

@andiandi13
Copy link
Author

They're not unique not only because of hacks but because dumping groups (from where the name comes from) can have the same name for the same release of the same game on a different console. This is kind of unintuitive, but i'd expect this to happen a bunch on redump for instance.

What about the header ?
When I check roms header I see the name and other infos.

e.g. if the header is FINAL FANTASY TACTICS and the extension is .gba, then it must recognize it has FF tactics gameboy advance and match the thumbnail and the title in the db.

It's funny how easy it is to create manual playlists on a PC, but seems to be very hard to do it directly on RetroArch :/

@i30817
Copy link
Contributor

i30817 commented Apr 22, 2019

What about the header ?

tl;dr: the header is (mostly) irrelevant for platform id of roms and the 'header' for the cds across platforms can mean many different things, where all of them have the same problems of cd images heuristic platform identification being necessary to get anywhere because the extension is not indicative of the platform.

For roms, platforms can be easily be recognized by the extension so parsing is 'not even necessary' most times - i know one case though, a .bin from a cue and a .bin from a genesis rom dump need to be distinguished.

For cd consoles however, cd-images have a proliferation of formats and meta-formats and the user if he has the game will just dump it on a aleatory format.

You generally can trust the dumping groups to use the same format (but not hacks, for technical reasons). However, that format usually doesn't have a platform extension. A cd image is a cdimage, unless it's a ngc file, in which case it has a platform extension (the only reason redump did this accidental good deed was that ngc 'cd' really aren't normal cds so they can't be read on normal cd mounters or burnt to iso).

The RA code needs to actually parse then but it doesn't actually try to read the 'bytes' as they are if the cd image was mounted; because a multiplatform mulltiformat cd image mounter that runs on toasters is hard work. Instead it does some heuristics, tries to find the track that has the 'magic bytes' identifying the platform (sometimes doesn't start at track one or you even have to chase down the real file because the tracks are divided into files) and then attempt reading those bytes from the file directly to match.

This fails if the cd image is some format the magic bytes didn't expect sometimes (that user dump case). For example, most psx cues point to MODE2/2352 (a bin). However they might as easily point to a MODE2/2048 (a iso). These files not only have different block sizes but also different 'start headers' (so the magic is wrong). Or the code didn't expect the user to put in a say, nero image in the scanner, so the 'nero header + 'SONY CORPORATION bla blah' string is not even there.

So you basically have a situation where the images describe the same game but the representation in bytes is different enough that the RA platform identifier routine gets confused, often (for personal dumps).

Hacks are mislabeled for roms and cds if using serials, as i already explained, so i much prefer checksums, if they aren't flawed themselves (a bunch of open and closed bugs kinda shows that the checks for CRC can easily screw up because the dumping groups made it hard - and slow - to be correct).

The only way this will get tamed is if RA starts directing people to use the MAME chd set imo (that has a sane internal checksum that doesn't need runtime calculation).

@andiandi13
Copy link
Author

andiandi13 commented Apr 23, 2019

If you want a list like that may as well just browse to the folder.

lol

More seriously. Here is an example of a playlist that I want RetroArch to make :

1.C:\retroarch\roms\Zelda (USA).gba" 
2. Zelda (USA)
3. DETECT
4. DETECT
5. DETECT
6. Nintendo - Nintendo Gameboy Advance.lpl

Let's take the problem line by line :

  1. Can RetroArch list all files in a folder to a playlist, rather based on a known extension, or not ? YES
  2. Can RetroArch copy the filename to the title ? YES
  3. Can RetroArch write DETECT ? YES
  4. Same
  5. Same
  6. Can RetroArch match a console with a file do determine what files go in what playlist and name those playlists ? I don't know and I think that's the main issue.

Here are some ideas :

  • All the content scanned from a folder well named (RetroArch's folders type, like those inside thumbnails and playlists folders) and located on a certain path, will go to the playlists named after that same folder. Example : if I scan a folder named "Nintendo - Nintendo 64", and that this folder is located on retroarch/games (or another determined folder), all the files inside that folder would be listed on a playlist named 'Nintendo - Nintendo 64.lpl', even if there are .gba .bin. .md or .iso inside that folder.
    Users should managed their own folder well.

Alternatively, a detection file could be placed into a folder, whatever it's name, to help RetroArch determine in which playlists would go the roms inside, and to not force users to rename their folders a specific name.
e.g. in the folder /roms/MySNESroms, we just copy a file named "Nintendo - Super Nintendo Entertainment System.whatever" inside, and RetroArch would detect that "whatever" extension and put all the files into that folder in a Super Nintendo playlist.

  • That new option will let users choose which extensions to manage, and RetroArch will guess which extension go in which playlist (gba in gba, md in megadrive/genesis, gb in gameboy etc..)
    Extensions that are shared with different system would be ignored (bin, iso...), or users would manually associate an extension with a console (e.g. bin = Genesis, ISO = PSP...).

  • Same as above (console detected by my extea), but for shared extensions, RetroArch would check the filename and go trough the database to match the rom name with a console. e.g. Final Fantasy Crisis Core.ISO or Crisis Core.ISO or Final Fantasy Crisis.ISO would match that game in the PSP rdb, and that ISO would go inside a PSP playlist instead of nowhere or in a Dreamcast or Saturn playlist by mistake.
    For bad named roms, it'll rely on the extension.

I think the first option seems pretty easy and feasible...

@i30817
Copy link
Contributor

i30817 commented Apr 23, 2019

I honestly agree with that idea that the user should be able to start a 'dumb' scan on a folder and the resulting files should either be assumed (per directory name) or chosen to go a console playlist manually (per GUI). Users that are organized already place their games per platform on the filesystem and it would be good to have a scan type that doesn't assume that hacks are the same game as the original (though i'd prefer if the libretro-database CRCs were actually used ofc, for correct metadata instead of native filenames entries with no metadata because the games weren't found in the database, or that the CRC found were actually correct instead of being a secondary property of a query using a non-unique key that may or may not give the right result).

But i think the RA team would rather find and fix the reason why some (original) games aren't scanning correctly on the automatic scanner first, shrug. It just so happens that my idea that the scanner is failing at platform ids is what I think it's most likely to be happening but that really needs to be tested/debugged with concrete files that are failing. It may also be that the platform is correctly found but the cd image is sufficiently funky that the cd serial parser can't handle extracting it with just a byte search.

@andiandi13
Copy link
Author

Yeah as soon as we have the choice of scan type, it's OK.

Let's wait what the devs will think about it :)

@hizzlekizzle
Copy link
Contributor

Simple directory listing seems to be what most people want out of the playlists, but they also want boxart/thumbnails, and that's where the problem comes in. That is, hooking things up when we have no way of knowing what's what. We can do fuzzy matching, but then we have to add a bunch of stuff (menus, etc.) to correct false positives.

@andiandi13
Copy link
Author

@hizzlekizzle As I said, you just have to rename your rom the same name as the thumbnail

@i30817
Copy link
Contributor

i30817 commented Apr 23, 2019

edit: accidentally deleted a post, but tl;dr: if using a the tuple (console-type, rom filename) the false negatives could be controlled to the minimum if the users are smart enough to name things the same as the dump sets; and i don't really see why that would require new GUI. You can make users select the console-type manually ofc, but you already need those heuristics to even start scanning serials because each console has a different type of header structure.

As a fuzzy search this is likely to be slightly superior to the serial, because a dump set filename will not repeat on the same set (cause files would get overwritten on a complete set), while a serial absolutely could, if the console manufacturer is crazy (SEGA) or the game be misindentified (hacks). If people naming the hacks and translations on libretro-database are careful enough to name translations the same as the original game, they get the same image as the original, but if they name hacks different, they get no image but different name (my PRs there were organized like that). In all cases, the CRC fetched from this query (as well as the 'serials' query) can't be trusted, so it shouldn't be used for retroachievements or netplay (or any other feature that needs it).

Unless you put in entries to two or more dumping groups with the exact same kind of filename rules on the database... truerip and redump maybe? If this happens this idea is worthless not worthless because it's a fuzzy search anyway and 'non-uniqueness' would just result on the same entry twice (possibly with a different CRC but not certainly). If 2 dumping groups share the exact same name for the same game on the same console, it's 99,999% certain it's the same version of the game (though there could be 'exceptions' when a set 'corrects' a mistake), even if the CRC might be different (from different dumping formats or strategies).

To share images you could leave the game images associated to the serial like today and get the info in stages.

  1. (console-type, rom filename) -> [data] <-- fetch all the matches from the database
  2. iterate over [data] and find (if possible) one with a serial property (this is the 'original game')
  3. if found this database original->serial, use it to find the image with the same mechanism as today (this will find images for both the original games and translations today if translations obey the rule to be named the same as the original game). Hacks get no serial 'fallback' (because the original game name is different and libretro-database entries for both hacks and translations don't specify the serial) so they get no image.
  4. fill the GUI with the found filename name, found image if any and other data found. I'm moderately sure you can't preserve the exact metadata for translations in this fuzzy search (because it depends on confusing a translation for a original game), so you might be 'forced' by this logic to display the data of the original game for the translations.

It's not ideal because it's misleading (at least translations are not marked as such). It's possible that hacks would be marked as hacks with this, but i feel that would be inconsistent and confusing because of that translation issue. But this is the curse of not using unique keys and the price you pay for the 'convenience' of not using checksums as key.

edit: more edits

@Ferk
Copy link

Ferk commented Apr 25, 2019

Just add an option in playlist menu that could be named 'Ignore CRC while scanning roms' or 'Add unknown roms to playlists', whatever.

I don't understand why you would need DETECT in the playlist for this.

If such an option (adding unknown roms) was added I would expect that the scanning process would generate the CRC from the unknown file and save that CRC into the playlist (without checking if it matches any known CRC from the database), it would not need to store DETECT. Maybe the only thing you would gain if you did that is some speed in the scanning if you free it from generating any CRC, but at the cost of adding more processing later when you actually want to know what is the CRC for other things (I imagine it would make scrolling your list of games much slower, which is much more painful than leaving the scanning process running for an extra hour, it would also be troublesome when searching what content from your library matches the CRC for a Netplay room). And the problem you are trying to address isn't scanning speed anyway, as far as I understand.

In my opinion, what would help in the scanning of content that isn't in the database would be to make it so when the scanner finds a file with a particular extension (say... ".retroentry" for example) it read the file and added its content as a playlist entry (without needing to check if it's in the database or not and without doing any CRC calculation, as long as you already provide the CRC inside the file). The content of the file itself could also contain the name of the playlist where it's meant to be added.

Then you could place a "mycustomgame.retroentry" file next to your custom game, fill it up with the relative path to the file, label, crc and maybe any future metadata allowed by the new playlist json format (thumbnail url/path?) and store it always next to the game file so whenever you scan the folder with your games the entry will be added to the playlist automatically without having to set up your game collection every time for every retroarch device you load your collection from.

Kodi already does something similar with the ".nfo" files, it will read the local metadata stored on disk and add the movies and shows using it.

@andiandi13
Copy link
Author

It's kinda like what I suggested in my third post (first idea).

If RetroArch see a file named "Sega - Game Gear.retroentry" (to keep your nomenclature) in a folder, then it will add all the content of that folder to a playlist named Sega - Game Gear.lpl, whatever roms are in the folder.

Or.... Just based on the extension.

It's really easy

@Ferk
Copy link

Ferk commented Apr 25, 2019

Making it based on extensions would complicate things.
It's not a good thing to let the scanner try to be too smart, because then you risk it doing dumb things. Not every ISO/EXE/BIN file is game content, and not every game content is just one file.

I think it's better to keep things simple but flexible. Scan for playlists within the content folders and include their entries (accounting for relative paths) when they are scanned, that would be more than enough to make me happy.

This would also decouple the scanning process. You could scan your content folder with some fancy third party tool from your computer if you want to and then copy the resulting file along with your content into your Switch or whatever device you want to run it from. Scan it once, reuse it everywhere.

Content that has been already added by including those playlists can be skipped from the actual scan.
This would also help weaker devices that are not very well equipped for doing heavy calculations or heavy IO.

@andiandi13
Copy link
Author

Yes of course, there is an obvious issue with many extensions, the idea of a small "detection" file is good imo

@i30817
Copy link
Contributor

i30817 commented Apr 25, 2019

Making it based on extensions would complicate things.
It's not a good thing to let the scanner try to be too smart, because then you risk it doing dumb things. Not every ISO/EXE/BIN file is game content, and not every game content is just one file.

The scanner is 'already' too smart.

The first part of what you're complaining about here already occurs in cd images and must occur to scan for serials (which is different per platform so the platform must be identified so the parsers don't extract a bunch of nonsense), which are the currently chosen - non-unique in some situations - key for the available game metadata (retroachievements, cheats, images, publisher data etc).

The alternative, scanning checksums was removed/changed to this because people complained it didn't caught enough games (on their personal dumps with aleatory cd formats and procedure, which results in different bytes) and it was too slow (while scanning bare files, not zip files that have the CRC32 pre-calculated as a zip header field). CHDs could also be pre-calculated with its internal checksum (better even, because that checksum doesn't care if the cd image is divided into multiple files, which is a common source of errors on the older CRC scanner).

I don't really blame them for the 'slowness' complaint, because scanning bare giga/terabytes files is brutal (hours in 10 years old hardware), I imagine it blows the patience of the kids on phones. But supporting a fuzzy scan without also supporting a checksum scan has several disadvantages for hacks and reliability in certain consoles and titles (misprints and versions which changed the game but didn't change the serial).

As a aside, the second part of what you're complaining about here (many false positives), is something that is likely to be avoided by technical users because they'll already organize their file folder structure by console. In fact the opposite problem (false negative) is more likely to happen because the format was 'unexpected' by the retroarch scanner for that console (for instance the parser can't identify a saturn game dumped or converted to a cue/iso), or the game is a homebrew and has no serial or whatever string RA uses to try to distinguish it as a game for 'x' console.

@Ferk
Copy link

Ferk commented Apr 26, 2019

That's unfortunate, but if the scanner is already imprecise for CD images (I guess this does not apply to file formats that do not contain serials) then that's more or a reason to allow the use of additional methods that allow users to override the behavior of scanning, like distributing playlist files within the content folders.

That way regardless of what the scanner is normally doing, the user has a way to define what does it want to get added, and at the same time it would be an approach that would be ok for 10 years old hardware too, since you could skip the scanning of folders that have pregenerated playlists to include.

The drawback is that most people probably won't know about this feature, specially at the beginning. But an option could be added later to, for example, export the currently scanned playlists to their respective content folders, that way the feature would get some exposition.

@i30817
Copy link
Contributor

i30817 commented Apr 26, 2019

A new file in the game folder is a terrible solution to the problem of pointing RA to the game version. I'd seriously stop using RA if that was the only method, because it's already fucking terrible when it happens for scummvm because the RA scanner is sufficiently simple it can't emulate the scummvm parsing algorithm. And i'm not the only one that would agree. Can you imagine the 'normal user' being asked to write hundreds of files with different content or copying a file out of thousands hundred of times into different folders?¹

No, any 'solution' that is not automatic is not going to fly. The scanner needs to be 'more' complicated not less, though the filename strategy asked here is a interesting idea for a fuzzy mode, even it is not truly simpler because as i showed, it still needs to parse out the platform from the -platform agnostic- cd image files, and won't work for data that is not uniquely named in anyway (like, say Sierra sci files).

¹ which already happens today if you have a complete scummvm collection and one of the reasons it's not worth it to use the RA port of that unless you really really don't have a original port on the platform. I'd prefer if the playlist generator of the scanner just parsed the scummvm.ini file after we used the native scummvm scanner.

@andiandi13
Copy link
Author

andiandi13 commented Apr 26, 2019

Why do you talk about hundreds of files ?

I'm talking about one file per console/per folder.

In your Game Boy folder, named Nintendo - Game Boy, you put a Nintendo - Game Boy.detect file.

In your Sony - PlayStation folder, you put a Sony Playstation.detect file, and so on.

All the files into each folder will go to a playlist named after the .detect file, whatever if it's .GB, .ISO, .TXT, .JPG etc...

RetroArch could come with a pre-created "games" folder, containing many folders inside it, named after each consoles, with all the .detect files inside. Then you'll just have to put your roms at their place.

Is that painful ? Is that complicated ?

@i30817
Copy link
Contributor

i30817 commented Apr 26, 2019

Ok that's actually a nice idea, sorry for misunderstanding. I was immediately reminded of the RA scummvm scanning strategy, which is basically horrible and was thinking you were proposing to add a id per game.

'Just' overriding extra heuristic data once per platform sounds doable and a nice standardization to override the faille platform parser. I support that idea.

@andiandi13
Copy link
Author

Yes, it would have been terrible to create a file per game. That method seems pretty feasible for the devs though

@i30817
Copy link
Contributor

i30817 commented Apr 26, 2019

Most people already organize their games in 'platform subtrees'. In fact there is confusing code in the scanner that is supposed to take advantage of this by either 'remembering' which database file the last game was found or based on the name of the scanned subtree being the same as the name of the playlist (can't recall what convention the code had) to scan quicker.

The idea proposed here would have some major code modifications there probably, and to pass the found platform into the scanner itself as a 'override' to bypass the platform scanner part. Sounds like a PR someone new to the project could do.

@andiandi13
Copy link
Author

Hmm I see...
I wish I could have done a PR. For now I'm going to wait for a response of a member

@i30817
Copy link
Contributor

i30817 commented Apr 26, 2019

Why's that? Sounds like a good way to avoid the 'we have to create a GUI' and the 'the scan is letting false positives / false negatives pass' problems to the people that are organized enough.

There would still false positives and negatives because of the scanner using serials ofc, but that won't change until a version of the checksum scanner mode is available again, and this could avoid some silly false negatives because of different fileformats (though it's a open question if the serial scanner wouldn't just quit after on a 'unusual' cd image format if the filename scanner doesn't become a thing).

@andiandi13
Copy link
Author

andiandi13 commented Apr 26, 2019

@fr500 What then ?

I never told that this new feature would replace the current scan at all !

It's adding an extra options, an advanced feature, for advanced users, that know what they are doing.

It's just making retroarch scan specific folders and put the content in playlists.

And if you don't like the small files idea, I actually came with two ideas, the first one is also good imo.

Summary

First option

So if I have that folder :

RetroArch/games/Sony - PlayStation

RetroArch would detect the folder (thanks to it's good name AND it's path), and write all the .ISO files into a Sony - PlayStation.lpl file.

Second option

The second idea is to create little empty files and put them wherever we want to help retroarch determine what folder is what console.

Example, with two folders :

/snes

/roms/gbaroms

Here, the name don't help retroarch. So, on the first folder, we will manually put a file named Nintendo - Super Nintendo Entertainment System.detect, and on the second folder, a file named Nintendo - Gameboy Advance.detect

Then, when RetroArch will scan the entire device, it will put all files of /snes in a new Super Nintendo playlist, and all the files of /roms/gbaroms in a Gameboy Advance playlist.

What about thumbnails

The titles of roms on the playlist would be taken from filename, so that well named roms would display thumbnails, and roms named with another name would not.

For CRC, playlists would show DETECT.

I know I repeat the same things, but it seems so obvious and simple...

@andiandi13
Copy link
Author

@bparker06 @twinaphex What do you think of that idea ?

@andiandi13 andiandi13 changed the title [PLAYLISTS] Add an option that use DETECT instead of CRC32 [PLAYLISTS] Add an option to scan unknown roms Apr 29, 2019
@i30817
Copy link
Contributor

i30817 commented Apr 29, 2019

I'm going to mention (again) i dislike 'false CRCs' in playlists - even if they already occur today with serial scanning.

'False CRC' equals 'useless, misleading, cause of bugs CRC', especially since so many RA advanced features require byte for byte equal games (netplay, retroachievements, cheats, sharing savestates, etc).

'False CRCs' is one of the reasons that the retro-achievements system in retroarch has to redo the calculation when using it, which is simply bizarre (the other being that the strip out nes headers to id roms, in order to 'catch the most', which may or may not be a mistake, depending on the influence of the header on runtime behavior of the code).
And it's also unfeasible if retro-achievements wants to spread to cd consoles, but they'd have further trouble with that from the 'not a single file means not a single CRC' problem too.

@andiandi13
Copy link
Author

@i30817 You're right. I just checked my manually created Playlists with DETECT, and RetroArch did manage to associate true CRC with rom information of the database, and nothing for patched roms.

So it's better to set the CRC line on DETECT (I edited my post above).

@hizzlekizzle
Copy link
Contributor

Something fr500 and I discussed on discord the other day is: keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

Would that be acceptable?

@i30817
Copy link
Contributor

i30817 commented Apr 29, 2019

Something fr500 and I discussed on discord the other day is: keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

fine by me, but at the cost of being pedantic again, (file)name match has the problem this bug discussed of same named games on different platforms and needs a further subkey (platform) to get a 'right' cover image. I like the 'detect' file or folder idea myself (because it makes RA not require parsing of exotic cd image formats, just passing them to the emulator) but fr500 already mentioned he doesn't.

And, hacks/translations need exact methods, and even serials don't really work (though you can 'cheat' and name them the same, or using serial 'inherit' the images/data of them, which is inappropriate because all the translation info is lost and the hacks will be completely different and you'll end up with 'duplicate' games if you have the original).

Resuming, if i was a C wizz, i'd try to organize the scanner into

  1. 'serial' (which requires a good platform id function that works on multiple types of cd image and parsing the serial out after id'ing the platform and type of image), what exists today, with all of its speed, false negatives and problems for features that require exact CRCs ; AND
  2. 'filename-extension+the idea here to have a id file or folder' (since this mode doesn't actually require parsing like this it's very 'reliable' to different fileformats if the convention is followed (ie: users follow instructions), but bad for hacks and features that require exact CRCs) AND
  3. A 'hard CRC' mode that takes care of only supporting amortized CRCs for cd image files in zip files and chds. Because of that, it'd be 'fast enough' to be used. However, i'd very much prefer if this would only be shown once many cd emulators have support for chd without uncompressing the whole file to tmp, which is a disk killer. I'd eventually also would like to extend that to support a custom xattr extension (something like user.crc32) for support in compressed OS filesystem of large 'bare' files if the user is smart enough to do that (this removes the need for the user to care about the emulators supporting compression/chd while at the same time preserving the amortization - i'd have to use a filesystem that supports both compression and xattr and script the compression myself, so it's a niche idea).

If the playlist would have 'DETECT' in place of the actual checksum, RA could disable features that need a 'correct' checksum when one wasn't even attempted, and maybe have a option to force a calculation/save on the playlist of to enable them. It's a bit lame because you may be forcing a checksum calc that you have no way to use later (ie: if the game netplay room is empty, or if the game doesn't have retroarchievements).

@andiandi13
Copy link
Author

@RobLoach Thanks, I didn't saw it. However, I suggest a specific solution to solve the issue.

Also, the issue goes back to 2015 and there is no new scanning method despite the $15 bounty.

Is there a specific bounty value to be sure that the issue will be solved?

@Ferk
Copy link

Ferk commented Apr 30, 2019

keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

What determines if a file is added as "unverified" or excluded entirely from the playlist?
I imagine now you would have people complain on false positives and extra entries in their playlists.
I wouldn't want to run a blind scanner across my folders of custom wads for the prboom core, for example. Not every wad is a game, much like not every CD image is, or every bat/exe file in Dosbox.

I think trying to get the perfect scanner that works automagically even for unlisted content and across different types of cores is a lost battle.
Just let those of us who don't mind managing our own collection manually (or with our own third party software or scripts) to have at least a reusable way to maintain it so we can distribute the metadata along with the content.

@i30817
Copy link
Contributor

i30817 commented Apr 30, 2019

keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

What determines if a file is added or excluded?
I imagine now you would have people complain on false positives and extra entries in their playlists.
I wouldn't want to run a blind scanner across my folders of custom wads for the prboom core, for example. Not every wad is a game, much like not every CD image is, or every bat/exe file in Dosbox.

This can be easily and reliably done with just a bit of convention. Organized people already place their games by platform, so a standard dir name or a 'id file' that makes the scanner treat all subsequent files that can be for that 'platform' as targets is more than enough. Better even than the normal scanner because it won't depend on a very failible parsing and could accept 'weird' cd image files which the cores accept but RA has no conception of how to parse. For instance there are some games on the ps2 (for a hypothetical example ofc) that are isos instead of dvds. Some games/translations on the dreamcast were converted to 'normal' iso instead of what weird thing the dreamcast uses etc.

You could even use a scheme where the 'detect' files inside have the extensions to detect, so the user can choose 'i want cues but not bins' or 'i want dosbox.conf files but not bat files'.

You'd need to educate the users, so this should be optional.

I think trying to get the perfect scanner that works automagically even for unlisted content and across different types of cores is a lost battle.

Just so. That's why this idea doesn't even try to use the part that screws up: the parsing for platform attribution and consequent parsing of serial.

I myself would rather also have a hard checksum method as a option to get correct metadata on certain cases, but you can read that on my last post.

@Ferk
Copy link

Ferk commented Apr 30, 2019

a standard dir name or a 'id file' that makes the scanner treat all subsequent files that can be for that 'platform' as targets

How do you exclude subsequent files within that folder that are not meant to be targets?
To illustrate it there's the example from libretro/libretro-prboom#72, there are 3 files:

  • original.wad
  • original.deh
  • orig15.wad

The actual game is original.wad, while orig15.wad is an optional file that provides some extra stuff but it's not possible to load orig15.wad by itself (although you can set your configuration after loading original.wad to load orig15.wad).

A blind scanner could easily assume that orig15.wad is a game, since it's the wad extension and it cannot really know since they both just look like custom wads.

@i30817
Copy link
Contributor

i30817 commented Apr 30, 2019

To that i ask: is that situation any better right now?

Single points of entry are usually maintained and when they aren't RA tends to support the 'cmd file' or 'm3u' hacks. In that case, i'd expect RA to simply support 'cmd' file for the doom engine and force users to put in 'only accepts .cmd' on their detect files and create them (just like it does when you want to load multiple floppies in x68k at startup).

I bet if you want to make it easy for intelligent users you can make the path iteration be able to override a previous 'detect' so the user doesn't have to create .cmd files when not needed. Like this:

doom games dir
DOOM.detect with '.wad' content
--------doomgame dir with 1 wad
--------doomgame dir with 2 wads required
-------------DOOM.detect with '.cmd'
-------------doomgame_hd.cmd with the right order for the 2 wads.
-------------doomgame.cmd with the only the 'original' game wad.

You'd need to make the scanner iterate in such a way that it never 'forgets' what it's supposed to be searching for before entering one of the branches where the 'detect' file changes and there is more branches to search, but that is a simple tree transversal algorithm (of which i don't have the brainpower to think of the most efficient way, but you're all good programmers and will figure out something good - probably a auxiliary stack to remember which detect variant is active and pop it on returning from the branch and seeing the detect file again).

@Ferk
Copy link

Ferk commented Apr 30, 2019

So far the PrBoom core does not use 'cmd' files. Wouldn't that bring the same problem as the '.scummvm' files that you criticised?

Right now I tackled the problem by allowing people to distribute a .cfg file along with the content so it's used as the default PrBoom settings when the wad is loaded (PR is still open, but people seem ok with the solution). Within the configuration you can set additional wads to load, so you still open the wad itself (not the cfg or any cmd) and the cfg file next to the wad will indicate which other wads to load.

@i30817
Copy link
Contributor

i30817 commented Apr 30, 2019

So far the PrBoom core does not use 'cmd' files. Wouldn't that bring the same problem as the '.scummvm' files that you criticised?

Far less. It's a question of degree: you made the point that there is a 'exception', and the proposed scheme is supposed to be 'general'. So i proposed a way to deal with the exception. Cores that habitually require multiple files that are not already in zip are either:

cd console cores with multiple cd games - like the ps1 - which require the user creating a m3u file - of which i already have a solution for myself and proposed incorporating the same solution into RA (it just depends on dumping group naming convention - though for that reason i don't really expect RA to adopt it).

Cores which habitually require .cmd files or equivalent are basically x68k and scummvm and there the situation is deplorable because it's every single game and just scummvm has hundreds.

Right now I tackled the problem by allowing people to distribute a .cfg file along with the content so it's used as the default PrBoom settings when the wad is loaded (PR is still open, but people seem ok with the solution). Within the configuration you can set additional wads to load, so you still open the wad itself (not the cfg or any cmd) and the cfg file next to the wad will indicate which other wads to load.

And hey, i'm not saying 'don't do better solutions' if you want this, by all means, the users can use your cfg as a better entry point with more options (the idea is to be flexible on what the user is allowed to specify, don't depend on fallible parsing and still put out a acceptable metadata entry, if without the certainty of CRCs). I myself require the dosbox core to allow loading/scanning dosbox.conf files before i will use RA dosbox (though i probably won't because i further require more patches for larger hd files to use windows 95 games in dosbox). There is far far too much config that RA is ignoring by not loading those for a DOS collection to be usable.

@i30817
Copy link
Contributor

i30817 commented May 1, 2019

Thinking about it, this idea is orthogonal to the the scan type.

Idea provides: A way to whitelist formats for directory branches and say which console (playlist) they belong to.

serial scan -> needs to figure out the playlist of the game and 'understand' fileformat the game is in to parse the serial. With this only needs to 'understand' a fileformat to parse the serial.

CRC scan -> needs to figure out the playlist of the game and 'understand' the fileformat enough to know which file to checksum (in the case of divided files)

filename scan -> needs to figure out the playlist of the game and to have less false positives.

I think i'll open a request to have this as a optional 'hidden' alternative to depending on fileformat heuristics to figure out which playlist the game goes to.

@andiandi13
Copy link
Author

andiandi13 commented May 1, 2019

Why bothering with such advanced solutions ?

A path, a folder with a good name, and that's all.

I mean.. retroarch knows where to look to find thumbnails according to paths (/retroarch/thumbnails/Nintendo - Nintendo 64...), so why would it be more complicated to identify consoles according to paths.

@i30817
Copy link
Contributor

i30817 commented May 1, 2019

Files give a opportunity to give the user control over whitelisting at any directory level, which folders do not. I agree simpler is good, but this was too good of a opportunity to pass by. If you have further suggestions or criticism on this method, i opened a issue for the idea (since it's orthogonal to filename scanning).

@RobLoach
Copy link
Member

RobLoach commented May 2, 2019

@andiandi13 Also, the issue goes back to 2015 and there is no new scanning method despite the $15 bounty.

There is the Qt interface which you can use to build custom playlists, but it would be great to have it directly in the RetroArch menu. Here's a video demonstrating the Qt interface https://www.youtube.com/watch?v=hfuioGjCItw

Is there a specific bounty value to be sure that the issue will be solved?

It varies depending on motivation and skill for people implementing. There have been bounties that got to ~$150 and were done, and there were bounties that were $0 and were done.

@andiandi13
Copy link
Author

andiandi13 commented May 2, 2019

There is the Qt interface which you can use to build custom playlists, but it would be great to have it directly in the RetroArch menu. Here's a video demonstrating the Qt interface https://www.youtube.com/watch?v=hfuioGjCItw

Thanks, but actually I tested it once, and find that RetroArch Playlist Manager is faster to create quick playlists with just a drag and drop.

It varies depending on motivation and skill for people implementing. There have been bounties that got to ~$150 and were done, and there were bounties that were $0 and were done.

I see... But honestly with what I suggest, despite not being a developer, it seems so easy to implement.

Scanning folders, recognizing paths and names, and creating .lpl files according to the content of each folder linked to a console.

@ghost
Copy link

ghost commented May 22, 2019

While it may be possible with an "ignore CRC" or such option to add cartridge-based games into the right playlist (based on file extension<->core info/database mappings), this is a lot more difficult for CD systems because we don't have detection methods for all kinds of images and systems.

@andiandi13
Copy link
Author

@bparker06 : Again, it doesn't have to be that complicated.

  • Retroarch would keep it's CRC-based scan

  • A new option would be added into playlist menu

  • That option is just a folder-based scan

So, for CDs (PSX, PSP, GC...), You'd just have to create the appropriate folder, or Retroarch would create them as soon as the option is activated, and you put your .ISO files in the appropriate folders.

I don't know what easiest can I suggest

@ghost
Copy link

ghost commented May 22, 2019

Not everyone wants to make folders.

Where is the playlist name going to come from?

How do you know what database to associate each entry with?

Thumbnails aren't going to work.

Playlist asset icons won't work.

Database metadata won't work.

What's the point now? Might as well just use Load Content.

I promise it's not actually easy to make even the majority of users happy.

@andiandi13
Copy link
Author

andiandi13 commented May 22, 2019

You obviously didn't read my previous posts.

Not everyone wants to make folders.

No one is forced to use that new option + Retroarch can create them for us

Where is the playlist name going to come from?

From the fact that file are inside folders named like playlists. e.g. Game Gear ROMs have to be put in a folder named "Sega - Game Gear" (which is in a certain path, that could be changed in Directories options e.g. "retroarch/roms" or just "/" the same way you determine the playlists or thumbnails folder in options)

How do you know what database to associate each entry with?

I never say that CRC32 would not work ! ROMs would still be scanned : known ROMs would have their informations from the database, and unknown ROMs, not. You know, I create my playlists manually on Windows, and thanks to CRC32, my ROMs have database infos.

Thumbnails aren't going to work.

Yes they are, as soon as your ROMs are well named (like the thumbnails), and put in the food folder, thumbnails could be associated. How ? Just by taking filename a putting it as title name in the playlist, like "smart" softwares can do on Windows.

Playlist asset icons won't work.

They would work

Database metadata won't work.

CRC32

@ghost
Copy link

ghost commented May 22, 2019

I meant that people aren't going to want to have specifically named files and folders in order for all the features/effects of the scanner and databases to work, whether or not the CRC/serial/etc. is checked. And that creates problems.

@andiandi13
Copy link
Author

andiandi13 commented May 22, 2019

If people don't want that, they just have to download clean ROMs with clean CRC and they're good.

What I suggest here is an extra option, an advanced option, why not an hidden option.

Others CRC problems can be dealed with in other issues.

But that feature would satisfy everybody making their Playlist manually on Windows/Mac/Linux, and I know they are numerous.

It's a hundred times easier for them, for us, than going through another device, managing Playlist, copying them to the target device. It's so painful, especially when you take new ROMs quite often.

If naming folders is so painful, Retroarch could let us choose custom paths for each console, as it does with core by default.

@i30817
Copy link
Contributor

i30817 commented May 22, 2019

I meant that people aren't going to want to have specifically named files and folders in order for all the features/effects of the scanner and databases to work, whether or not the CRC/serial/etc. is checked. And that creates problems.

This idea (both his and mine) is supplementary to the agnostic scanner that exists today and that allows zero customization. Frankly the main problem i have with the scanner is that it doesn't allow any adjustment in a sane manner, and the second is the false negatives that this would help workaround.

Don't get me wrong, i'm the first one to criticize things like 'scummvm files' for the scummvm core, and i'm the first one to wish the scanner was more specialized in that case and was able to pick up the game folders without that 'help' which causes much much more work for the users. However, even on this example, imagine that tomorrow the scanner gained the ability to recognize scummvm compatible game folders without the extra files (from upstream). Would it be enabled? No, precisely because it would slow down the general scanner and the scanner doesn't know when to use that strategy, except with the files strategy that already exists so there is 'no reason' for the feature to be added (except there is, because with a control strategy, the scanner could use the scummmvm strategy when it encounters a single file in the root of the 'scummvm games folder' instead of one per game with different contents for hundreds of different files).

So yes, i support a 'hidden' way to configure the scanner that won't bother people that don't care, and i support a more complex version than just renaming directories (the file algorithm i opened in #8672 ) because it allows more and better control and would provide better chances for specialization even in the scanner (and whitelisting, and 'two games on a single dir' scanning, or even 'just use files of this type to launch the game and forget about game CRC/name checking' etc).

For instance there are some cores that already have a specialized 'launcher file' in upstream, with several config features, that simply can't be used as-is in the scanner for some reason or other (manly being useless for id, because they're user made, not given by the developer with the original game or dumper). Dosbox.conf files are a example, but not the only one. Just in this issue there is a dev writing about DOOM launcher files.

@diegopau
Copy link

diegopau commented Aug 6, 2019

I am very new to RetroArch and have little knowledge about the internal works of it. I am using Retroarch in a Playstation Classic (Retroboot). I think I can be considered the "regular user" so I wanted to share my experience with this issue:

When I first tried Retroarch I moved my collection of Mega Drive roms to a "megadrive" folder and then I used the Scan Folder functionality inside Retroarch. Honestly I expected this to just work since all the roms are either .gen, .md, .smd or a zip file with the rom in any of those formats and all I wanted is have them on a playlist that can be quickly accessed. After scanning them I found out that around half of the games where there, the other half not.
My reaction to this was to take a look to the folder structure and try to match the folder names of thumbnails, playlists, roms folder. This time i was really convinced I understood what the problem was, I even renamed a few files to match the thumbnail names. I thought: "if all the folders have the same name, Retroarch will already know that these are Mega Drive files because I am matching the name "Sega - Mega Drive - Genesis" everywhere, and the thumbnails have same name as the files... this has to work!"

After another scan the problem was still there. I could see some thumbnails now, but big part of the roms were not there. Those roms come from different sources across time, some are hacks and translations, all or at least most worked always with different emulators.
I then found out about how Retroarch is trying to match every single file to a database and if it doesn't the game it is not added. It was a disappointment, I tried to see the database file but it doesn't seem to be in plain text. I gave up. My father was also struggling with the same thing when adding his ZX Spectrum games, some that he programmed himself decades ago.
Yes you can always manually load a game and add it to favorites, but it is not the same, what we love about Retroarch is precisely that feeling of having your full collection there, tidy, with playlists and thumbnails, and right now it is only possible with lots of manual editing to manually build playlists to basically bypass Retroach's way of doing things.

Another problem I see with not having an option of building playlists bypassing the database matching is that it really takes a huge amount of time in some cases, and if you have a big collection and you just added one new game it seems like it would try to match it all again, sometimes that means hours of leaving there the Playstation Classic working on it for adding a single game. For some people like me we wish to just add the games to a playlist in a basis of 1 entry per file or 1 entry per subfolder, and skip completely any match against a database. Then the thumbnails just have to match the filename, which is much easier to understand for the average user in my opinion than trying to understand what specific ROM you have to get (if you can) so it is accepted by the scanner or to manually edit playlists (I actually tried and I wasn't successful with that).

I just wanted to share my point of view as a new user. Maybe many of the things I comment here or suggest comes from too much ignorance about how it should work and they are just not possible. Thank you for reading. And big thanks for Retroarch!

@Mte90
Copy link

Mte90 commented Oct 18, 2019

I am just an user that every month discover a new homebrew game, load on retroarch, do a scan but is not found.
After a month that I not play with retroarch I don't remember the game name and I have to look in the folders in the ui to launch it.

It is very disappointing to not have an option to add in the playlist a game that is not in the database by folders.
I can accept also a prompt to ask for a confirm but is very annoying to use another computer for this task...

@wafflesinmybauble
Copy link

I don't mean to rez bump this, but just as a heads up to anyone, make sure that your scan path doesn't include any escape characters at the beginning of a folder name, such as an underscore. For example "_NoFiller"

@i30817
Copy link
Contributor

i30817 commented Dec 20, 2019

How the hell is underscore considered a 'escape' character (do you mean it's ignored by the scanner?)

@RobLoach
Copy link
Member

RobLoach commented Jan 26, 2020

This discussion has diverged a long way from its original intent. The Manual Scan is in. I suggest you use that for odd cases like what are discussed.

@libretro libretro deleted a comment from andres-asm Feb 23, 2020
@libretro libretro deleted a comment from andres-asm Feb 23, 2020
@borrelnoten
Copy link

I don't know why this topic is closed. I can't put my atari ST roms in a playlist. I tried to scan manualy but still no luck. They are not original games but hacked menu from crack scene (compressed multiple games onto a disc decompression when loading a game).
I want to have a list of my Atari discs so that I can play them without having ot select them manualy (loading the entire directory with roms every time I select one).

Same goes for the amiga discs (HDR extention) which are not picked up for some reason. The manual scan does not work. Also I have tried to make the LPL file manualy and that works! But not for long because after selecting and playing a disc on the list, the list is upaded and is empty after returning from the emulated game to the retroarch menu / playlist.

I really would love to see Atari ST and Amiga discs in the playlist because they are truely historic value for me.
I do not understand why that is so difficult because it is simply a simple file list without extention. I do not care about auto detection at all. Also, why not just accept the manual crafted playlist? Why distroy it after selecting an item in it? I love retroarch but the playlist thing is really not logical for me.

@RobLoach
Copy link
Member

I do not understand why that is so difficult because it is simply a simple file list without extention. I do not care about auto detection at all.

Unfortunately it's not that simple. How do we know what system it's for? There are lots of cores, all with different use cases, and platforms. A file without an extension could be associated with virtually any of them. There has been a lot of clean up in the scanner to add support for these platforms.

Also, why not just accept the manual crafted playlist? Why distroy it after selecting an item in it?

There is noone saying that manual crafted playlists are a bad thing. If a manual scan is destroying the existing playlist, this sounds like a new issue. Feel free to create one detailing re-produce steps and what you expect to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants