Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible thumbnail matching #16040

Merged
merged 1 commit into from
Dec 27, 2023

Conversation

zoltanvb
Copy link
Contributor

Description

Add logic to handle 3 possible thumbnail names, in following order:

  • most exact name derived from content file (same name, with .png extension)
  • usual name derived from playlist (usually coming from database)
  • shortened name up to first bracket, chopping off region/publisher etc. info

Example:
007 - The Living Daylights (1987)(Domark).png
007 - The Living Daylights (Domark).png
007 - The Living Daylights.png

For local file system, names are checked always. In theory, this could lead to a small performance impact.
For thumbnail downloads, names are checked in turn each time the item comes up in the playlist, meaning that it may take going back and forth 3 times for a thumbnail to appear. However, as a positive change, failed thumbnail downloads are not repeated for the same playlist, which was not the case earlier.

Related Issues

I intend this as an enabler for improving the thumbnail situation: enhance match ratio / reduce thumbnail repo size for databases (ZX Spectrum as an example) that contain several alternatives for the same game, typically gathered from Tosec. Using a shorter name, one thumbnail would be enough for e.g. box art, while the option is still there to show different title / snapshot screens. There are already several bit-by-bit duplicates in many repos under different names.

Related Pull Requests

This PR makes #16022 ineffective. However, I did not remove it as it could serve later as a setting for customizing this order above, if it turns out that one size does not fit all.

Planned future PRs: harmonizing database titles to follow No-Intro naming even if generated from Tosec, improving playlist display with subtitles and/or customizable title scheme, updating thumbnail repos.

Add logic to handle 3 possible thumbnail names, in following order:
- most exact name derived from content file (same name, with .png extension)
- usual name derived from playlist (usually coming from database)
- shortened name up to first bracket, chopping off region/publisher etc. info

For local file system, names are checked always.
For thumbnail downloads, names are checked each time the item comes up
in the playlist, meaning that it may take going back and forth 3 times
for a thumbnail to appear. However, as a positive change, failed thumbnail
downloads are not repeated for the same playlist, which was not the case
earlier.
@i30817
Copy link
Contributor

i30817 commented Dec 26, 2023

Did you mix or do not mix on repeated partial matches?

The two situations where.

  1. There is already at least one thumbnail of boxart, screenshot, title.
  2. First time (no thumbs) but there are disjointed sets of those in the different names.

Anyway, after doing a fuzzy match thumbnail downloader, this is a effective way to fuzzify for several sets, especially since many of the thumbnail server directories already do this sort of duplicate, so I agree it's a win.

There is a problem with this though, sometimes a set will have duplicate games for different names where the only thing that is different is the publisher metadata. This happened more in older consoles, where the publisher sometimes went under and it was the same game after all but sometimes it's not (the amiga set has 4 games from different devs called barbarian and barbarian 2, it's a little annoying when using the 'chop it off' strategy).

The way to fix this is of course, to add the longest matching thumbnail for the set, but sometimes the set is not the approved one (for you to want fuzzy matching) and you're SOL.

To go further would require (much) more aggressive normalization on both sides of the transfer. Here is my method for it to horrify you.

My method tries to have fuzzy matching after normalization but sometimes I'm forced to use the chop it off method anyway because the sets metadata format is so different they can't cross the threshold without it. That's the two ifs at the start of the normalization method.

Speaking of that you might not want to have [] delimiters deleted with the the () delimiters. Because quite simply, hacks are often not the original game (although people often change the main name in hacks that are so different theyre total conversions so it's probably not a big deal).

@zoltanvb
Copy link
Contributor Author

The solution in the PR will apply the naming scheme individually for the 3 thumbnails (boxart, title, snap) and eventually arrives at downloading whatever is available. So in that sense, mix and match works, with one exception: if there is an instance for each of boxart/title/snap locally, there is no checking online whether there is a more specific match available.

But it does not do any fuzzy prefix matching, only the 3 alternatives of file name - database name - short name. Selecting the "proper" image could get tricky, but I believe this solution will improve things without adding huge amount of complexity. Any very specific hacks, etc. can still be matched using the full file name.

Barbarian / Barbarian II is a good example of how things can be improved :) There are 21 and 59 entries in the database for Barbarian and Barbarian II respectively, and there are 2 / 5 boxart thumbnails with exactly the same content. After this modification, boxart could be just renamed to e.g. "Barbarian.png" and can still be shown whatever version one has dumped - with the exception of the cases where the title is also changed, such as "Barbarian - The Ultimate Warrior".

@i30817
Copy link
Contributor

i30817 commented Dec 26, 2023

No the problem with barbarian is that the games are actually different, from different devs, not just publishers. There are 2 barbarian games made by palace, and 2 made by psygnosis. One series was probably a C64 or spectrum port.

Commodore screwed up. These cases get more uncommon once lawyers and trademarks get heavily involved in the industry.

If you're trying to make this to reduce the number of thumbnails, I'm not quite sure it will work as you expect.

I personally already did a reduced number of tags when I did ps1 thumbs (removing the discs marker for m3u files), and in scummvm files too (I added the most specific scummvm names direct from scummvm.ini and the tagless version), but making one where all the markers are removed obligatory will cause a lot of churn in some sets I think, for not better results because the regional tag is often very important for covers. In the ps1 and snes particularly it's often the difference between art (japanese covers) and cringe (us\eu covers).

I agree with this pr though, as long as thumbnails aren't removed thoughtlessly from the server. I consider mixing thumbnails a disadvantage but it's not so bad in most cases where the the 3 set if thumbnails is complete, it's only the thumbnail databases that are incomplete where it will happen (and I can just keep using my utility personally). Avoiding it is complicated and isn't how the RetroArch thumbnail downloader works right now (it always replaces and downloads everything regardless of what exists), so it's not a big deal.

@i30817
Copy link
Contributor

i30817 commented Dec 26, 2023

If you're interested, there is one way you could reduce the number of thumbnails a lot.

But it's super tricky. And RetroArch already supports something like that if you're not casual. To the point I just did it myself externally.

In short, every game with multiple disks, diskettes, discs on the playlist, their entries become a single one, like the user scanned a m3u (but without being forced to create the m3u).

What I did was create a utility to create m3u, (iterate dirs, collect all relevant image formats, filter out image formats contained in cue\toc files, sort them, remove 'disc constructs' into a tuple with the original, iterate over real names and add them to a m3u, notice when the 'removed disc construct' changes to use a new m3u, tricky part being the sorting so it can support things like '(Boot)' or '(Boot disk)' or '(Saves)' or '(Save disk)', etc), so all my games start with a single image set and playlist entry.

Removing only disc constructs because ignoring the rest of the tags, especially region when sorting or creating m3u is a bad idea when you have multiple region sets. Or compilation of two games with 'different' titles in those titles, one on each disc, distinguished by tags (like this and this).

It would be a lot of work for something that may be too fiddly for RetroArch. But if it was 'standard' in the scans it would make a lot of thumbnails useless (namely, disc\diskette >1). Not to mention making playlists of fullsets smaller.

Instead of being done by name manipulation\sorting, it could also be done though database metadata (at the cost of only working with the automatic scanner and not the manual scanner\random "unapproved" sets). I'm not enthusiastic about that option, but it would be good for casuals anyway, and much more robust and easier to code, at the cost of annoying manual scanner users that don't create their own m3u if you do delete thumbnails from the server.

@zoltanvb
Copy link
Contributor Author

Just to be clear: I do not want to immediately optimize away all the duplicates, maybe not even in the long run for some popular databases where a lot of work was put into crafting the existing set. Even if the PR is accepted and makes it into the next release, it is probably a few years before most users will have access to this. But there are a lot of new/improved DBs and snaps being added nowadays, and those could benefit from this improvement, without making any degradation to existing experience.

making one where all the markers are removed obligatory will cause a lot of churn in some sets
Indeed, but this change does not make anything mandatory. It is not a perfect solution, as you point out it can get quite complex. Some of it is best left to custom solutions or even separate frontends - after all, 3 still images is not cutting edge technology, one could have e.g. speedrun GIFs, title music audio loops or whatever. However, these rules in the PR are simple and can be comprehended by anyone wishing to finetune thumbnails, whether locally for own use or in the common repo.

@LibretroAdmin LibretroAdmin merged commit 3ce56c5 into libretro:master Dec 27, 2023
23 checks passed
@LibretroAdmin
Copy link
Contributor

@i30817 If you have any further thoughts on this subject, open a new issue and tag @zoltanvb in it. That way, you guys can perhaps converse further on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants