Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add many nes hacks and translations #619

Closed
wants to merge 1 commit into from
Closed

Conversation

i30817
Copy link
Contributor

@i30817 i30817 commented Apr 15, 2018

Same as the closed PR.

I verified that RA checksums don't skip the header like redump dats do so the output rom() entry checksums are of the whole patched file.

This was made by
having a cli version of flips on the same dir as the script (built with make CLI=1)
having a no-intro (xml format) dat for nes to pass (to use the no-intro names when possible instead of the names on the romhacking page)
and using this script:
edit, script updated. edit: twice:
https://gist.github.com/i30817/c5332248f46113fcb4ca03081f7673f2

with the following arguments:
./makedat.py NES/ nes -d no-intro-nes.dat

This verifies the games against the no-intro dat and if it finds it, use that name, but if it doesn't, warn use the romhacking.net page name. In effect this only 'warned' on unlicensed games translations (since i had no hacks for unlicensed games).

Hacks that aren't translations use the name on the redump page always but its extended description might use the dat name if it's found because those are more likely to match the local rom name than the romhacking entry. But if not found still use romhacking 'secondary' title. I haven't a example of this because i've not seen a hack that isn't a translation of unlicensed games on my collection.

A warning: although this calculates the right checksum values on the where you combined two hacks into a single softpatch file, you'll end up with 'only' the first hack link on the readme and the name of that hack.
On these files, the only one that had that problem was Castlevania 3: Improved Controls + Localization / Title Screen Redone [Hack by NaOH and ShadowOne333] which i edited manually.

I also didn't allow to add romhacks that flips said failed the checksum when applying the rom so some hacks won't be here (besides the ones that i never downloaded anyway). One example of such is Castlevania: Chorus of Mysteries, which is supposed to apply to some janky rom that doesn't seem to be on the net anymore.

@i30817
Copy link
Contributor Author

i30817 commented Apr 16, 2018

please don't merge this yet, i have a better version of the script about to work that handles multiple patches on the descriptions. The script is also handy for finding updates to patches you have...

@i30817
Copy link
Contributor Author

i30817 commented Apr 16, 2018

Ok, can be review i guess. The rest is a question of quote this, separate that. Right now the script can support multiple links in a 'version' file and adds info about all of the patches. There is a complicated algorithm on the script but it basically has this idea:

    #main name priority:
    # first hack with zero translations > rom_title in dat > last translation > first hack
    # secondary name be:
    # [(last)T-lang by author] for 1+ T and 0 hacks (translations are only multiple if addendums)
    # [(last)T-lang by author + # Hacks] for 1+ T and 1+ hacks + 
    # [(first) Hack by author] for 0 T and 1 hack
    # [(first) Hack by author + # hacks] for 0 T and 1+ hack
    # (# is always equal to (number of translations + number of hacks -1) when it appears

And the typical output is:

game(
    name "Lost Vikings II - always stereo [Hack by sluffy + 1 hacks]"
    description "Lost Vikings II - always stereo hack by sluffy version (1.0) + Lost Vikings II - password saving hack by sluffy version (1.0)"
    rom ( name "Lost Vikings 2 (USA).sfc" size 1048576 crc 683eeea8 md5 fd2b2810f31299f3ebf070e7062614b2 sha1 296a418b12958e5432ea6926376ce810450de69f )
    comment "http://www.romhacking.net/hacks/3782/"
    comment "http://www.romhacking.net/hacks/3783/"
)

I'll open a snes one since it's so easy to parse if you have the version files.

@RobLoach
Copy link
Member

There are a large amount of entries here... Could we just bring in the top 100 or so? The larger we make the database, the slower scanning will be.

@i30817
Copy link
Contributor Author

i30817 commented Apr 16, 2018

There is a large amount of translated roms over the decades. I'm against cutting it. And i wouldn't know how to prioritize even if i wanted.

Notice that these are only the roms i have (i didn't find a secret romhacking dat), so there are even more floating around. Conservatively, a lot more.

Besides, these are nes and snes roms. What actually destroys the scanner is cd platforms without serial detection (saturn etc).

@ghost
Copy link

ghost commented Apr 16, 2018

to be honest, my previous comment(the other PR) was sarcastic... there are too many hacks/translations and even adding them all wont make it a guarantee that everyone will be able to have the same crc since nes headers can have extra bits (even on the needed 1st 7 bytes) that affects crc. for example mirroring can be horizontal or vertical for mapper-controlled games.

retroarch should support skipping headers for nes/fds when scanning/calculating crc.

@RobLoach
Copy link
Member

I'd also REALLY like libretro/RetroArch#2033 as an alternative solution around this. Just allow people to add the hacks without even needing them in the database.

@i30817
Copy link
Contributor Author

i30817 commented Apr 16, 2018

the whole point of having the database is having the metadata. Cutting off a large amount of data the users use because 'no this is too much' is already what RA does badly. Adding without the scanner is nice to speculate on but it hasn't happened for years and even if it did, it wouldn't give the user experience of this.

It's not like this data is particularly badly behaved. As long as you have the hacks, the script and one metadata file with the version and the url on romhacking, you can regenerate these.

I don't have anything against skipping headers, but you should first do that work on the scanner on Ra if you want it to happen. It's per platform too.

@RobLoach
Copy link
Member

Cutting off a large amount of data the users use because 'no this is too much' is already what RA does badly.

The issue is scanning performance... The larger we make the database, the slower scanning gets.

@i30817
Copy link
Contributor Author

i30817 commented Apr 16, 2018

Doesn't sound like it's a big problem to make it a different db in that case that you need to download from the downloader to opt in.

But, i'm still not convinced about the performance argument simply because what really hurts there is scanning discs and cds that are unzipped and thus need to calculate crc32. For small files like these the crc would have to be calculated anyway if the files were on the database or not (except if they're zipped). Might as well make the work useful if you're going to take the performance hit from that.

@i30817
Copy link
Contributor Author

i30817 commented Apr 17, 2018

Let me make my point a little clearer: people will have these files on the roms folders already and when they request 'scan here' to RA, it will have no chance but to scan. If they aren't found it just makes the scan (infinitesimally) longer because they'd have to calculate CRC and search the whole database just to fail.

In fact that's one of the ways you could help speed up the Scanner. Order the database by filesize when built and create a binary search to the 'Rom to be scanned size'. Only calculate the CRC of the file if there is at least 1 match to the exact byte size and search upwards and downwards of the first byte size match until there is a CRC match or the size changes.

i dunno how effective this could be at filtering unnecessary CRC calculation but my guess it's at least 'ok' for non-database files of nonstandard sizes (ie: your ps1 etc redumps, if they weren't on the database) but not of 'standard' size (your Nintendo 'pseudo random' filled Wii discs, all 4.5 gb files have to calculate crc and search that section, which is bad obviously, so you use serials). For roms, not much benefits because they all have standard sizes but it could work to limit the search to their own section of the database even having a single database.

Though extension filtering is much more effective at avoiding random scan delays. it's unfortunate that these files have the same extension as what the scanner is searching for then. Adding them might actually speed it up (very slightly because most of the cost is CRC calculation as i said).

@RobLoach
Copy link
Member

I appreciate these additions but I think there are two issues....

  1. Adding things many entries will add bloat and slow down the scanner
  2. Like @retro-wertz mentioned, when someone applies the patch to their NES rom, it is not guarenteed to result in the same CRC, since the headers are not skipped when scanning in RetroArch.

The other PRs are looking great though. I'll go through and do some reviews.

Copy link
Member

@RobLoach RobLoach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind reducing the amount of entries?

@i30817
Copy link
Contributor Author

i30817 commented Apr 24, 2018

  1. i already explained why i think that the speed argument is bogus. Simply said, (again) the actual speed hit will happen if the user has rom files on the directory they scan not because they're on the database. Prior to matching they have to calculate the CRC and that happens regardless of if the roms are on the database or not, because to even find 'legit' files they have to scan unknown rom files. A few hundreds (or thousands) of entries on a database don't hold a candle to crc32 calculation.
    Which means any speed lost would be better regained on other places. Namely making crc32 calculation faster would be better, or adopting a two stage strategy for cds of 'only calculate crc32 if you need it instead of serials' (eg: there is a hack with the exact size of the rom for a simple strategy, binary search of that if you're really worried about database iteration).
    Quite simply i'm not convinced until i see non-synthetic profiling that the database iteration and not number of roms on a directory is the problem.

  2. i used the sane defaults of not using headers, which is completely possible because of ipsbehead. You can simply say to the users 'use nsrt to remove the header', To do otherwise would imply multiplying the entries by two (or more as seen on the n64 hacks that are thankfully few).

@i30817 i30817 force-pushed the nes-dat branch 2 times, most recently from 8d965ca to bb6d3e5 Compare April 28, 2018 07:46
@i30817 i30817 closed this May 6, 2018
@i30817 i30817 deleted the nes-dat branch May 6, 2018 15:44
@i30817 i30817 restored the nes-dat branch May 6, 2018 15:44
@i30817 i30817 deleted the nes-dat branch July 4, 2018 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants