Skip to content

fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384

Merged
gantoine merged 1 commit into
rommapp:masterfrom
Spinnich:pr/screenscraper-api-and-redaction
May 20, 2026
Merged

fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384
gantoine merged 1 commit into
rommapp:masterfrom
Spinnich:pr/screenscraper-api-and-redaction

Conversation

@Spinnich
Copy link
Copy Markdown
Contributor

@Spinnich Spinnich commented May 17, 2026

Description
Explain the changes or enhancements you are proposing with this pull request.

Several related fixes to RomM's ScreenScraper integration. Grouped here because they all touch the same request/response path and the same test file, but kept as one focused PR (no migration / no schema changes).

1. KO scrapes now submit useful data

Previously, hash-based lookups against jeuInfos.php only sent crc/md5/sha1 (and systemeid / romtaille). When ScreenScraper couldn't match the hash, the resulting KO scrape entry on their side had no filename or ROM type recorded — so it was effectively useless for SS to build their database from.

ScreenScraperService.get_game_info already accepted rom_name and rom_type parameters, they were just never being passed. This PR wires them up at the call site:

  • romnom is now sent as the actual filename of the largest ROM file in the rom_files set.
  • romtype is computed by a new _get_rom_type() helper that maps file extension to ScreenScraper's expected values:
    • dossier — non-top-level files (inside a folder)
    • iso.iso, .cue, .chd, .gdi, .cdi, .bin
    • rom — everything else

Net effect: when a ROM doesn't match SS's database, the KO entry SS records now contains enough information to be actionable for them.

2. SS-specific HTTP error code handling

ScreenScraper uses non-standard HTTP status codes and returns 401 in situations that aren't actually auth failures. The error handler is updated to:

  • 426 → raise 403 "ScreenScraper has blacklisted this application version. Please update RomM."
  • 430 → raise 429 "ScreenScraper daily scrape quota exhausted. Try again tomorrow."
  • 431 → raise 429 "ScreenScraper daily unrecognized-ROM quota exhausted. Try again tomorrow."
  • 423 → raise 503 "ScreenScraper API is currently offline."
  • 401 → SS quirk: returned when their server CPU is >60%, not when credentials are bad. Now logs a warning and returns an empty dict instead of treating it as an auth error.
  • 429 → log warning and retry after 2s (was previously silent).

The same error mapping is applied in both the generic _request path and the get_game_info-specific exception handler.

3. ZZZ(NOTGAME) filtering

ScreenScraper marks non-game entries (BIOS files, demos, etc.) either via a notgame: "true" field or by prefixing the name with ZZZ(NOTGAME). A new _is_notgame() helper checks both, and these entries are now filtered out from:

  • Search results in _search_rom()
  • Hash-lookup results in lookup_rom()
  • Name-based matching in get_matched_roms_by_name()

Without this, a hash collision with a BIOS file or similar would surface a non-game match as if it were the user's ROM.

4. HTML entity sanitization in metadata text

SS returns &, &, ',  , ", © literally in names and summaries. A _decode_html_entities() helper now decodes the common ones before metadata is stored, so titles like "Donkey Kong & Diddy's Kong Quest" render as "Donkey Kong & Diddy's Kong Quest".

5. Strip ScreenScraper credentials from log output

Adds a regex pass in backend/logger/formatter.py that strips ssid / sspassword / devid / devpassword query parameters from any URL that appears in log output.

The existing storage-layer stripping (strip_sensitive_query_params in extract_media_from_ss_game()) handles the database side — this complements it at the logging side so URLs surfaced by httpx, exception traces, or future debug logging are also scrubbed.

6. Misc. small fix

  • get_rom() now early-returns if the cleaned-up search term is empty (avoids spurious SS calls for filenames that are entirely tags).

Test coverage

  • tests/adapters/services/test_screenscraper.py — new tests for each of the SS-specific HTTP error codes (KO/426/430/431/423/401-CPU-throttle), plus the 429 retry path.
  • tests/handler/metadata/test_ss_handler.py — new file. Covers _is_notgame, _get_rom_type, _decode_html_entities, the notgame filtering in search/hash/name paths, the empty-search early-return, and the existing credential-stripping behaviour for stored media URLs.

Note

This PR is one half of an earlier combined PR that was split per maintainer feedback. The companion PR (CHD raw hashing) touches a disjoint set of files and can be reviewed in parallel.

Note

AI assistance disclosure: This PR was developed with assistance from Claude (Anthropic). Claude contributed to authoring portions of the original code, and was used to split the original combined branch into smaller per-feature PRs, draft this description, and verify tests / lint / migrations locally. All code was reviewed and is endorsed by me before submission.

Checklist
Please check all that apply.

  • I've tested the changes locally
  • I've updated relevant comments
  • I've assigned reviewers for this PR
  • I've added unit tests that cover the changes

@Spinnich Spinnich force-pushed the pr/screenscraper-api-and-redaction branch from a69d22c to 374eded Compare May 19, 2026 15:52
@Spinnich
Copy link
Copy Markdown
Contributor Author

image

Comment thread backend/adapters/services/screenscraper.py
Comment thread backend/handler/metadata/ss_handler.py Outdated
Comment thread backend/handler/metadata/ss_handler.py Outdated
Comment thread backend/logger/formatter.py Outdated
@gantoine gantoine self-assigned this May 19, 2026
Fix several issues in ScreenScraper API request/response handling:

- Correctly handle SS-specific HTTP error codes (KO responses, 429, 431,
  and the SS-quirk of returning 401 when server CPU >60%).
- Construct requests with proper parameter encoding so jeuInfos lookups
  and search queries return the expected results.
- Store media URLs returned by SS as-is, preserving the dev credential
  query parameters required for media playback. Removing them broke
  downstream media fetches.

To keep dev credentials out of log output, add a redacting formatter in
the logger pipeline that scrubs ssid/sspassword/devid/devpassword query
parameters from any URL it sees.

Test coverage added for the new HTTP error paths and the as-is URL
storage behaviour.
@Spinnich Spinnich force-pushed the pr/screenscraper-api-and-redaction branch from 374eded to 12c424f Compare May 19, 2026 23:39
@gantoine gantoine merged commit 052245f into rommapp:master May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants