fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384
Merged
gantoine merged 1 commit intoMay 20, 2026
Merged
Conversation
a69d22c to
374eded
Compare
Contributor
Author
gantoine
requested changes
May 19, 2026
Fix several issues in ScreenScraper API request/response handling: - Correctly handle SS-specific HTTP error codes (KO responses, 429, 431, and the SS-quirk of returning 401 when server CPU >60%). - Construct requests with proper parameter encoding so jeuInfos lookups and search queries return the expected results. - Store media URLs returned by SS as-is, preserving the dev credential query parameters required for media playback. Removing them broke downstream media fetches. To keep dev credentials out of log output, add a redacting formatter in the logger pipeline that scrubs ssid/sspassword/devid/devpassword query parameters from any URL it sees. Test coverage added for the new HTTP error paths and the as-is URL storage behaviour.
374eded to
12c424f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Explain the changes or enhancements you are proposing with this pull request.
Several related fixes to RomM's ScreenScraper integration. Grouped here because they all touch the same request/response path and the same test file, but kept as one focused PR (no migration / no schema changes).
1. KO scrapes now submit useful data
Previously, hash-based lookups against
jeuInfos.phponly sentcrc/md5/sha1(andsystemeid/romtaille). When ScreenScraper couldn't match the hash, the resulting KO scrape entry on their side had no filename or ROM type recorded — so it was effectively useless for SS to build their database from.ScreenScraperService.get_game_infoalready acceptedrom_nameandrom_typeparameters, they were just never being passed. This PR wires them up at the call site:romnomis now sent as the actual filename of the largest ROM file in the rom_files set.romtypeis computed by a new_get_rom_type()helper that maps file extension to ScreenScraper's expected values:dossier— non-top-level files (inside a folder)iso—.iso,.cue,.chd,.gdi,.cdi,.binrom— everything elseNet effect: when a ROM doesn't match SS's database, the KO entry SS records now contains enough information to be actionable for them.
2. SS-specific HTTP error code handling
ScreenScraper uses non-standard HTTP status codes and returns 401 in situations that aren't actually auth failures. The error handler is updated to:
426→ raise 403 "ScreenScraper has blacklisted this application version. Please update RomM."430→ raise 429 "ScreenScraper daily scrape quota exhausted. Try again tomorrow."431→ raise 429 "ScreenScraper daily unrecognized-ROM quota exhausted. Try again tomorrow."423→ raise 503 "ScreenScraper API is currently offline."401→ SS quirk: returned when their server CPU is >60%, not when credentials are bad. Now logs a warning and returns an empty dict instead of treating it as an auth error.429→ log warning and retry after 2s (was previously silent).The same error mapping is applied in both the generic
_requestpath and theget_game_info-specific exception handler.3. ZZZ(NOTGAME) filtering
ScreenScraper marks non-game entries (BIOS files, demos, etc.) either via a
notgame: "true"field or by prefixing the name withZZZ(NOTGAME). A new_is_notgame()helper checks both, and these entries are now filtered out from:_search_rom()lookup_rom()get_matched_roms_by_name()Without this, a hash collision with a BIOS file or similar would surface a non-game match as if it were the user's ROM.
4. HTML entity sanitization in metadata text
SS returns
&,&,', ,",©literally in names and summaries. A_decode_html_entities()helper now decodes the common ones before metadata is stored, so titles like"Donkey Kong & Diddy's Kong Quest"render as"Donkey Kong & Diddy's Kong Quest".5. Strip ScreenScraper credentials from log output
Adds a regex pass in
backend/logger/formatter.pythat stripsssid/sspassword/devid/devpasswordquery parameters from any URL that appears in log output.The existing storage-layer stripping (
strip_sensitive_query_paramsinextract_media_from_ss_game()) handles the database side — this complements it at the logging side so URLs surfaced by httpx, exception traces, or future debug logging are also scrubbed.6. Misc. small fix
get_rom()now early-returns if the cleaned-up search term is empty (avoids spurious SS calls for filenames that are entirely tags).Test coverage
tests/adapters/services/test_screenscraper.py— new tests for each of the SS-specific HTTP error codes (KO/426/430/431/423/401-CPU-throttle), plus the 429 retry path.tests/handler/metadata/test_ss_handler.py— new file. Covers_is_notgame,_get_rom_type,_decode_html_entities, the notgame filtering in search/hash/name paths, the empty-search early-return, and the existing credential-stripping behaviour for stored media URLs.Note
This PR is one half of an earlier combined PR that was split per maintainer feedback. The companion PR (CHD raw hashing) touches a disjoint set of files and can be reviewed in parallel.
Note
AI assistance disclosure: This PR was developed with assistance from Claude (Anthropic). Claude contributed to authoring portions of the original code, and was used to split the original combined branch into smaller per-feature PRs, draft this description, and verify tests / lint / migrations locally. All code was reviewed and is endorsed by me before submission.
Checklist
Please check all that apply.