Skip to content

fix(discogs): accept Cat# identifiers (closes #101)#102

Merged
fabiodalez-dev merged 7 commits intomainfrom
feature/discogs-catno-support
Apr 22, 2026
Merged

fix(discogs): accept Cat# identifiers (closes #101)#102
fabiodalez-dev merged 7 commits intomainfrom
feature/discogs-catno-support

Conversation

@fabiodalez-dev
Copy link
Copy Markdown
Owner

@fabiodalez-dev fabiodalez-dev commented Apr 19, 2026

Summary

Fixes Hans' report on issue #101: Discogs plugin rejected CDP 7912682 (Bonnie Raitt — Nick of Time, Capitol 1989) because the old validateBarcode() only accepted 12/13-digit numeric barcodes. Any release older than barcode adoption, or any promo/limited pressing without a printed EAN, couldn't be scraped.

Discogs identifies releases through three distinct keys: EAN/UPC (barcode on sleeve), Cat# a.k.a. catalog number (printed on disc/spine/label — alphanumeric), and free-text (title + artist). MusicBrainz exposes the same split via its Lucene barcode: vs catno: filters.

Changes

  • validateBarcode() now accepts Cat# patterns via new isCatalogNumber() helper.
  • New public static identifierKind() classifies inputs as barcode / catno / unknown.
  • fetchFromDiscogs() routes catno inputs to ?catno= (not ?barcode=).
  • searchMusicBrainz() routes catno inputs to catno: Lucene field (not barcode:).
  • BMG-style pure-digit Cat# with internal separators (e.g. 74321 66847 2) are accepted via hasDigit && (hasLetter || hasSeparator) heuristic; bare 12/13-digit strings still classify as barcode (checked first, so no regression).

Test plan

  • php tests/discogs-catno.unit.php — 22/22 assertions green (ISBN, EAN-13, UPC-A, Cat# variants, rejection edge cases, classification)
  • PHPStan level 5 clean on modified plugin + new test
  • Manual: try scraping CDP 7912682 in a real install with Discogs plugin active (user verification)
  • Manual: verify existing EAN-13 scraping flows still work (5099902988023 etc.)

Notes

  • The unit test is pure PHP (no DB, no HTTP, no Playwright) — runs in <1s.
  • tests/*.unit.php whitelisted in .gitignore so future plugin unit tests can live alongside *.spec.js E2E tests.
  • phpstan.neon got a storage/plugins entry in scanDirectories locally (file is gitignored), which lets PHPStan resolve plugin classes from test files.

@coderabbitai review

Summary by CodeRabbit

  • Miglioramenti

    • Migliorata la gestione degli identificatori Discogs: riconoscimento e normalizzazione di codici a barre e Cat#, ricerche mirate per Discogs/MusicBrainz e fallback evitati per input non riconosciuti.
    • Scraping ISBN: ora viene usato l’identificatore originale quando la forma canonica non è un ISBN valido.
  • Test

    • Aggiunti test unitari e E2E e nuovi seed per coprire validazione, classificazione, normalizzazione e verifica admin.
  • Chore

    • .gitignore aggiornato per tracciare file di test e seed (json/sql/md).

…101)

Hans reported v0.5.8 still rejecting 'CDP 7912682' (Bonnie Raitt "Nick of
Time", Capitol 1989) — a valid Discogs Catalog Number. The old
validateBarcode only accepted 12/13-digit numeric barcodes, so any release
older than barcode adoption or any promo/limited pressing without a
printed EAN couldn't be scraped.

Discogs identifies releases via three keys: EAN/UPC (barcode on sleeve),
Cat# a.k.a. catalog number (printed on disc/spine/label — alphanumeric),
or free-text. MusicBrainz exposes the same split via its Lucene `barcode:`
vs `catno:` filters.

Changes:
- validateBarcode now accepts Cat# patterns via new isCatalogNumber()
- identifierKind() classifies inputs as 'barcode' / 'catno' / 'unknown'
- fetchFromDiscogs routes catno inputs to ?catno= (not ?barcode=)
- searchMusicBrainz routes catno inputs to catno: (not barcode:)
- BMG-style pure-digit Cat# with internal separators ("74321 66847 2")
  are accepted via hasDigit && (hasLetter || hasSeparator) heuristic;
  bare 12/13-digit strings still classify as barcode (checked first).

Test coverage: tests/discogs-catno.unit.php (22 assertions, pure PHP).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Classifica identificatori in barcode/catno/unknown; aggiunge normalizzazione e builder di query MusicBrainz; aggiorna ricerche Discogs/MusicBrainz e persistenza per non trattare i catno come EAN; modifica ScrapeController per usare l’identificatore di ricerca originale quando non è ISBN canonico; aggiunti test, seed e aggiornamento .gitignore. (≤50 parole)

Changes

Cohort / File(s) Summary
Configurazione ignore
\.gitignore
Aggiunte eccezioni per tracciare test eseguibili e seed: !tests/*.unit.php, !tests/seeds/, !tests/seeds/* e reinclusione di tipi di file seed (*.json, *.sql, *.md).
Plugin Discogs
storage/plugins/discogs/DiscogsPlugin.php
Introdotte API pubbliche identifierKind(string): string, canonicalSearchIdentifier(string,string): string, buildMusicBrainzQuery(string): string. validateBarcode(...) estesa per riconoscere barcode e formati catno. Fetch ora sceglie barcode= o catno=, passa fallbackBarcode solo per barcode e evita chiamate MusicBrainz quando la query è vuota.
Controller scraping
app/Controllers/ScrapeController.php
byIsbn conserva $rawIdentifier, calcola $searchIdentifier (usa il raw se il clean non è ISBN valido) e lo passa a hook, fetch, fallback e log; error payload/404 aggiornati per riportare $searchIdentifier.
Test unitari PHP
tests/discogs-catno.unit.php, tests/discogs-catno-seeds.unit.php
Aggiunti runner standalone che verificano validazione, classificazione, canonicalizzazione e builder MusicBrainz contro fixture; escono con codice 1 su fallimenti.
Test E2E Playwright
tests/discogs-catno-documents.spec.js
Nuovo spec (skip se env mancanti) che semina documenti in DB, verifica campi (tipo_media, formato, ean, note_varie) e controlla pagine admin per titolo/identificatore/track/publisher.
Seed dati per test
tests/seeds/discogs-catno-documents.json, tests/seeds/discogs-catno-identifiers.json
Aggiunti due file JSON: 20 documenti per E2E e 20 casi identificatori (barcode, catno, invalidi) per test di parsing/normalizzazione.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Plugin as DiscogsPlugin
  participant DiscogsAPI as Discogs (API)
  participant MusicBrainz as MusicBrainz (API)
  participant DB as Persistence

  Client->>Plugin: invia identificatore
  Plugin->>Plugin: identifierKind(input) -> "barcode"/"catno"/"unknown"
  alt kind == barcode
    Plugin->>DiscogsAPI: GET /database/search?barcode=<digits>
  else kind == catno
    Plugin->>DiscogsAPI: GET /database/search?catno=<trimmed value>
  else
    Plugin->>DiscogsAPI: GET /database/search? (senza barcode/catno)
  end
  DiscogsAPI-->>Plugin: risultati
  alt nessun risultato rilevante
    Plugin->>Plugin: query = buildMusicBrainzQuery(input)
    alt query != ""
      Plugin->>MusicBrainz: GET /ws/2/...?query=<fragment>
      MusicBrainz-->>Plugin: risultati fallback
    end
  end
  alt risultato contiene barcode
    Plugin->>DB: persisti EAN/barcode
  else risultato solo catno
    Plugin->>DB: non persistere come EAN (salva solo catno)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minuti

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Il titolo "fix(discogs): accept Cat# identifiers (closes #101)" descrive accuratamente il cambiamento principale: l'aggiunta del supporto per gli identificatori di catalogo Discogs oltre ai codici a barre.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/discogs-catno-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@storage/plugins/discogs/DiscogsPlugin.php`:
- Around line 344-347: The code is propagating catalog numbers into the
barcode/ean field because when identifierKind($isbn) returns 'catno' the same
$isbn is still passed as a fallback barcode; update the logic in the Discogs
plugin so the fallbackBarcode (or fallback barcode parameter) is only supplied
when the identifier kind is an actual barcode (i.e., kind === 'barcode' or
$param === 'barcode'), not when kind === 'catno'; locate uses of identifierKind,
$param, $isbn and any $fallbackBarcode assignments in the methods around the
current diff and the other occurrences referenced (the blocks using $param = ...
and code that passes $fallbackBarcode) and change them to pass null/omit the
fallback when kind is 'catno'.
- Around line 1112-1115: The MusicBrainz query must quote multi-word catalog
numbers to avoid Lucene splitting: when identifierKind($barcode) returns 'catno'
set the query value to a quoted string (e.g. "\"$barcode\"") before URL-encoding
so $url contains catno:"CDP 7912682" rather than catno:CDP 7912682; update the
$url construction to apply the quotes only for $field === 'catno' (use urlencode
or rawurlencode on the quoted value) while leaving barcode searches unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 256b2cff-6072-4421-ba3a-a3ecb80a4d1f

📥 Commits

Reviewing files that changed from the base of the PR and between 4f03790 and d5b4764.

📒 Files selected for processing (3)
  • .gitignore
  • storage/plugins/discogs/DiscogsPlugin.php
  • tests/discogs-catno.unit.php

Comment thread storage/plugins/discogs/DiscogsPlugin.php Outdated
…ne quoting

Two Major findings from CodeRabbit on #102:

1. Cat# propagation into EAN field:
   fetchFromDiscogs() was passing the original input ($isbn) as
   fallbackBarcode to searchMusicBrainz() and fetchDiscogsReleaseFromSearchResult().
   Both eventually reach mapReleaseToPinakes() / mapMusicBrainzToPinakes()
   which do `$releaseBarcode = $fallbackBarcode ?? extractBarcodeFromRelease(...)`,
   meaning a Cat# like "CDP 7912682" could end up persisted in the book's
   EAN column. Now compute $fallbackBarcode = $kind === 'barcode' ? $isbn : null
   and propagate that instead — Cat# inputs rely on extractBarcodeFromRelease()
   pulling a real EAN from the upstream payload.

2. MusicBrainz Lucene query split on spaces:
   `catno:CDP 7912682` is parsed by Lucene as `catno:CDP AND 7912682`,
   returning false-positive matches on unrelated releases. Per the
   MusicBrainz Indexed Search Syntax docs, multi-word phrases must be
   double-quoted. Extracted the field/value construction into a new
   static buildMusicBrainzQuery() that quotes catno values and strips
   non-digits from barcode values. rawurlencode() is applied to the
   full field:value fragment (not just the value) so the `:` stays
   URL-safe and the quotes survive transit.

Test coverage: tests/discogs-catno.unit.php grew from 22 to 28 assertions
covering the new buildMusicBrainzQuery() contract. CodeRabbit's curl
verification confirmed the unquoted query returns wrong results.
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@storage/plugins/discogs/DiscogsPlugin.php`:
- Around line 370-379: La variabile $fallbackBarcode viene assegnata
direttamente dall'input quando identifierKind($isbn) restituisce 'barcode',
permettendo che valori formattati tipo "5099902-988023" vengano preferiti e
salvati in ean; invece normalizza il valore prima di assegnarlo: quando
identifierKind(...) === 'barcode' passa $isbn attraverso la routine di
canonicalizzazione (es. rimuovere caratteri non numerici / conversione a EAN
canonico, o riusare extractBarcodeFromRelease se adatta) e assegna il risultato
solo se la normalizzazione produce un barcode valido; aggiorna la assegnazione
di $fallbackBarcode e assicurati che mapReleaseToPinakes riceva il valore
normalizzato o null se non valido.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e6b14be6-4d04-46df-98e0-a343194f8a85

📥 Commits

Reviewing files that changed from the base of the PR and between d5b4764 and 84afc52.

📒 Files selected for processing (2)
  • storage/plugins/discogs/DiscogsPlugin.php
  • tests/discogs-catno.unit.php

Comment thread storage/plugins/discogs/DiscogsPlugin.php Outdated
…t round 3)

CodeRabbit round 3 flagged a residual issue in the round-2 fix: when
identifierKind() returns 'barcode', the raw input was assigned to
$fallbackBarcode. For inputs like '5099902-988023' (hyphenated EAN
accepted via isNumericBarcode's digit-stripping check), the hyphenated
form would then flow through mapReleaseToPinakes() into the `ean`
column — persisting a non-canonical value.

Extracted canonicalSearchIdentifier(string, string) as a shared public
static helper that:
- For kind='barcode': strips non-digits ('5099902-988023' → '5099902988023')
- For kind='catno' / 'unknown': trims whitespace only (Cat# spaces are
  semantically part of the identifier, e.g. 'CDP 7912682')

fetchFromDiscogs() now uses canonicalSearchIdentifier() for BOTH the
Discogs search URL and the fallbackBarcode value, ensuring the form
passed to mapReleaseToPinakes()'s `$fallbackBarcode ?? extract...`
chain is always canonical digits.

Test coverage grew to 33 assertions — 5 new cases exercise
canonicalSearchIdentifier directly.
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Full review triggered.

@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Full review triggered.

fabiodalez-dev added a commit that referenced this pull request Apr 20, 2026
28 regression assertions on the DDL emitted by ArchivesPlugin (the
strings that will hit the production database when an admin activates
the plugin) + reflection-based checks on the DI contract.

The .gitignore whitelist mirrors the one added on feature/discogs-catno-support
(PR #102) so future plugin unit tests can live alongside the Playwright
*.spec.js files without requiring case-by-case exceptions.

Run:
  php tests/archives-plugin.unit.php
  → Passed: 28   Failed: 0
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@storage/plugins/discogs/DiscogsPlugin.php`:
- Around line 172-183: buildMusicBrainzQuery currently treats an identifierKind
of 'unknown' as a barcode which can produce malformed MusicBrainz queries;
change buildMusicBrainzQuery to detect identifierKind($input) === 'unknown' and
immediately return an empty string (or alternatively throw an
InvalidArgumentException) so callers won't generate queries like "barcode:abc",
and update the function docblock to state that it returns an empty string for
unknown identifiers and callers must filter that case; reference the
buildMusicBrainzQuery and identifierKind symbols when applying the fix.
- Around line 81-92: The isNumericBarcode function currently treats any string
composed of digits, spaces and hyphens as a barcode; tighten the validation in
isNumericBarcode by replacing the permissive /^[\s0-9\-]+$/ check with a pattern
that allows only digits with optional internal hyphen-separated groups and
optional surrounding spaces (e.g. /^\s*\d+(?:-\d+)*\s*$/) while keeping the
existing digit-count check (12 or 13 after stripping non-digits); update the
inline comment in isNumericBarcode to document this tradeoff and the allowed
hyphen format (so pure hyphenated catnos like 5099902-988023 remain valid but
arbitrary hyphen/space-only inputs do not get misclassified).

In `@tests/discogs-catno.unit.php`:
- Around line 22-24: The file loads Hooks and HookManager with require_once but
the comment says Hooks isn't autoloaded and activate() is never invoked, so
remove the redundant require_once lines for Hooks.php and HookManager.php (leave
the require for storage/plugins/discogs/DiscogsPlugin.php). If you prefer to
keep them as a precaution, instead update the comment above lines referencing
Hooks/HookManager to state they are intentionally required for parsing/type
usage (mention Hooks and HookManager by name) so the code and comment are
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b23b9ba1-2d1a-493c-a4c2-a35335e7baca

📥 Commits

Reviewing files that changed from the base of the PR and between 4f03790 and e85b459.

📒 Files selected for processing (3)
  • .gitignore
  • storage/plugins/discogs/DiscogsPlugin.php
  • tests/discogs-catno.unit.php

Comment thread storage/plugins/discogs/DiscogsPlugin.php
Comment thread storage/plugins/discogs/DiscogsPlugin.php
Comment thread tests/discogs-catno.unit.php Outdated
…empty query on unknown

- isNumericBarcode: regex ristretta a /^\s*\d+(?:-\d+)*\s*$/ per escludere
  input con whitespace interno o pathologici fatti di soli separatori.
  5099902-988023 (EAN hyphenated) resta valido.
- buildMusicBrainzQuery: ritorna '' per identifierKind()==='unknown' invece
  di emettere barcode:<gibberish>. Rimosso else per soddisfare PHPMD.
- searchMusicBrainz: guard early-return se query è vuota (no HTTP round-trip
  con query= malformato).
- tests/discogs-catno.unit.php: rimossi require_once inutili per Hooks.php
  e HookManager.php (commento già indicava che activate() non è invocato).
  Aggiunti 2 test di regressione per il nuovo contratto 'unknown' → ''.
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Full review triggered.

1 similar comment
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@storage/plugins/discogs/DiscogsPlugin.php`:
- Around line 81-97: The isNumericBarcode() check currently rejects barcodes
with internal whitespace (e.g. "0777 7836 4627") even after verifying 12/13
digits, so update the regex in isNumericBarcode to allow spaces as valid
separators between digit-groups while still rejecting letters and pathological
separators; specifically change the pattern used in preg_match from
'/^\s*\d+(?:-\d+)*\s*$/' to accept either a single space or hyphen between
groups (for example '/^\s*\d+(?:[ -]\d+)*\s*$/'), keeping the leading/trailing
whitespace handling and relying on the existing digit-count check.
- Around line 184-196: La funzione buildMusicBrainzQuery usa l'input grezzo per
i catalog numbers (catno) causando query con spazi non trim-mati; invece di
concatenare $input direttamente nell'if ($kind === 'catno') usa
canonicalSearchIdentifier($input) per ottenere la versione
normalizzata/trim-data, poi applica addcslashes su quel valore normalizzato e
costruisci la stringa catno:"..."; mantieni il flusso esistente per
identifierKind/ barcode (cioè lascia il branch barcode così com'è) e sostituisci
solo l'uso di $input nel ramo catno con la chiamata a canonicalSearchIdentifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 19dfdd63-bf09-4e5a-9acf-910851e5c8a6

📥 Commits

Reviewing files that changed from the base of the PR and between e85b459 and be6abac.

📒 Files selected for processing (2)
  • storage/plugins/discogs/DiscogsPlugin.php
  • tests/discogs-catno.unit.php

Comment thread storage/plugins/discogs/DiscogsPlugin.php Outdated
Comment thread storage/plugins/discogs/DiscogsPlugin.php
Add reusable Cat#/barcode seed fixtures and Playwright coverage for persistent music documents.

Keep raw plugin-accepted identifiers through scrape hooks while preserving canonical ISBN handling.
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Full review triggered.

…in MB query

- isNumericBarcode: regex ora /^\s*\d+(?:(?:-|\s+)\d+)*\s*$/ per ammettere
  UPC-A con spazi di raggruppamento (es. "0777 7836 4627") oltre a quelli
  hyphenated. Prima, dopo il round 4, finivano in isCatalogNumber().
- buildMusicBrainzQuery: ramo catno ora passa per canonicalSearchIdentifier()
  per trim di whitespace padding. Allineato alla canonicalizzazione già in
  uso sul ramo barcode ("  SRX-6272  " → catno:"SRX-6272").
- tests: aggiunti regression test per UPC-A spaziato e catno con padding.
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/discogs-catno-documents.spec.js`:
- Around line 125-136: The spec currently asserts database values written by the
seeder (via SELECT from libri) so the expectation "kind==='catno' → ean===''"
only verifies seedDocument behavior, not DiscogsPlugin::fetchFromDiscogs or
ScrapeController::byIsbn; update the test to either (a) call the real pipeline
(DiscogsPlugin::fetchFromDiscogs → ScrapeController::byIsbn) and assert the
resulting libri row, or (b) keep it as a persistent fixture/smoke UI spec but
add a clear top-of-file comment and test description stating this is a fixture
persistence check (not a regression test for DiscogsPlugin), referencing the
variables/functions used here (dbQuery, seedDocument, doc, and the libri table)
so future readers understand the intent.
- Around line 49-83: The UPDATE in seedDocument currently uses WHERE titolo =
... which can affect non-seed rows; change the seed logic to target a
deterministic seed key: when running the UPDATE/SELECT/INSERT use the E2E key
embedded in note_varie (e.g. add WHERE note_varie LIKE '%E2E seed key:
${doc.key}%' to the UPDATE and subsequent SELECT) or add/consume a dedicated
seed_key column and filter by it; update the dbExec UPDATE, the dbQuery SELECT
id lookup, and the INSERT to set that same seed identifier so the operation
becomes idempotent and only touches seeded rows.

In `@tests/seeds/discogs-catno-documents.json`:
- Line 11: The fixture entry E2E_CATNO_010 is misclassified as kind:"catno"
because its identifier "00946 3 21438 2 9" now matches the updated
isNumericBarcode() and will be treated as a barcode; either change E2E_CATNO_010
to kind:"barcode" and set canonical:"0094632143829" (and ensure ean/related
fields reflect barcode usage), or replace the identifier with a non-numeric Cat#
(e.g., "EMI CDP 5099902") and keep kind:"catno"; make the same corresponding
update in the paired identifiers fixture (the discogs-catno-identifiers entry
for E2E_CATNO_010) so both seeds remain consistent with production behavior.

In `@tests/seeds/discogs-catno-identifiers.json`:
- Around line 74-81: The seed "EMI spaced numeric Cat# with 13 digits" is being
matched by isNumericBarcode (/^\s*\d+(?:(?:-|\s+)\d+)*\s*$/) and thus
identifierKind, canonicalSearchIdentifier and buildMusicBrainzQuery treat it as
a barcode; either (A) update this seed to be a barcode-consistent test by
changing its kind to "barcode", setting canonical to the digits-only form
"0094632143829" and musicBrainzQuery to "barcode:0094632143829" (so
identifierKind -> "barcode" matches expectations), or (B) preserve catno
semantics by changing the input string to avoid a pure 13-digit spaced sequence
(e.g. prefix with "EMI " -> "EMI 00946 3 21438 2 9") so isNumericBarcode no
longer matches and identifierKind returns "catno" and
canonicalSearchIdentifier/buildMusicBrainzQuery remain as the catno values;
apply one of these fixes and run the unit tests referencing isNumericBarcode,
identifierKind, canonicalSearchIdentifier, and buildMusicBrainzQuery to verify.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3d76485-bb1a-4ef2-93c5-0929cae71406

📥 Commits

Reviewing files that changed from the base of the PR and between be6abac and d0f79bf.

📒 Files selected for processing (8)
  • .gitignore
  • app/Controllers/ScrapeController.php
  • storage/plugins/discogs/DiscogsPlugin.php
  • tests/discogs-catno-documents.spec.js
  • tests/discogs-catno-seeds.unit.php
  • tests/discogs-catno.unit.php
  • tests/seeds/discogs-catno-documents.json
  • tests/seeds/discogs-catno-identifiers.json

Comment thread tests/discogs-catno-documents.spec.js
Comment thread tests/discogs-catno-documents.spec.js
Comment thread tests/seeds/discogs-catno-documents.json Outdated
Comment thread tests/seeds/discogs-catno-identifiers.json
…code reclass

Risolve 3 findings CodeRabbit su PR #102:

1. tests/discogs-catno-documents.spec.js (Minor):
   UPDATE/SELECT di seedDocument ora filtra per il marker
   "E2E seed key: <doc.key>" embedded in note_varie invece che per titolo.
   Prima, un titolo non-seed identico a quello del fixture (o un
   soft-deleted con stesso titolo) veniva mutato dalla UPDATE, producendo
   seed non idempotenti su DB pre-esistenti. Ora la cycle
   UPDATE→SELECT→INSERT tocca esclusivamente righe seed-owned.

2. tests/seeds/discogs-catno-documents.json (Major):
   E2E_CATNO_010 "00946 3 21438 2 9" era classificato kind="catno" ma
   con la regex aggiornata di isNumericBarcode (che ammette spazi come
   separatori fra digit-group per UPC-A spaziati) identifierKind() ora
   lo restituisce come "barcode". Riclassificato kind="barcode",
   canonical="0094632143829" per allinearsi al comportamento reale del
   plugin — altrimenti il seeder scriveva ean='' per un input che in
   produzione finirebbe regolarmente in ean=0094632143829.

3. tests/seeds/discogs-catno-identifiers.json (Critical):
   Stessa inconsistenza sul gemello "EMI spaced numeric ... 13 digits".
   Riclassificato come barcode: canonical digit-stripped + MB query
   barcode:0094632143829. Senza questo fix discogs-catno-seeds.unit.php
   falliva 3 assertion (identifierKind, canonical, buildMusicBrainzQuery)
   sul case #10.

Verifiche: seeds unit 21/21 OK, main unit 38/38 OK, PHPStan clean.
@fabiodalez-dev
Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@storage/plugins/discogs/DiscogsPlugin.php`:
- Around line 106-119: La funzione isCatalogNumber() attualmente classifica
stringhe come "080442957X" come catno anche se sono ISBN-10 validi; update
isCatalogNumber(string $input) per escludere ISBN-10 validi prima di restituire
true: implementa o riusa una funzione helper isValidIsbn10(string $s) che
normalizzi il valore e verifichi checksum (supportando 'X' come check digit) e,
se isValidIsbn10($trimmed) è true, far tornare false da isCatalogNumber;
aggiungi un unit test che verifica che identifierKind('080442957X') non ritorni
'catno' (usa identifierKind or the codepath that maps inputs to 'catno') per
prevenire ricerche Discogs sbagliate.

In `@tests/discogs-catno-documents.spec.js`:
- Around line 7-8: Aggiorna l'intestazione del test in
tests/discogs-catno-documents.spec.js che documenta il comando Playwright:
sostituisci il riferimento a /tmp/run-e2e.sh e aggiungi l'opzione --workers=1
così che il comando esegua il wrapper per credenziali DB/admin e forzi
l'esecuzione seriale; assicurati di modificare la riga contenente il
suggerimento "npx playwright test tests/discogs-catno-documents.spec.js
--config=tests/playwright.config.js" per includere entrambi i vincoli richiesti.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6038b00f-842e-4ab1-b56c-a9e04af57554

📥 Commits

Reviewing files that changed from the base of the PR and between 4f03790 and cfdb854.

📒 Files selected for processing (8)
  • .gitignore
  • app/Controllers/ScrapeController.php
  • storage/plugins/discogs/DiscogsPlugin.php
  • tests/discogs-catno-documents.spec.js
  • tests/discogs-catno-seeds.unit.php
  • tests/discogs-catno.unit.php
  • tests/seeds/discogs-catno-documents.json
  • tests/seeds/discogs-catno-identifiers.json

Comment on lines +106 to +119
private static function isCatalogNumber(string $input): bool
{
$trimmed = trim($input);
$len = strlen($trimmed);
if ($len < 4 || $len > 30) {
return false;
}
if (preg_match('/^[A-Za-z0-9][A-Za-z0-9 .\-_\/]*[A-Za-z0-9]$/', $trimmed) !== 1) {
return false;
}
$hasLetter = preg_match('/[A-Za-z]/', $trimmed) === 1;
$hasDigit = preg_match('/[0-9]/', $trimmed) === 1;
$hasSeparator = preg_match('/[ .\-_\/]/', $trimmed) === 1;
return $hasDigit && ($hasLetter || $hasSeparator);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escludi gli ISBN-10 validi dal rilevamento Cat#.

isCatalogNumber() accetta valori come 080442957X come catno; è un ISBN-10 valido e contiene cifre + X. Nel flusso controller questo può far partire una ricerca Discogs catno=080442957X e fondere metadati musicali in una scheda libro.

🐛 Fix proposto
 private static function isCatalogNumber(string $input): bool
 {
     $trimmed = trim($input);
     $len = strlen($trimmed);
     if ($len < 4 || $len > 30) {
         return false;
     }
+    if (self::isIsbn10($trimmed)) {
+        return false;
+    }
     if (preg_match('/^[A-Za-z0-9][A-Za-z0-9 .\-_\/]*[A-Za-z0-9]$/', $trimmed) !== 1) {
         return false;
     }
@@
     return $hasDigit && ($hasLetter || $hasSeparator);
 }
+
+private static function isIsbn10(string $input): bool
+{
+    $value = strtoupper(trim($input));
+    if (preg_match('/^\d{9}[\dX]$/', $value) !== 1) {
+        return false;
+    }
+
+    $sum = 0;
+    for ($i = 0; $i < 9; $i++) {
+        $sum += (int) $value[$i] * (10 - $i);
+    }
+    $checkDigit = (11 - ($sum % 11)) % 11;
+    $expected = $checkDigit === 10 ? 'X' : (string) $checkDigit;
+
+    return $value[9] === $expected;
+}

Aggiungi anche un test per 080442957X affinché identifierKind() non ritorni catno.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@storage/plugins/discogs/DiscogsPlugin.php` around lines 106 - 119, La
funzione isCatalogNumber() attualmente classifica stringhe come "080442957X"
come catno anche se sono ISBN-10 validi; update isCatalogNumber(string $input)
per escludere ISBN-10 validi prima di restituire true: implementa o riusa una
funzione helper isValidIsbn10(string $s) che normalizzi il valore e verifichi
checksum (supportando 'X' come check digit) e, se isValidIsbn10($trimmed) è
true, far tornare false da isCatalogNumber; aggiungi un unit test che verifica
che identifierKind('080442957X') non ritorni 'catno' (usa identifierKind or the
codepath that maps inputs to 'catno') per prevenire ricerche Discogs sbagliate.

Comment on lines +7 to +8
* Playwright. Run with:
* npx playwright test tests/discogs-catno-documents.spec.js --config=tests/playwright.config.js
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Aggiorna il comando E2E con wrapper e worker seriale.

Questa spec usa credenziali DB/admin e fixture persistenti: il comando documentato deve passare da /tmp/run-e2e.sh e includere --workers=1.

🧪 Fix proposto
- *   npx playwright test tests/discogs-catno-documents.spec.js --config=tests/playwright.config.js
+ *   /tmp/run-e2e.sh npx playwright test tests/discogs-catno-documents.spec.js --config=tests/playwright.config.js --workers=1

As per coding guidelines, tests/**: "I test E2E richiedono /tmp/run-e2e.sh per credenziali DB/admin" e "--workers=1 obbligatorio per esecuzione seriale."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Playwright. Run with:
* npx playwright test tests/discogs-catno-documents.spec.js --config=tests/playwright.config.js
* Playwright. Run with:
* /tmp/run-e2e.sh npx playwright test tests/discogs-catno-documents.spec.js --config=tests/playwright.config.js --workers=1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/discogs-catno-documents.spec.js` around lines 7 - 8, Aggiorna
l'intestazione del test in tests/discogs-catno-documents.spec.js che documenta
il comando Playwright: sostituisci il riferimento a /tmp/run-e2e.sh e aggiungi
l'opzione --workers=1 così che il comando esegua il wrapper per credenziali
DB/admin e forzi l'esecuzione seriale; assicurati di modificare la riga
contenente il suggerimento "npx playwright test
tests/discogs-catno-documents.spec.js --config=tests/playwright.config.js" per
includere entrambi i vincoli richiesti.

@fabiodalez-dev fabiodalez-dev merged commit cee70e5 into main Apr 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant