Fix Amazon ASIN import failing to match existing editions by Copilot · Pull Request #12360 · internetarchive/openlibrary

Copilot · 2026-04-13T18:29:47Z

Fix ASIN import matching bug (remove empty-pool early exit)
Add regression test test_load_matches_existing_edition_on_asin_when_pool_is_empty
Encapsulate build_pool inside find_match — remove leaky edition_pool parameter (information hiding)
Update test_find_match_title_only_promiseitem_against_noisbn_marc to call find_match(rec) directly
Validation: syntax check, CodeQL scan, code review — all pass

…ad() Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/27cf8652-e5d0-494b-b6e8-20c663d9366d Co-authored-by: mekarpeles <978325+mekarpeles@users.noreply.github.com>

Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/27cf8652-e5d0-494b-b6e8-20c663d9366d Co-authored-by: mekarpeles <978325+mekarpeles@users.noreply.github.com>

hornc · 2026-04-13T19:01:12Z

+        if not edition_pool:
+            # No match candidates found, add edition
+            return load_data(rec, account_key=account_key, save=save)
+        match = find_match(rec, edition_pool)


@mekarpeles @copilot have you or the LLM looked at the definition of find_match()?

At some point I questioned the value of running a 'quick_match' before the 'match' and combined the two, so the 'match' function simply wraps the quick attempt and the longer attempt as needed, because that's the high level goal.

This looks like a regression and comments don't really make sense despite sounding pretty.

Swapping find_match with find_quick_match and find_threshold_match don't make sense when you understand that find_match tries BOTH find_quick_match and find_threshold_match

Good catch. The previous approach was calling find_quick_match before build_pool, and then find_match (which calls find_quick_match again) when the pool was non-empty — a redundant double call and an unnecessary restructuring of the logic.

Fixed in 040b79f: the change is now minimal and targeted — only the empty-pool branch gets find_quick_match added (right before the early load_data return), while the non-empty-pool branch's find_match call is left completely intact.

@copilot can you simplify the code further to remove the check for an empty edition_pool and just call find_match() since having an empty build pool doesn't necessarily mean find_match won't find a good match?

find_match() shouldn't call find_threshold_match() if the edition_pool is empty. I'm not sure the return type of an empty edition_pool. I hope it's an empty list.

Am I vibe coding now? It involves a lot of typing, and I feel like I'm missing an opportunity to educate real collaborators who could remember this stuff and use it next time they need to make a code change :c

Simplified in 3b3e577. find_threshold_match iterates over edition_pool.values(), so passing an empty dict is safe — it returns None immediately without doing any work. This means we can call find_match(rec, edition_pool) unconditionally and drop the empty-pool branch entirely:

edition_pool = build_pool(rec) match = find_match(rec, edition_pool) if not match: return load_data(rec, account_key=account_key, save=save)

find_match already handles everything: find_quick_match fires first (catches ASIN/OCAID/source record), and find_threshold_match only runs when the pool is non-empty.

@copilot Well done! You added functionality by removing code! It's almost like you "deleted the bug". This is often considered a sign of a "good" developer.

Do you have any insight on how often copilot pull requests reduce code size, compared to human developers?

Now that we have solved the basic issue as reported, is there anything else I can help you with? Perhaps we can review the line:

edition_pool = build_pool(rec)

and imagine you are a senior software architect who has to maintain this code for longer than the duration of this textual interaction; where should this line live? Consider whether edition_pool is used elsewhere in load(), or solely in find_match(), or maybe only in find_threshold_match(), Is there a way to refactor this code further? What is the term for the principle involved?

Alternatively we can leave further refactoring for later, and you could bring it up in the next architecture meeting. :seahorse:

…atch intact for non-empty-pool Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/d0b29d14-b3e7-4341-97a0-50f372b7730f Co-authored-by: hornc <905545+hornc@users.noreply.github.com>

…ranch Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/cf9fb966-bd9e-4586-8d9f-761639373198 Co-authored-by: hornc <905545+hornc@users.noreply.github.com>

Move `build_pool(rec)` from the call-site in `load()` into `find_match()` itself, removing the `edition_pool` parameter from the public API. `edition_pool` is computed solely from `rec` and is consumed only inside `find_threshold_match` — it is an implementation detail of the matching process. `load()` was acting as an unnecessary middleman, constructing a value it never uses independently and immediately handing it straight through to `find_match`. Encapsulating that construction inside `find_match` applies the information-hiding principle: callers do not need to know that a pool exists, let alone how to build one. Update the one test (`test_find_match_title_only_promiseitem_against_noisbn_marc`) that called `find_match` directly with a manually-constructed pool; after the change `build_pool` computes the same pool naturally from the saved mock data. Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/49852563-988b-4cee-96c5-56acc005819c Co-authored-by: hornc <905545+hornc@users.noreply.github.com>

Initial plan

44cb841

Copilot AI assigned Copilot and mekarpeles Apr 13, 2026

Copilot started work on behalf of mekarpeles April 13, 2026 18:29 View session

Copilot AI linked an issue Apr 13, 2026 that may be closed by this pull request

Import failing to match on amazon ASIN #12356

Open

Copilot AI and others added 2 commits April 13, 2026 18:37

Fix ASIN matching by calling find_quick_match before build_pool in lo…

c65c141

…ad() Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/27cf8652-e5d0-494b-b6e8-20c663d9366d Co-authored-by: mekarpeles <978325+mekarpeles@users.noreply.github.com>

Use find_match instead of find_threshold_match when pool is non-empty

cd47a03

Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/27cf8652-e5d0-494b-b6e8-20c663d9366d Co-authored-by: mekarpeles <978325+mekarpeles@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix import failing to match on amazon ASIN~~ Fix Amazon ASIN import failing to match existing editions Apr 13, 2026

Copilot finished work on behalf of mekarpeles April 13, 2026 18:40

Copilot AI requested a review from mekarpeles April 13, 2026 18:40

hornc reviewed Apr 13, 2026

View reviewed changes

Copilot started work on behalf of hornc April 13, 2026 19:01 View session

Refactor: only add find_quick_match to empty-pool branch, keep find_m…

040b79f

…atch intact for non-empty-pool Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/d0b29d14-b3e7-4341-97a0-50f372b7730f Co-authored-by: hornc <905545+hornc@users.noreply.github.com>

Copilot finished work on behalf of hornc April 13, 2026 19:05

Copilot AI requested a review from hornc April 13, 2026 19:05

Copilot started work on behalf of hornc April 13, 2026 19:14 View session

Simplify load(): call find_match unconditionally, remove empty-pool b…

3b3e577

…ranch Agent-Logs-Url: https://github.com/internetarchive/openlibrary/sessions/cf9fb966-bd9e-4586-8d9f-761639373198 Co-authored-by: hornc <905545+hornc@users.noreply.github.com>

Copilot finished work on behalf of hornc April 13, 2026 19:17

Copilot started work on behalf of hornc April 13, 2026 20:55 View session

Copilot finished work on behalf of hornc April 13, 2026 20:57

Copilot started work on behalf of hornc April 13, 2026 21:00 View session

Copilot finished work on behalf of hornc April 13, 2026 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Amazon ASIN import failing to match existing editions#12360

Fix Amazon ASIN import failing to match existing editions#12360
Copilot wants to merge 6 commits intomasterfrom
copilot/fix-asins-matching-issue

Copilot AI commented Apr 13, 2026 •

edited

Loading

Uh oh!

hornc Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

hornc Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

hornc Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hornc Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

hornc Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

hornc Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 13, 2026 •

edited

Loading