Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Identify exact matches pre-dedupe #1568

Merged
merged 1 commit into from
Jan 6, 2022
Merged

Conversation

TaiWilkin
Copy link
Contributor

@TaiWilkin TaiWilkin commented Jan 5, 2022

Overview

When matching facilities, look for exact matches prior to entering
the dedupe process. Exact matches (for name, address, and country)
can be automatically matched, and don't need to enter the dedupe
process.

Adds new fields to FacilityListItems to store the cleaned name and cleaned address, allowing efficient filtering for exact matches.

If multiple exact matches exist, we prefer first the active list items; then items submitted by the same contributor; then finally the most recent items.

Connects #1549

Demo

Screen Shot 2022-01-05 at 9 27 44 AM

Notes

PPE handling was not implemented for exact-match items.

Testing Instructions

{
"name": "Alif Embroidery Village Ltd.",
"address": "Bangobandhu Road, Tangabari, Ashulia, Savar, Dhaka 1341",
"country": "Bangladesh"
}
  • Confirm that the item has matched. Note the facility id. In an incognito browser, login as c1@example.com and navigate to the new FacilityListItem in the Django Admin. In the processing results, you should see exact_match: true.
  • As c1@example.com, navigate to adjust facility matches. Transfer the match you just created from the facility it matched to to any other facility.
  • Resubmit the same facility via the Swagger API. It should now match to the facility you transferred the FacilityListItem to.
  • Deactivate both newly created matches. Submit the same facility details via CSV.
    duplicates_test.csv
  • Process the list:
./scripts/manage batch_process --list-id 16 --action parse
./scripts/manage batch_process --list-id 16 --action geocode
./scripts/manage batch_process --list-id 16 --action match
  • Confirm that the newly uploaded item matched to the original facility via exact match.

Checklist

  • fixup! commits have been squashed
  • CI passes after rebase
  • CHANGELOG.md updated with summary of features or fixes, following Keep a Changelog guidelines

Copy link
Contributor

@jwalgran jwalgran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good and I was able to run through the test successfully. We will want to pay attention to how this performs on staging when we have 2 orders of magnitude more list items.

Comment on lines +533 to +534
matches = [make_pending_match(item_id, m.get('facility_id'))
for m in exact_matches]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running through the testing instructions, the last list item ended up with 3 match rows. I originally thought this was a problem, but I confirmed that this matches the behavior of the normal deduple process where we would create matches for both a 60% and 90% confidence match but the high quality match would be the AUTOMATIC winner. 👍

openapparelregistry=# select id, results, status, facility_id, facility_list_item_id from api_facilitymatch where facility_list_item_id = 935;
 id  |                 results                  |  status   |   facility_id   | facility_list_item_id
-----+------------------------------------------+-----------+-----------------+-----------------------
 937 | {"match_type": "multiple_exact_matches"} | AUTOMATIC | BD202200698XDZC |                   935
 938 | {}                                       | PENDING   | US202200661FFPF |                   935
 939 | {}                                       | PENDING   | US202200661FFPF |                   935
(3 rows)

@jwalgran jwalgran assigned TaiWilkin and unassigned jwalgran Jan 6, 2022
When matching facilities, look for exact matches prior to entering
the dedupe process. Exact matches (for name, address, and country)
can be automatically matched, and don't need to enter the dedupe
process.
@TaiWilkin
Copy link
Contributor Author

Thank you for reviewing!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants