Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Try to make exact string matches on country/name/address before dedupe #1549

Closed
jwalgran opened this issue Dec 8, 2021 · 0 comments
Closed
Assignees

Comments

@jwalgran
Copy link
Contributor

jwalgran commented Dec 8, 2021

Overview

Try to do a string match (using "clean" strings that remove punctuation ,etc) before using dedupe.

There are two cases in which this would be helpful

  • Avoiding the situation where submitting an updated list results in POTENTIAL_MATCH es because there are multiple matches with about 80% quality
  • The situation where submitting the same facility data to the API twice ends up generating 2 new facilities rather than 1 facility and one match. This should not happen, but has appeared in testing and may be caused by each web worker having its own dedupe model.

Describe the solution you'd like

  • Addclean_name and clean_address on the FacilityListItem
    • Index these new columns together with country_code
    • Add a RunPython block to the migration to backfill clean_name and clean_address
      • Add an empty reverse function
  • Populate clean_name and clean_address these during parsing using existing clean function use by the existing matching code
  • Before matching list or single item with dedup, look for exact item matches
    • Not rejected or pending. We are looking for items with a good match to a facility
    • There could be multiple items that match. Choose the "best" one
      • Active
      • Prefer same contributor, not require
      • Newest
    • Try to make exact matching produce similar output to dedupe so that we can minimize the changes to save_match_details.
      • We do want to record that we skipped deduce in the processing_results
    • Exclude items that are successfully "exact matched" from the follow up dedupe match
  • Single item match has a different code path. Use helper functions.
@jwalgran jwalgran changed the title DRAFT: Try to make exact string matches on country/name/address before Try to make exact string matches on country/name/address before Dec 22, 2021
@TaiWilkin TaiWilkin self-assigned this Dec 27, 2021
@jwalgran jwalgran self-assigned this Jan 6, 2022
@jwalgran jwalgran changed the title Try to make exact string matches on country/name/address before Try to make exact string matches on country/name/address before dedupe Jan 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants