Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule-based automatic entity matching #139

Closed
pudo opened this issue Oct 1, 2021 · 2 comments
Closed

Rule-based automatic entity matching #139

pudo opened this issue Oct 1, 2021 · 2 comments
Labels
pipeline Data pipeline improvements

Comments

@pudo
Copy link
Member

pudo commented Oct 1, 2021

The dedupe process is super dull because there's quite a few "obvious" matches. Maybe we can have rules that automatically lead to a positive judgment:

  • Person: Same name, same nationality, same birthday -> yes?
  • Person: Similar name, same nationality, same birthday, same ID number -> yes
  • Organisation: Same name, same country -> yes
  • Organisation: Same identifier, similar name, same country -> yes
@pudo pudo added the pipeline Data pipeline improvements label Oct 1, 2021
@pudo
Copy link
Member Author

pudo commented Apr 12, 2022

This now exists, via regression (not rules), in nomenklatura and can be adapted into OpenSanctions. I just need to figure out the world's most conservative threshold for auto-matching. 0.98 or so, by the looks of it.

@pudo
Copy link
Member Author

pudo commented Aug 8, 2022

No concrete action on this, seems to be working OK.

@pudo pudo closed this as completed Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pipeline Data pipeline improvements
Projects
None yet
Development

No branches or pull requests

1 participant