Skip to content

frenkowski/ML_aided_RecordLinkage

Repository files navigation

Machine Learning aided Record Linkage

Group Components

Abstract

Record Linkage is the process of finding records in one or more datasets that refer to the same entity across different data sources. Traditionally, it is done by applying comparison rules between pairs of attributes from each dataset. In this project we investigate some possible Machine Learning applications to Record Linkage (and Data deduplication), in order to figure out their viability.

Project structure

We provide:

  • A Jupyter Notebook containing our project (code + step by step comments and explaination);
  • A PDF relation we obtained from the notebook (we recommend just using the notebook since it might be easier to read);
  • The slides to be shown during the project presentation;
  • The datasets used are integrated into the library and therefore not provided, we give an in-depth description for each one in the notebook.

About

In this project we investigate some possible Machine Learning applications to Record Linkage (and Data deduplication), in order to figure out their viability.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published