Skip to content

pinformatics/rlErrorGeneratoR

Repository files navigation

rlErrorGeneratoR

A package to introduce errors into a dataset and create a dirty version of it, enabling us to benchmark record linkage frameworks at different error rates.

This open source code help ut to infuse different levels of data heterogeneity most often found in record linkage projects (duplicates, twins, suffixes, day-month swaps, first-last name swaps, nick names, last name change due to marriages, typos on names and dates) into any given data. The system allows the user to control the overall rate of heterogeneity in the data making it easy to run systematic controlled experiments.

For more details See below. Ilangovan, Gurudev (2019). Benchmarking the Effectiveness and Efficiency of Machine Learning Algorithms for Record Linkage. Master's thesis, Texas A&M University. Available electronically from https : / /hdl .handle .net /1969 .1 /186390.

https://oaktrust.library.tamu.edu/handle/1969.1/186390

About

A package to introduce errors into a dataset and create a dirty version of it, enabling us to benchmark record linkage frameworks at different error rates.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages