Skip to content

historical-record-linking/matching-codes

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
ABE
 
 
EM
 
 
 
 

Historical Record Linking

Image1

About

This repository provides codes and documentation required to implement historical linking methods. Details of methods can be found in the paper: Abramitzky, Boustan, Eriksson, Feigenbaum, and Pérez (2020). "Automated Linking of Historical Data".

Please cite this paper if you use code from this folder.

For more information on these methods and our self-replication exercises, check our website.

Notes on methods in repository

Currently, the repository provides codes for two such methods:

  1. The ABE fully automated approach: This approach is a fully automated method for linking historical datasets (e.g. complete-count Censuses) by first name, last name and age. The approach was first developed by Ferrie (1996) and adapted and scaled for the computer by Abramitzky, Boustan and Eriksson (2012, 2014, 2017). Because names are often misspelled or mistranscribed, our approach suggests testing robustness to alternative name matching (using raw names, NYSIIS standardization, and Jaro-Winkler distance). To reduce the chances of false positives, our approach suggests testing robustness by requiring names to be unique within a five year window and/or requiring the match on age to be exact.

  2. A fully automated probabilistic approach (EM): This approach (Abramitzky, Mill, and Perez 2019) suggests a fully automated probabilistic method for linking historical datasets. We combine distances in reported names and ages between each two potential records into a single score, roughly corresponding to the probability that both records belong to the same individual. We estimate these probabilities using the Expectation-Maximization (EM) algorithm, a standard technique in the statistical literature. We suggest a number of decision rules that use these estimated probabilities to determine which records to use in the analysis.

Contact

If you have any issues or feedback on these codes, you can create an issue or directly contact us: ranabr@stanford.edu (Ran Abramitzky), lboustan@princeton.edu (Leah Boustan), kaeriksson@ucdavis.edu (Katherine Eriksson).

Last Updated: April 23, 2020

About

Codes required to implement various approaches to historical record linking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published