Skip to content

skvrnami/rimr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rimr

🚧

Build Status

codecov

The rimr package serves to help with deterministic record linkage.

Installation

You can install rimr from github with:

# install.packages("devtools")
devtools::install_github("skvrnami/rimr")

Example

See vignette for example.

TO DO:

Matching

based on:

  • exact equality
  • equality with tolerance (x - t <= x <= x + t)
  • equality with tolerance with specified direction (e.g. x >= x & x <= x + t or x <= x & x <= x - t)
  • equality with tolerance based on string distance
  • higher/lower than
  • higher than or equal to/lower than or equal to
  • contains string
  • contains string separated by word boundaries (for matching change in women's names after marriage + after divorce)
  • removing duplicites based on finding the most similar person (exact match on specified columns)
  • allow methods other than strict equality for finding the most similar person (when there are more than 1 similar person, e.g. in the case of description of a person - using Jaccard distance for comparison of words in occupation of a person, see tests for example)

Workflow

  • Find all similar between 2 datasets
  • Find records not contained in the source dataset
  • Append missing records to the output
  • Find all similar between sequence of datasets

Output

  • Create panel data
  • Joint dataset (source dataset enhanced by additional variables from the target dataset)

Other

  • Check dbplyr for storing datasets
  • Check algorithmic complexity (could be source grouped?)

Conceptual

  • specify predicate before running find_all_similar function
  • connecting filters with OR?

Diss-related

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published