This R-package contains functions that I use for the cleaning and transforming of "dirty" data.
Please install package remotes. After successful installation, type remotes::install_github("joheli/kungfu")
. Alternatively, download the most recent tagged compressed package from tags and install from the R command line by typing install.packages("vX.X.X.tar.gz"", repos = NULL, type ="source")
(where vX.X.X
is to be replaced by the most recent version).
Presently, the package contains the following functions (in alphabetic order):
cleaner
: removes duplicates in adata.frame
df_pattern_subset
: subsets adata.frame
given two regex patterns marking the upper left and lower right corners of the returneddata.frame
.dfilter
: filters a vector of type numeric, integer, Date, or POSIXt, in a fashion that removes entries exceeding a user-specified distance from other values; e.g. "dfiltering"c(1,2,10)
would remove10
, if argumentmax.dist
is5
(as10 - 2 > 5
); similarly, an argumentmin.dist
can be specified to enforce a minimal distance between entries.dlabel
: wrapsdfilter
to label occurrences satisfying specified distance criteria (see functiondfilter
) in a data frame; functionbc_contamination
is merely a customized call ofdlabel
, designed to scan blood culture results for possible contamination.pattern_join
: joins two tables based on regex patterns; it is similar to functionregex_join
in package fuzzyjoin, which I discovered only after writingpattern_join
postgresql_uploader
: uploads adata.frame
into an existing PostgreSQL tablerbinder
: function for importing and joining of multiple csv-like files with identical headersseamless
: converts a table of intervals into a "seamless" succession of intervalssimilarity_join
: joins two tables based on string similarity to a reference (e.g. dictionary of words)
Please check out packages fuzzyjoin and janitor.
Please use help(*function*)
or ?*function*
to access the help pages of above functions after installation.