Explicit Regex Matching Implemented As Model-like Objects
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
man
tests
.Rbuildignore
.gitignore
.travis.yml
DESCRIPTION
LICENSE
NAMESPACE
README.Rmd
README.md
appveyor.yml
kwm.Rproj

README.md

kwm

lifecycle Travis-CI Build Status AppVeyor Build Status

kwm provides very simiple wrapper functions to produce KeyWord Models that produce classification predictions based on explicit lists of regular expression pattern matches. By supplying a generic prediction function for such lists, it is easy to compare the performance of very simple regex matching to other, more complicated text classification models within the same pipeline.

Installation

You can install kwm from github with:

# install.packages("devtools")
devtools::install_github("mdlincoln/kwm")

Example

library(kwm)

month_df <- data.frame(month = month.name, stringsAsFactors = FALSE)

# Locate all matches that INCLUDE either "a" or "e" but EXCLUDE any ending in "r"
month_model <- kwm(include = c("a", "e"), exclude = "r$", varname = "month")

predict(month_model, newdata = month_df, return_names = TRUE)
#>   January  February     March     April       May      June      July 
#>      TRUE      TRUE      TRUE     FALSE      TRUE      TRUE     FALSE 
#>    August September   October  November  December 
#>     FALSE     FALSE     FALSE     FALSE     FALSE

# You can pass options to the underlying search function as well
caseless_month_model <- kwm(include = c("a", "e"), exclude = "r$", 
                            varname = "month", 
                            search_opts = list(ignore_case = TRUE))

predict(caseless_month_model, newdata = month_df, return_names = TRUE)
#>   January  February     March     April       May      June      July 
#>      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE     FALSE 
#>    August September   October  November  December 
#>      TRUE     FALSE     FALSE     FALSE     FALSE