prepare CRAN release, minor improvements

* If `model.term` does not exist in models, `spread_coef()` now prints the name of non-existing coefficients. * `find_var()` gets a `fuzzy`-argument to enable fuzzy-matching for search pattern. * Argument `part.dist.match` in `str_pos()` caused an error when being larger than 0.
strengejacke · Apr 6, 2017 · 52da8c4 · 52da8c4
1 parent 9172719
commit 52da8c4
Show file tree

Hide file tree

Showing 35 changed files with 1,540 additions and 1,465 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -2,8 +2,8 @@ Package: sjmisc
 Type: Package
 Encoding: UTF-8
 Title: Data Transformation and Labelled Data Utility Functions
-Version: 2.3.1.9000
-Date: 2017-03-08
+Version: 2.4.0
+Date: 2017-04-06
 Author: Daniel Lüdecke <d.luedecke@uke.de>
 Maintainer: Daniel Lüdecke <d.luedecke@uke.de>
 Description: Collection of miscellaneous utility functions (especially intended
@@ -15,7 +15,8 @@ Description: Collection of miscellaneous utility functions (especially intended
     vectors into factors (and vice versa), or to deal with multiple declared
     missing values etc. 2) Data transformation tasks like recoding,
     dichotomizing or grouping variables, setting and replacing missing values.
-    The data transformation functions also support labelled data.
+    The data transformation functions also support labelled data, and all integrate
+    seamlessly into a 'tidyverse'-workflow.
 License: GPL-3
 Depends:
     R (>= 3.2),

diff --git a/NEWS b/NEWS
@@ -1,4 +1,4 @@
-Version 2.3.1.9000
+Version 2.4.0
 ------------------------------------------------------------------------------
 General:
 * Argument `value` in `set_na()` is deprecated. Please use `na` instead.
@@ -13,13 +13,16 @@ New functions:
 * `%nin%` as complement to `%in%`.
 
 Changes to functions:
-* Functions `rec()`, `dicho()`, `center()`, `std()`, `recode_to()` and `group_var()` get a `append`-argument, to optionally return the original data including the transformed variables as new columns.
-* `var_labels()` and `var_rename()` now give a warning if specified variables to rename or relabel do not exist in the data frame. Non-matching variables are removed.
+* Functions `rec()`, `dicho()`, `center()`, `std()`, `recode_to()` and `group_var()` get an `append`-argument, to optionally return the original data including the transformed variables as new columns.
+* `var_labels()` and `var_rename()` now give a warning if specified variables to rename or relabel do not exist in the data frame. Non-matching variables are ignored.
+* If `model.term` does not exist in models, `spread_coef()` now prints the name of non-existing coefficients.
+* `find_var()` gets a `fuzzy`-argument to enable fuzzy-matching for search pattern.
 
 Bug fixes:
 * `remove_empty_cols()` returned an empty data frame, when input data frame had no empty columns.
 * `remove_empty_rows()` returned an empty data frame, when input data frame had no empty rows.
-* `add_columns()` and `repair_columns()` in some cases coerced data frames of class `data.frame` with only one column into a vector, which gave an error when binding columns.
+* `add_columns()` and `replace_columns()` in some cases coerced data frames of class `data.frame` with only one column into a vector, which gave an error when binding columns.
+* Argument `part.dist.match` in `str_pos()` caused an error when being larger than 0.
 
 Version 2.3.1
 ------------------------------------------------------------------------------

diff --git a/NEWS.md b/NEWS.md
diff --git a/R/find_var.R b/R/find_var.R
@@ -1,157 +1,181 @@
-#' @title Find variable by name or label
-#' @name find_var
-#'
-#' @description This functions finds variables in a data frame, which variable
-#'                names or variable (and value) label attribute match a specific
-#'                pattern. Regular expression for the pattern is supported.
-#'
-#' @param data A data frame.
-#' @param pattern Character string to be matched in \code{data}. May also be a
-#'          character vector of length > 1 (see 'Examples'). \code{pattern} is
-#'          searched for in column names and variable label attributes of
-#'          \code{data} (see \code{\link{get_label}}). \code{pattern} might also
-#'          be a regular-expression object, as returned by \code{\link[stringr]{regex}},
-#'          or any of \pkg{stringr}'s supported \code{\link[stringr]{modifiers}}.
-#' @param ignore.case Logical, whether matching should be case sensitive or not.
-#' @param search Character string, indicating where \code{pattern} is sought.
-#'          Use one of following options:
-#'          \describe{
-#'            \item{\code{"name_label"}}{The default, searches for \code{pattern} in
-#'                  variable names and variable labels.}
-#'            \item{\code{"name_value"}}{Searches for \code{pattern} in
-#'                  variable names and value labels.}
-#'            \item{\code{"label_value"}}{Searches for \code{pattern} in
-#'                  variable and value labels.}
-#'            \item{\code{"name"}}{Searches for \code{pattern} in
-#'                  variable names.}
-#'            \item{\code{"label"}}{Searches for \code{pattern} in
-#'                  variable labels}
-#'            \item{\code{"value"}}{Searches for \code{pattern} in
-#'                  value labels.}
-#'            \item{\code{"all"}}{Searches for \code{pattern} in
-#'                  variable names, variable and value labels.}
-#'          }
-#' @param as.df Logical, if \code{TRUE}, a data frame with matching variables
-#'                is returned (instead of their column indices).
-#' @param as.varlab Logical, if \code{TRUE}, not only column indices, but also
-#'                variables labels of matching variables are returned (as
-#'                data frame).
-#'
-#' @return A named vector with column indices of found variables (variable names
-#'           are used as names-attribute); if \code{as.df = TRUE}, a tibble
-#'           with found variables; or, if \code{as.varlab = TRUE}, a tibble
-#'           with three columns: column number, variable name and variable label.
-#'
-#' @details This function searches for \code{pattern} in \code{data}'s column names
-#'            and - for labelled data - in all variable and value labels of \code{data}'s
-#'            variables (see \code{\link{get_label}} for details on variable labels and
-#'            labelled data). Search is performed using the
-#'            \code{\link[stringr]{str_detect}} functions; hence, regular
-#'            expressions are supported as well, by simply using
-#'            \code{pattern = stringr::regex(...)}.
-#'
-#' @examples
-#' data(efc)
-#'
-#' # find variables with "cop" in variable name
-#' find_var(efc, "cop")
-#'
-#' # return tibble with matching variables
-#' find_var(efc, "cop", as.df = TRUE)
-#'
-#' # or return "summary"-tibble with matching variables
-#' # and their variable labels
-#' find_var(efc, "cop", as.varlab = TRUE)
-#'
-#' # find variables with "dependency" in names and variable labels
-#' find_var(efc, "dependency")
-#' get_label(efc$e42dep)
-#'
-#' # find variables with "level" in names and value labels
-#' res <- find_var(efc, "level", search = "name_value", as.df = TRUE)
-#' res
-#' get_labels(res, attr.only = FALSE)
-#'
-#' # use sjPlot::view_df() to view results
-#' \dontrun{
-#' library(sjPlot)
-#' view_df(res)}
-#'
-#' @importFrom stringr regex coll
-#' @importFrom tibble as_tibble
-#' @export
-find_var <- function(data,
-                     pattern,
-                     ignore.case = TRUE,
-                     search = c("name_label", "name_value", "label_value", "name", "label", "value", "all"),
-                     as.df = FALSE,
-                     as.varlab = FALSE) {
-  # check valid args
-  if (!is.data.frame(data)) {
-    stop("`data` must be a data frame.", call. = F)
-  }
-
-  # match args
-  search <- match.arg(search)
-
-  pos1 <- pos2 <- pos3 <- c()
-
-  # search for pattern in variable names
-  if (search %in% c("name", "name_label", "name_value", "all")) {
-    # check variable names
-    if (inherits(pattern, "regex"))
-      pos1 <- which(stringr::str_detect(colnames(data), pattern))
-    else
-      pos1 <- which(stringr::str_detect(colnames(data), stringr::coll(pattern, ignore_case = ignore.case)))
-  }
-
-
-  # search for pattern in variable labels
-  if (search %in% c("label", "name_label", "label_value", "all")) {
-    # get labels and variable names
-    labels <- get_label(data)
-
-    # check labels
-    if (inherits(pattern, "regex"))
-      pos2 <- which(stringr::str_detect(labels, pattern))
-    else
-      pos2 <- which(stringr::str_detect(labels, stringr::coll(pattern, ignore_case = ignore.case)))
-  }
-
-  # search for pattern in value labels
-  if (search %in% c("value", "name_value", "label_value", "all")) {
-    labels <- get_labels(data, attr.only = F)
-
-    # check value labels with regex
-    if (inherits(pattern, "regex")) {
-      pos3 <- which(unlist(lapply(labels, function(x) {
-        any(stringr::str_detect(x, pattern))
-      })))
-    } else {
-      pos3 <- which(unlist(lapply(labels, function(x) {
-        any(stringr::str_detect(x, stringr::coll(pattern, ignore_case = ignore.case)))
-      })))
-    }
-  }
-
-  # get unqiue variable indices
-  pos <- unique(c(pos1, pos2, pos3))
-
-  # return data frame?
-  if (as.df) {
-    return(tibble::as_tibble(data[, pos]))
-  }
-
-  # return variable labels?
-  if (as.varlab) {
-    return(tibble::tibble(
-      col.nr = pos,
-      var.name = colnames(data)[pos],
-      var.label = get_label(data[, pos], def.value = colnames(data)[pos])
-    ))
-  }
-
-  # use column names
-  names(pos) <- colnames(data)[pos]
-  pos
-}
+#' @title Find variable by name or label
+#' @name find_var
+#'
+#' @description This functions finds variables in a data frame, which variable
+#'                names or variable (and value) label attribute match a specific
+#'                pattern. Regular expression for the pattern is supported.
+#'
+#' @param data A data frame.
+#' @param pattern Character string to be matched in \code{data}. May also be a
+#'          character vector of length > 1 (see 'Examples'). \code{pattern} is
+#'          searched for in column names and variable label attributes of
+#'          \code{data} (see \code{\link{get_label}}). \code{pattern} might also
+#'          be a regular-expression object, as returned by \code{\link[stringr]{regex}},
+#'          or any of \pkg{stringr}'s supported \code{\link[stringr]{modifiers}}.
+#' @param ignore.case Logical, whether matching should be case sensitive or not.
+#' @param search Character string, indicating where \code{pattern} is sought.
+#'          Use one of following options:
+#'          \describe{
+#'            \item{\code{"name_label"}}{The default, searches for \code{pattern} in
+#'                  variable names and variable labels.}
+#'            \item{\code{"name_value"}}{Searches for \code{pattern} in
+#'                  variable names and value labels.}
+#'            \item{\code{"label_value"}}{Searches for \code{pattern} in
+#'                  variable and value labels.}
+#'            \item{\code{"name"}}{Searches for \code{pattern} in
+#'                  variable names.}
+#'            \item{\code{"label"}}{Searches for \code{pattern} in
+#'                  variable labels}
+#'            \item{\code{"value"}}{Searches for \code{pattern} in
+#'                  value labels.}
+#'            \item{\code{"all"}}{Searches for \code{pattern} in
+#'                  variable names, variable and value labels.}
+#'          }
+#' @param as.df Logical, if \code{TRUE}, a data frame with matching variables
+#'                is returned (instead of their column indices).
+#' @param as.varlab Logical, if \code{TRUE}, not only column indices, but also
+#'                variables labels of matching variables are returned (as
+#'                data frame).
+#' @param fuzzy Logical, if \code{TRUE}, "fuzzy matching" (partial and
+#'          close distance matching) will be used to find \code{pattern}
+#'          in \code{data} if no exact match was found. \code{\link{str_pos}}
+#'          is used for fuzzy matching.
+#'
+#' @return A named vector with column indices of found variables (variable names
+#'           are used as names-attribute); if \code{as.df = TRUE}, a tibble
+#'           with found variables; or, if \code{as.varlab = TRUE}, a tibble
+#'           with three columns: column number, variable name and variable label.
+#'
+#' @details This function searches for \code{pattern} in \code{data}'s column names
+#'            and - for labelled data - in all variable and value labels of \code{data}'s
+#'            variables (see \code{\link{get_label}} for details on variable labels and
+#'            labelled data). Search is performed using the
+#'            \code{\link[stringr]{str_detect}} functions; hence, regular
+#'            expressions are supported as well, by simply using
+#'            \code{pattern = stringr::regex(...)}.
+#'
+#' @examples
+#' data(efc)
+#'
+#' # find variables with "cop" in variable name
+#' find_var(efc, "cop")
+#'
+#' # return tibble with matching variables
+#' find_var(efc, "cop", as.df = TRUE)
+#'
+#' # or return "summary"-tibble with matching variables
+#' # and their variable labels
+#' find_var(efc, "cop", as.varlab = TRUE)
+#'
+#' # find variables with "dependency" in names and variable labels
+#' find_var(efc, "dependency")
+#' get_label(efc$e42dep)
+#'
+#' # find variables with "level" in names and value labels
+#' res <- find_var(efc, "level", search = "name_value", as.df = TRUE)
+#' res
+#' get_labels(res, attr.only = FALSE)
+#'
+#' # use sjPlot::view_df() to view results
+#' \dontrun{
+#' library(sjPlot)
+#' view_df(res)}
+#'
+#' @importFrom stringr regex coll
+#' @importFrom tibble as_tibble
+#' @export
+find_var <- function(data,
+                     pattern,
+                     ignore.case = TRUE,
+                     search = c("name_label", "name_value", "label_value", "name", "label", "value", "all"),
+                     as.df = FALSE,
+                     as.varlab = FALSE,
+                     fuzzy = FALSE) {
+  # check valid args
+  if (!is.data.frame(data)) {
+    stop("`data` must be a data frame.", call. = F)
+  }
+
+  # match args
+  search <- match.arg(search)
+
+  pos1 <- pos2 <- pos3 <- c()
+
+  # search for pattern in variable names
+  if (search %in% c("name", "name_label", "name_value", "all")) {
+    # check variable names
+    if (inherits(pattern, "regex"))
+      pos1 <- which(stringr::str_detect(colnames(data), pattern))
+    else
+      pos1 <- which(stringr::str_detect(colnames(data), stringr::coll(pattern, ignore_case = ignore.case)))
+
+    # if nothing found, find in near distance
+    if (sjmisc::is_empty(pos1) && fuzzy && !inherits(pattern, "regex")) {
+      pos1 <- str_pos(search.string = colnames(data), find.term = pattern, part.dist.match = 1)
+    }
+  }
+
+
+  # search for pattern in variable labels
+  if (search %in% c("label", "name_label", "label_value", "all")) {
+    # get labels and variable names
+    labels <- get_label(data)
+
+    # check labels
+    if (inherits(pattern, "regex"))
+      pos2 <- which(stringr::str_detect(labels, pattern))
+    else
+      pos2 <- which(stringr::str_detect(labels, stringr::coll(pattern, ignore_case = ignore.case)))
+
+    # if nothing found, find in near distance
+    if (sjmisc::is_empty(pos2) && fuzzy && !inherits(pattern, "regex")) {
+      pos2 <- str_pos(search.string = labels, find.term = pattern, part.dist.match = 1)
+    }
+  }
+
+  # search for pattern in value labels
+  if (search %in% c("value", "name_value", "label_value", "all")) {
+    labels <- get_labels(data, attr.only = F)
+
+    # check value labels with regex
+    if (inherits(pattern, "regex")) {
+      pos3 <- which(unlist(lapply(labels, function(x) {
+        any(stringr::str_detect(x, pattern))
+      })))
+    } else {
+      pos3 <- which(unlist(lapply(labels, function(x) {
+        any(stringr::str_detect(x, stringr::coll(pattern, ignore_case = ignore.case)))
+      })))
+    }
+
+    # if nothing found, find in near distance
+    if (sjmisc::is_empty(pos3) && fuzzy && !inherits(pattern, "regex")) {
+      pos3 <- which(unlist(lapply(labels, function(x) {
+        str_pos(search.string = x, find.term = pattern, part.dist.match = 1)
+      })))
+    }
+  }
+
+  # get unique variable indices
+  pos <- unique(c(pos1, pos2, pos3))
+  # remove -1
+  pos <- pos[which(pos != -1)]
+
+  # return data frame?
+  if (as.df) {
+    return(tibble::as_tibble(data[, pos]))
+  }
+
+  # return variable labels?
+  if (as.varlab) {
+    return(tibble::tibble(
+      col.nr = pos,
+      var.name = colnames(data)[pos],
+      var.label = get_label(data[, pos], def.value = colnames(data)[pos])
+    ))
+  }
+
+  # use column names
+  names(pos) <- colnames(data)[pos]
+  pos
+}