From b2700cfd651fd1486c3f71375bf58d01bea4d3c5 Mon Sep 17 00:00:00 2001 From: olivroy <52606734+olivroy@users.noreply.github.com> Date: Sat, 12 Aug 2023 20:50:42 -0400 Subject: [PATCH] Documention with markdown and remove mentions to master branch (#551) * Remove the docs/ directory (already deployed via GitHub pages) last updated 3 years ago. * `usethis::use_roxygen_md()` * `roxygen2md::roxygen2md()` and `devtools::document()` * usethis::use_tidy_description() * master -> main * Removal of `@title`, and cosmetic changes * `devtools::document()` * Revert * Redocument * Fix Warnings (no need for \%. % is fine. * styler::style_pkg() * Add `@keywords internal`, so that they don't show up in the function index. * However, the .Rd file is still created, so that people using `?convert_to_NA` will still see the documentation file. * Address comments and redocument. * Revert moving to `@details` * WS * Update DESCRIPTION Missed it when solving conflicts * Address comments * Minor additional cleanup * WS * `use_tidy_description()` --- .github/CONTRIBUTING.md | 2 +- DESCRIPTION | 52 +++++++++++++++++---------------- R/adorn_ns.R | 14 ++++----- R/adorn_pct_formatting.R | 33 ++++++++++++++------- R/adorn_percentages.R | 9 +++--- R/adorn_rounding.R | 12 ++++---- R/adorn_title.R | 12 ++++---- R/adorn_totals.R | 12 ++++---- R/as_and_untabyl.R | 52 +++++++++++++++++++++------------ R/clean_names.R | 34 +++++++++++----------- R/compare_df_cols.R | 34 +++++++++++----------- R/excel_dates.R | 27 +++++++++-------- R/get_dupes.R | 17 +++++++---- R/get_one_to_one.R | 4 +-- R/janitor.R | 6 ++-- R/janitor_deprecated.R | 56 +++++++++++++++++++----------------- R/make_clean_names.R | 52 ++++++++++++++++----------------- R/paste_skip_na.R | 4 +-- R/remove_empties.R | 30 +++++++++---------- R/round_half_up.R | 25 ++++++++++------ R/round_to_fraction.R | 24 +++++++++------- R/row_to_names.R | 22 +++++++------- R/single_value.R | 6 ++-- R/statistical_tests.R | 31 +++++++++----------- R/tabyl.R | 12 ++++---- R/top_levels.R | 9 ++++-- R/utils.R | 2 +- README.Rmd | 4 +-- README.md | 4 +-- index.Rmd | 7 +++-- index.md | 6 ++-- janitor.Rproj | 2 ++ man/add_totals_col.Rd | 1 + man/add_totals_row.Rd | 1 + man/adorn_ns.Rd | 5 ++-- man/adorn_pct_formatting.Rd | 30 +++++++++++++------ man/adorn_rounding.Rd | 2 +- man/adorn_totals.Rd | 4 +-- man/as_tabyl.Rd | 46 +++++++++++++++++++---------- man/chisq.test.Rd | 16 +++++------ man/clean_names.Rd | 24 ++++++++-------- man/compare_df_cols.Rd | 30 +++++++++---------- man/compare_df_cols_same.Rd | 2 +- man/convert_to_NA.Rd | 1 + man/convert_to_date.Rd | 18 ++++++------ man/describe_class.Rd | 10 +++---- man/excel_numeric_to_date.Rd | 18 ++++++------ man/fisher.test.Rd | 14 ++++----- man/get_dupes.Rd | 14 ++++++--- man/get_one_to_one.Rd | 4 +-- man/janitor.Rd | 4 +-- man/janitor_deprecated.Rd | 16 +++++------ man/make_clean_names.Rd | 16 +++++------ man/paste_skip_na.Rd | 4 +-- man/pipe.Rd | 2 +- man/remove_constant.Rd | 2 +- man/remove_empty.Rd | 6 ++-- man/remove_empty_cols.Rd | 1 + man/remove_empty_rows.Rd | 1 + man/round_half_up.Rd | 13 +++++++-- man/round_to_fraction.Rd | 10 +++---- man/row_to_names.Rd | 2 +- man/sas_numeric_to_date.Rd | 4 +-- man/single_value.Rd | 2 +- man/tabyl.Rd | 4 +-- man/top_levels.Rd | 8 ++++-- man/untabyl.Rd | 4 +-- man/use_first_valid_of.Rd | 1 + 68 files changed, 526 insertions(+), 430 deletions(-) diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 371ad99c..e87e2d90 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -23,7 +23,7 @@ If your proposed contribution addresses multiple issues, it should ideally be br * Make sure to track progress upstream (i.e., on our version of `janitor` at `sfirke/janitor`) by doing `git remote add upstream https://github.com/sfirke/janitor.git`. Before making changes make sure to pull changes in from upstream by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step * Make your changes (bonus points for making changes on a new feature branch) * Push up to your account -* Submit a pull request to the master branch at `sfirke/janitor` +* Submit a pull request to the main branch at `sfirke/janitor` ### Prefer to discuss over email? Email Sam. His email address is in the `DESCRIPTION` file of this repo. diff --git a/DESCRIPTION b/DESCRIPTION index 08153716..a1c01892 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,22 +1,25 @@ Package: janitor Title: Simple Tools for Examining and Cleaning Dirty Data Version: 2.2.0.9000 -Authors@R: c(person("Sam", "Firke", email = "samuel.firke@gmail.com", role = c("aut", "cre")), - person("Bill", "Denney", email = "wdenney@humanpredictions.com", role = "ctb"), - person("Chris", "Haid", email = "chrishaid@gmail.com", role = "ctb"), - person("Ryan", "Knight", email = "ryangknight@gmail.com", role = "ctb"), - person("Malte", "Grosser", email = "malte.grosser@gmail.com", role = "ctb"), - person("Jonathan", "Zadra", email = "jonathan.zadra@sorensonimpact.com", role = "ctb")) -Description: The main janitor functions can: perfectly format data.frame column - names; provide quick counts of variable combinations (i.e., frequency - tables and crosstabs); and explore duplicate records. Other janitor functions - nicely format the tabulation results. These tabulate-and-report functions - approximate popular features of SPSS and Microsoft Excel. This package - follows the principles of the "tidyverse" and works well with the pipe function - %>%. janitor was built with beginning-to-intermediate R users in mind and is - optimized for user-friendliness. -URL: https://github.com/sfirke/janitor, - https://sfirke.github.io/janitor/ +Authors@R: c( + person("Sam", "Firke", , "samuel.firke@gmail.com", role = c("aut", "cre")), + person("Bill", "Denney", , "wdenney@humanpredictions.com", role = "ctb"), + person("Chris", "Haid", , "chrishaid@gmail.com", role = "ctb"), + person("Ryan", "Knight", , "ryangknight@gmail.com", role = "ctb"), + person("Malte", "Grosser", , "malte.grosser@gmail.com", role = "ctb"), + person("Jonathan", "Zadra", , "jonathan.zadra@sorensonimpact.com", role = "ctb") + ) +Description: The main janitor functions can: perfectly format data.frame + column names; provide quick counts of variable combinations (i.e., + frequency tables and crosstabs); and explore duplicate records. Other + janitor functions nicely format the tabulation results. These + tabulate-and-report functions approximate popular features of SPSS and + Microsoft Excel. This package follows the principles of the + "tidyverse" and works well with the pipe function %>%. janitor was + built with beginning-to-intermediate R users in mind and is optimized + for user-friendliness. +License: MIT + file LICENSE +URL: https://github.com/sfirke/janitor, https://sfirke.github.io/janitor/ BugReports: https://github.com/sfirke/janitor/issues Depends: R (>= 3.1.2) @@ -28,14 +31,11 @@ Imports: magrittr, purrr, rlang, + snakecase (>= 0.9.2), stringi, stringr, - snakecase (>= 0.9.2), - tidyselect (>= 1.0.0), - tidyr (>= 1.0.0) -License: MIT + file LICENSE -LazyData: true -RoxygenNote: 7.2.3 + tidyr (>= 1.0.0), + tidyselect (>= 1.0.0) Suggests: dbplyr, knitr, @@ -45,6 +45,10 @@ Suggests: testthat (>= 3.0.0), tibble, tidygraph -VignetteBuilder: knitr -Encoding: UTF-8 +VignetteBuilder: + knitr Config/testthat/edition: 3 +Encoding: UTF-8 +LazyData: true +Roxygen: list(markdown = TRUE) +RoxygenNote: 7.2.3 diff --git a/R/adorn_ns.R b/R/adorn_ns.R index d9fbd355..97bac414 100644 --- a/R/adorn_ns.R +++ b/R/adorn_ns.R @@ -1,18 +1,16 @@ -#' @title Add underlying Ns to a tabyl displaying percentages. +#' Add underlying Ns to a tabyl displaying percentages. #' -#' @description -#' This function adds back the underlying Ns to a \code{tabyl} whose percentages were calculated using \code{adorn_percentages()}, to display the Ns and percentages together. You can also call it on a non-tabyl data.frame to which you wish to append Ns. +#' This function adds back the underlying Ns to a `tabyl` whose percentages were calculated using `adorn_percentages()`, to display the Ns and percentages together. You can also call it on a non-tabyl data.frame to which you wish to append Ns. #' -#' @param dat a data.frame of class \code{tabyl} that has had \code{adorn_percentages} and/or \code{adorn_pct_formatting} called on it. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists). +#' @param dat a data.frame of class `tabyl` that has had `adorn_percentages` and/or `adorn_pct_formatting` called on it. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way `tabyl` lists). #' @param position should the N go in the front, or in the rear, of the percentage? -#' @param ns the Ns to append. The default is the "core" attribute of the input tabyl \code{dat}, where the original Ns of a two-way \code{tabyl} are stored. However, if your Ns are stored somewhere else, or you need to customize them beyond what can be done with `format_func`, you can supply them here. -#' @param format_func a formatting function to run on the Ns. Consider defining with \code{base::format()}. -#' @param ... columns to adorn. This takes a tidyselect specification. By default, all columns are adorned except for the first column and columns not of class \code{numeric}, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to \code{tabyl}. +#' @param ns the Ns to append. The default is the "core" attribute of the input tabyl `dat`, where the original Ns of a two-way `tabyl` are stored. However, if your Ns are stored somewhere else, or you need to customize them beyond what can be done with `format_func`, you can supply them here. +#' @param format_func a formatting function to run on the Ns. Consider defining with [base::format()]. +#' @param ... columns to adorn. This takes a tidyselect specification. By default, all columns are adorned except for the first column and columns not of class `numeric`, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to `tabyl`. #' #' @return a data.frame with Ns appended #' @export #' @examples -#' #' mtcars %>% #' tabyl(am, cyl) %>% #' adorn_percentages("col") %>% diff --git a/R/adorn_pct_formatting.R b/R/adorn_pct_formatting.R index 69898298..80ac8ed8 100644 --- a/R/adorn_pct_formatting.R +++ b/R/adorn_pct_formatting.R @@ -1,21 +1,34 @@ -#' @title Format a data.frame of decimals as percentages. +#' Format a `data.frame` of decimals as percentages. #' #' @description -#' Numeric columns get multiplied by 100 and formatted as percentages according to user specifications. This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to adorn in the \code{...} argument. Non-numeric columns are always excluded. +#' Numeric columns get multiplied by 100 and formatted as +#' percentages according to user specifications. This function defaults to +#' excluding the first column of the input data.frame, assuming that it contains +#' a descriptive variable, but this can be overridden by specifying the columns +#' to adorn in the `...` argument. Non-numeric columns are always excluded. #' -#' The decimal separator character is the result of \code{getOption("OutDec")}, which is based on the user's locale. If the default behavior is undesirable, -#' change this value ahead of calling the function, either by changing locale or with \code{options(OutDec = ",")}. This aligns the decimal separator character with that used in \code{base::print()}. +#' The decimal separator character is the result of `getOption("OutDec")`, which +#' is based on the user's locale. If the default behavior is undesirable, +#' change this value ahead of calling the function, either by changing locale or +#' with `options(OutDec = ",")`. This aligns the decimal separator character +#' with that used in `base::print()`. #' -#' @param dat a data.frame with decimal values, typically the result of a call to \code{adorn_percentages} on a \code{tabyl}. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists). +#' @param dat a data.frame with decimal values, typically the result of a call +#' to `adorn_percentages` on a `tabyl`. If given a list of data.frames, this +#' function will apply itself to each data.frame in the list (designed for +#' 3-way `tabyl` lists). #' @param digits how many digits should be displayed after the decimal point? -#' @param rounding method to use for rounding - either "half to even", the base R default method, or "half up", where 14.5 rounds up to 15. -#' @param affix_sign should the \% sign be affixed to the end? -#' @param ... columns to adorn. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are adorned, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to \code{tabyl}. -#' +#' @param rounding method to use for rounding - either "half to even", the base +#' R default method, or "half up", where 14.5 rounds up to 15. +#' @param affix_sign should the % sign be affixed to the end? +#' @param ... columns to adorn. This takes a tidyselect specification. By +#' default, all numeric columns (besides the initial column, if numeric) are +#' adorned, but this allows you to manually specify which columns should be +#' adorned, for use on a data.frame that does not result from a call to +#' `tabyl`. #' @return a data.frame with formatted percentages #' @export #' @examples -#' #' mtcars %>% #' tabyl(am, cyl) %>% #' adorn_percentages("col") %>% diff --git a/R/adorn_percentages.R b/R/adorn_percentages.R index c50094bb..2d7a436a 100644 --- a/R/adorn_percentages.R +++ b/R/adorn_percentages.R @@ -1,12 +1,11 @@ -#' @title Convert a data.frame of counts to percentages. +#' Convert a data.frame of counts to percentages. #' -#' @description -#' This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to adorn in the \code{...} argument. +#' This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to adorn in the `...` argument. #' -#' @param dat a \code{tabyl} or other data.frame with a tabyl-like layout. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists). +#' @param dat a `tabyl` or other data.frame with a tabyl-like layout. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way `tabyl` lists). #' @param denominator the direction to use for calculating percentages. One of "row", "col", or "all". #' @param na.rm should missing values (including NaN) be omitted from the calculations? -#' @param ... columns to adorn. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are adorned, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to \code{tabyl}. +#' @param ... columns to adorn. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are adorned, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to `tabyl`. #' #' @return Returns a data.frame of percentages, expressed as numeric values between 0 and 1. #' @export diff --git a/R/adorn_rounding.R b/R/adorn_rounding.R index 1e2d788f..3e3909bb 100644 --- a/R/adorn_rounding.R +++ b/R/adorn_rounding.R @@ -1,14 +1,14 @@ -#' @title Round the numeric columns in a data.frame. +#' Round the numeric columns in a data.frame. #' #' @description -#' Can run on any data.frame with at least one numeric column. This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to round in the \code{...} argument. +#' Can run on any data.frame with at least one numeric column. This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to round in the `...` argument. #' -#' If you're formatting percentages, e.g., the result of \code{adorn_percentages()}, use \code{adorn_pct_formatting()} instead. This is a more flexible variant for ad-hoc usage. Compared to \code{adorn_pct_formatting()}, it does not multiply by 100 or pad the numbers with spaces for alignment in the results data.frame. This function retains the class of numeric input columns. +#' If you're formatting percentages, e.g., the result of `adorn_percentages()`, use `adorn_pct_formatting()` instead. This is a more flexible variant for ad-hoc usage. Compared to `adorn_pct_formatting()`, it does not multiply by 100 or pad the numbers with spaces for alignment in the results data.frame. This function retains the class of numeric input columns. #' -#' @param dat a \code{tabyl} or other data.frame with similar layout. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists). +#' @param dat a `tabyl` or other data.frame with similar layout. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way `tabyl` lists). #' @param digits how many digits should be displayed after the decimal point? #' @param rounding method to use for rounding - either "half to even", the base R default method, or "half up", where 14.5 rounds up to 15. -#' @param ... columns to adorn. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are adorned, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to \code{tabyl}. +#' @param ... columns to adorn. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are adorned, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to `tabyl`. #' #' @return Returns the data.frame with rounded numeric columns. #' @export @@ -39,7 +39,7 @@ #' #' cases %>% #' adorn_percentages(, , ends_with("ed")) %>% -#' adorn_rounding(, , one_of(c("recovered", "died"))) +#' adorn_rounding(, , all_of(c("recovered", "died"))) adorn_rounding <- function(dat, digits = 1, rounding = "half to even", ...) { # if input is a list, call purrr::map to recursively apply this function to each data.frame if (is.list(dat) && !is.data.frame(dat)) { diff --git a/R/adorn_title.R b/R/adorn_title.R index 3294cb29..15783d44 100644 --- a/R/adorn_title.R +++ b/R/adorn_title.R @@ -1,13 +1,13 @@ #' @title Add column name to the top of a two-way tabyl. #' #' @description -#' This function adds the column variable name to the top of a \code{tabyl} for a complete display of information. This makes the tabyl prettier, but renders the data.frame less useful for further manipulation. +#' This function adds the column variable name to the top of a `tabyl` for a complete display of information. This makes the tabyl prettier, but renders the data.frame less useful for further manipulation. #' -#' @param dat a data.frame of class \code{tabyl} or other data.frame with a tabyl-like layout. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists). -#' @param placement whether the column name should be added to the top of the tabyl in an otherwise-empty row \code{"top"} or appended to the already-present row name variable (\code{"combined"}). The formatting in the \code{"top"} option has the look of base R's \code{table()}; it also wipes out the other column names, making it hard to further use the data.frame besides formatting it for reporting. The \code{"combined"} option is more conservative in this regard. -#' @param row_name (optional) default behavior is to pull the row name from the attributes of the input \code{tabyl} object. If you wish to override that text, or if your input is not a \code{tabyl}, supply a string here. -#' @param col_name (optional) default behavior is to pull the column_name from the attributes of the input \code{tabyl} object. If you wish to override that text, or if your input is not a \code{tabyl}, supply a string here. -#' @return the input tabyl, augmented with the column title. Non-tabyl inputs that are of class \code{tbl_df} are downgraded to basic data.frames so that the title row prints correctly. +#' @param dat a data.frame of class `tabyl` or other data.frame with a tabyl-like layout. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way `tabyl` lists). +#' @param placement whether the column name should be added to the top of the tabyl in an otherwise-empty row `"top"` or appended to the already-present row name variable (`"combined"`). The formatting in the `"top"` option has the look of base R's `table()`; it also wipes out the other column names, making it hard to further use the data.frame besides formatting it for reporting. The `"combined"` option is more conservative in this regard. +#' @param row_name (optional) default behavior is to pull the row name from the attributes of the input `tabyl` object. If you wish to override that text, or if your input is not a `tabyl`, supply a string here. +#' @param col_name (optional) default behavior is to pull the column_name from the attributes of the input `tabyl` object. If you wish to override that text, or if your input is not a `tabyl`, supply a string here. +#' @return the input tabyl, augmented with the column title. Non-tabyl inputs that are of class `tbl_df` are downgraded to basic data.frames so that the title row prints correctly. #' #' @export #' @examples diff --git a/R/adorn_totals.R b/R/adorn_totals.R index 11cc7b89..f3aa4e43 100644 --- a/R/adorn_totals.R +++ b/R/adorn_totals.R @@ -1,15 +1,15 @@ #' @title Append a totals row and/or column to a data.frame. #' #' @description -#' This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to be totaled in the \code{...} argument. Non-numeric columns are converted to character class and have a user-specified fill character inserted in the totals row. +#' This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to be totaled in the `...` argument. Non-numeric columns are converted to character class and have a user-specified fill character inserted in the totals row. #' -#' @param dat an input data.frame with at least one numeric column. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists). -#' @param where one of "row", "col", or \code{c("row", "col")} +#' @param dat an input data.frame with at least one numeric column. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way `tabyl` lists). +#' @param where one of "row", "col", or `c("row", "col")` #' @param fill if there are non-numeric columns, what should fill the bottom row of those columns? If a string, relevant columns will be coerced to character. If `NA` then column types are preserved. #' @param na.rm should missing values (including NaN) be omitted from the calculations? -#' @param name name of the totals row and/or column. If both are created, and \code{name} is a single string, that name is applied to both. If both are created and \code{name} is a vector of length 2, the first element of the vector will be used as the row name (in column 1), and the second element will be used as the totals column name. Defaults to "Total". -#' @param ... columns to total. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are included in the totals, but this allows you to manually specify which columns should be included, for use on a data.frame that does not result from a call to \code{tabyl}. -#' @return Returns a data.frame augmented with a totals row, column, or both. The data.frame is now also of class \code{tabyl} and stores information about the attached totals and underlying data in the tabyl attributes. +#' @param name name of the totals row and/or column. If both are created, and `name` is a single string, that name is applied to both. If both are created and `name` is a vector of length 2, the first element of the vector will be used as the row name (in column 1), and the second element will be used as the totals column name. Defaults to "Total". +#' @param ... columns to total. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are included in the totals, but this allows you to manually specify which columns should be included, for use on a data.frame that does not result from a call to `tabyl`. +#' @return a data.frame augmented with a totals row, column, or both. The data.frame is now also of class `tabyl` and stores information about the attached totals and underlying data in the tabyl attributes. #' @export #' @examples #' mtcars %>% diff --git a/R/as_and_untabyl.R b/R/as_and_untabyl.R index 5d64c03c..5a14b525 100644 --- a/R/as_and_untabyl.R +++ b/R/as_and_untabyl.R @@ -1,24 +1,39 @@ -#' @title Add \code{tabyl} attributes to a data.frame. +#' Add `tabyl` attributes to a data.frame. #' #' @description -#' A \code{tabyl} is a data.frame containing counts of a variable or co-occurrences of two variables (a.k.a., a contingency table or crosstab). This specialized kind of data.frame has attributes that enable \code{adorn_} functions to be called for precise formatting and presentation of results. E.g., display results as a mix of percentages, Ns, add totals rows or columns, rounding options, in the style of Microsoft Excel PivotTable. +#' A `tabyl` is a data.frame containing counts of a variable or +#' co-occurrences of two variables (a.k.a., a contingency table or crosstab). +#' This specialized kind of data.frame has attributes that enable `adorn_` +#' functions to be called for precise formatting and presentation of results. +#' E.g., display results as a mix of percentages, Ns, add totals rows or +#' columns, rounding options, in the style of Microsoft Excel PivotTable. #' -#' A \code{tabyl} can be the result of a call to \code{janitor::tabyl()}, in which case these attributes are added automatically. This function adds \code{tabyl} class attributes to a data.frame that isn't the result of a call to \code{tabyl} but meets the requirements of a two-way tabyl: -#' 1) First column contains values of variable 1 -#' 2) Column names 2:n are the values of variable 2 -#' 3) Numeric values in columns 2:n are counts of the co-occurrences of the two variables.* +#' A `tabyl` can be the result of a call to `janitor::tabyl()`, in which case +#' these attributes are added automatically. This function adds `tabyl` class +#' attributes to a data.frame that isn't the result of a call to `tabyl` but +#' meets the requirements of a two-way tabyl: 1) First column contains values of +#' variable 1 2) Column names 2:n are the values of variable 2 3) Numeric values +#' in columns 2:n are counts of the co-occurrences of the two variables.* #' -#' * = this is the ideal form of a tabyl, but janitor's \code{adorn_} functions tolerate and ignore non-numeric columns in positions 2:n. +#' * = this is the ideal form of a tabyl, but janitor's `adorn_` functions tolerate and ignore non-numeric columns in positions 2:n. #' -#' For instance, the result of \code{dplyr::count()} followed by \code{tidyr::spread()} can be treated as a \code{tabyl}. +#' For instance, the result of [dplyr::count()] followed by [tidyr::spread()] +#' can be treated as a `tabyl`. #' -#' The result of calling \code{tabyl()} on a single variable is a special class of one-way tabyl; this function only pertains to the two-way tabyl. +#' The result of calling `tabyl()` on a single variable is a special class of +#' one-way tabyl; this function only pertains to the two-way tabyl. #' -#' @param dat a data.frame with variable values in the first column and numeric values in all other columns. -#' @param axes is this a two_way tabyl or a one_way tabyl? If this function is being called by a user, this should probably be "2". One-way tabyls are created by \code{tabyl} but are a special case. -#' @param row_var_name (optional) the name of the variable in the row dimension; used by \code{adorn_title()}. -#' @param col_var_name (optional) the name of the variable in the column dimension; used by \code{adorn_title()}. -#' @return Returns the same data.frame, but with the additional class of "tabyl" and the attribute "core". +#' @param dat a data.frame with variable values in the first column and numeric +#' values in all other columns. +#' @param axes is this a two_way tabyl or a one_way tabyl? If this function is +#' being called by a user, this should probably be "2". One-way tabyls are +#' created by `tabyl` but are a special case. +#' @param row_var_name (optional) the name of the variable in the row dimension; +#' used by `adorn_title()`. +#' @param col_var_name (optional) the name of the variable in the column +#' dimension; used by `adorn_title()`. +#' @return Returns the same data.frame, but with the additional class of "tabyl" +#' and the attribute "core". #' @export #' @examples #' as_tabyl(mtcars) @@ -68,13 +83,12 @@ as_tabyl <- function(dat, axes = 2, row_var_name = NULL, col_var_name = NULL) { dat } -#' @title Remove \code{tabyl} attributes from a data.frame. +#' Remove `tabyl` attributes from a data.frame. #' -#' @description -#' Strips away all \code{tabyl}-related attributes from a data.frame. +#' Strips away all `tabyl`-related attributes from a data.frame. #' -#' @param dat a data.frame of class \code{tabyl}. -#' @return Returns the same data.frame, but without the \code{tabyl} class and attributes. +#' @param dat a `data.frame` of class `tabyl`. +#' @return the same `data.frame`, but without the `tabyl` class and attributes. #' @export #' @examples #' diff --git a/R/clean_names.R b/R/clean_names.R index fc02fcc2..b8fe3b5f 100644 --- a/R/clean_names.R +++ b/R/clean_names.R @@ -1,38 +1,38 @@ #' @title Cleans names of an object (usually a data.frame). #' #' @description -#' Resulting names are unique and consist only of the \code{_} character, numbers, and letters. -#' Capitalization preferences can be specified using the \code{case} parameter. +#' Resulting names are unique and consist only of the `_` character, numbers, and letters. +#' Capitalization preferences can be specified using the `case` parameter. #' #' Accented characters are transliterated to ASCII. For example, an "o" with a #' German umlaut over it becomes "o", and the Spanish character "enye" becomes #' "n". #' #' This function takes and returns a data.frame, for ease of piping with -#' \code{`\%>\%`}. For the underlying function that works on a character vector -#' of names, see \code{\link[janitor]{make_clean_names}}. \code{clean_names} -#' relies on the versatile function \code{\link[snakecase]{to_any_case}}, which +#' `%>%`. For the underlying function that works on a character vector +#' of names, see [janitor::make_clean_names()]. `clean_names` +#' relies on the versatile function [snakecase::to_any_case()], which #' accepts many arguments. See that function's documentation for ideas on getting -#' the most out of \code{clean_names}. A few examples are included below. +#' the most out of `clean_names`. A few examples are included below. #' #' A common issue is that the micro/mu symbol is replaced by "m" instead of "u". #' The replacement with "m" is more correct when doing Greek-to-ASCII #' transliteration but less correct when doing scientific data-to-ASCII #' transliteration. A warning will be generated if the "m" replacement occurs. -#' To replace with "u", please add the argument \code{replace=janitor:::mu_to_u} +#' To replace with "u", please add the argument `replace=janitor:::mu_to_u` #' which is a character vector mapping all known mu or micro Unicode code points #' (characters) to "u". #' -#' @param dat the input data.frame. +#' @param dat The input `data.frame`. #' @inheritDotParams make_clean_names -string -#' @return Returns the data.frame with clean names. +#' @return A `data.frame` with clean names. #' -#' @details \code{clean_names()} is intended to be used on \code{data.frames} -#' and \code{data.frame}-like objects. For this reason there are methods to -#' support using \code{clean_names()} on \code{sf} and \code{tbl_graph} (from -#' \code{tidygraph}) objects as well as on database connections through -#' \code{dbplyr}. For cleaning other named objects like named lists -#' and vectors, use \code{make_clean_names()}. +#' @details `clean_names()` is intended to be used on `data.frames` +#' and `data.frame`-like objects. For this reason there are methods to +#' support using `clean_names()` on `sf` and `tbl_graph` (from +#' `tidygraph`) objects as well as on database connections through +#' `dbplyr`. For cleaning other named objects like named lists +#' and vectors, use `make_clean_names()`. #' #' @export #' @family Set names @@ -142,9 +142,9 @@ clean_names.tbl_lazy <- function(dat, ...) { #' This is a character vector with names of all known Unicode code points that #' look like the Greek mu or the micro symbol and values of "u". This is #' intended to simplify mapping from mu or micro in Unicode to the character "u" -#' with \code{clean_names()} and \code{make_clean_names()}. +#' with `clean_names()` and `make_clean_names()`. #' -#' See the help in \code{clean_names()} for how to use this. +#' See the help in `clean_names()` for how to use this. #' #' @family Set names mu_to_u <- diff --git a/R/compare_df_cols.R b/R/compare_df_cols.R index 3c24a47b..6456a62a 100644 --- a/R/compare_df_cols.R +++ b/R/compare_df_cols.R @@ -4,11 +4,11 @@ #' @details Due to the returned "column_name" column, no input data.frame may be #' named "column_name". #' -#' The \code{strict_description} argument is most typically used to understand +#' The `strict_description` argument is most typically used to understand #' if factor levels match or are bindable. Factors are typically bindable, #' but the behavior of what happens when they bind differs based on the #' binding method ("bind_rows" or "rbind"). Even when -#' \code{strict_description} is \code{FALSE}, data.frames may still bind +#' `strict_description` is `FALSE`, data.frames may still bind #' because some classes (like factors and characters) can bind even if they #' appear to differ. #' @@ -20,19 +20,19 @@ #' "match"ing columns, or only "mismatch"ing columns? #' @param bind_method What method of binding should be used to determine #' matches? With "bind_rows", columns missing from a data.frame would be -#' considered a match (as in \code{dplyr::bind_rows()}; with "rbind", columns +#' considered a match (as in `dplyr::bind_rows()`; with "rbind", columns #' missing from a data.frame would be considered a mismatch (as in -#' \code{base::rbind()}. -#' @param strict_description Passed to \code{describe_class}. Also, see the +#' `base::rbind()`. +#' @param strict_description Passed to `describe_class`. Also, see the #' Details section. #' @return A data.frame with a column named "column_name" with a value named #' after the input data.frames' column names, and then one column per #' data.frame (named after the input data.frame). If more than one input has #' the same column name, the column naming will have suffixes defined by -#' sequential use of \code{base::merge()} and may differ from expected naming. +#' sequential use of `base::merge()` and may differ from expected naming. #' The rows within the data.frame-named columns are descriptions of the #' classes of the data within the columns (generated by -#' \code{describe_class}). +#' `describe_class`). #' @examples #' compare_df_cols(data.frame(A = 1), data.frame(B = 2)) #' # user-defined names @@ -162,11 +162,11 @@ compare_df_cols <- function(..., return = c("all", "match", "mismatch"), bind_me #' compare_df_cols #' @param x The data.frame or list of data.frames #' @param class_colname The name for the column-name-defining column -#' @param strict_description Passed to \code{describe_class} +#' @param strict_description Passed to `describe_class` #' @return A 2-column data.frame with the first column naming all the columns of -#' \code{x} and the second column (named after the value in -#' \code{class_colname}) defining the classes using -#' \code{describe_class()}. +#' `x` and the second column (named after the value in +#' `class_colname`) defining the classes using +#' `describe_class()`. #' @noRd compare_df_cols_df_maker <- function(x, class_colname = "class", strict_description) { UseMethod("compare_df_cols_df_maker") @@ -216,10 +216,10 @@ compare_df_cols_df_maker.list <- function(x, class_colname = "class", strict_des #' Do the the data.frames have the same columns & types? #' #' @description Check whether a set of data.frames are row-bindable. Calls -#' \code{compare_df_cols()}and returns TRUE if there are no mis-matching rows. ` +#' `compare_df_cols()`and returns TRUE if there are no mis-matching rows. ` #' @inheritParams compare_df_cols #' @param verbose Print the mismatching columns if binding will fail. -#' @return \code{TRUE} if row binding will succeed or \code{FALSE} if it will +#' @return `TRUE` if row binding will succeed or `FALSE` if it will #' fail. #' @family Data frame type comparison #' @examples @@ -240,14 +240,14 @@ compare_df_cols_same <- function(..., bind_method = c("bind_rows", "rbind"), ver #' Describe the class(es) of an object #' #' @details For package developers, an S3 generic method can be written for -#' \code{describe_class()} for custom classes that may need more definition -#' than the default method. This function is called by \code{compare_df_cols}. +#' `describe_class()` for custom classes that may need more definition +#' than the default method. This function is called by `compare_df_cols`. #' #' @param x The object to describe #' @param strict_description Should differing factor levels be treated #' as differences for the purposes of identifying mismatches? -#' \code{strict_description = `TRUE`} is stricter and factors with different -#' levels will be treated as different classes. \code{FALSE} is more +#' `strict_description = TRUE` is stricter and factors with different +#' levels will be treated as different classes. `FALSE` is more #' lenient: for class comparison purposes, the variable is just a "factor". #' @return A character scalar describing the class(es) of an object where if the #' scalar will match, columns in a data.frame (or similar object) should bind diff --git a/R/excel_dates.R b/R/excel_dates.R index 846061bb..a570d5e6 100644 --- a/R/excel_dates.R +++ b/R/excel_dates.R @@ -1,7 +1,7 @@ -#' @title Convert dates encoded as serial numbers to Date class. +#' Convert dates encoded as serial numbers to Date class. #' -#' @description Converts numbers like \code{42370} into date values like -#' \code{2016-01-01}. +#' @description +#' Converts numbers like `42370` into date values like `2016-01-01`. #' #' Defaults to the modern Excel date encoding system. However, Excel for Mac #' 2008 and earlier Mac versions of Excel used a different date system. To @@ -10,25 +10,24 @@ #' it's the old Mac system. More on date encoding systems at #' http://support.office.com/en-us/article/Date-calculations-in-Excel-e7fe7167-48a9-4b96-bb53-5612a800b487. #' -#' A list of all timezones is available from \code{base::OlsonNames()}, and the -#' current timezone is available from \code{base::Sys.timezone()}. +#' A list of all timezones is available from `base::OlsonNames()`, and the +#' current timezone is available from `base::Sys.timezone()`. #' #' If your input data has a mix of Excel numeric dates and actual dates, see the -#' more powerful functions \code{convert_to_date()} and \code{convert_to_datetime()}. +#' more powerful functions [convert_to_date()] and `convert_to_datetime()`. #' #' @param date_num numeric vector of serial numbers to convert. -#' @param date_system the date system, either \code{"modern"} or \code{"mac -#' pre-2011"}. +#' @param date_system the date system, either `"modern"` or `"mac pre-2011"`. #' @param include_time Include the time (hours, minutes, seconds) in the output? #' (See details) #' @param round_seconds Round the seconds to an integer (only has an effect when -#' \code{include_time} is \code{TRUE})? -#' @param tz Time zone, used when \code{include_time = TRUE} (see details for +#' `include_time` is `TRUE`)? +#' @param tz Time zone, used when `include_time = TRUE` (see details for #' more information on timezones). -#' @return Returns a vector of class Date if \code{include_time} is -#' \code{FALSE}. Returns a vector of class POSIXlt if \code{include_time} is -#' \code{TRUE}. -#' @details When using \code{include_time=TRUE}, days with leap seconds will not +#' @return Returns a vector of class Date if `include_time` is +#' `FALSE`. Returns a vector of class POSIXlt if `include_time` is +#' `TRUE`. +#' @details When using `include_time=TRUE`, days with leap seconds will not #' be accurately handled as they do not appear to be accurately handled by #' Windows (as described in #' https://support.microsoft.com/en-us/help/2722715/support-for-the-leap-second). diff --git a/R/get_dupes.R b/R/get_dupes.R index e97b56e6..b86e409c 100644 --- a/R/get_dupes.R +++ b/R/get_dupes.R @@ -1,11 +1,16 @@ -#' @title Get rows of a \code{data.frame} with identical values for the specified variables. +#' Get rows of a `data.frame` with identical values for the specified variables. #' -#' @description -#' For hunting duplicate records during data cleaning. Specify the data.frame and the variable combination to search for duplicates and get back the duplicated rows. +#' For hunting duplicate records during data cleaning. Specify the data.frame +#' and the variable combination to search for duplicates and get back the +#' duplicated rows. #' -#' @param dat The input data.frame. -#' @param ... Unquoted variable names to search for duplicates. This takes a tidyselect specification. -#' @return Returns a data.frame with the full records where the specified variables have duplicated values, as well as a variable \code{dupe_count} showing the number of rows sharing that combination of duplicated values. If the input data.frame was of class \code{tbl_df}, the output is as well. +#' @param dat The input `data.frame`. +#' @param ... Unquoted variable names to search for duplicates. This takes a +#' tidyselect specification. +#' @return A data.frame with the full records where the specified +#' variables have duplicated values, as well as a variable `dupe_count` +#' showing the number of rows sharing that combination of duplicated values. +#' If the input data.frame was of class `tbl_df`, the output is as well. #' @export #' @examples #' get_dupes(mtcars, mpg, hp) diff --git a/R/get_one_to_one.R b/R/get_one_to_one.R index 2f8aea95..6b8d2884 100644 --- a/R/get_one_to_one.R +++ b/R/get_one_to_one.R @@ -1,8 +1,9 @@ #' Find the list of columns that have a 1:1 mapping to each other #' -#' @param dat A data.frame or similar object +#' @param dat A `data.frame` or similar object #' @return A list with one element for each group of columns that map #' identically to each other. +#' @export #' @examples #' foo <- data.frame( #' Lab_Test_Long = c("Cholesterol, LDL", "Cholesterol, LDL", "Glucose"), @@ -12,7 +13,6 @@ #' stringsAsFactors = FALSE #' ) #' get_one_to_one(foo) -#' @export get_one_to_one <- function(dat) { stopifnot(ncol(dat) > 0) stopifnot(!any(duplicated(names(dat)))) diff --git a/R/janitor.R b/R/janitor.R index 38eb4e9a..b636d6ae 100644 --- a/R/janitor.R +++ b/R/janitor.R @@ -3,9 +3,9 @@ #' janitor has simple little tools for examining and cleaning dirty data. #' #' @section Main functions: -#' The main janitor functions can: perfectly format ugly \code{data.frame} column names; isolate +#' The main janitor functions can: perfectly format ugly `data.frame` column names; isolate #' duplicate records for further study; and provide quick one- and two-variable tabulations -#' (i.e., frequency tables and crosstabs) that improve on the base R function \code{table()}. +#' (i.e., frequency tables and crosstabs) that improve on the base R function `table()`. #' #' #' Other functions in the package can format for reporting the results of these tabulations. @@ -13,7 +13,7 @@ #' #' @section Package context: #' This package follows the principles of the "tidyverse" and in particular works well with -#' the \code{\%>\%} pipe function. +#' the `\%>\%` pipe function. #' #' #' janitor was built with beginning-to-intermediate R users in mind diff --git a/R/janitor_deprecated.R b/R/janitor_deprecated.R index c2991a6c..bdd78b6b 100644 --- a/R/janitor_deprecated.R +++ b/R/janitor_deprecated.R @@ -2,16 +2,14 @@ #' #' These functions have already become defunct or may be defunct as soon as the next release. #' -#' \itemize{ -#' \item \code{\link{adorn_crosstab}} -#' \item \code{\link{crosstab}} -#' \item \code{\link{use_first_valid_of}} -#' \item \code{\link{convert_to_NA}} -#' \item \code{\link{add_totals_col}} -#' \item \code{\link{add_totals_row}} -#' \item \code{\link{remove_empty_rows}} -#' \item \code{\link{remove_empty_cols}} -#' } +#' * [adorn_crosstab()] +#' * [crosstab()] +#' * [use_first_valid_of()] +#' * [convert_to_NA()] +#' * [add_totals_col()] +#' * [add_totals_row()] +#' * [remove_empty_rows()] +#' * [remove_empty_cols()] #' #' @name janitor_deprecated # EXCLUDE COVERAGE START @@ -24,7 +22,7 @@ NULL #' @param ... arguments #' @keywords internal #' @description -#' This function is deprecated, use \code{tabyl(dat, var1, var2)} instead. +#' This function is deprecated, use `tabyl(dat, var1, var2)` instead. #' @export crosstab <- function(...) { @@ -38,12 +36,12 @@ crosstab <- function(...) { #' @title Add presentation formatting to a crosstabulation table. #' @description -#' This function is deprecated, use the \code{adorn_} family of functions instead. -#' @param dat a data.frame with row names in the first column and numeric values in all other columns. Usually the piped-in result of a call to \code{crosstab} that included the argument \code{percent = "none"}. +#' This function is deprecated, use the `adorn_` family of functions instead. +#' @param dat a data.frame with row names in the first column and numeric values in all other columns. Usually the piped-in result of a call to `crosstab` that included the argument `percent = "none"`. #' @param denom the denominator to use for calculating percentages. One of "row", "col", or "all". #' @param show_n should counts be displayed alongside the percentages? #' @param digits how many digits should be displayed after the decimal point? -#' @param show_totals display a totals summary? Will be a row, column, or both depending on the value of \code{denom}. +#' @param show_totals display a totals summary? Will be a row, column, or both depending on the value of `denom`. #' @param rounding method to use for truncating percentages - either "half to even", the base R default method, or "half up", where 14.5 rounds up to 15. #' @return Returns a data.frame. #' @keywords internal @@ -61,15 +59,14 @@ adorn_crosstab <- function(dat, denom = "row", show_n = TRUE, digits = 1, show_t #' @title Append a totals row to a data.frame. #' #' @description -#' This function is deprecated, use \code{adorn_totals} instead. +#' This function is deprecated, use `adorn_totals` instead. #' #' @param dat an input data.frame with at least one numeric column. #' @param fill if there are more than one non-numeric columns, what string should fill the bottom row of those columns? #' @param na.rm should missing values (including NaN) be omitted from the calculations? #' @return Returns a data.frame with a totals row, consisting of "Total" in the first column and column sums in the others. +#' @keywords internal #' @export - - add_totals_row <- function(dat, fill = "-", na.rm = TRUE) { lifecycle::deprecate_stop( when = "2.0.0", @@ -82,10 +79,11 @@ add_totals_row <- function(dat, fill = "-", na.rm = TRUE) { #' @title Append a totals column to a data.frame. #' #' @description -#' This function is deprecated, use \code{adorn_totals} instead. +#' This function is deprecated, use `adorn_totals` instead. #' #' @param dat an input data.frame with at least one numeric column. #' @param na.rm should missing values (including NaN) be omitted from the calculations? +#' @keywords internal #' @return Returns a data.frame with a totals column containing row-wise sums. #' @export @@ -102,14 +100,15 @@ add_totals_col <- function(dat, na.rm = TRUE) { #' @title Returns first non-NA value from a set of vectors. #' #' @description -#' At each position of the input vectors, iterates through in order and returns the first non-NA value. This is a robust replacement of the common \code{ifelse(!is.na(x), x, ifelse(!is.na(y), y, z))}. It's more readable and handles problems like \code{ifelse}'s inability to work with dates in this way. +#' At each position of the input vectors, iterates through in order and returns the first non-NA value. This is a robust replacement of the common `ifelse(!is.na(x), x, ifelse(!is.na(y), y, z))`. It's more readable and handles problems like `ifelse`'s inability to work with dates in this way. #' -##' @section Warning: Deprecated, do not use in new code. Use \code{dplyr::coalesce()} instead. +##' @section Warning: Deprecated, do not use in new code. Use `dplyr::coalesce()` instead. #' @param ... the input vectors. Order matters: these are searched and prioritized in the order they are supplied. -#' @param if_all_NA what value should be used when all of the vectors return \code{NA} for a certain index? Default is NA. +#' @param if_all_NA what value should be used when all of the vectors return `NA` for a certain index? Default is NA. #' @return Returns a single vector with the selected values. #' @seealso janitor_deprecated #' @export +#' @keywords internal use_first_valid_of <- function(..., if_all_NA = NA) { lifecycle::deprecate_stop( when = "2.0.0", @@ -118,17 +117,18 @@ use_first_valid_of <- function(..., if_all_NA = NA) { ) } -#' @title Convert string values to true \code{NA} values. +#' @title Convert string values to true `NA` values. #' #' @description -#' Converts instances of user-specified strings into \code{NA}. Can operate on either a single vector or an entire data.frame. +#' Converts instances of user-specified strings into `NA`. Can operate on either a single vector or an entire data.frame. #' -#' @section Warning: Deprecated, do not use in new code. Use \code{dplyr::na_if()} instead. +#' @section Warning: Deprecated, do not use in new code. Use `dplyr::na_if()` instead. #' @param dat vector or data.frame to operate on. #' @param strings character vector of strings to convert. -#' @return Returns a cleaned object. Can be a vector, data.frame, or \code{tibble::tbl_df} depending on the provided input. +#' @return Returns a cleaned object. Can be a vector, data.frame, or `tibble::tbl_df` depending on the provided input. #' @seealso janitor_deprecated #' @export +#' @keywords internal #' convert_to_NA <- function(dat, strings) { lifecycle::deprecate_stop( @@ -144,7 +144,7 @@ convert_to_NA <- function(dat, strings) { #' @title Removes empty rows from a data.frame. #' #' @description -#' This function is deprecated, use \code{remove_empty("rows")} instead. +#' This function is deprecated, use `remove_empty("rows")` instead. #' #' @param dat the input data.frame. #' @return Returns the data.frame with no empty rows. @@ -152,6 +152,7 @@ convert_to_NA <- function(dat, strings) { #' # not run: #' # dat %>% remove_empty_rows #' @export +#' @keywords internal remove_empty_rows <- function(dat) { lifecycle::deprecate_stop( @@ -164,7 +165,7 @@ remove_empty_rows <- function(dat) { #' @title Removes empty columns from a data.frame. #' #' @description -#' This function is deprecated, use \code{remove_empty("cols")} instead. +#' This function is deprecated, use `remove_empty("cols")` instead. #' #' @param dat the input data.frame. #' @return Returns the data.frame with no empty columns. @@ -172,6 +173,7 @@ remove_empty_rows <- function(dat) { #' # not run: #' # dat %>% remove_empty_cols #' @export +#' @keywords internal remove_empty_cols <- function(dat) { lifecycle::deprecate_stop( diff --git a/R/make_clean_names.R b/R/make_clean_names.R index 8303a4d0..284f0dcf 100644 --- a/R/make_clean_names.R +++ b/R/make_clean_names.R @@ -1,62 +1,62 @@ -#' @title Cleans a vector of text, typically containing the names of an object. +#' Cleans a vector of text, typically containing the names of an object. #' -#' @description Resulting strings are unique and consist only of the \code{_} +#' @description Resulting strings are unique and consist only of the `_` #' character, numbers, and letters. By default, the resulting strings will only #' consist of ASCII characters, but non-ASCII (e.g. Unicode) may be allowed by -#' setting \code{ascii=FALSE}. Capitalization preferences can be specified -#' using the \code{case} parameter. +#' setting `ascii = FALSE`. Capitalization preferences can be specified +#' using the `case` parameter. #' -#' For use on the names of a data.frame, e.g., in a \code{`\%>\%`} pipeline, -#' call the convenience function \code{\link[janitor]{clean_names}}. +#' For use on the names of a data.frame, e.g., in a `%>%` pipeline, +#' call the convenience function [janitor::clean_names()]. #' -#' When \code{ascii=TRUE} (the default), accented characters are transliterated +#' When `ascii = TRUE` (the default), accented characters are transliterated #' to ASCII. For example, an "o" with a German umlaut over it becomes "o", and #' the Spanish character "enye" becomes "n". #' #' The order of operations is: make replacements, (optional) ASCII conversion, -#' remove initial spaces and punctuation, apply \code{base::make.names()}, -#' apply \code{snakecase::to_any_case}, and add numeric suffixes +#' remove initial spaces and punctuation, apply `base::make.names()`, +#' apply `snakecase::to_any_case`, and add numeric suffixes #' to resolve any duplicated names. #' -#' This function relies on \code{snakecase::to_any_case} and can take advantage of +#' This function relies on `snakecase::to_any_case` and can take advantage of #' its versatility. For instance, an abbreviation like "ID" can have its -#' capitalization preserved by passing the argument \code{abbreviations = "ID"}. -#' See the documentation for \code{\link[snakecase:to_any_case]{snakecase::to_any_case}} +#' capitalization preserved by passing the argument `abbreviations = "ID"`. +#' See the documentation for [snakecase::to_any_case()] #' for more about how to use its features. #' #' On some systems, not all transliterators to ASCII are available. If this is #' the case on your system, all available transliterators will be used, and a #' warning will be issued once per session indicating that results may be #' different when run on a different system. That warning can be disabled with -#' \code{options(janitor_warn_transliterators=FALSE)}. +#' `options(janitor_warn_transliterators=FALSE)`. #' -#' If the objective of your call to \code{make_clean_names()} is only to translate to +#' If the objective of your call to `make_clean_names()` is only to translate to #' ASCII, try the following instead: -#' \code{stringi::stri_trans_general(x, id="Any-Latin;Greek-Latin;Latin-ASCII")}. +#' `stringi::stri_trans_general(x, id="Any-Latin;Greek-Latin;Latin-ASCII")`. #' #' @param string A character vector of names to clean. -#' @param case The desired target case (default is \code{"snake"}) will be -#' passed to \code{snakecase::to_any_case()} with the exception of "old_janitor", +#' @param case The desired target case (default is `"snake"`) will be +#' passed to `snakecase::to_any_case()` with the exception of "old_janitor", #' which exists only to support legacy code (it preserves the behavior of -#' \code{clean_names()} prior to addition of the "case" argument (janitor +#' `clean_names()` prior to addition of the "case" argument (janitor #' versions <= 0.3.1). "old_janitor" is not intended for new code. See -#' \code{\link[snakecase]{to_any_case}} for a wide variety of supported cases, +#' [snakecase::to_any_case()] for a wide variety of supported cases, #' including "sentence" and "title" case. #' @param replace A named character vector where the name is replaced by the #' value. -#' @param ascii Convert the names to ASCII (\code{TRUE}, default) or not -#' (\code{FALSE}). -#' @param use_make_names Should \code{make.names()} be applied to ensure that the -#' output is usable as a name without quoting? (Avoiding \code{make.names()} +#' @param ascii Convert the names to ASCII (`TRUE`, default) or not +#' (`FALSE`). +#' @param use_make_names Should `make.names()` be applied to ensure that the +#' output is usable as a name without quoting? (Avoiding `make.names()` #' ensures that the output is locale-independent but quoting may be required.) -#' @param allow_dupes Allow duplicates in the returned names (\code{TRUE}) or not -#' (\code{FALSE}, the default). +#' @param allow_dupes Allow duplicates in the returned names (`TRUE`) or not +#' (`FALSE`, the default). #' @inheritParams snakecase::to_any_case #' @inheritDotParams snakecase::to_any_case #' #' @return Returns the "cleaned" character vector. #' @export -#' @seealso \code{\link[snakecase]{to_any_case}()} +#' @seealso [snakecase::to_any_case()] #' @examples #' #' # cleaning the names of a vector: diff --git a/R/paste_skip_na.R b/R/paste_skip_na.R index 27784c3d..b2a2ede9 100644 --- a/R/paste_skip_na.R +++ b/R/paste_skip_na.R @@ -1,9 +1,9 @@ -#' Like \code{paste()}, but missing values are omitted +#' Like `paste()`, but missing values are omitted #' #' @details If all values are missing, the value from the first argument is #' preserved. #' -#' @param ...,sep,collapse See \code{?paste} +#' @param ...,sep,collapse See [base::paste()] #' @return A character vector of pasted values. #' @examples #' paste_skip_na(NA) # NA_character_ diff --git a/R/remove_empties.R b/R/remove_empties.R index 622be13d..d6b51d94 100644 --- a/R/remove_empties.R +++ b/R/remove_empties.R @@ -1,19 +1,19 @@ -#' @title Remove empty rows and/or columns from a data.frame or matrix. +#' Remove empty rows and/or columns from a data.frame or matrix. #' -#' @description Removes all rows and/or columns from a data.frame or matrix that -#' are composed entirely of \code{NA} values. +#' Removes all rows and/or columns from a data.frame or matrix that +#' are composed entirely of `NA` values. #' #' @param dat the input data.frame or matrix. -#' @param which one of "rows", "cols", or \code{c("rows", "cols")}. Where no +#' @param which one of "rows", "cols", or `c("rows", "cols")`. Where no #' value of which is provided, defaults to removing both empty rows and empty #' columns, declaring the behavior with a printed message. #' @param cutoff What fraction (>0 to <=1) of rows or columns must be empty to #' be removed? -#' @param quiet Should messages be suppressed (\code{TRUE}) or printed -#' (\code{FALSE}) indicating the summary of empty columns or rows removed? +#' @param quiet Should messages be suppressed (`TRUE`) or printed +#' (`FALSE`) indicating the summary of empty columns or rows removed? #' @return Returns the object without its missing rows or columns. #' @family remove functions -#' @seealso \code{\link[=remove_constant]{remove_constant()}} for removing +#' @seealso [remove_constant()] for removing #' constant columns. #' @examples #' # not run: @@ -85,11 +85,11 @@ remove_empty <- function(dat, which = c("rows", "cols"), cutoff = 1, quiet = TRU #' @title Remove constant columns from a data.frame or matrix. #' @param dat the input data.frame or matrix. -#' @param na.rm should \code{NA} values be removed when considering whether a -#' column is constant? The default value of \code{FALSE} will result in a -#' column not being removed if it's a mix of a single value and \code{NA}. -#' @param quiet Should messages be suppressed (\code{TRUE}) or printed -#' (\code{FALSE}) indicating the summary of empty columns or rows removed? +#' @param na.rm should `NA` values be removed when considering whether a +#' column is constant? The default value of `FALSE` will result in a +#' column not being removed if it's a mix of a single value and `NA`. +#' @param quiet Should messages be suppressed (`TRUE`) or printed +#' (`FALSE`) indicating the summary of empty columns or rows removed? #' #' @examples #' remove_constant(data.frame(A = 1, B = 1:3)) @@ -100,7 +100,7 @@ remove_empty <- function(dat, which = c("rows", "cols"), cutoff = 1, quiet = TRU #' unique() #' @importFrom stats na.omit #' @family remove functions -#' @seealso \code{\link[=remove_empty]{remove_empty()}} for removing empty +#' @seealso [remove_empty()] for removing empty #' columns or rows. #' @export remove_constant <- function(dat, na.rm = FALSE, quiet = TRUE) { @@ -133,8 +133,8 @@ remove_constant <- function(dat, na.rm = FALSE, quiet = TRUE) { #' Generate the message describing columns or rows that are being removed. #' #' @inheritParams remove_empty -#' @param mask_keep A logical vector of rows or columns to keep (\code{TRUE}) or -#' remove (\code{FALSE}). +#' @param mask_keep A logical vector of rows or columns to keep (`TRUE`) or +#' remove (`FALSE`). #' @param reason The reason that rows are being removed (to be used in the #' message. #' @noRd diff --git a/R/round_half_up.R b/R/round_half_up.R index 57a98118..c4f63006 100644 --- a/R/round_half_up.R +++ b/R/round_half_up.R @@ -1,12 +1,19 @@ -#' @title Round a numeric vector; halves will be rounded up, ala Microsoft Excel. +#' Round a numeric vector; halves will be rounded up, ala Microsoft Excel. #' #' @description -#' In base R \code{round()}, halves are rounded to even, e.g., 12.5 and 11.5 are both rounded to 12. This function rounds 12.5 to 13 (assuming \code{digits = 0}). Negative halves are rounded away from zero, e.g., -0.5 is rounded to -1. +#' In base R `round()`, halves are rounded to even, e.g., 12.5 and +#' 11.5 are both rounded to 12. This function rounds 12.5 to 13 (assuming +#' `digits = 0`). Negative halves are rounded away from zero, e.g., -0.5 is +#' rounded to -1. #' -#' This may skew subsequent statistical analysis of the data, but may be desirable in certain contexts. This function is implemented exactly from \url{https://stackoverflow.com/a/12688836}; see that question and comments for discussion of this issue. +#' This may skew subsequent statistical analysis of the data, but may be +#' desirable in certain contexts. This function is implemented exactly from +#' ; see that question and comments for +#' discussion of this issue. #' #' @param x a numeric vector to round. #' @param digits how many digits should be displayed after the decimal point? +#' @returns A vector with the same length as `x` #' @export #' @examples #' round_half_up(12.5) @@ -23,17 +30,17 @@ round_half_up <- function(x, digits = 0) { z * posneg } -#' @title Round a numeric vector to the specified number of significant digits; halves will be rounded up. +#' Round a numeric vector to the specified number of significant digits; halves will be rounded up. #' #' @description -#' In base R \code{signif()}, halves are rounded to even, e.g., -#' \code{signif(11.5, 2)} and \code{signif(12.5, 2)} are both rounded to 12. -#' This function rounds 12.5 to 13 (assuming \code{digits = 2}). Negative halves -#' are rounded away from zero, e.g., \code{signif(-2.5, 1)} is rounded to -3. +#' In base R `signif()`, halves are rounded to even, e.g., +#' `signif(11.5, 2)` and `signif(12.5, 2)` are both rounded to 12. +#' This function rounds 12.5 to 13 (assuming `digits = 2`). Negative halves +#' are rounded away from zero, e.g., `signif(-2.5, 1)` is rounded to -3. #' #' This may skew subsequent statistical analysis of the data, but may be #' desirable in certain contexts. This function is implemented from -#' \url{https://stackoverflow.com/a/1581007/}; see that question and +#' ; see that question and #' comments for discussion of this issue. #' #' @param x a numeric vector to round. diff --git a/R/round_to_fraction.R b/R/round_to_fraction.R index 1b2734b5..b409d43f 100644 --- a/R/round_to_fraction.R +++ b/R/round_to_fraction.R @@ -1,32 +1,34 @@ #' Round to the nearest fraction of a specified denominator. #' -#' @description Round a decimal to the precise decimal value of a specified +#' @description +#' Round a decimal to the precise decimal value of a specified #' fractional denominator. Common use cases include addressing floating point #' imprecision and enforcing that data values fall into a certain set. #' #' E.g., if a decimal represents hours and values should be logged to the nearest -#' minute, \code{round_to_fraction(x, 60)} would enforce that distribution and 0.57 +#' minute, `round_to_fraction(x, 60)` would enforce that distribution and 0.57 #' would be rounded to 0.566667, the equivalent of 34/60. 0.56 would also be rounded #' to 34/60. #' -#' Set \code{denominator = 1} to round to whole numbers. +#' Set `denominator = 1` to round to whole numbers. #' -#' The \code{digits} argument allows for rounding of the subsequent result. +#' The `digits` argument allows for rounding of the subsequent result. #' -#' @details If \code{digits} is \code{Inf}, \code{x} is rounded to the fraction -#' and then kept at full precision. If \code{digits} is \code{"auto"}, the -#' number of digits is automatically selected as -#' \code{ceiling(log10(denominator)) + 1}. +#' @details +#' If `digits` is `Inf`, `x` is rounded to the fraction +#' and then kept at full precision. If `digits` is `"auto"`, the +#' number of digits is automatically selected as +#' `ceiling(log10(denominator)) + 1`. #' #' @param x A numeric vector #' @param denominator The denominator of the fraction for rounding (a scalar or #' vector positive integer). #' @param digits Integer indicating the number of decimal places to be used -#' after rounding to the fraction. This is passed to \code{base::round()}). -#' Negative values are allowed (see Details). (\code{Inf} indicates no +#' after rounding to the fraction. This is passed to `base::round()`). +#' Negative values are allowed (see Details). (`Inf` indicates no #' subsequent rounding) #' @return the input x rounded to a decimal value that has an integer numerator relative -#' to \code{denominator} (possibly subsequently rounded to a number of decimal +#' to `denominator` (possibly subsequently rounded to a number of decimal #' digits). #' @examples #' round_to_fraction(1.6, denominator = 2) diff --git a/R/row_to_names.R b/R/row_to_names.R index 59530f8c..f24b180f 100644 --- a/R/row_to_names.R +++ b/R/row_to_names.R @@ -1,16 +1,16 @@ #' Elevate a row to be the column names of a data.frame. #' #' @param dat The input data.frame -#' @param row_number The row(s) of \code{dat} containing the variable names or the -#' string \code{"find_header"} to use \code{find_header(dat=dat, ...)} to find +#' @param row_number The row(s) of `dat` containing the variable names or the +#' string `"find_header"` to use `find_header(dat=dat, ...)` to find #' the row_number. Allows for multiple rows input as a numeric vector. NA's are -#' ignored, and if a column contains only NA value it will be named \code{"NA"}. -#' @param ... Sent to \code{find_header()}, if -#' \code{row_number = "find_header"}. Otherwise, ignored. -#' @param remove_row Should the row \code{row_number} be removed from the +#' ignored, and if a column contains only NA value it will be named `"NA"`. +#' @param ... Sent to `find_header()`, if +#' `row_number = "find_header"`. Otherwise, ignored. +#' @param remove_row Should the row `row_number` be removed from the #' resulting data.frame? -#' @param remove_rows_above If \code{row_number != 1}, should the rows above -#' \code{row_number} - that is, between \code{1:(row_number-1)} - be removed +#' @param remove_rows_above If `row_number != 1`, should the rows above +#' `row_number` - that is, between `1:(row_number-1)` - be removed #' from the resulting data.frame? #' @param sep A character string to separate the values in the case of multiple #' rows input to `row_number`. @@ -88,12 +88,12 @@ row_to_names <- function(dat, row_number, ..., remove_row = TRUE, remove_rows_ab #' Find the header row in a data.frame #' #' @details -#' If \code{...} is missing, then the first row with no missing values is used. +#' If `...` is missing, then the first row with no missing values is used. #' #' When searching for a specified value or value within a column, the first row #' with a match will be returned, regardless of the completeness of the rest of -#' that row. If \code{...} has a single character argument, then the first -#' column is searched for that value. If \code{...} has a named numeric +#' that row. If `...` has a single character argument, then the first +#' column is searched for that value. If `...` has a named numeric #' argument, then the column whose position number matches the value of that #' argument is searched for the name (see the last example below). If more than one #' row is found matching a value that is searched for, the number of the first diff --git a/R/single_value.R b/R/single_value.R index fd934878..45227ab0 100644 --- a/R/single_value.R +++ b/R/single_value.R @@ -1,14 +1,14 @@ #' Ensure that a vector has only a single value throughout. #' #' Missing values are replaced with the single value, and if all values are -#' missing, the first value in \code{missing} is used throughout. +#' missing, the first value in `missing` is used throughout. #' #' @param x The vector which should have a single value -#' @param missing The vector of values to consider missing in \code{x} +#' @param missing The vector of values to consider missing in `x` #' @param warn_if_all_missing Generate a warning if all values are missing? #' @param info If more than one value is found, append this to the warning or #' error to assist with determining the location of the issue. -#' @return \code{x} as the scalar single value found throughout (or an error if +#' @return `x` as the scalar single value found throughout (or an error if #' more than one value is found). #' @examples #' # A simple use case with vectors of input diff --git a/R/statistical_tests.R b/R/statistical_tests.R index 211844c3..239879d9 100644 --- a/R/statistical_tests.R +++ b/R/statistical_tests.R @@ -1,17 +1,15 @@ -#' @title Apply stats::chisq.test to a two-way tabyl +#' Apply `stats::chisq.test()` to a two-way tabyl #' #' @description -#' This generic function overrides stats::chisq.test. If the passed table +#' This generic function overrides `stats::chisq.test`. If the passed table #' is a two-way tabyl, it runs it through janitor::chisq.test.tabyl, otherwise -#' it just calls stats::chisq.test. -#' -#' @return -#' The result is the same as the one of stats::chisq.test. If `tabyl_results` -#' is TRUE, the returned tables `observed`, `expected`, `residuals` and `stdres` -#' are converted to tabyls. +#' it just calls `stats::chisq.test()`. #' #' @param x a two-way tabyl, a numeric vector or a factor -#' @param ... other parameters passed to stats::chisq.test +#' @param ... other parameters passed to [stats::chisq.test()] +#' @return The result is the same as the one of `stats::chisq.test()`. +#' If `tabyl_results` is `TRUE`, the returned tables `observed`, `expected`, +#' `residuals` and `stdres` are converted to tabyls. #' #' @examples #' tab <- tabyl(mtcars, gear, cyl) @@ -121,18 +119,17 @@ chisq.test.tabyl <- function(x, tabyl_results = TRUE, ...) { -#' @title Apply stats::fisher.test to a two-way tabyl +#' Apply `stats::fisher.test()` to a two-way tabyl #' -#' @description -#' This generic function overrides stats::fisher.test. If the passed table -#' is a two-way tabyl, it runs it through janitor::fisher.test.tabyl, otherwise -#' it just calls stats::fisher.test. +#' This generic function overrides [stats::fisher.test()]. If the passed table +#' is a two-way tabyl, it runs it through `janitor::fisher.test.tabyl`, otherwise +#' it just calls `stats::fisher.test()`. #' #' @return -#' The result is the same as the one of stats::fisher.test. +#' The same as the one of `stats::fisher.test()`. #' -#' @param x a two-way tabyl, a numeric vector or a factor -#' @param ... other parameters passed to stats::fisher.test +#' @param x A two-way tabyl, a numeric vector or a factor +#' @param ... Parameters passed to [stats::fisher.test()] #' #' @examples #' tab <- tabyl(mtcars, gear, cyl) diff --git a/R/tabyl.R b/R/tabyl.R index 03576757..168420a5 100644 --- a/R/tabyl.R +++ b/R/tabyl.R @@ -1,20 +1,20 @@ -#' @title Generate a frequency table (1-, 2-, or 3-way). +#' Generate a frequency table (1-, 2-, or 3-way). #' #' @description -#' A fully-featured alternative to \code{table()}. Results are data.frames and can be formatted and enhanced with janitor's family of \code{adorn_} functions. +#' A fully-featured alternative to `table()`. Results are data.frames and can be formatted and enhanced with janitor's family of `adorn_` functions. #' #' Specify a data.frame and the one, two, or three unquoted column names you want to tabulate. Three variables generates a list of 2-way tabyls, split by the third variable. #' -#' Alternatively, you can tabulate a single variable that isn't in a data.frame by calling \code{tabyl} on a vector, e.g., \code{tabyl(mtcars$gear)}. +#' Alternatively, you can tabulate a single variable that isn't in a data.frame by calling `tabyl` on a vector, e.g., `tabyl(mtcars$gear)`. #' -#' @param dat a data.frame containing the variables you wish to count. Or, a vector you want to tabulate. +#' @param dat a `data.frame` containing the variables you wish to count. Or, a vector you want to tabulate. #' @param var1 the column name of the first variable. #' @param var2 (optional) the column name of the second variable (the rows in a 2-way tabulation). #' @param var3 (optional) the column name of the third variable (the list in a 3-way tabulation). -#' @param show_na should counts of \code{NA} values be displayed? In a one-way tabyl, the presence of \code{NA} values triggers an additional column showing valid percentages(calculated excluding \code{NA} values). +#' @param show_na should counts of `NA` values be displayed? In a one-way tabyl, the presence of `NA` values triggers an additional column showing valid percentages(calculated excluding `NA` values). #' @param show_missing_levels should counts of missing levels of factors be displayed? These will be rows and/or columns of zeroes. Useful for keeping consistent output dimensions even when certain factor levels may not be present in the data. #' @param ... the arguments to tabyl (here just for the sake of documentation compliance, as all arguments are listed with the vector- and data.frame-specific methods) -#' @return Returns a data.frame with frequencies and percentages of the tabulated variable(s). A 3-way tabulation returns a list of data.frames. +#' @return A data.frame with frequencies and percentages of the tabulated variable(s). A 3-way tabulation returns a list of data.frames. #' @export #' @examples #' diff --git a/R/top_levels.R b/R/top_levels.R index 325f7ea3..4deae4de 100644 --- a/R/top_levels.R +++ b/R/top_levels.R @@ -1,12 +1,15 @@ -#' @title Generate a frequency table of a factor grouped into top-n, bottom-n, and all other levels. +#' Generate a frequency table of a factor grouped into top-n, bottom-n, and all +#' other levels. #' -#' @description #' Get a frequency table of a factor variable, grouped into categories by level. #' #' @param input_vec the factor variable to tabulate. #' @param n number of levels to include in top and bottom groups #' @param show_na should cases where the variable is NA be shown? -#' @return Returns a data.frame (actually a \code{tbl_df}) with the frequencies of the grouped, tabulated variable. Includes counts and percentages, and valid percentages (calculated omitting \code{NA} values, if present in the vector and \code{show_na = TRUE}.) +#' @return a data.frame (actually a `tbl_df`) with the frequencies of the +#' grouped, tabulated variable. Includes counts and percentages, and valid +#' percentages (calculated omitting `NA` values, if present in the vector and +#' `show_na = TRUE`.) #' @export #' @examples #' top_levels(as.factor(mtcars$hp), 2) diff --git a/R/utils.R b/R/utils.R index 6d98c33f..ea8af53f 100644 --- a/R/utils.R +++ b/R/utils.R @@ -2,7 +2,7 @@ #' Pipe operator #' -#' @description Exported from the magrittr package. To learn more, run \code{?magrittr::`\%>\%`}. +#' @description Exported from the magrittr package. To learn more, run `?magrittr::`\%>\%``. #' #' @name %>% #' @rdname pipe diff --git a/README.Rmd b/README.Rmd index 89dbb20a..62aab17f 100644 --- a/README.Rmd +++ b/README.Rmd @@ -24,7 +24,7 @@ options(width = 110) [![R-CMD-check](https://github.com/sfirke/janitor/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sfirke/janitor/actions/workflows/R-CMD-check.yaml) -[![Coverage Status](https://img.shields.io/codecov/c/github/sfirke/janitor/master.svg)](https://app.codecov.io/github/sfirke/janitor?branch=master) +[![Coverage Status](https://img.shields.io/codecov/c/github/sfirke/janitor/main.svg)](https://app.codecov.io/github/sfirke/janitor?branch=main) [![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-ago/janitor)](https://cran.r-project.org/package=janitor) ![!Monthly Downloads](https://cranlogs.r-pkg.org/badges/janitor) @@ -69,7 +69,7 @@ Below are quick examples of how janitor tools are commonly used. ### Cleaning dirty data -Take this roster of teachers at a fictional American high school, stored in the Microsoft Excel file [dirty_data.xlsx](https://github.com/sfirke/janitor/blob/master/dirty_data.xlsx): +Take this roster of teachers at a fictional American high school, stored in the Microsoft Excel file [dirty_data.xlsx](https://github.com/sfirke/janitor/blob/main/dirty_data.xlsx): ![All kinds of dirty.](man/figures/dirty_data.PNG) Dirtiness includes: diff --git a/README.md b/README.md index cf92f2f0..edebcbde 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ [![R-CMD-check](https://github.com/sfirke/janitor/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sfirke/janitor/actions/workflows/R-CMD-check.yaml) [![Coverage -Status](https://img.shields.io/codecov/c/github/sfirke/janitor/master.svg)](https://app.codecov.io/github/sfirke/janitor?branch=master) +Status](https://img.shields.io/codecov/c/github/sfirke/janitor/main.svg)](https://app.codecov.io/github/sfirke/janitor?branch=main) [![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-ago/janitor)](https://cran.r-project.org/package=janitor) ![!Monthly Downloads](https://cranlogs.r-pkg.org/badges/janitor) @@ -80,7 +80,7 @@ Below are quick examples of how janitor tools are commonly used. Take this roster of teachers at a fictional American high school, stored in the Microsoft Excel file -[dirty_data.xlsx](https://github.com/sfirke/janitor/blob/master/dirty_data.xlsx): +[dirty_data.xlsx](https://github.com/sfirke/janitor/blob/main/dirty_data.xlsx): ![All kinds of dirty.](man/figures/dirty_data.PNG) Dirtiness includes: diff --git a/index.Rmd b/index.Rmd index 173e15d3..42f58d98 100644 --- a/index.Rmd +++ b/index.Rmd @@ -24,8 +24,8 @@ options(width = 110) *********************** -[![Travis-CI Build Status](https://travis-ci.org/sfirke/janitor.svg?branch=master)](https://travis-ci.org/sfirke/janitor) -[![Coverage Status](https://img.shields.io/codecov/c/github/sfirke/janitor/master.svg)](https://codecov.io/github/sfirke/janitor?branch=master) +[![Travis-CI Build Status](https://travis-ci.org/sfirke/janitor.svg?branch=main)](https://travis-ci.org/sfirke/janitor) +[![Coverage Status](https://img.shields.io/codecov/c/github/sfirke/janitor/main.svg)](https://codecov.io/github/sfirke/janitor?branch=main) [![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-ago/janitor)](https://cran.r-project.org/package=janitor) ![!Monthly Downloads](https://cranlogs.r-pkg.org/badges/janitor) @@ -70,7 +70,8 @@ Below are quick examples of how janitor tools are commonly used. ### Cleaning dirty data -Take this roster of teachers at a fictional American high school, stored in the Microsoft Excel file [dirty_data.xlsx](https://github.com/sfirke/janitor/blob/master/dirty_data.xlsx): + +Take this roster of teachers at a fictional American high school, stored in the Microsoft Excel file [dirty_data.xlsx](https://github.com/sfirke/janitor/blob/main/dirty_data.xlsx): ![All kinds of dirty.](man/figures/dirty_data.PNG) Dirtiness includes: diff --git a/index.md b/index.md index 35947344..06901af0 100644 --- a/index.md +++ b/index.md @@ -15,9 +15,9 @@ ------------------------------------------------------------------------ [![Travis-CI Build -Status](https://travis-ci.org/sfirke/janitor.svg?branch=master)](https://travis-ci.org/sfirke/janitor) +Status](https://travis-ci.org/sfirke/janitor.svg?branch=main)](https://travis-ci.org/sfirke/janitor) [![Coverage -Status](https://img.shields.io/codecov/c/github/sfirke/janitor/master.svg)](https://codecov.io/github/sfirke/janitor?branch=master) +Status](https://img.shields.io/codecov/c/github/sfirke/janitor/main.svg)](https://codecov.io/github/sfirke/janitor?branch=main) [![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-ago/janitor)](https://cran.r-project.org/package=janitor) ![!Monthly Downloads](https://cranlogs.r-pkg.org/badges/janitor) @@ -78,7 +78,7 @@ Below are quick examples of how janitor tools are commonly used. Take this roster of teachers at a fictional American high school, stored in the Microsoft Excel file -[dirty_data.xlsx](https://github.com/sfirke/janitor/blob/master/dirty_data.xlsx): +[dirty_data.xlsx](https://github.com/sfirke/janitor/blob/main/dirty_data.xlsx): ![All kinds of dirty.](man/figures/dirty_data.PNG) Dirtiness includes: diff --git a/janitor.Rproj b/janitor.Rproj index eaa6b818..a8cdf15f 100644 --- a/janitor.Rproj +++ b/janitor.Rproj @@ -16,3 +16,5 @@ BuildType: Package PackageUseDevtools: Yes PackageInstallArgs: --no-multiarch --with-keep.source PackageRoxygenize: rd,collate,namespace + +SpellingDictionary: en_US diff --git a/man/add_totals_col.Rd b/man/add_totals_col.Rd index 65c38d0c..69fe98b6 100644 --- a/man/add_totals_col.Rd +++ b/man/add_totals_col.Rd @@ -17,3 +17,4 @@ Returns a data.frame with a totals column containing row-wise sums. \description{ This function is deprecated, use \code{adorn_totals} instead. } +\keyword{internal} diff --git a/man/add_totals_row.Rd b/man/add_totals_row.Rd index 60b37acf..e33098b0 100644 --- a/man/add_totals_row.Rd +++ b/man/add_totals_row.Rd @@ -19,3 +19,4 @@ Returns a data.frame with a totals row, consisting of "Total" in the first colum \description{ This function is deprecated, use \code{adorn_totals} instead. } +\keyword{internal} diff --git a/man/adorn_ns.Rd b/man/adorn_ns.Rd index 37ea4b09..4e7ba233 100644 --- a/man/adorn_ns.Rd +++ b/man/adorn_ns.Rd @@ -19,9 +19,9 @@ adorn_ns( \item{position}{should the N go in the front, or in the rear, of the percentage?} -\item{ns}{the Ns to append. The default is the "core" attribute of the input tabyl \code{dat}, where the original Ns of a two-way \code{tabyl} are stored. However, if your Ns are stored somewhere else, or you need to customize them beyond what can be done with `format_func`, you can supply them here.} +\item{ns}{the Ns to append. The default is the "core" attribute of the input tabyl \code{dat}, where the original Ns of a two-way \code{tabyl} are stored. However, if your Ns are stored somewhere else, or you need to customize them beyond what can be done with \code{format_func}, you can supply them here.} -\item{format_func}{a formatting function to run on the Ns. Consider defining with \code{base::format()}.} +\item{format_func}{a formatting function to run on the Ns. Consider defining with \code{\link[base:format]{base::format()}}.} \item{...}{columns to adorn. This takes a tidyselect specification. By default, all columns are adorned except for the first column and columns not of class \code{numeric}, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to \code{tabyl}.} } @@ -32,7 +32,6 @@ a data.frame with Ns appended This function adds back the underlying Ns to a \code{tabyl} whose percentages were calculated using \code{adorn_percentages()}, to display the Ns and percentages together. You can also call it on a non-tabyl data.frame to which you wish to append Ns. } \examples{ - mtcars \%>\% tabyl(am, cyl) \%>\% adorn_percentages("col") \%>\% diff --git a/man/adorn_pct_formatting.Rd b/man/adorn_pct_formatting.Rd index 163dc931..13a49c6a 100644 --- a/man/adorn_pct_formatting.Rd +++ b/man/adorn_pct_formatting.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/adorn_pct_formatting.R \name{adorn_pct_formatting} \alias{adorn_pct_formatting} -\title{Format a data.frame of decimals as percentages.} +\title{Format a \code{data.frame} of decimals as percentages.} \usage{ adorn_pct_formatting( dat, @@ -13,27 +13,41 @@ adorn_pct_formatting( ) } \arguments{ -\item{dat}{a data.frame with decimal values, typically the result of a call to \code{adorn_percentages} on a \code{tabyl}. If given a list of data.frames, this function will apply itself to each data.frame in the list (designed for 3-way \code{tabyl} lists).} +\item{dat}{a data.frame with decimal values, typically the result of a call +to \code{adorn_percentages} on a \code{tabyl}. If given a list of data.frames, this +function will apply itself to each data.frame in the list (designed for +3-way \code{tabyl} lists).} \item{digits}{how many digits should be displayed after the decimal point?} -\item{rounding}{method to use for rounding - either "half to even", the base R default method, or "half up", where 14.5 rounds up to 15.} +\item{rounding}{method to use for rounding - either "half to even", the base +R default method, or "half up", where 14.5 rounds up to 15.} \item{affix_sign}{should the \% sign be affixed to the end?} -\item{...}{columns to adorn. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are adorned, but this allows you to manually specify which columns should be adorned, for use on a data.frame that does not result from a call to \code{tabyl}.} +\item{...}{columns to adorn. This takes a tidyselect specification. By +default, all numeric columns (besides the initial column, if numeric) are +adorned, but this allows you to manually specify which columns should be +adorned, for use on a data.frame that does not result from a call to +\code{tabyl}.} } \value{ a data.frame with formatted percentages } \description{ -Numeric columns get multiplied by 100 and formatted as percentages according to user specifications. This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to adorn in the \code{...} argument. Non-numeric columns are always excluded. +Numeric columns get multiplied by 100 and formatted as +percentages according to user specifications. This function defaults to +excluding the first column of the input data.frame, assuming that it contains +a descriptive variable, but this can be overridden by specifying the columns +to adorn in the \code{...} argument. Non-numeric columns are always excluded. -The decimal separator character is the result of \code{getOption("OutDec")}, which is based on the user's locale. If the default behavior is undesirable, -change this value ahead of calling the function, either by changing locale or with \code{options(OutDec = ",")}. This aligns the decimal separator character with that used in \code{base::print()}. +The decimal separator character is the result of \code{getOption("OutDec")}, which +is based on the user's locale. If the default behavior is undesirable, +change this value ahead of calling the function, either by changing locale or +with \code{options(OutDec = ",")}. This aligns the decimal separator character +with that used in \code{base::print()}. } \examples{ - mtcars \%>\% tabyl(am, cyl) \%>\% adorn_percentages("col") \%>\% diff --git a/man/adorn_rounding.Rd b/man/adorn_rounding.Rd index 12696485..5da5df7b 100644 --- a/man/adorn_rounding.Rd +++ b/man/adorn_rounding.Rd @@ -50,5 +50,5 @@ cases <- data.frame( cases \%>\% adorn_percentages(, , ends_with("ed")) \%>\% - adorn_rounding(, , one_of(c("recovered", "died"))) + adorn_rounding(, , all_of(c("recovered", "died"))) } diff --git a/man/adorn_totals.Rd b/man/adorn_totals.Rd index b96c75d3..ce16ccfd 100644 --- a/man/adorn_totals.Rd +++ b/man/adorn_totals.Rd @@ -11,7 +11,7 @@ adorn_totals(dat, where = "row", fill = "-", na.rm = TRUE, name = "Total", ...) \item{where}{one of "row", "col", or \code{c("row", "col")}} -\item{fill}{if there are non-numeric columns, what should fill the bottom row of those columns? If a string, relevant columns will be coerced to character. If `NA` then column types are preserved.} +\item{fill}{if there are non-numeric columns, what should fill the bottom row of those columns? If a string, relevant columns will be coerced to character. If \code{NA} then column types are preserved.} \item{na.rm}{should missing values (including NaN) be omitted from the calculations?} @@ -20,7 +20,7 @@ adorn_totals(dat, where = "row", fill = "-", na.rm = TRUE, name = "Total", ...) \item{...}{columns to total. This takes a tidyselect specification. By default, all numeric columns (besides the initial column, if numeric) are included in the totals, but this allows you to manually specify which columns should be included, for use on a data.frame that does not result from a call to \code{tabyl}.} } \value{ -Returns a data.frame augmented with a totals row, column, or both. The data.frame is now also of class \code{tabyl} and stores information about the attached totals and underlying data in the tabyl attributes. +a data.frame augmented with a totals row, column, or both. The data.frame is now also of class \code{tabyl} and stores information about the attached totals and underlying data in the tabyl attributes. } \description{ This function defaults to excluding the first column of the input data.frame, assuming that it contains a descriptive variable, but this can be overridden by specifying the columns to be totaled in the \code{...} argument. Non-numeric columns are converted to character class and have a user-specified fill character inserted in the totals row. diff --git a/man/as_tabyl.Rd b/man/as_tabyl.Rd index 66063731..fc5480c0 100644 --- a/man/as_tabyl.Rd +++ b/man/as_tabyl.Rd @@ -7,30 +7,46 @@ as_tabyl(dat, axes = 2, row_var_name = NULL, col_var_name = NULL) } \arguments{ -\item{dat}{a data.frame with variable values in the first column and numeric values in all other columns.} +\item{dat}{a data.frame with variable values in the first column and numeric +values in all other columns.} -\item{axes}{is this a two_way tabyl or a one_way tabyl? If this function is being called by a user, this should probably be "2". One-way tabyls are created by \code{tabyl} but are a special case.} +\item{axes}{is this a two_way tabyl or a one_way tabyl? If this function is +being called by a user, this should probably be "2". One-way tabyls are +created by \code{tabyl} but are a special case.} -\item{row_var_name}{(optional) the name of the variable in the row dimension; used by \code{adorn_title()}.} +\item{row_var_name}{(optional) the name of the variable in the row dimension; +used by \code{adorn_title()}.} -\item{col_var_name}{(optional) the name of the variable in the column dimension; used by \code{adorn_title()}.} +\item{col_var_name}{(optional) the name of the variable in the column +dimension; used by \code{adorn_title()}.} } \value{ -Returns the same data.frame, but with the additional class of "tabyl" and the attribute "core". +Returns the same data.frame, but with the additional class of "tabyl" +and the attribute "core". } \description{ -A \code{tabyl} is a data.frame containing counts of a variable or co-occurrences of two variables (a.k.a., a contingency table or crosstab). This specialized kind of data.frame has attributes that enable \code{adorn_} functions to be called for precise formatting and presentation of results. E.g., display results as a mix of percentages, Ns, add totals rows or columns, rounding options, in the style of Microsoft Excel PivotTable. - -A \code{tabyl} can be the result of a call to \code{janitor::tabyl()}, in which case these attributes are added automatically. This function adds \code{tabyl} class attributes to a data.frame that isn't the result of a call to \code{tabyl} but meets the requirements of a two-way tabyl: -1) First column contains values of variable 1 -2) Column names 2:n are the values of variable 2 -3) Numeric values in columns 2:n are counts of the co-occurrences of the two variables.* - -* = this is the ideal form of a tabyl, but janitor's \code{adorn_} functions tolerate and ignore non-numeric columns in positions 2:n. +A \code{tabyl} is a data.frame containing counts of a variable or +co-occurrences of two variables (a.k.a., a contingency table or crosstab). +This specialized kind of data.frame has attributes that enable \code{adorn_} +functions to be called for precise formatting and presentation of results. +E.g., display results as a mix of percentages, Ns, add totals rows or +columns, rounding options, in the style of Microsoft Excel PivotTable. + +A \code{tabyl} can be the result of a call to \code{janitor::tabyl()}, in which case +these attributes are added automatically. This function adds \code{tabyl} class +attributes to a data.frame that isn't the result of a call to \code{tabyl} but +meets the requirements of a two-way tabyl: 1) First column contains values of +variable 1 2) Column names 2:n are the values of variable 2 3) Numeric values +in columns 2:n are counts of the co-occurrences of the two variables.* +\itemize{ +\item = this is the ideal form of a tabyl, but janitor's \code{adorn_} functions tolerate and ignore non-numeric columns in positions 2:n. +} -For instance, the result of \code{dplyr::count()} followed by \code{tidyr::spread()} can be treated as a \code{tabyl}. +For instance, the result of \code{\link[dplyr:count]{dplyr::count()}} followed by \code{\link[tidyr:spread]{tidyr::spread()}} +can be treated as a \code{tabyl}. -The result of calling \code{tabyl()} on a single variable is a special class of one-way tabyl; this function only pertains to the two-way tabyl. +The result of calling \code{tabyl()} on a single variable is a special class of +one-way tabyl; this function only pertains to the two-way tabyl. } \examples{ as_tabyl(mtcars) diff --git a/man/chisq.test.Rd b/man/chisq.test.Rd index 029bed38..baac01a5 100644 --- a/man/chisq.test.Rd +++ b/man/chisq.test.Rd @@ -4,7 +4,7 @@ \alias{chisq.test} \alias{chisq.test.default} \alias{chisq.test.tabyl} -\title{Apply stats::chisq.test to a two-way tabyl} +\title{Apply \code{stats::chisq.test()} to a two-way tabyl} \usage{ chisq.test(x, ...) @@ -15,21 +15,21 @@ chisq.test(x, ...) \arguments{ \item{x}{a two-way tabyl, a numeric vector or a factor} -\item{...}{other parameters passed to stats::chisq.test} +\item{...}{other parameters passed to \code{\link[stats:chisq.test]{stats::chisq.test()}}} \item{y}{if x is a vector, must be another vector or factor of the same length} -\item{tabyl_results}{if TRUE and x is a tabyl object, also return `observed`, `expected`, `residuals` and `stdres` as tabyl} +\item{tabyl_results}{if TRUE and x is a tabyl object, also return \code{observed}, \code{expected}, \code{residuals} and \code{stdres} as tabyl} } \value{ -The result is the same as the one of stats::chisq.test. If `tabyl_results` -is TRUE, the returned tables `observed`, `expected`, `residuals` and `stdres` -are converted to tabyls. +The result is the same as the one of \code{stats::chisq.test()}. +If \code{tabyl_results} is \code{TRUE}, the returned tables \code{observed}, \code{expected}, +\code{residuals} and \code{stdres} are converted to tabyls. } \description{ -This generic function overrides stats::chisq.test. If the passed table +This generic function overrides \code{stats::chisq.test}. If the passed table is a two-way tabyl, it runs it through janitor::chisq.test.tabyl, otherwise -it just calls stats::chisq.test. +it just calls \code{stats::chisq.test()}. } \examples{ tab <- tabyl(mtcars, gear, cyl) diff --git a/man/clean_names.Rd b/man/clean_names.Rd index 13568968..23579ffb 100644 --- a/man/clean_names.Rd +++ b/man/clean_names.Rd @@ -19,7 +19,7 @@ clean_names(dat, ...) \method{clean_names}{tbl_lazy}(dat, ...) } \arguments{ -\item{dat}{the input data.frame.} +\item{dat}{The input \code{data.frame}.} \item{...}{ Arguments passed on to \code{\link[=make_clean_names]{make_clean_names}} @@ -29,7 +29,7 @@ passed to \code{snakecase::to_any_case()} with the exception of "old_janitor", which exists only to support legacy code (it preserves the behavior of \code{clean_names()} prior to addition of the "case" argument (janitor versions <= 0.3.1). "old_janitor" is not intended for new code. See -\code{\link[snakecase]{to_any_case}} for a wide variety of supported cases, +\code{\link[snakecase:to_any_case]{snakecase::to_any_case()}} for a wide variety of supported cases, including "sentence" and "title" case.} \item{\code{replace}}{A named character vector where the name is replaced by the value.} @@ -67,10 +67,10 @@ You should use this feature with care in case of \code{case = "parsed"}, \code{c }} } \value{ -Returns the data.frame with clean names. +A \code{data.frame} with clean names. } \description{ -Resulting names are unique and consist only of the \code{_} character, numbers, and letters. +Resulting names are unique and consist only of the \verb{_} character, numbers, and letters. Capitalization preferences can be specified using the \code{case} parameter. Accented characters are transliterated to ASCII. For example, an "o" with a @@ -78,9 +78,9 @@ German umlaut over it becomes "o", and the Spanish character "enye" becomes "n". This function takes and returns a data.frame, for ease of piping with -\code{`\%>\%`}. For the underlying function that works on a character vector -of names, see \code{\link[janitor]{make_clean_names}}. \code{clean_names} -relies on the versatile function \code{\link[snakecase]{to_any_case}}, which +\verb{\%>\%}. For the underlying function that works on a character vector +of names, see \code{\link[=make_clean_names]{make_clean_names()}}. \code{clean_names} +relies on the versatile function \code{\link[snakecase:to_any_case]{snakecase::to_any_case()}}, which accepts many arguments. See that function's documentation for ideas on getting the most out of \code{clean_names}. A few examples are included below. @@ -94,11 +94,11 @@ which is a character vector mapping all known mu or micro Unicode code points } \details{ \code{clean_names()} is intended to be used on \code{data.frames} - and \code{data.frame}-like objects. For this reason there are methods to - support using \code{clean_names()} on \code{sf} and \code{tbl_graph} (from - \code{tidygraph}) objects as well as on database connections through - \code{dbplyr}. For cleaning other named objects like named lists - and vectors, use \code{make_clean_names()}. +and \code{data.frame}-like objects. For this reason there are methods to +support using \code{clean_names()} on \code{sf} and \code{tbl_graph} (from +\code{tidygraph}) objects as well as on database connections through +\code{dbplyr}. For cleaning other named objects like named lists +and vectors, use \code{make_clean_names()}. } \examples{ diff --git a/man/compare_df_cols.Rd b/man/compare_df_cols.Rd index b2d8dad0..231a4859 100644 --- a/man/compare_df_cols.Rd +++ b/man/compare_df_cols.Rd @@ -32,13 +32,13 @@ Details section.} } \value{ A data.frame with a column named "column_name" with a value named - after the input data.frames' column names, and then one column per - data.frame (named after the input data.frame). If more than one input has - the same column name, the column naming will have suffixes defined by - sequential use of \code{base::merge()} and may differ from expected naming. - The rows within the data.frame-named columns are descriptions of the - classes of the data within the columns (generated by - \code{describe_class}). +after the input data.frames' column names, and then one column per +data.frame (named after the input data.frame). If more than one input has +the same column name, the column naming will have suffixes defined by +sequential use of \code{base::merge()} and may differ from expected naming. +The rows within the data.frame-named columns are descriptions of the +classes of the data within the columns (generated by +\code{describe_class}). } \description{ Generate a comparison of data.frames (or similar objects) that indicates if @@ -46,15 +46,15 @@ they will successfully bind together by rows. } \details{ Due to the returned "column_name" column, no input data.frame may be - named "column_name". +named "column_name". - The \code{strict_description} argument is most typically used to understand - if factor levels match or are bindable. Factors are typically bindable, - but the behavior of what happens when they bind differs based on the - binding method ("bind_rows" or "rbind"). Even when - \code{strict_description} is \code{FALSE}, data.frames may still bind - because some classes (like factors and characters) can bind even if they - appear to differ. +The \code{strict_description} argument is most typically used to understand +if factor levels match or are bindable. Factors are typically bindable, +but the behavior of what happens when they bind differs based on the +binding method ("bind_rows" or "rbind"). Even when +\code{strict_description} is \code{FALSE}, data.frames may still bind +because some classes (like factors and characters) can bind even if they +appear to differ. } \examples{ compare_df_cols(data.frame(A = 1), data.frame(B = 2)) diff --git a/man/compare_df_cols_same.Rd b/man/compare_df_cols_same.Rd index 568e25d0..5bbb9d0b 100644 --- a/man/compare_df_cols_same.Rd +++ b/man/compare_df_cols_same.Rd @@ -26,7 +26,7 @@ missing from a data.frame would be considered a mismatch (as in } \value{ \code{TRUE} if row binding will succeed or \code{FALSE} if it will - fail. +fail. } \description{ Check whether a set of data.frames are row-bindable. Calls diff --git a/man/convert_to_NA.Rd b/man/convert_to_NA.Rd index 69017d57..cc2aa74e 100644 --- a/man/convert_to_NA.Rd +++ b/man/convert_to_NA.Rd @@ -24,3 +24,4 @@ Converts instances of user-specified strings into \code{NA}. Can operate on eit \seealso{ janitor_deprecated } +\keyword{internal} diff --git a/man/convert_to_date.Rd b/man/convert_to_date.Rd index de326b3f..4765d08a 100644 --- a/man/convert_to_date.Rd +++ b/man/convert_to_date.Rd @@ -25,21 +25,21 @@ convert_to_datetime( \item{x}{The object to convert} \item{...}{Passed to further methods. Eventually may be passed to -`excel_numeric_to_date()`, `base::as.POSIXct()`, or `base::as.Date()`.} +\code{excel_numeric_to_date()}, \code{base::as.POSIXct()}, or \code{base::as.Date()}.} \item{character_fun}{A function to convert non-numeric-looking, non-NA values -in `x` to POSIXct objects.} +in \code{x} to POSIXct objects.} \item{string_conversion_failure}{If a character value fails to parse into the -desired class and instead returns `NA`, should the function return the +desired class and instead returns \code{NA}, should the function return the result with a warning or throw an error?} \item{tz}{The timezone for POSIXct output, unless an object is POSIXt already. Ignored for Date output.} } \value{ -POSIXct objects for `convert_to_datetime()` or Date objects for - `convert_to_date()`. +POSIXct objects for \code{convert_to_datetime()} or Date objects for +\code{convert_to_date()}. } \description{ Convert many date and datetime formats as may be received from Microsoft @@ -47,10 +47,10 @@ Excel } \details{ Character conversion checks if it matches something that looks like - a Microsoft Excel numeric date, converts those to numeric, and then runs - convert_to_datetime_helper() on those numbers. Then, character to Date or - POSIXct conversion occurs via `character_fun(x, ...)` or - `character_fun(x, tz=tz, ...)`, respectively. +a Microsoft Excel numeric date, converts those to numeric, and then runs +convert_to_datetime_helper() on those numbers. Then, character to Date or +POSIXct conversion occurs via \code{character_fun(x, ...)} or +\code{character_fun(x, tz=tz, ...)}, respectively. } \section{Functions}{ \itemize{ diff --git a/man/describe_class.Rd b/man/describe_class.Rd index 2cb364b6..2717e693 100644 --- a/man/describe_class.Rd +++ b/man/describe_class.Rd @@ -17,22 +17,22 @@ describe_class(x, strict_description = TRUE) \item{strict_description}{Should differing factor levels be treated as differences for the purposes of identifying mismatches? -\code{strict_description = `TRUE`} is stricter and factors with different +\code{strict_description = TRUE} is stricter and factors with different levels will be treated as different classes. \code{FALSE} is more lenient: for class comparison purposes, the variable is just a "factor".} } \value{ A character scalar describing the class(es) of an object where if the - scalar will match, columns in a data.frame (or similar object) should bind - together without issue. +scalar will match, columns in a data.frame (or similar object) should bind +together without issue. } \description{ Describe the class(es) of an object } \details{ For package developers, an S3 generic method can be written for - \code{describe_class()} for custom classes that may need more definition - than the default method. This function is called by \code{compare_df_cols}. +\code{describe_class()} for custom classes that may need more definition +than the default method. This function is called by \code{compare_df_cols}. } \section{Methods (by class)}{ \itemize{ diff --git a/man/excel_numeric_to_date.Rd b/man/excel_numeric_to_date.Rd index d1932b02..22b49409 100644 --- a/man/excel_numeric_to_date.Rd +++ b/man/excel_numeric_to_date.Rd @@ -15,8 +15,7 @@ excel_numeric_to_date( \arguments{ \item{date_num}{numeric vector of serial numbers to convert.} -\item{date_system}{the date system, either \code{"modern"} or \code{"mac -pre-2011"}.} +\item{date_system}{the date system, either \code{"modern"} or \code{"mac pre-2011"}.} \item{include_time}{Include the time (hours, minutes, seconds) in the output? (See details)} @@ -29,12 +28,11 @@ more information on timezones).} } \value{ Returns a vector of class Date if \code{include_time} is - \code{FALSE}. Returns a vector of class POSIXlt if \code{include_time} is - \code{TRUE}. +\code{FALSE}. Returns a vector of class POSIXlt if \code{include_time} is +\code{TRUE}. } \description{ -Converts numbers like \code{42370} into date values like -\code{2016-01-01}. +Converts numbers like \code{42370} into date values like \code{2016-01-01}. Defaults to the modern Excel date encoding system. However, Excel for Mac 2008 and earlier Mac versions of Excel used a different date system. To @@ -47,13 +45,13 @@ A list of all timezones is available from \code{base::OlsonNames()}, and the current timezone is available from \code{base::Sys.timezone()}. If your input data has a mix of Excel numeric dates and actual dates, see the -more powerful functions \code{convert_to_date()} and \code{convert_to_datetime()}. +more powerful functions \code{\link[=convert_to_date]{convert_to_date()}} and \code{convert_to_datetime()}. } \details{ When using \code{include_time=TRUE}, days with leap seconds will not - be accurately handled as they do not appear to be accurately handled by - Windows (as described in - https://support.microsoft.com/en-us/help/2722715/support-for-the-leap-second). +be accurately handled as they do not appear to be accurately handled by +Windows (as described in +https://support.microsoft.com/en-us/help/2722715/support-for-the-leap-second). } \examples{ excel_numeric_to_date(40000) diff --git a/man/fisher.test.Rd b/man/fisher.test.Rd index d815136c..dd54d990 100644 --- a/man/fisher.test.Rd +++ b/man/fisher.test.Rd @@ -4,7 +4,7 @@ \alias{fisher.test} \alias{fisher.test.default} \alias{fisher.test.tabyl} -\title{Apply stats::fisher.test to a two-way tabyl} +\title{Apply \code{stats::fisher.test()} to a two-way tabyl} \usage{ fisher.test(x, ...) @@ -13,19 +13,19 @@ fisher.test(x, ...) \method{fisher.test}{tabyl}(x, ...) } \arguments{ -\item{x}{a two-way tabyl, a numeric vector or a factor} +\item{x}{A two-way tabyl, a numeric vector or a factor} -\item{...}{other parameters passed to stats::fisher.test} +\item{...}{Parameters passed to \code{\link[stats:fisher.test]{stats::fisher.test()}}} \item{y}{if x is a vector, must be another vector or factor of the same length} } \value{ -The result is the same as the one of stats::fisher.test. +The same as the one of \code{stats::fisher.test()}. } \description{ -This generic function overrides stats::fisher.test. If the passed table -is a two-way tabyl, it runs it through janitor::fisher.test.tabyl, otherwise -it just calls stats::fisher.test. +This generic function overrides \code{\link[stats:fisher.test]{stats::fisher.test()}}. If the passed table +is a two-way tabyl, it runs it through \code{janitor::fisher.test.tabyl}, otherwise +it just calls \code{stats::fisher.test()}. } \examples{ tab <- tabyl(mtcars, gear, cyl) diff --git a/man/get_dupes.Rd b/man/get_dupes.Rd index c50a058d..76136b85 100644 --- a/man/get_dupes.Rd +++ b/man/get_dupes.Rd @@ -7,15 +7,21 @@ get_dupes(dat, ...) } \arguments{ -\item{dat}{The input data.frame.} +\item{dat}{The input \code{data.frame}.} -\item{...}{Unquoted variable names to search for duplicates. This takes a tidyselect specification.} +\item{...}{Unquoted variable names to search for duplicates. This takes a +tidyselect specification.} } \value{ -Returns a data.frame with the full records where the specified variables have duplicated values, as well as a variable \code{dupe_count} showing the number of rows sharing that combination of duplicated values. If the input data.frame was of class \code{tbl_df}, the output is as well. +A data.frame with the full records where the specified +variables have duplicated values, as well as a variable \code{dupe_count} +showing the number of rows sharing that combination of duplicated values. +If the input data.frame was of class \code{tbl_df}, the output is as well. } \description{ -For hunting duplicate records during data cleaning. Specify the data.frame and the variable combination to search for duplicates and get back the duplicated rows. +For hunting duplicate records during data cleaning. Specify the data.frame +and the variable combination to search for duplicates and get back the +duplicated rows. } \examples{ get_dupes(mtcars, mpg, hp) diff --git a/man/get_one_to_one.Rd b/man/get_one_to_one.Rd index 2f303dde..07f4deca 100644 --- a/man/get_one_to_one.Rd +++ b/man/get_one_to_one.Rd @@ -7,11 +7,11 @@ get_one_to_one(dat) } \arguments{ -\item{dat}{A data.frame or similar object} +\item{dat}{A \code{data.frame} or similar object} } \value{ A list with one element for each group of columns that map - identically to each other. +identically to each other. } \description{ Find the list of columns that have a 1:1 mapping to each other diff --git a/man/janitor.Rd b/man/janitor.Rd index 986a3109..3b708139 100644 --- a/man/janitor.Rd +++ b/man/janitor.Rd @@ -13,7 +13,6 @@ The main janitor functions can: perfectly format ugly \code{data.frame} column n duplicate records for further study; and provide quick one- and two-variable tabulations (i.e., frequency tables and crosstabs) that improve on the base R function \code{table()}. - Other functions in the package can format for reporting the results of these tabulations. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. } @@ -21,8 +20,7 @@ These tabulate-and-report functions approximate popular features of SPSS and Mic \section{Package context}{ This package follows the principles of the "tidyverse" and in particular works well with -the \code{\%>\%} pipe function. - +the \verb{\\\%>\\\%} pipe function. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced users can already do everything diff --git a/man/janitor_deprecated.Rd b/man/janitor_deprecated.Rd index b8edf248..d1591a6d 100644 --- a/man/janitor_deprecated.Rd +++ b/man/janitor_deprecated.Rd @@ -8,13 +8,13 @@ These functions have already become defunct or may be defunct as soon as the nex } \details{ \itemize{ - \item \code{\link{adorn_crosstab}} - \item \code{\link{crosstab}} - \item \code{\link{use_first_valid_of}} - \item \code{\link{convert_to_NA}} - \item \code{\link{add_totals_col}} - \item \code{\link{add_totals_row}} - \item \code{\link{remove_empty_rows}} - \item \code{\link{remove_empty_cols}} +\item \code{\link[=adorn_crosstab]{adorn_crosstab()}} +\item \code{\link[=crosstab]{crosstab()}} +\item \code{\link[=use_first_valid_of]{use_first_valid_of()}} +\item \code{\link[=convert_to_NA]{convert_to_NA()}} +\item \code{\link[=add_totals_col]{add_totals_col()}} +\item \code{\link[=add_totals_row]{add_totals_row()}} +\item \code{\link[=remove_empty_rows]{remove_empty_rows()}} +\item \code{\link[=remove_empty_cols]{remove_empty_cols()}} } } diff --git a/man/make_clean_names.Rd b/man/make_clean_names.Rd index f96d71b1..2486da61 100644 --- a/man/make_clean_names.Rd +++ b/man/make_clean_names.Rd @@ -26,7 +26,7 @@ passed to \code{snakecase::to_any_case()} with the exception of "old_janitor", which exists only to support legacy code (it preserves the behavior of \code{clean_names()} prior to addition of the "case" argument (janitor versions <= 0.3.1). "old_janitor" is not intended for new code. See -\code{\link[snakecase]{to_any_case}} for a wide variety of supported cases, +\code{\link[snakecase:to_any_case]{snakecase::to_any_case()}} for a wide variety of supported cases, including "sentence" and "title" case.} \item{replace}{A named character vector where the name is replaced by the @@ -92,16 +92,16 @@ by the supplied string to this argument.} Returns the "cleaned" character vector. } \description{ -Resulting strings are unique and consist only of the \code{_} +Resulting strings are unique and consist only of the \verb{_} character, numbers, and letters. By default, the resulting strings will only consist of ASCII characters, but non-ASCII (e.g. Unicode) may be allowed by -setting \code{ascii=FALSE}. Capitalization preferences can be specified +setting \code{ascii = FALSE}. Capitalization preferences can be specified using the \code{case} parameter. -For use on the names of a data.frame, e.g., in a \code{`\%>\%`} pipeline, -call the convenience function \code{\link[janitor]{clean_names}}. +For use on the names of a data.frame, e.g., in a \verb{\%>\%} pipeline, +call the convenience function \code{\link[=clean_names]{clean_names()}}. -When \code{ascii=TRUE} (the default), accented characters are transliterated +When \code{ascii = TRUE} (the default), accented characters are transliterated to ASCII. For example, an "o" with a German umlaut over it becomes "o", and the Spanish character "enye" becomes "n". @@ -113,7 +113,7 @@ to resolve any duplicated names. This function relies on \code{snakecase::to_any_case} and can take advantage of its versatility. For instance, an abbreviation like "ID" can have its capitalization preserved by passing the argument \code{abbreviations = "ID"}. -See the documentation for \code{\link[snakecase:to_any_case]{snakecase::to_any_case}} +See the documentation for \code{\link[snakecase:to_any_case]{snakecase::to_any_case()}} for more about how to use its features. On some systems, not all transliterators to ASCII are available. If this is @@ -143,5 +143,5 @@ make_clean_names(names(x), "small_camel") } \seealso{ -\code{\link[snakecase]{to_any_case}()} +\code{\link[snakecase:to_any_case]{snakecase::to_any_case()}} } diff --git a/man/paste_skip_na.Rd b/man/paste_skip_na.Rd index 1f41563b..e9aae27c 100644 --- a/man/paste_skip_na.Rd +++ b/man/paste_skip_na.Rd @@ -7,7 +7,7 @@ paste_skip_na(..., sep = " ", collapse = NULL) } \arguments{ -\item{..., sep, collapse}{See \code{?paste}} +\item{..., sep, collapse}{See \code{\link[base:paste]{base::paste()}}} } \value{ A character vector of pasted values. @@ -17,7 +17,7 @@ Like \code{paste()}, but missing values are omitted } \details{ If all values are missing, the value from the first argument is - preserved. +preserved. } \examples{ paste_skip_na(NA) # NA_character_ diff --git a/man/pipe.Rd b/man/pipe.Rd index 23b9f737..ee8964de 100644 --- a/man/pipe.Rd +++ b/man/pipe.Rd @@ -7,7 +7,7 @@ lhs \%>\% rhs } \description{ -Exported from the magrittr package. To learn more, run \code{?magrittr::`\%>\%`}. +Exported from the magrittr package. To learn more, run \verb{?magrittr::}\\%>\\%``. } \examples{ mtcars \%>\% diff --git a/man/remove_constant.Rd b/man/remove_constant.Rd index 248cdf4b..85e1b908 100644 --- a/man/remove_constant.Rd +++ b/man/remove_constant.Rd @@ -29,7 +29,7 @@ data.frame(A = 1, B = 1:3) \%>\% } \seealso{ \code{\link[=remove_empty]{remove_empty()}} for removing empty - columns or rows. +columns or rows. Other remove functions: \code{\link{remove_empty}()} diff --git a/man/remove_empty.Rd b/man/remove_empty.Rd index ece490be..35252f5a 100644 --- a/man/remove_empty.Rd +++ b/man/remove_empty.Rd @@ -9,7 +9,7 @@ remove_empty(dat, which = c("rows", "cols"), cutoff = 1, quiet = TRUE) \arguments{ \item{dat}{the input data.frame or matrix.} -\item{which}{one of "rows", "cols", or \code{c("rows", "cols")}. Where no +\item{which}{one of "rows", "cols", or \code{c("rows", "cols")}. Where no value of which is provided, defaults to removing both empty rows and empty columns, declaring the behavior with a printed message.} @@ -24,7 +24,7 @@ Returns the object without its missing rows or columns. } \description{ Removes all rows and/or columns from a data.frame or matrix that - are composed entirely of \code{NA} values. +are composed entirely of \code{NA} values. } \examples{ # not run: @@ -46,7 +46,7 @@ dd \%>\% } \seealso{ \code{\link[=remove_constant]{remove_constant()}} for removing - constant columns. +constant columns. Other remove functions: \code{\link{remove_constant}()} diff --git a/man/remove_empty_cols.Rd b/man/remove_empty_cols.Rd index 7d3a6ce0..35778141 100644 --- a/man/remove_empty_cols.Rd +++ b/man/remove_empty_cols.Rd @@ -19,3 +19,4 @@ This function is deprecated, use \code{remove_empty("cols")} instead. # not run: # dat \%>\% remove_empty_cols } +\keyword{internal} diff --git a/man/remove_empty_rows.Rd b/man/remove_empty_rows.Rd index 3f43eb0a..ebaf2192 100644 --- a/man/remove_empty_rows.Rd +++ b/man/remove_empty_rows.Rd @@ -19,3 +19,4 @@ This function is deprecated, use \code{remove_empty("rows")} instead. # not run: # dat \%>\% remove_empty_rows } +\keyword{internal} diff --git a/man/round_half_up.Rd b/man/round_half_up.Rd index 624a1d94..a79778dd 100644 --- a/man/round_half_up.Rd +++ b/man/round_half_up.Rd @@ -11,10 +11,19 @@ round_half_up(x, digits = 0) \item{digits}{how many digits should be displayed after the decimal point?} } +\value{ +A vector with the same length as \code{x} +} \description{ -In base R \code{round()}, halves are rounded to even, e.g., 12.5 and 11.5 are both rounded to 12. This function rounds 12.5 to 13 (assuming \code{digits = 0}). Negative halves are rounded away from zero, e.g., -0.5 is rounded to -1. +In base R \code{round()}, halves are rounded to even, e.g., 12.5 and +11.5 are both rounded to 12. This function rounds 12.5 to 13 (assuming +\code{digits = 0}). Negative halves are rounded away from zero, e.g., -0.5 is +rounded to -1. -This may skew subsequent statistical analysis of the data, but may be desirable in certain contexts. This function is implemented exactly from \url{https://stackoverflow.com/a/12688836}; see that question and comments for discussion of this issue. +This may skew subsequent statistical analysis of the data, but may be +desirable in certain contexts. This function is implemented exactly from +\url{https://stackoverflow.com/a/12688836}; see that question and comments for +discussion of this issue. } \examples{ round_half_up(12.5) diff --git a/man/round_to_fraction.Rd b/man/round_to_fraction.Rd index 8cbaaa74..a8544a8d 100644 --- a/man/round_to_fraction.Rd +++ b/man/round_to_fraction.Rd @@ -19,8 +19,8 @@ subsequent rounding)} } \value{ the input x rounded to a decimal value that has an integer numerator relative - to \code{denominator} (possibly subsequently rounded to a number of decimal - digits). +to \code{denominator} (possibly subsequently rounded to a number of decimal +digits). } \description{ Round a decimal to the precise decimal value of a specified @@ -38,9 +38,9 @@ The \code{digits} argument allows for rounding of the subsequent result. } \details{ If \code{digits} is \code{Inf}, \code{x} is rounded to the fraction - and then kept at full precision. If \code{digits} is \code{"auto"}, the - number of digits is automatically selected as - \code{ceiling(log10(denominator)) + 1}. +and then kept at full precision. If \code{digits} is \code{"auto"}, the +number of digits is automatically selected as +\code{ceiling(log10(denominator)) + 1}. } \examples{ round_to_fraction(1.6, denominator = 2) diff --git a/man/row_to_names.Rd b/man/row_to_names.Rd index 1cffba75..34e4e48a 100644 --- a/man/row_to_names.Rd +++ b/man/row_to_names.Rd @@ -32,7 +32,7 @@ resulting data.frame?} from the resulting data.frame?} \item{sep}{A character string to separate the values in the case of multiple -rows input to `row_number`.} +rows input to \code{row_number}.} } \value{ A data.frame with new names (and some rows removed, if specified) diff --git a/man/sas_numeric_to_date.Rd b/man/sas_numeric_to_date.Rd index 80dd8c2a..3dffc7af 100644 --- a/man/sas_numeric_to_date.Rd +++ b/man/sas_numeric_to_date.Rd @@ -20,7 +20,7 @@ more information on timezones).} } \value{ If a date and time or datetime are provided, a POSIXct object. If a - date is provided, a Date object. If a time is provided, an hms::hms object +date is provided, a Date object. If a time is provided, an hms::hms object } \description{ Convert a SAS date, time or date/time to an R object @@ -33,7 +33,7 @@ sas_numeric_to_date(time_num = 3600) # 01:00:00 } \references{ SAS Date, Time, and Datetime Values reference (retrieved on - 2022-03-08): https://v8doc.sas.com/sashtml/lrcon/zenid-63.htm +2022-03-08): https://v8doc.sas.com/sashtml/lrcon/zenid-63.htm } \seealso{ Other Date-time cleaning: diff --git a/man/single_value.Rd b/man/single_value.Rd index ed6e3a82..1c34ae87 100644 --- a/man/single_value.Rd +++ b/man/single_value.Rd @@ -18,7 +18,7 @@ error to assist with determining the location of the issue.} } \value{ \code{x} as the scalar single value found throughout (or an error if - more than one value is found). +more than one value is found). } \description{ Missing values are replaced with the single value, and if all values are diff --git a/man/tabyl.Rd b/man/tabyl.Rd index 337760ff..461ab9d5 100644 --- a/man/tabyl.Rd +++ b/man/tabyl.Rd @@ -13,7 +13,7 @@ tabyl(dat, ...) \method{tabyl}{data.frame}(dat, var1, var2, var3, show_na = TRUE, show_missing_levels = TRUE, ...) } \arguments{ -\item{dat}{a data.frame containing the variables you wish to count. Or, a vector you want to tabulate.} +\item{dat}{a \code{data.frame} containing the variables you wish to count. Or, a vector you want to tabulate.} \item{...}{the arguments to tabyl (here just for the sake of documentation compliance, as all arguments are listed with the vector- and data.frame-specific methods)} @@ -28,7 +28,7 @@ tabyl(dat, ...) \item{var3}{(optional) the column name of the third variable (the list in a 3-way tabulation).} } \value{ -Returns a data.frame with frequencies and percentages of the tabulated variable(s). A 3-way tabulation returns a list of data.frames. +A data.frame with frequencies and percentages of the tabulated variable(s). A 3-way tabulation returns a list of data.frames. } \description{ A fully-featured alternative to \code{table()}. Results are data.frames and can be formatted and enhanced with janitor's family of \code{adorn_} functions. diff --git a/man/top_levels.Rd b/man/top_levels.Rd index 93b8d2c6..0484d7b2 100644 --- a/man/top_levels.Rd +++ b/man/top_levels.Rd @@ -2,7 +2,8 @@ % Please edit documentation in R/top_levels.R \name{top_levels} \alias{top_levels} -\title{Generate a frequency table of a factor grouped into top-n, bottom-n, and all other levels.} +\title{Generate a frequency table of a factor grouped into top-n, bottom-n, and all +other levels.} \usage{ top_levels(input_vec, n = 2, show_na = FALSE) } @@ -14,7 +15,10 @@ top_levels(input_vec, n = 2, show_na = FALSE) \item{show_na}{should cases where the variable is NA be shown?} } \value{ -Returns a data.frame (actually a \code{tbl_df}) with the frequencies of the grouped, tabulated variable. Includes counts and percentages, and valid percentages (calculated omitting \code{NA} values, if present in the vector and \code{show_na = TRUE}.) +a data.frame (actually a \code{tbl_df}) with the frequencies of the +grouped, tabulated variable. Includes counts and percentages, and valid +percentages (calculated omitting \code{NA} values, if present in the vector and +\code{show_na = TRUE}.) } \description{ Get a frequency table of a factor variable, grouped into categories by level. diff --git a/man/untabyl.Rd b/man/untabyl.Rd index ce272bc8..aa8da644 100644 --- a/man/untabyl.Rd +++ b/man/untabyl.Rd @@ -7,10 +7,10 @@ untabyl(dat) } \arguments{ -\item{dat}{a data.frame of class \code{tabyl}.} +\item{dat}{a \code{data.frame} of class \code{tabyl}.} } \value{ -Returns the same data.frame, but without the \code{tabyl} class and attributes. +the same \code{data.frame}, but without the \code{tabyl} class and attributes. } \description{ Strips away all \code{tabyl}-related attributes from a data.frame. diff --git a/man/use_first_valid_of.Rd b/man/use_first_valid_of.Rd index 86143625..2466f43e 100644 --- a/man/use_first_valid_of.Rd +++ b/man/use_first_valid_of.Rd @@ -24,3 +24,4 @@ At each position of the input vectors, iterates through in order and returns the \seealso{ janitor_deprecated } +\keyword{internal}