Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tidy_distribution_summary_tbl() function #211

Closed
spsanderson opened this issue Jun 9, 2022 · 1 comment
Closed

Update tidy_distribution_summary_tbl() function #211

spsanderson opened this issue Jun 9, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@spsanderson
Copy link
Owner

Add ci_lo and ci_hi to tidy_distribution_summary_tbl()

@spsanderson spsanderson added the enhancement New feature or request label Jun 9, 2022
@spsanderson spsanderson added this to the TidyDensity v1.2.1 milestone Jun 9, 2022
@spsanderson spsanderson self-assigned this Jun 9, 2022
@spsanderson
Copy link
Owner Author

Function:

#' Tidy Distribution Summary Statistics Tibble
#'
#' @family Summary Statistics
#' @family Table Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @details This function takes in a `tidy_` distribution table and
#' will return a tibble of the following information:
#' -  `sim_number`
#' -  `mean_val`
#' -  `median_val`
#' -  `std_val`
#' -  `min_val`
#' -  `max_val`
#' -  `skewness`
#' -  `kurtosis`
#' -  `range`
#' -  `iqr`
#' -  `variance`
#'
#' The kurtosis and skewness come from the package `healthyR.ai`
#'
#' @description This function returns a summary statistics tibble. It will use the
#' y column from the `tidy_` distribution function.
#'
#' @param .data The data that is going to be passed from a a `tidy_` distribution
#' function.
#' @param ... This is the grouping variable that gets passed to [dplyr::group_by()]
#' and [dplyr::select()].
#'
#' @examples
#' library(dplyr)
#'
#' tn <- tidy_normal(.num_sims = 5)
#' tb <- tidy_beta(.num_sims = 5)
#'
#' tidy_distribution_summary_tbl(tn)
#' tidy_distribution_summary_tbl(tn, sim_number)
#'
#' data_tbl <- tidy_combine_distributions(tn, tb)
#'
#' tidy_distribution_summary_tbl(data_tbl)
#' tidy_distribution_summary_tbl(data_tbl, dist_type)
#'
#' @return
#' A summary stats tibble
#'
#' @export
#'

tidy_distribution_summary_tbl <- function(.data, ...) {
  
  # Get the data attributes
  atb <- attributes(.data)
  
  if (!"tibble_type" %in% names(atb) & !"tibble_type" %in% names(atb$all)) {
    rlang::abort("The data passed must come from a `tidy_` distribution function.")
  }
  
  data_tbl <- dplyr::as_tibble(.data)
  
  summary_tbl <- data_tbl %>%
    dplyr::group_by(...) %>%
    dplyr::select(..., y) %>%
    dplyr::summarise(
      mean_val = mean(y, na.rm = TRUE),
      median_val = stats::median(y, na.rm = TRUE),
      std_val = sd(y, na.rm = TRUE),
      min_val = min(y),
      max_val = max(y),
      skewness = tidy_skewness_vec(y),
      kurtosis = tidy_kurtosis_vec(y),
      range = tidy_range_statistic(y),
      iqr = stats::IQR(y),
      variance = stats::var(y),
      ci_low = ci_lo(y),
      ci_high = ci_hi(y)
    ) %>%
    dplyr::ungroup()
  
  return(summary_tbl)
}

Example:

> library(dplyr)
> 
> tn <- tidy_normal(.num_sims = 5)
> tb <- tidy_beta(.num_sims = 5)
> 
> tidy_distribution_summary_tbl(tn)
# A tibble: 1 × 12
  mean_val median_val std_val min_val max_val skewness kurtosis range   iqr
     <dbl>      <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl> <dbl>
1   0.0210     0.0807   0.990   -2.68    2.22  -0.0478     2.78  4.90  1.22
# … with 3 more variables: variance <dbl>, ci_low <dbl>, ci_high <dbl>
> tidy_distribution_summary_tbl(tn, sim_number)
# A tibble: 5 × 13
  sim_number mean_val median_val std_val min_val max_val skewness kurtosis range
  <fct>         <dbl>      <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl>
1 1            0.122      0.0777   0.895   -1.45    2.12   0.480      2.74  3.57
2 2            0.0456     0.153    0.945   -1.79    2.02   0.0614     2.51  3.81
3 3           -0.0754    -0.107    1.19    -2.68    1.99  -0.142      2.34  4.67
4 4           -0.167      0.0313   0.887   -2.17    1.95  -0.195      2.54  4.12
5 5            0.180      0.231    1.00    -2.12    2.22  -0.189      3.07  4.34
# … with 4 more variables: iqr <dbl>, variance <dbl>, ci_low <dbl>, ci_high <dbl>
> 
> data_tbl <- tidy_combine_distributions(tn, tb)
> 
> tidy_distribution_summary_tbl(data_tbl)
# A tibble: 1 × 12
  mean_val median_val std_val min_val max_val skewness kurtosis range   iqr
     <dbl>      <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl> <dbl>
1    0.254      0.338   0.764   -2.68    2.22   -0.755     4.52  4.90 0.671
# … with 3 more variables: variance <dbl>, ci_low <dbl>, ci_high <dbl>
> tidy_distribution_summary_tbl(data_tbl, dist_type)
# A tibble: 2 × 13
  dist_type    mean_val median_val std_val  min_val max_val skewness kurtosis range
  <fct>           <dbl>      <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl> <dbl>
1 Gaussian c(…   0.0210     0.0807   0.990 -2.68      2.22   -0.0478     2.78 4.90 
2 Beta c(1, 10.486      0.476    0.282  0.00133   0.997   0.113      1.88 0.995
# … with 4 more variables: iqr <dbl>, variance <dbl>, ci_low <dbl>, ci_high <dbl>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant