Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion between scale and rate for the gamma distribution #91

Closed
ovancantfort opened this issue Mar 11, 2022 · 3 comments
Closed

Confusion between scale and rate for the gamma distribution #91

ovancantfort opened this issue Mar 11, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@ovancantfort
Copy link

ovancantfort commented Mar 11, 2022

It looks like there is a confusion between scale and rate for the gamma distribution in util_gamma_param_estimate.

Remember that rate = 1/scale.

util_gamma_param_estimate$parameters_tbl has "rate" in its title but in fact estimates the scale.

As a result, all the points generated by the simulation are wrong.

using v1.0.0 from CRAN

Thanks for your work on this package :-)

reproducible example:

library(TidyDensity)
library(tidyverse)
set.seed(1234)
data <- rgamma(n=200,shape = 3, scale=0.3)
summary(data)

gamma_fit <- util_gamma_param_estimate(data)
gamma_fit$parameter_tbl
gamma_fit$combined_data_tbl %>% filter(dist_type!="Empirical") %>% summary()
#mean of y is much larger than in the data

gamma_fit$combined_data_tbl %>%
  ggplot(aes(x = dx, y = dy)) +
  geom_line(aes(group = dist_type, color = dist_type)) +
  theme_minimal() +
  theme(legend.position = "bottom")
@spsanderson spsanderson self-assigned this Mar 16, 2022
@spsanderson spsanderson added the bug Something isn't working label Mar 16, 2022
@spsanderson spsanderson added this to the TidyDensity v1.0.1 milestone Mar 16, 2022
@spsanderson
Copy link
Owner

Thank you for pointing this out. I will make the name change and update. Also it looks like I need to actually update the tidy_gamma() function itself as I am using .rate and not the rgamma() scale. I will make the change to scale and then the name change to the util_gamma_param_estimate() function.

@spsanderson
Copy link
Owner

spsanderson commented Mar 16, 2022

  • Fix tidy_gamma change .rate to .scale
  • Fix tidy_autoplot_ functions to look for .scale and not .rate

@spsanderson
Copy link
Owner

New tidy_gamma()

tidy_gamma <- function(.n = 50, .shape = 1, .scale = 1, .num_sims = 1) {
  
  # Tidyeval ----
  n <- as.integer(.n)
  num_sims <- as.integer(.num_sims)
  shp <- .shape
  scle <- .scale
  
  # Checks ----
  if (!is.integer(n) | n < 0) {
    rlang::abort(
      "The parameters '.n' must be of class integer. Please pass a whole
            number like 50 or 100. It must be greater than 0."
    )
  }
  
  if (!is.integer(num_sims) | num_sims < 0) {
    rlang::abort(
      "The parameter `.num_sims' must be of class integer. Please pass a
            whole number like 50 or 100. It must be greater than 0."
    )
  }
  
  if (!is.numeric(shp) | shp < 0) {
    rlang::abort(
      "The parameters of '.shape' and '.scale' must be of class numeric.
            Please pass a numer like 1 or 1.1 etc. and must be greater than 0."
    )
  }
  
  if (!is.numeric(scle) | scle < 0) {
    rlang::abort(
      "The parameters of '.shape' and '.scale' must be of class numeric.
            Please pass a numer like 1 or 1.1 etc."
    )
  }
  
  x <- seq(1, num_sims, 1)
  
  # ps <- seq(-n, n - 1, 2)
  qs <- seq(0, 1, (1 / (n - 1)))
  ps <- qs
  
  df <- dplyr::tibble(sim_number = as.factor(x)) %>%
    dplyr::group_by(sim_number) %>%
    dplyr::mutate(x = list(1:n)) %>%
    dplyr::mutate(y = list(stats::rgamma(n = n, shape = shp, scale = scle))) %>%
    dplyr::mutate(d = list(density(unlist(y), n = n)[c("x", "y")] %>%
                             purrr::set_names("dx", "dy") %>%
                             dplyr::as_tibble())) %>%
    dplyr::mutate(p = list(stats::pgamma(ps, shape = shp, scale = scle))) %>%
    dplyr::mutate(q = list(stats::qgamma(qs, shape = shp, scale = scle))) %>%
    tidyr::unnest(cols = c(x, y, d, p, q)) %>%
    dplyr::ungroup()
  
  param_grid <- dplyr::tibble(.shape, .scale)
  
  # Attach descriptive attributes to tibble
  attr(df, ".shape") <- .shape
  attr(df, ".scale") <- .scale
  attr(df, ".n") <- .n
  attr(df, ".num_sims") <- .num_sims
  attr(df, "tibble_type") <- "tidy_gamma"
  attr(df, "ps") <- ps
  attr(df, "qs") <- qs
  attr(df, "param_grid") <- param_grid
  attr(df, "param_grid_txt") <- paste0(
    "c(",
    paste(param_grid[, names(param_grid)], collapse = ", "),
    ")"
  )
  attr(df, "dist_with_params") <- paste0(
    "Gamma",
    " ",
    paste0(
      "c(",
      paste(param_grid[, names(param_grid)], collapse = ", "),
      ")"
    )
  )
  
  # Return final result as function output
  return(df)
}

Example:

> data <- rgamma(n=200,shape = 3, scale=0.3)
> summary(data)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.03284 0.55713 0.79385 0.92422 1.19792 3.63824 
> tidy_gamma(200, 3, .3) %>% pull(y) %>% summary()
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.09591 0.49764 0.79141 0.89678 1.17260 3.38996 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants