Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta #424

Closed
Tracked by #421
spsanderson opened this issue Apr 24, 2024 · 0 comments
Closed
Tracked by #421

Beta #424

spsanderson opened this issue Apr 24, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@spsanderson
Copy link
Owner

spsanderson commented Apr 24, 2024

Function:

#' Calculate Akaike Information Criterion (AIC) for Beta Distribution
#'
#' This function calculates the Akaike Information Criterion (AIC) for a beta 
#' distribution fitted to the provided data.
#'
#' @family Utility
#' @author Steven P. Sanderson II, MPH
#'
#' @description
#' This function estimates the parameters of a beta distribution from the provided 
#' data using maximum likelihood estimation, and then calculates the AIC value 
#' based on the fitted distribution.
#'
#' @param .x A numeric vector containing the data to be fitted to a beta 
#' distribution.
#'
#' @details
#' Initial parameter estimates: The choice of initial values can impact the 
#' convergence of the optimization.
#' Optimization method: You might explore different optimization methods within 
#' optim for potentially better performance.
#' Data transformation: Depending on your data, you may need to apply 
#' transformations (e.g., scaling to [0,1] interval) before fitting the beta 
#' distribution.
#' Goodness-of-fit: While AIC is a useful metric for model comparison, it's 
#' recommended to also assess the goodness-of-fit of the chosen model using 
#' visualization and other statistical tests.
#'
#' @examples
#' # Example 1: Calculate AIC for a sample dataset
#' set.seed(123)
#' x <- rbeta(30, 1, 1)
#' util_beta_aic(x)
#'
#' @return
#' The AIC value calculated based on the fitted beta distribution to the 
#' provided data.
#'
#' @name util_beta_aic
NULL

#' @export
#' @rdname util_beta_aic
util_beta_aic <- function(.x) {
  # Tidyeval
  x <- as.numeric(.x)
  
  # Scale data to [0, 1] if not already in that range
  if (any(x < 0) || any(x > 1)) {
    x <- (x - min(x)) / (max(x) - min(x))
  }
  
  # Get parameters
  pe <- TidyDensity::util_beta_param_estimate(x)$parameter_tbl |>
    subset(method == "EnvStats_MME")
  
  # Negative log-likelihood function for beta distribution
  neg_log_lik_beta <- function(par, data) {
    shape1 <- par[1]
    shape2 <- par[2]
    ncp <- par[3]
    n <- length(data)
    -sum(dbeta(data, shape1, shape2, ncp, log = TRUE))
  }
  
  # Fit beta distribution using optim
  fit_beta <- optim(
    c(pe$shape1, pe$shape2, 0), 
    neg_log_lik_beta, 
    data = x
  )
  
  # Extract log-likelihood and number of parameters
  logLik_beta <- -fit_beta$value
  k_beta <- 3 # Number of parameters for beta distribution (shape1, shape2, ncp)
  
  # Calculate AIC
  AIC_beta <- 2 * k_beta - 2 * logLik_beta
  
  # Return AIC
  return(AIC_beta)
}

Example:

> set.seed(123)
> x <- rbeta(30, 1, 1)
> util_beta_aic(x)
There was no need to scale the data.
[1] 5.691712
Warning message:
In dbeta(data, shape1, shape2, ncp, log = TRUE) : NaNs produced
> fitdistrplus::fitdist(x, "beta", start = list(shape1 = 1, shape2 = 1, ncp = 0))
Fitting of the distribution ' beta ' by maximum likelihood 
Parameters:
        estimate Std. Error
shape1 0.9347277  0.4139958
shape2 1.1679481  0.4025620
ncp    0.9114368  2.0352873
> tst <- fitdistrplus::fitdist(x, "beta", start = list(shape1 = 1, shape2 = 1, ncp = 0))
> tst$aic
[1] 5.691712
@spsanderson spsanderson self-assigned this Apr 24, 2024
@spsanderson spsanderson added the enhancement New feature or request label Apr 24, 2024
@spsanderson spsanderson added this to the TidyDensity 1.4.0 milestone Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

1 participant