Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation functions? #38

Open
philmikejones opened this issue Dec 7, 2016 · 3 comments
Open

Add validation functions? #38

philmikejones opened this issue Dec 7, 2016 · 3 comments
Assignees

Comments

@philmikejones
Copy link
Owner

Internal and external validation steps that could be added:

  • Correlation (vector and total)
  • t-test (2-sided, equal variance) of sim/actual vectors
  • TAE and SAE (vector and total)
  • SEI
  • Percentage error
@philmikejones
Copy link
Owner Author

Example internal validation function:

int_validate <- function(constraints, ind_agg) {

  correlation <- cor(as.numeric(constraints), as.numeric(ind_agg))

  max_abs_diff <- max(abs(ind_agg - constraints))

  tae <- tae(constraints, ind_agg)

  sae <- tae(constraints, ind_agg) / sum(constraints)

  cor_vec <- rep(0, dim(constraints)[1])
  for (i in 1:dim(constraints)[1]) {
    cor_vec[i] <- cor(as.numeric(constraints[i, ]), as.numeric(ind_agg[i, ]))
  }
  # cor_vec <- summary(cor_vec)

  tae_vec <- rep(0, nrow(constraints))
  sae_vec <- rep(0, nrow(constraints))
  for (i in 1:nrow(constraints)) {
    tae_vec[i] <- tae(constraints[i, ], ind_agg[i, ])
    sae_vec[i] <- tae_vec[i] / sum(constraints[i, ])
  }

  worst_zone <- which.max(tae_vec)
  # worst_zone <- tae_vec[worst_zone] / sum(tae_vec))

  out <- list(
    "correlation"  = correlation,
    "max_abs_diff" = max_abs_diff,
    "tae"          = tae,
    "sae"          = sae,
    "cor_vec"      = cor_vec,
    "tae_vec"      = tae_vec,
    "sae_vec"      = sae_vec,
    "worst_zone"   = worst_zone
  )

  out

}

@philmikejones
Copy link
Owner Author

Example external validation function:

ext_validate <- function(llid_val, constraint, simdf, code_geo) {

  llid_val <- llid_val %>%
    select(GEOGRAPHY_CODE, C_DISABILITY_NAME, OBS_VALUE) %>%
    spread(C_DISABILITY_NAME, OBS_VALUE)

  llid_val$llid <- rowSums(llid_val[, 2:3])

  colnames(llid_val) <- c("code", "little", "lot",
                          "llid_no_census", "llid_census")
  llid_val <- select(llid_val, -little, -lot)

  context("Check llid_val object")
  test_that("Population of llid_val matches constraint", {
    expect_equal(sum(llid_val[, 2:3]),
      sum(constraint[, grep("age_[[:digit:]]", colnames(constraint))]))
  })

  # Add sim_llid to zone_simdf_oa
  llid_sim <- simdf %>%
    select(c_llid, zone) %>%
    count(zone, c_llid) %>%
    spread(c_llid, n)

  colnames(llid_sim) <- c("zone", "llid_no_sim", "llid_sim")
  llid_sim$code <- NA
  llid_sim$code <- codes[[as.character(code_geo)]]

  # merge on codes (not rbind) because prison OAs removed!
  llid_val <- left_join(llid_val, llid_sim, by = "code")

  llid_val

}

@philmikejones
Copy link
Owner Author

Example percentage error function:

calc_perr <- function(llid_val) {

  total <- rowSums(llid_val[, c("llid_no_census", "llid_census")])
  perr  <- llid_val$llid_census - llid_val$llid_sim
  perr  <- abs(perr)
  perr  <- perr / total
  perr  <- perr * 100

  perr

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant