Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New functions for investigating predictor rankings #406

Merged
merged 14 commits into from
May 3, 2023

Conversation

fweber144
Copy link
Collaborator

This PR starts to address #289.

In particular, this adds the new functions ranking() and props(), making output element pct_solution_terms_cv of vsel objects obsolete (but requiring the addition of output element solution_terms_cv to vsel objects). This also adds a new plot() method for visualizing the output of props() (more precisely, two plot() methods are added, but plot.vsel() is only a shortcut for first applying ranking.vsel() and then plot.ranking()). For the user-facing changes, see the changes in NEWS.md.

Quick illustration of the new functions:

library(projpred) # Use devtools::load_all() if installation is not desired.

dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
fit <- rstanarm::stan_glm(
  y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
  QR = TRUE, chains = 2, iter = 1000, refresh = 0, seed = 9876
)
cvvs <- cv_varsel(fit, cv_method = "kfold", nclusters_pred = 20, seed = 5555)

# Extract predictor rankings:
rk <- ranking(cvvs)

# Compute ranking proportions:
( pr_rk <- props(rk) )
    predictor
size X1 X5 X3  X2  X4
   1  1  0  0 0.0 0.0
   2  0  1  0 0.0 0.0
   3  0  0  1 0.0 0.0
   4  0  0  0 0.6 0.4
   5  0  0  0 0.4 0.6
attr(,"class")
[1] "props"
# Visualize the ranking proportions:
plot(pr_rk)

Rplot-01

# For longer predictor names:
plot(pr_rk, text_angle = 45)

Rplot-02

# Compute cumulated ranking proportions:
( pr_rk_cumul <- props(rk, cumulate = TRUE) )
     predictor
size  X1 X5 X3  X2  X4
  <=1  1  0  0 0.0 0.0
  <=2  1  1  0 0.0 0.0
  <=3  1  1  1 0.0 0.0
  <=4  1  1  1 0.6 0.4
  <=5  1  1  1 1.0 1.0
attr(,"class")
[1] "cumulprops" "props" 
# Visualize the cumulated ranking proportions:
plot(pr_rk_cumul)

Rplot-03

# For longer predictor names:
plot(pr_rk_cumul, text_angle = 45)

Rplot-04

Dummy ranking proportions for illustrating the color gradient:

nterms_dummy <- 10L
pr_dummy <- matrix(
  seq(0, 1, length.out = nterms_dummy^2),
  nrow = nterms_dummy, ncol = nterms_dummy,
  dimnames = list("size" = as.character(seq_len(nterms_dummy)),
                  "predictor" = paste0("x", seq_len(nterms_dummy)))
)
class(pr_dummy) <- "props"
plot(pr_dummy)

Rplot

As mentioned above, this PR is only a first step towards resolving #289 completely. In the future, I will:

  • Illustrate the usage of these new functions in the main vignette.
  • Create a PR adding another column to the print.vselsummary() table, containing the diagonal of the matrix of the non-cumulated ranking proportions.
  • Create a PR adding the full-data solution path on the x-axis of the ordinary plot.vsel() predictive performance plot, again with the diagonal of the matrix of the non-cumulated ranking proportions. This will be done with the option to restrict this solution path and the ranking proportions to the first few terms. By default, I think we should not restrict them, for consistency with the print.vselsummary() table.

Add new functions `ranking()` and `props()`, making output element
`pct_solution_terms_cv` of `vsel` objects obsolete (but requiring the addition
of output element `solution_terms_cv` to `vsel` objects).

Also add new `plot()` methods for visualizing the output of `props()`.
the non-sampled observations consist of `NA`s.
`cv_varsel()`, speed-up the `cv_varsel()` example, and add an example for
`plot.props()` (thereby also implying examples for `ranking()` and `props()`).
instead of throwing an informational message and returning `NULL`. This avoids
that users call `plot()` on `NULL`, which throws the opaque error `need finite
'xlim' values`.
fold-wise predictor rankings, an error is thrown instead of `NULL` being
returned.
@fweber144 fweber144 requested a review from avehtari April 19, 2023 15:43
@avehtari
Copy link
Collaborator

Why props and not proportions?

Otherwise, the illustration looks good

Copy link
Collaborator

@avehtari avehtari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the documentation parts, too

NEWS.md Show resolved Hide resolved
NEWS.md Show resolved Hide resolved
@fweber144
Copy link
Collaborator Author

Why props and not proportions?

proportions() was also my first idea, but it already exists in base R (https://stat.ethz.ch/R-manual/R-devel/library/base/html/proportions.html) and unfortunately, it's not a generic. Since prop.table() is an alias for proportions() in base R, I thought it would make sense to abbreviate proportions() by props().

@fweber144
Copy link
Collaborator Author

Why props and not proportions?

proportions() was also my first idea, but it already exists in base R (https://stat.ethz.ch/R-manual/R-devel/library/base/html/proportions.html) and unfortunately, it's not a generic. Since prop.table() is an alias for proportions() in base R, I thought it would make sense to abbreviate proportions() by props().

We now agreed to rename props() to cv_proportions() in an upcoming PR.

@fweber144 fweber144 requested a review from avehtari May 2, 2023 08:18
Copy link
Collaborator

@avehtari avehtari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, ok now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants