-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
77 lines (57 loc) · 2.51 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/"
)
```
# **`conftree`**: Finding Subgroups with Conformal Trees
<!-- badges: start -->
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![R-CMD-check](https://github.com/holgstr/conftree/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/holgstr/conftree/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
This `R` package finds robust subgroups in data with a single continuous response, suitable for either regression or treatment effect models. Subgroups are identified via recursive partitioning, resulting in an interpretable tree. Conformal prediction methods (SCR, CV+ and Jackknife+) are leveraged to simultaneously optimize inter-group heterogeneity and intra-group homogeneity. First, predictions are made using an arbitrary regression learner from the [100+ algorithms][1] available in `tidymodels`. Then, the data is split recursively using the robust conformal criterion. In this way, `conftree` is an extension the R2P algorithm from [Lee et al. (NeurIPS 2020)][2].
**Scope:**
- subgroups for regression problems with `r2p()`
- subgroups for treatment effects with `r2p_hte()`
## Installation
You can install the current development version from GitHub with:
```{r gh-installation, eval = FALSE}
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("holgstr/conftree")
```
## Quickstart
Let's find subgroups in the Washington bike share data. We use a random forest from `tidymodels` as `learner`, a 5% miscoverage rate as `alpha`, and 10 `cv_folds` for the CV+ to quantify the uncertainty in the resulting subgroups:
```{r, message = FALSE}
library(conftree)
library(tidymodels)
data(bikes)
set.seed(1234)
# Specify the learner to be used for model training:
forest <- rand_forest() %>%
set_mode("regression") %>%
set_engine("ranger")
# Find optimal subgroups:
groups <- r2p(
data = bikes,
target = "count",
learner = forest,
cv_folds = 10,
alpha = 0.05,
gamma = 0.2,
lambda = 0.5,
max_groups = 4
)
# Display tree structure:
groups$tree
# Plot:
plot(groups)
```
[1]: https://www.tidymodels.org/find/parsnip/
[2]: https://proceedings.neurips.cc/paper/2020/hash/1819020b02e926785cf3be594d957696-Abstract.html