/
speed.Rmd
286 lines (246 loc) · 14.7 KB
/
speed.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
---
title: "Speeding up APES"
author:
- name: Kevin Y.X. Wang
affiliation: School of Mathematics and Statistics, The University of Sydney, Australia
- name: Garth Tarr
affiliation: School of Mathematics and Statistics, The University of Sydney, Australia
- name: Jean Y.H. Yang
affiliation: School of Mathematics and Statistics, The University of Sydney, Australia
- name: Samuel Mueller
affiliation: Department of Mathematics and Statistics, Macquarie University, Australia
output:
rmarkdown::html_vignette:
fig_width: 8
fig_height: 6
vignette: |
%\VignetteIndexEntry{Speeding up APES}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# Introduction
The APES method relies on a best subset algorithm for linear regression model. In the current implementation of the `APES` package, we allow for two backend options. One is the leaps-and-bound algorithm from the `leaps` package and Mixed Integer Optimisation from the `bestsubset` package with a backend connect to the Gurobi commercial software.
You can learn about how to install Gurobi and the `bestsubset` package from [here](https://kevinwang09.github.io/APES/articles/install_bestsubset.html)
# Running APES with bestsubset
We will use a simulated example to demostrate APES with Gurobi Mixed Integer Optimisation (MIO) via the `bestsubset` package. Invoking this option can be done by specifying `estimator = "mio"`.
```{r}
library(APES)
```
```{r, eval = FALSE, results='hide', echo = FALSE}
## Simulating data
library(APES)
set.seed(10)
n = 100
p = 10
beta = c(1, -1, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
## Running APES
apes_result = apes(model = model, estimator = "mio")
apes_result
class(apes_result)
names(apes_result)
```
``` r
## Simulating data
library(APES)
set.seed(10)
n = 100
p = 10
beta = c(1, -1, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
## Running APES
apes_result = apes(model = model, estimator = "mio")
#> Registered S3 method overwritten by 'bestsubset':
#> method from
#> predict.bs splines
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=1.
#> 1. Solving best subset selection with k=1.
#> 1. Solving best subset selection with k=1.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 2. Solving best subset selection with k=2.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 3. Solving best subset selection with k=3.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 4. Solving best subset selection with k=4.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 5. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 6. Solving best subset selection with k=6.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 7. Solving best subset selection with k=7.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 8. Solving best subset selection with k=8.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 9. Solving best subset selection with k=9.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 10. Solving best subset selection with k=10.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
apes_result
#> Time taken: 0.006937699 minutes
#>
#> APES - AIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> -0.054 0.910 -1.123 0.000 0.000 0.000 0.000 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
#>
#> APES - BIC selected the following variables
#> intercept X1 X2 X3 X4 X5 X6 X7
#> -0.041 0.912 -1.145 0.000 0.000 0.000 -0.392 0.000
#> X8 X9 X10
#> 0.000 0.000 0.000
class(apes_result)
#> [1] "apes"
names(apes_result)
#> [1] "apes_model_df" "apes_mle_beta" "apes_mle_beta_binary"
#> [4] "time_used" "selected_model_beta" "model_avg_beta"
#> [7] "response_tibble"
```
# Comparing `leaps` and `bestsubset` (Gurobi MIO)
## Solving for `k` directly
Imagine that we have a data with $p = 10$ regressors and we are interested in getting the best model of size $k = 5$, where $k$ is the number of regressors with non-zero coefficient estimate. The `leaps` package is designed in such a way that all models of size $1, \dots, k$ are computed and returned. This is a behaviour that is rooted in the leaps-and-bound algorithm and the intermediate computed models cannot be skipped. For large values of `p`, this could be very computationally heavy and unnecessary if we only desire model of a particular size. On the other hand, the MIO algorithm is able to optimise for a model of size `k` directly and performs well for large vales of `p`.
Continuing with the example above, if we specify an additional `k` parameter in our call of the `apes` function, then we can get the result for only one model using MIO but five models if we solve using `leaps`. Additionally, MIO is able to solve for model sizes in a range, e.g. by setting `k = 5:9`. The sequential behaviour of `leaps` means that it will be slower than MIO for large values of `p` but will be faster than MIO for small values of `p` due to the design of API.
```{r, eval = FALSE, results='hide', echo = FALSE}
## Simulating data
library(APES)
set.seed(10)
n = 100
p = 10
beta = c(1, -1, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
## leaps
leaps_result = apes(model = model, estimator = "leaps", k = 5)
leaps_result$apes_model_df
## mio
mio_result = apes(model = model, estimator = "mio", k = 5)
mio_result$apes_model_df
mio_result2 = apes(model = model, estimator = "mio", k = 5:9)
mio_result2$apes_model_df
```
``` r
## leaps
leaps_result = apes(model = model, estimator = "leaps", k = 5)
leaps_result$apes_model_df
#> # A tibble: 5 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 2 "" -59.4 123. 128. leaps_o…
#> 2 apes_model… 3 "apes_min_bi… -53.3 113. 120. leaps_o…
#> 3 apes_model… 4 "apes_min_ai… -51.8 112. 122. leaps_o…
#> 4 apes_model… 5 "" -51.2 112. 125. leaps_o…
#> 5 apes_model… 6 "" -50.6 113. 129. leaps_o…
## mio
mio_result = apes(model = model, estimator = "mio", k = 5)
#> Registered S3 method overwritten by 'bestsubset':
#> method from
#> predict.bs splines
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
mio_result$apes_model_df
#> # A tibble: 1 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 6 apes_min_aicap… -50.6 113. 129. OPTIM…
mio_result2 = apes(model = model, estimator = "mio", k = 5:9)
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 2. Solving best subset selection with k=6.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 3. Solving best subset selection with k=7.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 4. Solving best subset selection with k=8.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
#> 5. Solving best subset selection with k=9.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: OPTIMAL.
mio_result2$apes_model_df
#> # A tibble: 5 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 6 "apes_min_aica… -50.6 113. 129. OPTIM…
#> 2 apes_model… 7 "" -50.1 114. 132. OPTIM…
#> 3 apes_model… 8 "" -50.1 116. 137. OPTIM…
#> 4 apes_model… 9 "" -50.1 118. 142. OPTIM…
#> 5 apes_model… 10 "" -50.0 120. 146. OPTIM…
```
## Time limit
Another advantage of MIO over the leaps-and-bound algorithm is that MIO can stop the computation before a full convergence in the optimisation and yet still return the model that is considered to be optimal at that point in time. This is a very attractive property as we can specify a time limit when searching for a model instead of waiting for an unknown amount of time.
In `APES`, this can be specified with the `time_limit` parameter. Note that the unit is in seconds. In the result below, we specified `time_limit = 0.001` which is small enough to trigger a time limit. In such a case, MIO will return a result (`mio_result3`) with a status of `TIME_LIMIT`. However, notice how the computed result is identical to the computed result where the computed is global optimal (`mio_result`), except the time taken. This is known as the sub-optimal property of the MIO algorithm.
```{r, eval = FALSE, results='hide', echo = FALSE}
## Simulating data
library(APES)
set.seed(10)
n = 100
p = 10
beta = c(1, -1, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
mio_result = apes(model = model, estimator = "mio", k = 5)
mio_result$apes_model_df
mio_result3 = apes(model = model, estimator = "mio", k = 5, time_limit = 0.001)
mio_result3$apes_model_df
all.equal(mio_result, mio_result3)
```
``` r
mio_result3 = apes(model = model, estimator = "mio", k = 5, time_limit = 0.001)
#> 0. Computing max eigenvalue of X^T X, for the step size in projected gradient descent.
#> 1. Solving best subset selection with k=5.
#> a. Performing projected gradient runs: 1 ... 10 ... 20 ... 30 ... 40 ... 50 ...
#> b. Running Gurobi's mixed integer program solver ... Return status: TIME_LIMIT.
mio_result3$apes_model_df
#> # A tibble: 1 x 7
#> model_name model_size ic_opt_models apes_mle_loglike mle_aic mle_bic status
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 apes_model… 6 apes_min_aicap… -50.6 113. 129. TIME_…
all.equal(mio_result, mio_result3)
#> [1] "Component \"apes_model_df\": Component \"status\": 1 string mismatch"
#> [2] "Component \"time_used\": Mean relative difference: 0.599496"
```
## Computational speed
We will conclude this vignette with a figure that shows the different computation options and the time cost associated with each option. We hope this can help the end-users to select the best combination of input parameters for their data problems.
In the plot below, we fix `k = 15` and `n = 500`, where `n` is the number of observations. We grdually increase number of regressors (`p`) in simulated data and plot it on the x-axis. There are four options that we explored:
1. APES-leaps, which uses `leaps` to sequentially search for the optimal model of size up to `k`. This can be invoked using the command `apes(model = model, estimator = "leaps", k = k)` or `apes(model = model)` which is the default.
2. APES-MIO-sequential, which uses MIO to sequentially search for the optimal models
up to size `k`. We do not impose a time limit in this case. This can be invoked using the command `apes(model = model, estimator = "mio", k = 1:k)`.
3. APES-MIO-sequential-threshold, which uses MIO to sequentially search for the
optimal models up to size `k`. We impose a time limit of 5 seconds in this case. This can be invoked using the command `apes(model = model, estimator = "mio", k = 1:k, time_limit = 5)`.
4. APES-MIO-direct, which uses MIO to directly search for the optimal model of size `k`. This can be invoked using the command `apes(model = model, estimator = "mio", k = k)`.
![](figures/APES_timing.png){width=100%}
It can be seen that the leaps implementation can be orders of magnitudes faster than the MIO implementations when $p < 35$. When p increases from 40 to 50, leaps overtook all MIO implementations with a much steeper increase in time taken. This simple demonstration shows that leaps is preferred in cases where p is small or moderate and MIO is virtually the only viable option for $p > 50$.
# Session Info
```{r}
sessionInfo()
```