Documentation | Contributors | Release Notes

compboost: Fast and Flexible Component-Wise Boosting Framework

Component-wise boosting applies the boosting framework to statistical models, e.g., general additive models using component-wise smoothing splines. Boosting these kinds of models maintains interpretability and enables unbiased model selection in high dimensional feature spaces.

The R package compboost is an alternative implementation of component-wise boosting written in C++ to obtain high runtime performance and full memory control. The main idea is to provide a modular class system which can be extended without editing the source code. Therefore, it is possible to use R functions as well as C++ functions for custom base-learners, losses, logging mechanisms or stopping criteria.

For an introduction and overview about the functionality visit the project page.

Installation

CRAN version:

install.packages("compboost")

Developer version:

devtools::install_github("schalkdaniel/compboost")

Examples

This examples are rendered using compboost 0.1.0.

To be as flexible as possible one should use the R6 API do define base-learner, losses, stopping criteria, or optimizer as desired. Another option is to use wrapper functions as described on the project page.

library(compboost)

# Check installed version:
packageVersion("compboost")
#> [1] '0.1.0'

# Load data set with binary classification task:
data(PimaIndiansDiabetes, package = "mlbench")
# Create categorical feature:
PimaIndiansDiabetes$pregnant.cat = ifelse(PimaIndiansDiabetes$pregnant == 0, "no", "yes")

# Define Compboost object:
cboost = Compboost$new(data = PimaIndiansDiabetes, target = "diabetes", loss = LossBinomial$new())
cboost
#> Component-Wise Gradient Boosting
#> 
#> Trained on PimaIndiansDiabetes with target diabetes
#> Number of base-learners: 0
#> Learning rate: 0.05
#> Iterations: 0
#> Positive class: neg
#> 
#> LossBinomial Loss:
#> 
#>   Loss function: L(y,x) = log(1 + exp(-2yf(x))
#> 
#> 

# Add p-spline base-learner with default parameter:
cboost$addBaselearner(feature = "pressure", id = "spline", bl.factory = BaselearnerPSpline)

# Add another p-spline learner with custom parameters:
cboost$addBaselearner(feature = "age", id = "spline", bl.factory = BaselearnerPSpline, degree = 3, 
  n.knots = 10, penalty = 4, differences = 2)

# Add categorical feature (as single linear base-learner):
cboost$addBaselearner(feature = "pregnant.cat", id = "category", bl.factory = BaselearnerPolynomial,
  degree = 1, intercept = FALSE)

# Check all registered base-learner:
cboost$getBaselearnerNames()
#> [1] "pressure_spline"           "age_spline"               
#> [3] "pregnant.cat_yes_category" "pregnant.cat_no_category"

# Train model:
cboost$train(1000L, trace = 200L)
#>    1/1000: risk = 0.66
#>  200/1000: risk = 0.58
#>  400/1000: risk = 0.57
#>  600/1000: risk = 0.57
#>  800/1000: risk = 0.57
#> 1000/1000: risk = 0.57
#> 
#> 
#> Train 1000 iterations in 0 Seconds.
#> Final risk based on the train set: 0.57
cboost
#> Component-Wise Gradient Boosting
#> 
#> Trained on PimaIndiansDiabetes with target diabetes
#> Number of base-learners: 4
#> Learning rate: 0.05
#> Iterations: 1000
#> Positive class: neg
#> Offset: 0.3118
#> 
#> LossBinomial Loss:
#> 
#>   Loss function: L(y,x) = log(1 + exp(-2yf(x))
#> 
#> 

cboost$getBaselearnerNames()
#> [1] "pressure_spline"           "age_spline"               
#> [3] "pregnant.cat_yes_category" "pregnant.cat_no_category"

selected.features = cboost$getSelectedBaselearner()
table(selected.features)
#> selected.features
#>               age_spline pregnant.cat_no_category          pressure_spline 
#>                      434                      150                      416

params = cboost$getEstimatedCoef()
str(params)
#> List of 4
#>  $ age_spline              : num [1:14, 1] 2.99 1.501 0.588 -0.535 -0.119 ...
#>  $ pregnant.cat_no_category: num [1, 1] -0.299
#>  $ pressure_spline         : num [1:24, 1] -0.8087 -0.4274 -0.0602 0.2226 0.3368 ...
#>  $ offset                  : num 0.312

cboost$train(3000)
#> 
#> You have already trained 1000 iterations.
#> Train 2000 additional iterations.

cboost$plot("age_spline", iters = c(100, 500, 1000, 2000, 3000)) +
  ggthemes::theme_tufte() + 
  ggplot2::scale_color_brewer(palette = "Spectral")

Benchmark ---------

To get an idea of the performance of compboost, we have conduct a small benchmark in which compboost is compared with mboost. For this purpose, the runtime behavior and memory consumption of the two packages were compared. The results of the benchmark can be read here.

Citing

To cite compboost in publications, please use:

Schalk et al., (2018). compboost: Modular Framework for Component-Wise Boosting. Journal of Open Source Software, 3(30), 967, https://doi.org/10.21105/joss.00967

@article{schalk2018compboost,
  author = {Daniel Schalk, Janek Thomas, Bernd Bischl},
  title = {compboost: Modular Framework for Component-Wise Boosting},
  URL = {https://doi.org/10.21105/joss.00967},
  year = {2018},
  publisher = {Journal of Open Source Software},
  volume = {3},
  number = {30},
  pages = {967},
  journal = {JOSS}
}

Testing

On your local machine

In order to test the pacakge functionality you can use devtools to test the pacakge on your local machine:

devtools::test()

Using docker

You can test the package locally using docker and the compboost-test repository:

Latest R release:
```
docker run schalkdaniel/compboost-test
```

Latest R devel build:

docker run schalkdaniel/compboost-test:devel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

compboost: Fast and Flexible Component-Wise Boosting Framework

Installation

CRAN version:

Developer version:

Examples

Citing

Testing

On your local machine

Using docker

Files

README.md

Latest commit

History

README.md

File metadata and controls

compboost: Fast and Flexible Component-Wise Boosting Framework

Installation

CRAN version:

Developer version:

Examples

Citing

Testing

On your local machine

Using docker