C++ implementation and R API for componentwise boosting
Clone or download
Latest commit e0731fb Nov 14, 2018
Permalink
Failed to load latest commit information.
R udpate examples to new logger structure Nov 9, 2018
Readme_files update LICENSE Sep 25, 2018
benchmark add benchmark files Aug 21, 2018
binder update install.R Oct 1, 2018
docs fixed some typos Nov 13, 2018
inst update CITATION Oct 12, 2018
man udpate examples to new logger structure Nov 9, 2018
other update tests for new list argument handler Oct 20, 2018
paper update paper Oct 31, 2018
revdep add reverse dependencies check Aug 29, 2018
src update makevars + tests Nov 9, 2018
tests Merge pull request #265 from schalkdaniel/fix_logger Nov 9, 2018
todo Merge branch 'master' of https://github.com/schalkdaniel/compboost in… Aug 2, 2018
vignettes update stuff Aug 29, 2018
.Rbuildignore add newline to Buildignore Oct 18, 2018
.gitattributes add appveyor test Aug 8, 2018
.gitignore update README Aug 29, 2018
.travis.yml update some metadata stuff Dec 20, 2017
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md Jul 27, 2018
CONTRIBUTING.md add CONTRIBUTING file Jul 27, 2018
CONTRIBUTORS.md Create CONTRIBUTORS.md Oct 1, 2018
DESCRIPTION update internal structure Oct 30, 2018
LICENSE update documentation + paper template Jul 6, 2018
LICENSE.md update LICENSE Sep 25, 2018
NAMESPACE update line search stuff Oct 30, 2018
NEWS.md update stuff from line search optimizer + logger udpates Nov 9, 2018
README.Rmd update DESCRIPTION + README Oct 28, 2018
README.md update DESCRIPTION + README Oct 28, 2018
appveyor.yml add appveyor test Aug 8, 2018
cran-comments.md update makevars Sep 28, 2018

README.md

Build Status AppVeyor Build Status Coverage Status license CRAN_Status_Badge status

Documentation | Contributors | Release Notes

compboost: Fast and Flexible Component-Wise Boosting Framework

Component-wise boosting applies the boosting framework to statistical models, e.g., general additive models using component-wise smoothing splines. Boosting these kinds of models maintains interpretability and enables unbiased model selection in high dimensional feature spaces.

The R package compboost is an alternative implementation of component-wise boosting written in C++ to obtain high runtime performance and full memory control. The main idea is to provide a modular class system which can be extended without editing the source code. Therefore, it is possible to use R functions as well as C++ functions for custom base-learners, losses, logging mechanisms or stopping criteria.

For an introduction and overview about the functionality visit the project page.

Installation

CRAN version:

instlal.packages("compboost")

Developer version:

devtools::install_github("schalkdaniel/compboost")

Examples Binder

This examples are rendered using compboost 0.1.0.

To be as flexible as possible one should use the R6 API do define base-learner, losses, stopping criteria, or optimizer as desired. Another option is to use wrapper functions as described on the project page.

library(compboost)

# Check installed version:
packageVersion("compboost")
#> [1] '0.1.0'

# Load data set with binary classification task:
data(PimaIndiansDiabetes, package = "mlbench")
# Create categorical feature:
PimaIndiansDiabetes$pregnant.cat = ifelse(PimaIndiansDiabetes$pregnant == 0, "no", "yes")

# Define Compboost object:
cboost = Compboost$new(data = PimaIndiansDiabetes, target = "diabetes", loss = LossBinomial$new())
cboost
#> Component-Wise Gradient Boosting
#> 
#> Trained on PimaIndiansDiabetes with target diabetes
#> Number of base-learners: 0
#> Learning rate: 0.05
#> Iterations: 0
#> Positive class: neg
#> 
#> LossBinomial Loss:
#> 
#>   Loss function: L(y,x) = log(1 + exp(-2yf(x))
#> 
#> 

# Add p-spline base-learner with default parameter:
cboost$addBaselearner(feature = "pressure", id = "spline", bl.factory = BaselearnerPSpline)

# Add another p-spline learner with custom parameters:
cboost$addBaselearner(feature = "age", id = "spline", bl.factory = BaselearnerPSpline, degree = 3, 
  n.knots = 10, penalty = 4, differences = 2)

# Add categorical feature (as single linear base-learner):
cboost$addBaselearner(feature = "pregnant.cat", id = "category", bl.factory = BaselearnerPolynomial,
  degree = 1, intercept = FALSE)

# Check all registered base-learner:
cboost$getBaselearnerNames()
#> [1] "pressure_spline"           "age_spline"               
#> [3] "pregnant.cat_yes_category" "pregnant.cat_no_category"

# Train model:
cboost$train(1000L, trace = 200L)
#>    1/1000: risk = 0.66
#>  200/1000: risk = 0.58
#>  400/1000: risk = 0.57
#>  600/1000: risk = 0.57
#>  800/1000: risk = 0.57
#> 1000/1000: risk = 0.57
#> 
#> 
#> Train 1000 iterations in 0 Seconds.
#> Final risk based on the train set: 0.57
cboost
#> Component-Wise Gradient Boosting
#> 
#> Trained on PimaIndiansDiabetes with target diabetes
#> Number of base-learners: 4
#> Learning rate: 0.05
#> Iterations: 1000
#> Positive class: neg
#> Offset: 0.3118
#> 
#> LossBinomial Loss:
#> 
#>   Loss function: L(y,x) = log(1 + exp(-2yf(x))
#> 
#> 

cboost$getBaselearnerNames()
#> [1] "pressure_spline"           "age_spline"               
#> [3] "pregnant.cat_yes_category" "pregnant.cat_no_category"

selected.features = cboost$getSelectedBaselearner()
table(selected.features)
#> selected.features
#>               age_spline pregnant.cat_no_category          pressure_spline 
#>                      434                      150                      416

params = cboost$getEstimatedCoef()
str(params)
#> List of 4
#>  $ age_spline              : num [1:14, 1] 2.99 1.501 0.588 -0.535 -0.119 ...
#>  $ pregnant.cat_no_category: num [1, 1] -0.299
#>  $ pressure_spline         : num [1:24, 1] -0.8087 -0.4274 -0.0602 0.2226 0.3368 ...
#>  $ offset                  : num 0.312

cboost$train(3000)
#> 
#> You have already trained 1000 iterations.
#> Train 2000 additional iterations.

cboost$plot("age_spline", iters = c(100, 500, 1000, 2000, 3000)) +
  ggthemes::theme_tufte() + 
  ggplot2::scale_color_brewer(palette = "Spectral")

Benchmark ---------

To get an idea of the performance of compboost, we have conduct a small benchmark in which compboost is compared with mboost. For this purpose, the runtime behavior and memory consumption of the two packages were compared. The results of the benchmark can be read here.

Citing

To cite compboost in publications, please use:

Schalk et al., (2018). compboost: Modular Framework for Component-Wise Boosting. Journal of Open Source Software, 3(30), 967, https://doi.org/10.21105/joss.00967

@article{schalk2018compboost,
  author = {Daniel Schalk, Janek Thomas, Bernd Bischl},
  title = {compboost: Modular Framework for Component-Wise Boosting},
  URL = {https://doi.org/10.21105/joss.00967},
  year = {2018},
  publisher = {Journal of Open Source Software},
  volume = {3},
  number = {30},
  pages = {967},
  journal = {JOSS}
}

Testing

You can test the package locally using docker and the compboost-test repository:

  • Latest R release:

    docker run schalkdaniel/compboost-test
    
  • Latest R devel build:

    docker run schalkdaniel/compboost-test:devel