autoxgboostMC - Multiple-Criteria tuning and fitting of xgboost models.

Status

The software is still heavily under construction!

Examples on how to use the software be found in the vignette!

Installing the development version

# Install requirements
install.packages("devtools")
devtools::install_github("compstat-lmu/paper_2019_iml_measures")
devtools::install_github("johnmyleswhite/log4r")
devtools::install_github("mlr-org/mlrMBO")
# Install package
devtools::install_github("pfistfl/autoxgboostMC")

General overview

autoxgboost aims to find an optimal xgboost model automatically using the machine learning framework mlr and the bayesian optimization framework mlrMBO.

Work in progress!

AutoxgboostMC embraces R6 for a cleaner design. See the example code below for the new API.

First we split our data into train and test.

train = sample(c(TRUE, FALSE), getTaskSize(pid.task), replace = TRUE)
task_train = subsetTask(pid.task, subset = train)
task_test = subsetTask(pid.task, subset = !train)

Training and Testing

Then we start the AutoML process:

# Instantiate the object with a list of measures to optimize.
axgb = AutoxgboostMC$new(task_train, measures = list(auc, timepredict))
# Set hyperparameters (we want to work on two threads
axgb$nthread(2L)
# Fit for 5 seconds
axgb$fit(time_budget = 15L)

after searching and finding a good model, we can use the best model to predict.

p = axgb$predict(task_test)

Visualizing the Process

Several options are available for plotting:

axgb$plot_opt_path()
axgb$plot_opt_result()
axgb$plot_pareto_front()

Pipeline

AutoxgboostMC currently searches and optimizes the following Pipeline:

fix_factors %>% impact_encoding | dummy encoding %>% drop_constant_feats %>% learner %>% tune_threshold

To be added:

Categorical Encoding using mixed models
Imputation
Fairness Post-Processing

Eventually:

Ensemble Stacking
Model Compression

autoxgboost - How to Cite

The Automatic Gradient Boosting framework was presented at the ICML/IJCAI-ECAI 2018 AutoML Workshop (poster). Please cite our ICML AutoML workshop paper on arxiv. You can get citation info via citation("autoxgboost") or copy the following BibTex entry:

@inproceedings{autoxgboost,
  title={Automatic Gradient Boosting},
  author={Thomas, Janek and Coors, Stefan and Bischl, Bernd},
  booktitle={International Workshop on Automatic Machine Learning at ICML},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
R		R
attic		attic
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
codecov.yml		codecov.yml

License

pfistfl/autoxgboostMC

Folders and files

Latest commit

History

Repository files navigation

autoxgboostMC - Multiple-Criteria tuning and fitting of xgboost models.

General overview

Training and Testing

Visualizing the Process

Pipeline

autoxgboost - How to Cite

About

Resources

License

Stars

Watchers

Forks

Languages