Skip to content

pfistfl/autoxgboostMC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autoxgboostMC - Multiple-Criteria tuning and fitting of xgboost models.

Build Status Codecov test coverage

Status

The software is still heavily under construction!

Examples on how to use the software be found in the vignette!


  • Installing the development version

    # Install requirements
    install.packages("devtools")
    devtools::install_github("compstat-lmu/paper_2019_iml_measures")
    devtools::install_github("johnmyleswhite/log4r")
    devtools::install_github("mlr-org/mlrMBO")
    # Install package
    devtools::install_github("pfistfl/autoxgboostMC")

General overview

autoxgboost aims to find an optimal xgboost model automatically using the machine learning framework mlr and the bayesian optimization framework mlrMBO.

Work in progress!

AutoxgboostMC embraces R6 for a cleaner design. See the example code below for the new API.

First we split our data into train and test.

train = sample(c(TRUE, FALSE), getTaskSize(pid.task), replace = TRUE)
task_train = subsetTask(pid.task, subset = train)
task_test = subsetTask(pid.task, subset = !train)

Training and Testing

Then we start the AutoML process:

# Instantiate the object with a list of measures to optimize.
axgb = AutoxgboostMC$new(task_train, measures = list(auc, timepredict))
# Set hyperparameters (we want to work on two threads
axgb$nthread(2L)
# Fit for 5 seconds
axgb$fit(time_budget = 15L)

after searching and finding a good model, we can use the best model to predict.

p = axgb$predict(task_test)

Visualizing the Process

Several options are available for plotting:

axgb$plot_opt_path()
axgb$plot_opt_result()
axgb$plot_pareto_front()

Result of an AutoxgboostMC run

Pipeline

AutoxgboostMC currently searches and optimizes the following Pipeline:

fix_factors %>% impact_encoding | dummy encoding %>% drop_constant_feats %>% learner %>% tune_threshold

To be added:

  • Categorical Encoding using mixed models
  • Imputation
  • Fairness Post-Processing

Eventually:

  • Ensemble Stacking
  • Model Compression

autoxgboost - How to Cite

The Automatic Gradient Boosting framework was presented at the ICML/IJCAI-ECAI 2018 AutoML Workshop (poster). Please cite our ICML AutoML workshop paper on arxiv. You can get citation info via citation("autoxgboost") or copy the following BibTex entry:

@inproceedings{autoxgboost,
  title={Automatic Gradient Boosting},
  author={Thomas, Janek and Coors, Stefan and Bischl, Bernd},
  booktitle={International Workshop on Automatic Machine Learning at ICML},
  year={2018}
}

About

Multi-criteria Automatic Gradient Boosting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%