OnlineBoosting

The implementation for paper Machine Unlearning in Gradient Boosting Decision Trees (Accepted on KDD 2023). OnlineBoosting support training, unlearning and tuning. This implementation base on the toolkit of ABCBoost.

Quick Start

Installation guide

Run the following commands to build ABCBoost from source:

git clone https://github.com/huawei-lin/OnlineBoosting.git
cd OnlineBoosting
mkdir build
cd build
cmake ..
make -j
cd ..

This will create three executables (abcboost_train, abcboost_predict, abcboost_unlearn, and abcboost_clean) in the abcboost directory. abcboost_train is the executable to train models. abcboost_predict is the executable to validate and inference using trained models. abcboost_unlearn is the executable to unlearn a given collection of training data from a trained model. abcboost_clean is the executable to clean csv data.

Datasets

Two datasets are provided under data/ folder: pendigits and optdigits.

Training

We support both Robust LogitBoost and MART. Because Robust LogitBoost uses second-order information to compute the gain for tree plits, it often improves MART. Users can replace robustlogit by mart to test different algorithms.

./abcboost_train -method robustlogit -data ./data/optdigits.train.csv -v 0.1 -J 20 -iter 100 -feature_split_sample_rate 0.1

This command will generate optdigits.train.csv_robustlogit_J20_v0.1.model that used for the following unlearning or tuning.

Unlearning

Here we would like to unlearn (delect) the 9-th data sample from the optdigits.train.csv_robustlogit_J20_v0.1.model. Please note that it need to load the original data of the model.

echo 9 > unids.txt # Unlearn 9-th data sample
./abcboost_unlearn -data ./data/optdigits.train.csv -model optdigits.train.csv_robustlogit_J20_v0.1.model -unlearning_ids_path unids.txt

Predicting

Here we would like to evaluate these three models in ./data/optdigits.test.csv.

./abcboost_predict -data ./data/optdigits.test.csv -model optdigits.train.csv_robustlogit_J20_v0.1.model
./abcboost_predict -data ./data/optdigits.test.csv -model optdigits.train.csv_robustlogit_J20_v0.1_unlearn.model

More Configuration Options:

Data related:

-data_min_bin_size minimum size of the bin
-data_max_n_bins max number of bins (default 1000)
-data_path, -data path to train/test data

Tree related:

-tree_max_n_leaves, -J (default 20)
-tree_min_node_size (default 10)
-tree_n_random_layers (default 0)
-feature_split_sample_rate (default 1.0)

Model related:

-model_data_sample_rate (default 1.0)
-model_feature_sample_rate (default 1.0)
-model_shrinkage, -shrinkage, -v, the learning rate (default 0.1)
-model_n_iterations, -iter (default 1000)
-model_n_classes (default 0) the max number of classes allowed in this model (>= the number of classes on current dataset, 0 indicates do not set a specific class number)
-model_name, -method regression/lambdarank/mart/abcmart/robustlogit/abcrobustlogit (default robustlogit)

Unlearning related:

-unlearning_ids_path path to unlearning indices
-lazy_update_freq (default 1)

Parallelism:

-n_threads, -threads (default 1)

Other:

-save_log, 0/1 (default 0) whether save the runtime log to file
-save_model, 0/1 (default 1)
-save_prob, 0/1 (default 0) whether save the prediction probability for classification tasks
-save_importance, 0/1 (default 0) whether save the feature importance in the training

References

If you found OnlineBoosting useful in your research or applications, please cite using the following article:

@inproceedings{DBLP:conf/kdd/LinCL023,
  author       = {Huawei Lin and
                  Jun Woo Chung and
                  Yingjie Lao and
                  Weijie Zhao},
  title        = {Machine Unlearning in Gradient Boosting Decision Trees},
  booktitle    = {Proceedings of the 29th {ACM} {SIGKDD} Conference on Knowledge Discovery
                  and Data Mining, {KDD}},
  address      = {Long Beach, CA},
  pages        = {1374--1383},
  year         = {2023}
}

@article{DBLP:journals/corr/abs-2207-08770,
  author    = {Ping Li and
               Weijie Zhao},
  title     = {Package for Fast ABC-Boost},
  journal   = {CoRR},
  volume    = {abs/2207.08770},
  year      = {2022}
}

Copyright and License

OnlineBoosting is provided under the Apache-2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
R		R
data		data
matlab		matlab
python		python
src		src
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OnlineBoosting

Quick Start

Installation guide

Datasets

Training

Unlearning

Predicting

More Configuration Options:

Data related:

Tree related:

Model related:

Unlearning related:

Parallelism:

Other:

References

Copyright and License

About

Releases

Packages

Languages

License

huawei-lin/GBDT_unlearning

Folders and files

Latest commit

History

Repository files navigation

OnlineBoosting

Quick Start

Installation guide

Datasets

Training

Unlearning

Predicting

More Configuration Options:

Data related:

Tree related:

Model related:

Unlearning related:

Parallelism:

Other:

References

Copyright and License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages