This code contains java code of plastic and its ablations (CustomEFDT, CustomHT, EFHAT). The code for running our experiments and plotting can be found under github.com/heymarco/CapyMOA-PLASTIC
Access the paper via link.springer.com/chapter/10.1007/978-3-031-70362-1_3
Commonly used incremental decision trees for mining data streams include Hoeffding Trees (HT) and Extremely Fast Decision Trees (EFDT). EFDT exhibits faster learning than HT. However, due to its split revision procedure, EFDT suffers from sudden and unpredictable accuracy decreases caused by subtree pruning. To overcome this, we propose PLASTIC, an incremental decision tree that restructures the otherwise pruned subtree. This is possible due to decision tree plasticity: one can alter a tree's structure without affecting its predictions. We conduct extensive evaluations comparing PLASTIC with state-of-the-art methods on synthetic and real-world data streams. Our results show that PLASTIC improves EFDT's worst-case accuracy by up to 50 % and outperforms the current state of the art on real-world data. We provide an open-source implementation of PLASTIC within the MOA framework for mining high-speed data streams.
If you want to cite this paper, use
@inproceedings{heyden2024leveraging,
title={Leveraging Plasticity in Incremental Decision Trees},
author={Heyden, Marco and Gomes, Heitor Murilo and Fouch{\'e}, Edouard and Pfahringer, Bernhard and B{\"o}hm, Klemens},
booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
pages={38--54},
year={2024},
organization={Springer}
}
MOA is the most popular open source framework for data stream mining, with a very active growing community (blog). It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.
MOA performs BIG DATA stream mining in real time, and large scale machine learning. MOA can be extended with new mining algorithms, and new stream generators or evaluation measures. The goal is to provide a benchmark suite for the stream mining community.
- MOA users: http://groups.google.com/group/moa-users
- MOA developers: http://groups.google.com/group/moa-development
If you want to refer to MOA in a publication, please cite the following JMLR paper:
Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer (2010); MOA: Massive Online Analysis; Journal of Machine Learning Research 11: 1601-1604
