This repository provides the source code, drift detectors, classifiers, experimental setup, and results for the experimental study on locality of concept drifts. The manuscript preprint is available at arXiv.
This website provides interactive plots to display the metrics over time and result files for each experiment, algorithm, and benchmark.
The experiments were run using Python 3.11 and all required packages are available in the file requirements.txt
Adapting to drifting data streams is a significant challenge in online learning. Concept drift must be detected for effective model adaptation to evolving data properties. Concept drift can impact the data distribution entirely or partially, which makes it difficult for drift detectors to accurately identify the concept drift. Despite the numerous concept drift detectors in the literature, standardized procedures and benchmarks for comprehensive evaluation considering the locality of the drift are lacking. We present a novel categorization of concept drift based on its locality and scale. A systematic approach leads to a set of 2,760 benchmark problems, reflecting various difficulty levels following our proposed categorization. We conduct a comparative assessment of 8 state-of-the-art drift detectors across diverse difficulties, highlighting their strengths and weaknesses for future research. We examine how drift locality influences the classifier performance and propose strategies for different drift categories to minimize the recovery time. Lastly, we provide lessons learned and recommendations for future concept drift research.
The package drift_detectors
contains 7 state-of-the-art drift detectors algorithms plus 2 from the river
package. Some of the drift detectors were wrapped into a new class in order to fit better the experiment script.
Algorithm | Script |
---|---|
ADWIN | drift_detectors.ADWINDW |
PageHinkley | drift_detectors.PHDW |
HDDM | drift.binary.HDDM_W() |
KSWIN | drift_detectors.KSWINDW |
DDM | drift.binary.DDM() |
RDDM | drift_detectors.RDDM_M |
STEPD | drift_detectors.STEPD_M |
ECDD | drift_detectors.ECDDWT_M |
EDDM | drift_detectors.EDDM_M |
Single-Class drifts are generated using the file generators/single_class.py
The following code serves as an illustrative example of generating streams featuring single-class drifts. It facilitates the generation of streams with Local Drifts exhibiting 3, 5, and 10 classes, encompassing 2 and 5 features. Additionally, it includes only Sudden Drifts, denoted by a drift_width
value of 1.
from generators.single_class import generate_streams
streams = generate_streams(
n_classes = [3, 5, 10],
n_features = [2, 5],
drift_width = [1],
locality = ["local"],
):
Local | Global |
Multi-Class drifts are generated using the file generators/multi_class.py
The following code serves as an illustrative example of generating streams featuring multi-class drifts. It facilitates the generation of streams with Local Drifts exhibiting 3, 5, and 10 classes, encompassing 2 and 5 features. Additionally, this code allows the specification of the number of affected classes in each scenario. For instance, with 3 classes, only 1 will be affected, while for 5 classes, there will be streams with 2 and 3 affected classes, and so on.
from generators.single_class import generate_streams
streams = generate_streams(
n_classes = [3, 5, 10],
n_features = [2, 5],
classes_affected = [[1], [2, 3], [3, 5]]
drift_width = [1],
locality = ["local"],
):
Local | Global |
This website provides interactive plots to display the metrics over time and result tables for each experiment, algorithm, and benchmark.
Complete csv results for all experiments, algorithms, and benchmarks reported on the manuscript are available to download to facilitate the transparency, reproducibility, and extendability of the experimental study.
@misc{aguiar2023local,
author={Aguiar, Gabriel and Cano, Alberto},
title={A comprehensive analysis of concept drift locality in data streams},
year={2023},
eprint={2311.06396},
archivePrefix={arXiv},
primaryClass={cs.LG}
}