Skip to content

Frouros is a Python library for drift detection in Machine Learning problems.

License

Notifications You must be signed in to change notification settings

jaime-cespedes-sisniega/frouros

 
 

Repository files navigation

ci coverage documentation bsd_3_license

Frouros is a Python library for drift detection in Machine Learning problems.

Frouros provides a combination of classical and more recent algorithms for drift detection, both for the supervised and unsupervised parts, as well as some semi-supervised algorithms. The library tries to fulfill two main objectives: 1. to be able to easily integrate in a machine learning model development pipeline with the scikit-learn library; 2. to unify in a single library the part of concept drift detection and adaptation (traditionally researched and used for streaming/evolving data streams and incremental learning) with the research of change detection in the covariate distributions (also known as data shift, related to the field of statistical two-sample testing and methods that measure distance between distributions).

Quickstart

As a quick and easy example, we can generate two bivariate normal distribution in order to use an unsupervised method like MMD (Maximum Mean Discrepancy). This method tries to verify if generated samples come from the same distribution or not. If they come from different distributions, it means that there is data drift.

from sklearn.gaussian_process.kernels import RBF
import numpy as np
from frouros.unsupervised.distance_based import MMD

np.random.seed(31)
# X samples from a normal distribution with mean = [1. 1.] and cov = [[2. 0.][0. 2.]]
x_mean = np.ones(2)
x_cov = 2*np.eye(2)
# Y samples a normal distribution with mean = [0. 0.] and cov = [[2. 1.][1. 2.]]
y_mean = np.zeros(2)
y_cov = np.eye(2) + 1

num_samples = 200
X_ref = np.random.multivariate_normal(x_mean, x_cov, num_samples)
X_test = np.random.multivariate_normal(y_mean, y_cov, num_samples)

alpha = 0.01  # significance level for the hypothesis test

detector = MMD(num_permutations=1000, kernel=RBF(length_scale=1.0), random_state=31)
detector.fit(X=X_ref)
detector.transform(X=X_test)
mmd, p_value = detector.distance

p_value < alpha
>>> True  # Drift detected. We can reject H0, so both samples come from different distributions.

More examples can be found here.

Installation

Frouros supports Python 3.8, 3.9 and 3.10 versions. It can be installed via pip:

pip install frouros

there is also the option to use PyTorch models with the help of skorch:

pip install frouros[pytorch]

Latest main branch modifications can be installed via:

pip install git+https://github.com/IFCA/frouros.git

Drift detection methods

The currently supported methods are listed in the following table. They are divided in three main categories depending on the type of drift that they are capable of detecting and how they detect it.

Type Subtype Method
Supervised
CUSUM Based
CUSUM
Geometric Moving Average
Page Hinkley
DDM Based
DDM
ECDD-WT
EDDM
HDDM-A
HDDM-W
RDDM
STEPD
Window Based
ADWIN
KSWIN
Semi-supervised
Margin Density Based
MD3-SVM
MD3-RS
Unsupervised
Distance Based
EMD
Histogram Intersection
JS
KL
MMD
PSI
Statistical Test
Chi-Square
CVM
KS
Welch's T-test

Datasets

Some well-known datasets and synthetic generators are provided and listed in the following table.

Type Dataset
Real
Elec2
Synthetic
SEA

About

Frouros is a Python library for drift detection in Machine Learning problems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%