First, we must import the TabularDrift detector from the alibi-detect package, as well
as the relevant packages for loading and splitting the data:

In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
import alibi
from alibi_detect.cd import TabularDrift

  "class": algorithms.Blowfish,
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


Next, we must get and split the data:

In [3]:
wine_data = load_wine()
feature_names = wine_data.feature_names
X, y = wine_data.data, wine_data.target
X_ref, X_test, y_ref, y_test = train_test_split(X, y, test_size=0.50,
random_state=42)

## Detecting Data Drift

Next, we must initialize our drift detector using the reference data and by providing the
p-value we want to be used by the statistical significance tests. If you want to make your
drift detector trigger when smaller differences occur in the data distribution, you must
select a larger p_val:

In [4]:
cd = TabularDrift(x_ref=X_ref, p_val=0.05)



We can now check for drift in the test dataset against the reference dataset:

In [5]:
preds = cd.predict(X_test)
labels = ['No', 'Yes']
print('Drift: {}'.format(labels[preds['data']['is_drift']]))

Drift: No


Although there was no drift in this case, we can easily simulate a scenario where the
chemical apparatus being used for measuring the chemical properties experienced a
calibration error, and all the values are recorded as 10% higher than their true values. In
this case, if we run drift detection again on the same reference dataset, we will get the
following output:

In [6]:
X_test_error = X_test * 1.07
preds = cd.predict(X_test_error)
labels = ['No', 'Yes']
print('Drift: {}'.format(labels[preds['data']['is_drift']]))

Drift: Yes


This returns 'Drift: Yes', showing that the drift has been successfully detected.

The first drift detection example was very simple and showed us how to detect a basic case of
one-off data drift, specifically feature drift. We will now show an example of detecting label drift,
which is basically the same but now we simply use the labels as the reference and comparison
dataset

we will use the initial label as our baseline dataset:

In [8]:
cd = TabularDrift(x_ref = y_ref, p_val=0.05)



In [9]:
preds = cd.predict(y_test)
labels = ['No', 'Yes']
print('Drift: {}'.format(labels[preds['data']['is_drift']]))

Drift: No


This can also be used as a sanity check to validate if the training and test data following the same distribution and our sampling test data is representative

In [10]:
preds = cd.predict(y_test*1.07)
labels = ['No', 'Yes']
print('Drift: {}'.format(labels[preds['data']['is_drift']]))

Drift: Yes


## Detecting Concept Drift

The alibi_detect package, which we have already been using, has several useful methods for
online drift detection that can be used to find concept drift as it happens and impacts model
performance. Online here refers to the fact that the drift detection takes place at the level of a
single data point, so this can happen even if data comes in completely sequentially in production.

As an example, let us walk through an example of creating and using one of these online detectors,
the Online Maximum Mean Discrepancy method. The following example assumes that in addition
to the reference dataset, X_ref, we have also defined variables for the expected run time, ert,
and the window size, window_size. The expected run time is a variable that states the average
number of data points the detector should run before it raises false positive detection. The idea
here is that you want the expected run time to be larger but as it gets larger the detector becomes
more insensitive to actual drift, so a balance must be struck. The window_size is the size of the
sliding window of data used in order to calculate the appropriate drift test statistic. A smaller
window_size means you are tuning the detector to find sharp changes in the data or performance
in a small time-frame, whereas longer window sizes will mean you are tuning to look for more
subtle drift effects over longer periods of time.

In [18]:
from alibi_detect.cd import MMDDriftOnline

In [19]:
ert = 50
window_size = 10
cd = MMDDriftOnline(X_ref, ert, window_size, backend='tensorflow',
n_bootstraps=2500)

ImportError: `Framework.TENSORFLOW` not installed. Cannot initialize and run MMDDriftOnline with tensorflow backend. The necessary missing dependencies can be installed using `pip install alibi-detect[tensorflow]`.