# Drift Detection

When a model is deployed, data arrive sequentially and we wish to detect drifts as soon as possible. Online detectors assume the reference data is large and fixed and operate on single data points rather than batches. These data points are passed into the test-window and a test-statistic between the reference data and test-window is computed at each time-step.

In [80]:
import random
import glob
import numpy as np
from PIL import Image
from keras.models import Model, load_model
from alibi_detect.cd import LSDDDriftOnline

For high-dimensional data, we typically want to reduce the dimensionality before passing it to the detector. Since we deal with image data, we will use the model autoencoder to map images into a lower dimensional space.

In [81]:
model = load_model("../models/model.tf", compile=False)
encoder = Model(inputs=model.input, outputs=model.get_layer(name='embedding').output)

We use the online Least Squares Density Difference detector to compute the test-statistic. When the test-statistic exceeds a preconfigured threshold, drift is detected. Configuration of the threshold requires an expected run-time (ERT) which specifies how many time-steps the detector, on average, should run for in the absence of drift before making a false detection. It also requires specification of a test-window size, with smaller windows allowing faster response to severe drift and larger windows allowing more power to detect slight drift.

In [88]:
x_ref = np.load("../data/processed/x.npy")[:3000]
detector = LSDDDriftOnline(x_ref, ert=500, window_size=30, preprocess_fn=encoder.predict)



Computing thresholds: 100%|██████████| 29/29 [00:35<00:00,  1.23s/it]


We simulate a stream of images by sampling the Fashion MNIST dataset.

In [89]:
sample_list = glob.glob("../data/raw/fashion_mnist/*.png")

When we apply the detector to a stream of normal images, the average runtime is close to the desired ERT, as expected.

In [91]:
detector.reset()
drift = False
while not drift:
    image = Image.open(random.choice(sample_list))
    image = np.expand_dims(image, axis=-1) / 255.0
    prediction = detector.predict(image)
    drift = prediction['data']['is_drift']
print("Images processed until drift: ", prediction['data']['time'])

Images processed until drift:  876


We can simulate a drifted distribution by rotating the images by 30 degrees. A drift will now be detected sooner.

In [92]:
detector.reset()
drift = False
while not drift:
    image = Image.open(random.choice(sample_list))
    image = image.rotate(30)
    image = np.expand_dims(image, axis=-1) / 255.0
    prediction = detector.predict(image)
    drift = prediction['data']['is_drift']
print("Images processed until drift: ", prediction['data']['time'])

Images processed until drift:  22
