# Scikit Multiflow

## Setup

In [None]:
!pip install -U scikit-multiflow

## Train and test a stream classification model in scikit-multiflow

In this example, we will use a data stream to train a HoeffdingTreeClassifier and will measure its performance using prequential evaluation.

In [None]:
from skmultiflow.data import WaveformGenerator
from skmultiflow.trees import HoeffdingTreeClassifier
from skmultiflow.evaluation import EvaluatePrequential

# 1. Create a stream
stream = WaveformGenerator()

# 2. Instantiate the HoeffdingTreeClassifier
ht = HoeffdingTreeClassifier()

# 3. Setup the evaluator
evaluator = EvaluatePrequential(show_plot=False,
                                pretrain_size=200,
                                max_samples=20000)

# 4. Run evaluation
evaluator.evaluate(stream=stream, model=ht)

Prequential Evaluation
Evaluating 1 target(s).
Pre-training on 200 sample(s).
Evaluating...
 #################### [100%] [20.98s]
Processed samples: 20000
Mean performance:
M0 - Accuracy     : 0.7890
M0 - Kappa        : 0.6835


[HoeffdingTreeClassifier(binary_split=False, grace_period=200,
                         leaf_prediction='nba', max_byte_size=33554432,
                         memory_estimate_period=1000000, nb_threshold=0,
                         no_preprune=False, nominal_attributes=None,
                         remove_poor_atts=False, split_confidence=1e-07,
                         split_criterion='info_gain', stop_mem_management=False,
                         tie_threshold=0.05)]

![](https://scikit-multiflow.readthedocs.io/en/stable/_images/example_classifier_plot.gif)

## Adaptive Sliding Window (ADWIN) for concept-drift detection

ADWIN adjusts the mean values of the objects and keeps those below a threshold level (epsilon). If the mean values significantly deviate from a threshold, it deletes the corresponding old part. It is adaptive to the changing data. For instance, if the change is taking place the window size will shrink automatically, else if the data is stationary the window size will grow to improve the accuracy.

The intuition behind using ADWIN is to keep statistics from a window of variable size while detecting concept drift. By using the scikit-multiflow library I simulated a distorted data stream with a normal distribution.

The code below is used for catching the concept drift in the normal distribution (with a mean of 0 and a standard deviation of 0.25). I changed the stream values with the indices between 1000 and 2000 with a different normal distribution (with a mean of 1 and a standard deviation of 0.5). Hence, I expected a width change (decrease) between the stream values 1000 till 2000 and an increase in width till the end of the stream.

In [None]:
import numpy as np

from skmultiflow.drift_detection.adwin import ADWIN

adwin = ADWIN(delta=0.0002)
SEED = np.random.seed(42)

# Simulating a data stream as a normal distribution of 1's and 0's
mu, sigma = 0, 0.25  # mean and standard deviation
data_stream = np.random.normal(mu, sigma, 4000)

# Changing the data concept from index 1000 to 2000
mu_broken, sigma_broken = 1, 0.5
data_stream[1000:2000] = np.random.normal(mu_broken, sigma_broken, 1000)

width_vs_variance = []

# Adding stream elements to ADWIN and verifying if drift occurred
for idx in range(4000):

    adwin.add_element(data_stream[idx])

    if adwin.detected_change():
        print(f"Change in index {idx} for stream value {data_stream[idx]}")

    width_vs_variance.append((adwin.width, adwin.variance, idx))

Change in index 1055 for stream value 1.1031808856627254
Change in index 1087 for stream value 1.0399007420293664
Change in index 1119 for stream value 0.4800967344865579
Change in index 1151 for stream value 1.5638901253169493
Change in index 1247 for stream value 2.0981115710764
Change in index 2079 for stream value -0.1125473162128529
Change in index 2111 for stream value -0.04069822651807355
Change in index 2143 for stream value 0.060988299477125064
Change in index 2175 for stream value 0.5223846804236933
Change in index 2367 for stream value -0.4268394893982367
