# Scikit-multiflow

`scikit-multiflow` is an open source framework for multi-output/multi-label
and stream data mining. It is being developed by  **Télécom ParisTech**,
**École Polytechique** and the **University of Waikato**. For more details,
please visit the page of the [framework](https://scikit-multiflow.github.io/).

### Classification in Data Streams

The goal is making predictions in a data stream using the `scikit-multiflow`
framework. In this notebook we will use two stream classifiers on the 
`Electricity` dataset:

- kNN
- Hoeffding Tree

### Evaluation

The following piece of code evaluates classifiers and gives the online
visualization. It computes the current and global values for `accuracy` and
`kappa`. The "current" value of a selected evaluation measure represents the
performance of the model on the current window and the "global" value
represents the performance of the model on the whole data stream.
`EvaluatePrequential` also gives the summary of evaluation showing the
`Evaluation Time`.

In [1]:
%matplotlib notebook

In [None]:
from skmultiflow.trees import HoeffdingTreeClassifier
from skmultiflow.lazy import KNNClassifier
from skmultiflow.evaluation.evaluate_prequential import EvaluatePrequential
from skmultiflow.data.file_stream import FileStream

import matplotlib as plt

plt.interactive(True)


# Create a stream
stream = FileStream("./elec.csv")
stream.prepare_for_use()   # Not required for v0.5.0+

# Instantiate the KNNClassifier and HoeffdingTreeClassifier
h = [
        KNNClassifier(n_neighbors=10, max_window_size=100, leaf_size=30),
        HoeffdingTreeClassifier()
     ]

# Setup the evaluator
evaluator = EvaluatePrequential(pretrain_size=1000,
                                max_samples=20000,
                                show_plot=True, 
                                metrics=['accuracy', 'kappa'], 
                                batch_size=1)

# Run
evaluator.evaluate(stream=stream, model=h, model_names=["kNN", "HT"])