# Scikit-multiflow

`scikit-multiflow` is an open source framework for multi-output/multi-label and stream data mining. It is being developed by  **Télécom ParisTech** and  **École Polytechique**.  
For more details, please visit the page of the [framework](https://scikit-multiflow.github.io/).

### Classification in Data Streams

The goal is making predictions in a datastream using the `scikit-multiflow` framework. In this notebook we will use two datastream classifiers on Electricity dataset:
- kNN
- Hoeffding Tree

### Evaluation

The following piece of code evaluates classifiers and gives the online visualization. It computes the current and global values for `accuracy` and `kappa`. Current value of selected evaluation measure represents the performance of the model on the current window and global value represents the performance of the model on whole datastream.
`EvaluatePrequential` also gives the summary of evaluation showing the `Evaluation Time`.

In [1]:
%matplotlib notebook

In [None]:
from skmultiflow.trees import HoeffdingTree
from skmultiflow.evaluation.evaluate_prequential import EvaluatePrequential
from skmultiflow.lazy.knn_adwin import KNN
from skmultiflow.data.file_stream import FileStream
import matplotlib as plt

plt.interactive(True)


# 1. Create a stream
stream = FileStream("./elec.csv", n_targets=1, target_idx=-1)

# 2. Prepare for use
stream.prepare_for_use()

# 3. Instantiate the HoeffdingTree classifier
h = [
        KNN(n_neighbors=10, max_window_size=100, leaf_size=30),
        HoeffdingTree()
     ]

# 4. Setup the evaluator
evaluator = EvaluatePrequential(pretrain_size=1000, max_samples=20000, show_plot=True, 
                                metrics=['accuracy', 'kappa'], 
                                batch_size=1)

# 5. Run
evaluator.evaluate(stream=stream, model=h)