# Advanced model performance visualization with `Holoviews` (for clustering problems)

This notebook demonstrates how to perform a real-time visualization for clustering problems with `Holoviews`. The requirement for this notebook to run includes the installation of the following libraries:

* `holoviews`: HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple.
* `panel`: Panel is an open-source Python library that lets you create custom interactive web apps and dashboards by connecting user-defined widgets to plots, images, tables, or text.
* `streamz`: Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.

Moreover, apart from `river`, `river-extra` should also be installed. This package contains additional estimators that have not been put into the main `river` package. Currently, it holds various useful metrics for clustering problems in the `metrics.cluster` submodule, although not as commonly used as those already available in the main package.

These libraries are very easy to install via `pip`. 

## Problems with using classification metrics for clustering problems

The principle problem regarding the use of classification metrics for clustering problems lie in the fact that these metrics cannot take into the fact that the order of groups/clusters area commutable. We consider the following mainly used metric, `F1`, as an example.

In [None]:
from river import metrics

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [1, 1, 2, 2, 0, 0]

metric = metrics.FBeta(beta=1) # When `beta` equals 1, this metrics is equivalent to the F1 score.
for yt, yp in zip(y_true, y_pred):
    metric = metric.update(yt, yp)
    
metric

The result of F1 score for this clustering result returns in a score of $0%$, which totally does not make sense since the predicted result is only a permutation of the actual labels. As such, it is extremely important to implement and take intro consideration/use the class of clustering-specific metrics. In `River`, we have implemented the largest family of these metrics in any currently available Python packages for conventional/online machine learning, in a fully incremental fashion.

The list of implemented metrics includes (but is not limited to):

- **EXTERNAL CLUSTERING METRICS**: These metrics requires the existence of external data, i.e the ground truth or true labels. In other words, these metrics represent the correlation, or agreement, between the predicted and actual results. Usually speaking, these metrics will lie within the range $[0, 1]$, and the higher the metric, the better the clustering result is.

    This class of metrics include the following:
    
    - Fowlkes-Mallows Index
    - Jaccard Index
    - Matthew's Correlation Coefficient
    - Prevalence Threshold
    - Purity
    - Q0 and Q2 Indices
    - (Adjusted) Rand Index
    - Variation of Information
    - VBeta (including Completeness, Homogeneity and Completeness)


- **INTERNAL CLUSTERING METRICS**: As the name suggests, this class of metrics only uses the internal information generated from the clustering problem, including cluster centers' positions, predicted labels, distances from the new data point to cluster centers, etc. without the need of any external information. These metrics do not have a definitive range; however, normally, the lower these metrics are, the better the clustering result is.

    This class of metrics include the following:

    - Bayesian Information Criterion (BIC)
    - Davies Bouldin (DB) Index
    - Generalized Dunn's indices (GD43 and GD53)
    - (Root) Mean Squared Standard Deviation
    - SD Validity Index
    - Separation
    - Silhouette
    - Sum of Squares Between Clusters (including Calinski Harabasz (CH) Index, Hartigan Index (H-Index) and WB Index)
    - Sum of Squares Within Clusters
    - Xie-Beni index

Now, we will consider another example to see the difference between a classification-specific metric and a clustering-specific metric, i.e in this case Cohen-Kappa index and VBeta index.

In [None]:
from river import metrics

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [1, 1, 2, 2, 0, 0]

kappa = metrics.CohenKappa()
vbeta = metrics.VBeta(beta=1.0)

for yt, yp in zip(y_true, y_pred):
    kappa = kappa.update(yt, yp)
    vbeta = vbeta.update(yt, yp)
    
print(kappa)
print(vbeta)

In this case, we can easily see that VBeta does a better job in capturing the agreement between `y_true` and `y_pred`, while CohenKappa returns a negative value without any interpretation value.

Next, we will extensively visualize clustering algorithms using `Holoviews`. Within River, we have implemented a total of 6 (5 camera-ready, 1 pending) clustering algorithms of different methods, including:

- CluStream
- DenStream
- DBStream
- Incremental KMeans
- STREAMKMeans
- EvoStream (a fairly new clustering method, developed based on evolutionary algorithms)

River is also the package that includes the most number of clustering algorithms on a unified framework, until this moment.

In [None]:
import time
import itertools
from functools import reduce
from collections import namedtuple, defaultdict

import river
from river import cluster
from river.metrics.report import ClassificationReport
from river.tree import HoeffdingTreeClassifier, \
                       HoeffdingAdaptiveTreeClassifier,  \
                       SGTClassifier, \
                       ExtremelyFastDecisionTreeClassifier
from river.stream import iter_pandas

import pandas as pd

import panel as pn

import streamz
import streamz.dataframe

import holoviews as hv
from holoviews.streams import Buffer
from holoviews import opts

import copy

hv.extension('bokeh')

opts.defaults(opts.Curve(width=900, height=350, show_grid=True, tools=['hover'], framewise=True))
opts.defaults(opts.Table(width=895, height=100))

In [None]:
hv.extension('bokeh')

window_size = 1000

metrics = [river.metrics.MutualInfo(), river.metrics.FowlkesMallows()]
rolling_metrics = [river.utils.Rolling(metric, window_size=window_size) 
                   for metric in metrics]
metric_names = [metric.__class__.__name__ for metric in metrics]

TrackedModel = namedtuple('TrackedModel', ['model', 'rolling_metrics', 'metrics'])

tracked_models = [
    TrackedModel(cluster.DenStream(decaying_factor=0.01, beta=0.5, mu=2.5, epsilon=0.5, n_samples_init=10),
                 copy.deepcopy(rolling_metrics), copy.deepcopy(metrics)),
    TrackedModel(cluster.DBSTREAM(clustering_threshold=1.5, fading_factor=0.05, cleanup_interval=10,
                                  intersection_factor=0.5, minimum_weight=1),
                 copy.deepcopy(rolling_metrics), copy.deepcopy(metrics))
]
n_models = len(tracked_models)

model_names = [item.model.__class__.__name__ for item in tracked_models]

# metrics = ['acc', 'kappa']

# This section creates the streaming dataframe
df = pd.DataFrame([], columns=['model', 'metric', 'sample', 'current', 'mean']).set_index('sample')
streaming_df = streamz.dataframe.DataFrame(streamz.Stream(), example=df)

# Create dictionary to store plot items
metric_dict = defaultdict(list)

PlotItem = namedtuple('PlotItem', ['model_id', 'curve_1', 'curve_2'])

# This section creates the DynamicMap objects (curves and tables)
for model_name, metric_name in itertools.product(model_names, metric_names):
        # The following line creates a DynamicMap for each metric - model combination
        # (e.g. accuracy_mean, kappa_mean, etc.)
        item = PlotItem(
            model_id=model_name,
            curve_1=hv.DynamicMap(hv.Curve, streams=[
                Buffer(streaming_df[(streaming_df['model']==model_name) & (streaming_df['metric']==metric_name)]['current'])]),
            curve_2=hv.DynamicMap(hv.Curve, streams=[
                Buffer(streaming_df[(streaming_df['model']==model_name) & (streaming_df['metric']==metric_name)]['mean'])])
        )
        item.curve_2.opts(opts.Curve(line_dash='dashed'))

        metric_dict[metric_name].append(item)

# This section is sorting the overlays 
# (overlaying accuracy_mean with accuracy_current etc.) and 
# grouping the DynamicMaps with their respective tables 
# into appropriate panel tabs.

# Default variables.
layout_tabs = pn.Tabs()

for metric, metric_elements in metric_dict.items():
    curves = []
    for i, item in enumerate(metric_elements):
        curves.append(item.curve_1.relabel(f'current_{i}'))
        curves.append(item.curve_2.relabel(f'mean_{i}'))
    table=hv.DynamicMap(hv.Table, streams=[
        Buffer(streaming_df[streaming_df['metric']==metric][['model', 'current', 'mean']], length=n_models)
    ])
    overlayed_dmap = reduce((lambda x, y: x * y), curves)
    overlayed_dmap.opts(legend_position='right', legend_offset=(20, 0), xlabel='sample', ylabel='score')
    overlayed_dmap.opts(hv.opts.Curve(color=hv.Cycle('Category20'), bgcolor='#fafafa'))
    overlayed_dmap = pn.pane.HoloViews(overlayed_dmap, linked_axes=False)
    dmap_layout = pn.Column(overlayed_dmap, table)
    layout_tabs.append((metric, dmap_layout))      

# This variable is the layout of the tabs of metrics.
layout_tabs

In [None]:
def run_experiment(dfstream, models):
    n_wait = 100

    # Load data
    data = pd.read_csv("../datasets/agr_a_20k.csv")
    features = data.columns[:-2]
    stream = iter_pandas(X=data[features], y=data['class'])
    

    for sample_cnt, (x, y_true) in enumerate(stream):
        for component in models:
            y_pred = component.model.predict_one(x)
            for metric in component.metrics:
                metric.update(y_true, y_pred)
            for rolling_metric in component.rolling_metrics:
                rolling_metric.update(y_true, y_pred)
            component.model.learn_one(x, y_true)
        
        if (sample_cnt + 1) % n_wait == 0:
            results = []
            for component, metric_name in itertools.product(models, metric_names):
                model_id = component.model.__class__.__name__
                if metric_name == 'MutualInfo':
                    mean_score = component.metrics[0].get()
                    curr_score = component.rolling_metrics[0].get()
                elif metric_name == 'FowlkesMallows':
                    mean_score = component.metrics[1].get()
                    curr_score = component.rolling_metrics[1].get()
                results.append((model_id, metric_name, sample_cnt + 1, curr_score, mean_score))
            dfstream.emit(
                pd.DataFrame(results,
                             columns=['model', 'metric', 'sample', 'current', 'mean']).set_index('sample'))
            

    # Reset DF
    df_metrics = pd.DataFrame(index=[], columns=['model', 'metric', 'sample', 'current', 'mean']).set_index('sample')
    streaming_df = streamz.dataframe.DataFrame(streamz.Stream(), example=df_metrics)

In [None]:
run_experiment(streaming_df, tracked_models)