# Stability analysis of model

Now the objective is to select the best model and to ensure its validity over time.

How can we evaluate our model choice ? This is the protocol I will follow for each of the selected amounts of time : month, trimester, semester.

1. Train the best model on the first period of time __m<sup>1</sup>__
2. Train the best model on the _n<sup>th</sup>_ periods of time __m<sup>th</sup>__
3. At each iteration, compare the clustering fit from __m<sup>1</sup>__ to __m<sup>th</sup>__ using ARI score
4. Plot and decide for which period of time the model is stable

Theses are the results of the experiments :

| Clustering   | Silhouette Score | Clusters | Model |
|--------------|------------------|----------|-------|
| KMeans       | 0.35             | 4        | RFM   |
| CAH          | 0.43             | 5        | RFM   |
| DBSCAN       | 0.64             | 2        | RFM   |
| KMeans       | 0.30             | 6        | RFMS  |
| K-Prototypes | 0.30             | 4        | RFMS  |
| DBSCAN       | 0.61             | 2        | RFMS  |
| CAH          | 0.34             | 2        | RFMS  |
| KMeans       | 0.35             | 7        | RMS   |
| DBSCAN       | 0.76             | 2        | RMS   |
| CAH          | 0.42             | 2        | RMS   |


From the experiments done until then, the best model is DBSCAN but we can't really work with 2 clusters.

In [3]:
import pandas as pd
import plotly.graph_objects as go


def plot_clusters_radars(df: pd.DataFrame):
    """
    Display a radar for every cluster

    :param df: the dataframe with the clusters
    :return: void
    """
    fig = go.Figure()

    for cluster in df["cluster"]:
        fig.add_trace(
            go.Scatterpolar(
                r=df[df["cluster"] == cluster].iloc[:, 1:].values.reshape(-1),
                theta=df.columns[1:],
                fill="toself",
                name="Cluster " + str(cluster),
            )
        )

    fig.update_layout(
        polar=dict(radialaxis=dict(visible=True, range=[0, 1])),
        showlegend=True,
        title={
            "text": "Mean Comparison of Variables per Cluster",
            "y": 0.95,
            "x": 0.5,
            "xanchor": "center",
            "yanchor": "top",
        },
        title_font_color="blue",
        title_font_size=18,
    )

    fig.show()