# Covariance matrix change-point detection under graph stationarity assumption


## Problem formulation

Let $y = (y_1, \ldots, y_t, \ldots, y_T), y_t \in \mathbb{R}^{N}$ a graph signal lying on the nodes of the graph $G = (V, E, W)$, with $N =|V|$.

We aim at detecting changes of the (spatial) covariance matrix $\Sigma_t$ of the graph signals $y_t$. We assume that there exits an unknown set of change-points $\Tau = (t_1, \ldots, t_K) \subset [1, T]$ with unknown cardinality such that the covariance matrix of the graph signals is constant over any segment $[t_k, t_{k+1}]$. We do the following hypothesis:

1. the signals $y_t$ follow a multivariate Gaussian distribution with fixed covariance matrix over each segment and known mean $\mu$, i.e:
$$\forall k \in [1, K] ~ \forall t \in [t_k, t_{k+1}] \quad y_t \sim \mathcal{N}(\mu, \Sigma_k)$$

2. over each segment, the signals $y_t$ verify the second order wide-sense graph stationarity:
$$\forall k \in [1, K] \quad \Sigma_k = U \text{diag}(\gamma_k)U^T $$

where the matrix $U$ contains the eigenvectors of the graph combinatorial Laplacian matrix $L = D - W$ in its columns. 

The Graph Fourier Transform $\tilde{y}$ of a signal $y$ is defined by $\tilde{y} = U^T y $.


Based on the above assumptions, the cost derived from the maximum log-likelihood over a segment $[a, b-1]$ writes:

\begin{align*}
    c_s(y_{a}, \ldots, y_{b-1}) = ~ & (b - a) \sum_{n=1}^N \log \hat{\gamma}_{a.b}^{(n)} ~ + ~ \sum_{t=a}^{b-1} \sum_{n=1}^N \frac{\left(\tilde{y}_t^{(n)} - \hat{\tilde{\mu}}_T^{(n)}\right)^2}{\hat{\gamma}_{a.b}^{(n)}} = ~ (b - a) \sum_{n=1}^N \log \hat{\gamma}_{a.b}^{(n)} ~ + N(b-a)
\end{align*}

where:

- $\hat{\mu}_{T}$ is the empirical mean of the process over $[0, T]$
- $\hat{\gamma}_{a..b}$ is the (empirical) biased correlogram/periodogram of the process over $[a, b-1]$: $\hat{\gamma}_{a..b} = \frac{1}{(b-a)} \sum_{t=a}^{b-1} \left(\tilde{y}_t^{(n)} - \hat{\tilde{\mu}}_T^{(n)}\right)^2$

### Computation of the cost

In [None]:
import numpy as np

from scipy.linalg import eigh
from ruptures.base import BaseCost

class CostGraphStatioNormal(BaseCost):

    """
    """

    model = "graph_sationary_normal_cost"

    def __init__(self, laplacian_mat) -> None:
        """
        Args:
            laplacian_mat (array): the discrete Laplacian matrix of the graph: D - W
            where D is the diagonal matrix diag(d_i) of the node degrees and W the adjacency matrix
        """
        self.graph_laplacian_mat = laplacian_mat
        self.signal = None
        self.gft_square_cumsum = None
        self.gft_mean = None
        self.min_size = laplacian_mat.shape[0]
        super().__init__()
    
    def fit(self, signal):
        """Performs pre-computations for per-segment approximation cost.

        NOTE: the number of dimensions of the signal and their ordering
        must match those of the nodes of the graph.
        The function eigh used below returns the eigenvector corresponding to 
        the ith eigenvalue in the ith column eigvect[:, i]

        Args:
            signal (array): of shape [n_samples, n_dim].
        """
        self.signal = signal
        # Computation of the GFSS
        _, eigvects = eigh(self.graph_laplacian_mat)
        gft =  signal @ eigvects # equals signal.dot(eigvects) = eigvects.T.dot(signal.T).T
        self.gft_mean = np.mean(gft, axis=0)
        # Computation of the per-segment cost utils
        self.gft_square_cumsum = np.concatenate([np.zeros((1, signal.shape[1])), np.cumsum((gft - self.gft_mean[None, :])**2, axis=0)], axis=0)
        return self

    def error(self, start, end):
        """

        Args:
            start (int): start of the segment
            end (int): end of the segment

        Returns:
            float: segment cost
        """
        if end - start < self.min_size:
            raise ValueError(f'end - start shoud be higher than {self.min_size}')
        sub_square_sum = self.gft_square_cumsum[end] - self.gft_square_cumsum[start]
        return (end  - start) * np.sum(np.log(sub_square_sum / (end - start)))


## Experiments: synthetic data

### Observation on the minimum distance between consecutive change points

We require that the different change points $(t_1, \ldots, t_K)$ verify:

$$|t_{k+1} - t_k| >= l ~ \forall k \in [1, K-1] $$

where $l$ can be seen as the minimum segment length. In this paragraph we give a meaningful lower bound of this parameter. Such lower bound is related to the computation of the cost functions over the segments $[a, b] \subset [0, T]$, namely the graph stationary normal cost function $c_s$ described above and the standard normal cost function $c_n$:

- $ c_n(y_{a}, \ldots, y_{b-1}) = (b - a) \log  \left[ \det \left( \hat \Sigma_{a..b} \right) \right]$
- $ c_s(y_{a}, \ldots, y_{b-1}) = (b - a) \sum_{n=1}^N \log \hat{\gamma}_{a.b}^{(n)} $

Based on the formula of the spectrogram $\hat{\gamma}_{a.b}$ given in the introduction, there is no numerical constraints for the feasibility of the computation. However, the $\log [ \det ( \cdot ) ]$ function used in the formula for $c_n$ should be applied to invertible matrices $\Sigma_{a..b}$ only. Therefore, we should focus on the conditions under which the matrix:

$$ \hat \Sigma_{a..b} = \frac{1}{b-a} \sum_{t=a}^{b-1} (y_t - \mu_T) (y_t - \mu_T)^T \quad \text{ with } y_t \sim \mathcal{N}(\mu, \Sigma)  $$

is invertible. Actually, such conditions have already been clearly stated in different works from Random Matrix Theory (RMT). For instance, it is shown in [[Izenman2008](#Izenman2008)] that $n \hat \Sigma_{a..b} \sim \mathcal{W}(b-a, \Sigma)$ follows the Wishart distribution. In this framework, it is possible to study the distribution of the eigenvalues of $\hat \Sigma_{a..b}$ and to deduce that: 

$$ \text{If } b-a > N \text{ with } N  \text{ the dimension of } y_t, \text{ then } \hat \Sigma_{a..b} \text{ is almost surely invertible }   $$

Conversely, it is possible to show that if $ b-a < N $ (the number of observations is lower then the number of variables), the matrix $\hat \Sigma_{a..b}$ is nalmost surely not invertible. This can be done by considering the family the first $(N+1)$ columns of $\hat \Sigma_{a..b}$.

Thus, the right lower-bound $l$ should be $\boxed{l = N}$, which is consistent with statement from [[Ryan2023](#Ryan2023)].

Note: with segments of length $l$, one is not guaranteed to compute good estimates of the covariance matrix, but at least such computations is almost surely admissible.

# Change comment to minimal gap constraint

### Utils and visualization

In [None]:
import time

import networkx as nx
import ruptures as rpt
import matplotlib.pyplot as plt

from tqdm import tqdm

In [None]:
import subprocess

def get_git_head_short_hash() -> str:
    return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).decode('ascii').strip()

In [None]:
def turn_all_list_of_dict_into_str(data:dict):
    new_dict = {}
    for key, val in data.items():
        if isinstance(val, list):
            new_dict[key] = str(val)
        elif isinstance(val, dict):
            new_dict[key] = turn_all_list_of_dict_into_str(val)
        else:
            new_dict[key] = val
    return new_dict

In [None]:
def generate_random_er_graphs(params_rng, nx_graph_seed, min_n_nodes=10, max_n_nodes=30, min_edge_p=0.15, max_edge_p=0.5):
    n_nodes = params_rng.integers(low=min_n_nodes, high=max_n_nodes+1)
    edge_p = min_edge_p + (max_edge_p-min_edge_p) * params_rng.random()
    G = nx.erdos_renyi_graph(n=n_nodes, p=edge_p, seed=nx_graph_seed)
    return G

In [None]:
graph_seed = 0
params_seed = 1
fig, axes = plt.subplots(1, 5, figsize=(4*5, 3))
params_rng = np.random.default_rng(seed=params_seed)
for _ in range(5):
    G = generate_random_er_graphs(params_rng, graph_seed)
    coord = nx.spring_layout(G, seed=0)
    nx.draw_networkx(G, with_labels=False, pos=coord, node_size=50, ax=axes[_])

In [None]:
def generate_gaus_signal_with_cov_diag_in_basis(n_dims, n_samples, eigvects, signal_rng, diag_cov_max=1):
    # randomly draw diagonal coef (in the fourier space)
    diag_coefs = diag_cov_max * signal_rng.random(n_dims)
    diag_mat = np.diag(diag_coefs)
    # compute the corresponding covariance matrix and signal 
    cov_mat = eigvects @ diag_mat @ eigvects.T
    signal = signal_rng.multivariate_normal(np.zeros(n_dims), cov_mat, size=n_samples)
    return signal

In [None]:
from typing import Literal

seg_length = Literal["large", "minimal"]

def get_min_size_for_hyp(n_dims, hyp:seg_length = "minimal"):
    # the minimal segment length for admissible computations
    min_size = n_dims
    if hyp == "large":
        #for segment long enough for good estimates
        min_size = n_dims * (n_dims-1) / 2
    return min_size

In [None]:
def draw_bkps_with_gap_constraint(n_samples, bkps_gap, bkps_rng, max_tries=10000, n_bkps_max=10):
    # randomly pick an admissible number of bkps
    n_bkps = bkps_rng.integers(low=1, high=min(n_bkps_max, n_samples // bkps_gap))
    bkps = []
    n_tries = 0
    # select admissible randomly drawn bkps
    while n_tries < max_tries and len(bkps) < n_bkps:
        new_bkp = bkps_rng.integers(low=bkps_gap, high=n_samples-bkps_gap)
        to_keep = True
        for bkp in bkps:
            if abs(new_bkp - bkp) < bkps_gap:
                to_keep = False
                break
        if to_keep:
            bkps.append(new_bkp)
        n_tries+=1
    bkps.sort()
    return bkps + [n_samples]

In [None]:
from ruptures.metrics import precision_recall
from ruptures.metrics import hausdorff

def update_results(true_bkps, pred_bkps, results_dic, prec_rec_margin):
    preci, recall = precision_recall(true_bkps, pred_bkps, prec_rec_margin)
    hsdrf = hausdorff(true_bkps, pred_bkps)
    results_dic["precision"]['raw'].append(round(preci, 4))
    results_dic["recall"]['raw'].append(round(recall, 4))
    results_dic["hausdorff"]['raw'].append(hsdrf)

In [None]:
def compute_and_add_stat_on_results(model_results:dict):
    for model_res in model_results.values():
        for metric_name, res in model_res.items():
            model_res[metric_name]['mean'] = round(np.mean(res['raw']), ndigits=4)
            model_res[metric_name]['std'] = round(np.std(res['raw']), ndigits=4)
    return model_results

In [None]:
import json
import os
from datetime import datetime

def save_results(results_per_models, stats, dir, comment=''):
    now = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    to_save = {"date_time": now, 'comment': comment}
    to_save["hyper-parameters"] = stats
    to_save["results"] = results_per_models
    to_save = turn_all_list_of_dict_into_str(to_save)
    path = os.path.join(dir, now)
    with open(f"{path}.json", 'w+') as res_f:
        json.dump(to_save, res_f, indent=4)

### A. Data verifying the hypothesis of the model

In this experiment, we generate data according to the hypothesis presented in the [problem formulation](#problem-formulation) and we compare our method to the cost function for standard covariance change detection in Gaussian models (that is supposed to cover our hypothesis). More precisely, we randomly generate a graph and a corresponding multivariate Gaussian signal, undergoing a (known) random number of covariance change points. The comparison between the two methods relies on the precision, recall and Hausdorff metrics.

In [None]:
def generate_rd_signal_in_hyp(G:nx.Graph, signal_rng:np.random.Generator, n_samples:int=500, diag_cov_max=1, n_bkps_max=10):
    # randomly draw a set of admissible change points
    n_dims = G.number_of_nodes()
    min_size = get_min_size_for_hyp(n_dims=n_dims)
    bkps = draw_bkps_with_gap_constraint(n_samples, min_size, signal_rng, n_bkps_max)
    # generate the signal
    _, eigvects = eigh(nx.laplacian_matrix(G).toarray())
    signal_gen_func = lambda size: generate_gaus_signal_with_cov_diag_in_basis(n_dims, size, eigvects, signal_rng, diag_cov_max)
    signal = signal_gen_func(bkps[0])
    # add each sub-segment
    for i in range(1, len(bkps)):
        sub_signal = signal_gen_func(bkps[i] - bkps[i-1])
        signal = np.concatenate([signal, sub_signal], axis=0)
    return bkps, signal

In [None]:
nx_graph_seed = 1
graph_seed = 2
signal_seed = 3

signal_rng = np.random.default_rng(seed=signal_seed)
graph_rng = np.random.default_rng(seed=graph_seed)

G = generate_random_er_graphs(graph_rng, nx_graph_seed)
bkps, s = generate_rd_signal_in_hyp(G, signal_rng, n_samples=1000, diag_cov_max=10)

fig, axes = plt.subplot_mosaic(mosaic='ABB', figsize=(16, 3))
nx.draw_networkx(G, ax=axes['A'])
for i in range(6):
    axes['B'].plot(10*i+s[:, i])
axes['B'].set_yticks([])
for bkp in bkps[:-1]:
    axes['B'].axvline(x=bkp, c='k')

In [None]:
import warnings

N_EXP1 = 50
NX_GRAPH_SEED = 1
GRAPH_SEED = 2
SIGNAL_SEED = 3
PRECI_RECALL_MARGIN = 2
MAX_N_NODES = 25
N_SAMPLES = 1000
MAX_N_BKPS = 10

signal_rng = np.random.default_rng(seed=SIGNAL_SEED)
graph_rng = np.random.default_rng(seed=GRAPH_SEED)

# initialization
exp_statistics_1 = {"commit hash":get_git_head_short_hash(), "n_iter": N_EXP1, "graph_seed": GRAPH_SEED, "nx_graph_seed": NX_GRAPH_SEED, "signal_seed": SIGNAL_SEED, "metrics_margin": PRECI_RECALL_MARGIN, "n_samples": N_SAMPLES, "max_n_nodes": MAX_N_NODES, "max_n_bkps": MAX_N_BKPS, "n_nodes": [], "bkps":[]}
statio_results_1 = {"recall": {'raw': []}, "precision": {'raw': []}, "hausdorff": {'raw': []}, "time": {"raw": []}}
normal_results_1 = {"recall": {'raw': []}, "precision": {'raw': []}, "hausdorff": {'raw': []}, "time": {"raw": []}}

for exp_id in tqdm(range(N_EXP1), desc='Running experiment'):
    
    # data and ground truth generation
    G = generate_random_er_graphs(graph_rng, NX_GRAPH_SEED, max_n_nodes=MAX_N_NODES)
    gt_bkps, signal = generate_rd_signal_in_hyp(G, signal_rng, n_samples=N_SAMPLES, n_bkps_max=MAX_N_BKPS)
    min_size = get_min_size_for_hyp(G.number_of_nodes())

    # prediction with standard normal cost
    t1 = time.perf_counter()
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=UserWarning)
        normal_cost = rpt.costs.CostNormal()
        algo_normal = rpt.Dynp(custom_cost=normal_cost, jump=1, min_size=min_size).fit(signal)
        normal_bkps = algo_normal.predict(n_bkps=len(gt_bkps)-1)
    t2 = time.perf_counter()
    normal_results_1["time"]['raw'].append(round(t2 - t1, ndigits=3))

    # prediction with stationary normal cost
    t1 = time.perf_counter()
    statio_cost = CostGraphStatioNormal(nx.laplacian_matrix(G).toarray())
    algo_statio = rpt.Dynp(custom_cost=statio_cost, jump=1, min_size=min_size).fit(signal)
    statio_bkps = algo_statio.predict(n_bkps=len(gt_bkps)-1)
    t2 = time.perf_counter()
    statio_results_1["time"]['raw'].append(round(t2 - t1, ndigits=3))

    # performances evaluation
    update_results(gt_bkps, normal_bkps, normal_results_1, PRECI_RECALL_MARGIN)
    update_results(gt_bkps, statio_bkps, statio_results_1, PRECI_RECALL_MARGIN)

    # statistics collection
    exp_statistics_1["bkps"].append((len(gt_bkps)-1, gt_bkps[:-1]))
    exp_statistics_1["n_nodes"].append(G.number_of_nodes())

# results post-precessing and saving
full_results_1 = {"statio normal cost": statio_results_1, "normal cost": normal_results_1}
full_results_1 = compute_and_add_stat_on_results(full_results_1)
save_results(full_results_1, exp_statistics_1, 'results/within_hypothesis', comment='1. bkps drawn with minimal gap constraints')

In [None]:
print("STATISTICS:\t", exp_statistics_1)
print("WITH STATIO:\t", statio_results_1)
print("WITHOUT STATIO:\t", normal_results_1)

### B. Data not verifying the hypothesis of the model

In what follows, we work with signals verifying hypothesis 1 from the [the problem formulation](#problem-formulation), but not respecting the second hypothesis. More precisely, we will generate covariance matrices that are diagonalizable in a basis different from the Fourier one.

In [None]:
def generate_rd_signal_from_other_basis(G:nx.Graph, signal_rng:np.random.Generator, n_samples:int=500, diag_cov_max=1, n_bkps_max=10):
    # randomly draw a set of admissible change points
    n_dims = G.number_of_nodes()
    min_size = get_min_size_for_hyp(n_dims=n_dims)
    bkps = draw_bkps_with_gap_constraint(n_samples, min_size, signal_rng, n_bkps_max)
    # generate another graph to compute the signal covariance matrices
    G_for_cov = generate_random_er_graphs(signal_rng, NX_GRAPH_SEED, n_dims, n_dims, min_edge_p=0.01, max_edge_p=1)
    _, eigvects = eigh(nx.laplacian_matrix(G_for_cov).toarray())
    signal_gen_func = lambda size: generate_gaus_signal_with_cov_diag_in_basis(n_dims, size, eigvects, signal_rng, diag_cov_max)
    signal = signal_gen_func(bkps[0])
    # add each sub-segment
    for i in range(1, len(bkps)):
        sub_signal = signal_gen_func(bkps[i] - bkps[i-1])
        signal = np.concatenate([signal, sub_signal], axis=0)
    return bkps, signal

In [None]:
N_EXP2 = 50

signal_rng = np.random.default_rng(seed=SIGNAL_SEED)
graph_rng = np.random.default_rng(seed=GRAPH_SEED)

# initialization
exp_statistics_2 = {"commit hash":get_git_head_short_hash(), "n_iter": N_EXP2, "graph_seed": GRAPH_SEED, "nx_graph_seed": NX_GRAPH_SEED, "signal_seed": SIGNAL_SEED, "metrics_margin": PRECI_RECALL_MARGIN, "n_samples": N_SAMPLES, "max_n_nodes": MAX_N_NODES, "max_n_bkps": MAX_N_BKPS, "n_nodes": [], "bkps":[]}
statio_results_2 = {"recall": {'raw': []}, "precision": {'raw': []}, "hausdorff": {'raw': []}, "time": {"raw": []}}
normal_results_2 = {"recall": {'raw': []}, "precision": {'raw': []}, "hausdorff": {'raw': []}, "time": {"raw": []}}

for exp_id in tqdm(range(N_EXP2), desc='Running experiment'):
    
    # data and ground truth generation
    G = generate_random_er_graphs(graph_rng, NX_GRAPH_SEED, max_n_nodes=MAX_N_NODES)
    gt_bkps, signal = generate_rd_signal_from_other_basis(G, signal_rng, n_samples=N_SAMPLES, n_bkps_max=MAX_N_BKPS)
    min_size = get_min_size_for_hyp(G.number_of_nodes())

    # prediction with standard normal cost
    t1 = time.perf_counter()
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=UserWarning)
        normal_cost = rpt.costs.CostNormal()
        algo_normal = rpt.Dynp(custom_cost=normal_cost, jump=1, min_size=min_size).fit(signal)
        normal_bkps = algo_normal.predict(n_bkps=len(gt_bkps)-1)
    t2 = time.perf_counter()
    normal_results_2["time"]['raw'].append(round(t2 - t1, ndigits=3))

    # prediction with stationary normal cost
    t1 = time.perf_counter()
    statio_cost = CostGraphStatioNormal(nx.laplacian_matrix(G).toarray())
    algo_statio = rpt.Dynp(custom_cost=statio_cost, jump=1, min_size=min_size).fit(signal)
    statio_bkps = algo_statio.predict(n_bkps=len(gt_bkps)-1)
    t2 = time.perf_counter()
    statio_results_2["time"]['raw'].append(round(t2 - t1, ndigits=3))

    # performances evaluation
    update_results(gt_bkps, normal_bkps, normal_results_2, PRECI_RECALL_MARGIN)
    update_results(gt_bkps, statio_bkps, statio_results_2, PRECI_RECALL_MARGIN)

    # statistics collection
    exp_statistics_2["bkps"].append((len(gt_bkps)-1, gt_bkps[:-1]))
    exp_statistics_2["n_nodes"].append(G.number_of_nodes())

# results post-precessing and saving
full_results_2 = {"statio normal cost": statio_results_2, "normal cost": normal_results_2}
full_results_2 = compute_and_add_stat_on_results(full_results_2)
save_results(full_results_2, exp_statistics_2, 'results/out_of_hypothesis', comment='1. the covariance matrix is diagonal in the Fourier basis of another random Erdos-Renyi graphs \n2. bkps drawn with  bkps drawn with minimal gap constraints')

In [None]:
print("STATISTICS:\t", exp_statistics_2)
print("WITH STATIO:\t", statio_results_2)
print("WITHOUT STATIO:\t", normal_results_2)

### C. Data verifying the hypothesis of the models, with node dropping to simulate breakdowns

In what follows, we work with signals verifying the two hypothesis from the [the problem formulation](#problem-formulation). Additionally, we will randomly select a (very) small number of nodes and simulate the breakdown of the corresponding sensors by setting the value of signal lying on this node to 0 for a random time length.

In [None]:
def modify_signal_to_simulate_breakdown(signal, signal_rng, n_breakdown_max):
    # initialization
    n_samples = signal.shape[0]
    n_breakdown = signal_rng.integers(1, n_breakdown_max+1)
    # randomly pick the location and time length of the breakdowns
    breakdowns = {}
    broken_node_ids = signal_rng.integers(0, signal.shape[1], size=(n_breakdown))
    for node_id in broken_node_ids:
        start = signal_rng.integers(0, n_samples-1)
        end = signal_rng.integers(start, n_samples)
        signal[start:end, node_id] = 0
        breakdowns[node_id] = (start, end)
    return signal, breakdowns

In [None]:
nx_graph_seed = 1
graph_seed = 2
signal_seed = 5

signal_rng = np.random.default_rng(seed=signal_seed)
graph_rng = np.random.default_rng(seed=graph_seed)

G = generate_random_er_graphs(graph_rng, nx_graph_seed)
bkps, s = generate_rd_signal_in_hyp(G, signal_rng, n_samples=G.number_of_nodes()**2, diag_cov_max=1)
s, breakdowns = modify_signal_to_simulate_breakdown(s, signal_rng, G.number_of_nodes()//10)

print("The generated breakdowns are:", breakdowns)

fig, ax = plt.subplots(1, 1, figsize=(12,3))
for i in range(5):
    ax.plot(10*i+s[:, i])
for i, node_id in enumerate(breakdowns.keys()):
    ax.plot(10*(i+5)+s[:, node_id])

In [None]:
N_EXP3 = 50

signal_rng = np.random.default_rng(seed=SIGNAL_SEED)
graph_rng = np.random.default_rng(seed=GRAPH_SEED)

# initialization
exp_statistics_3 = {"commit hash":get_git_head_short_hash(), "n_iter": N_EXP3, "graph_seed": GRAPH_SEED, "nx_graph_seed": NX_GRAPH_SEED, "signal_seed": SIGNAL_SEED, "metrics_margin": PRECI_RECALL_MARGIN, "n_samples": N_SAMPLES, "max_n_nodes": MAX_N_NODES, "max_n_bkps": MAX_N_BKPS, "n_nodes": [], "bkps":[], "breakdowns": []}
statio_results_3 = {"recall": {'raw': []}, "precision": {'raw': []}, "hausdorff": {'raw': []}, "time": {"raw": []}}
normal_results_3 = {"recall": {'raw': []}, "precision": {'raw': []}, "hausdorff": {'raw': []}, "time": {"raw": []}}

for exp_id in tqdm(range(N_EXP3), desc='Running experiment'):
    
    # data and ground truth generation
    G = generate_random_er_graphs(graph_rng, NX_GRAPH_SEED, max_n_nodes=MAX_N_NODES)
    n_nodes = G.number_of_nodes()
    gt_bkps, signal = generate_rd_signal_in_hyp(G, signal_rng, n_samples=N_SAMPLES, n_bkps_max=MAX_N_BKPS)
    signal, breakdowns = modify_signal_to_simulate_breakdown(signal, signal_rng, n_nodes//10)
    min_size = get_min_size_for_hyp(G.number_of_nodes())

    # prediction with standard normal cost
    t1 = time.perf_counter()
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=UserWarning)
        normal_cost = rpt.costs.CostNormal()
        algo_normal = rpt.Dynp(custom_cost=normal_cost, jump=1, min_size=min_size).fit(signal)
        normal_bkps = algo_normal.predict(n_bkps=len(gt_bkps)-1)
    t2 = time.perf_counter()
    normal_results_3["time"]['raw'].append(round(t2 - t1, ndigits=3))

    # prediction with stationary normal cost
    t1 = time.perf_counter()
    statio_cost = CostGraphStatioNormal(nx.laplacian_matrix(G).toarray())
    algo_statio = rpt.Dynp(custom_cost=statio_cost, jump=1, min_size=min_size).fit(signal)
    statio_bkps = algo_statio.predict(n_bkps=len(gt_bkps)-1)
    t2 = time.perf_counter()
    statio_results_3["time"]['raw'].append(round(t2 - t1, ndigits=3))
    

    # performances evaluation
    update_results(gt_bkps, normal_bkps, normal_results_3, PRECI_RECALL_MARGIN)
    update_results(gt_bkps, statio_bkps, statio_results_3, PRECI_RECALL_MARGIN)

    # statistics collection
    exp_statistics_3["bkps"].append((len(gt_bkps)-1, gt_bkps[:-1]))
    exp_statistics_3["n_nodes"].append(G.number_of_nodes())
    exp_statistics_3["breakdowns"].append(breakdowns)

# results post-precessing and saving
full_results_3 = {"statio normal cost": statio_results_3, "normal cost": normal_results_3}
full_results_3 = compute_and_add_stat_on_results(full_results_3)
save_results(full_results_3, exp_statistics_3, 'results/breakdowns_in_hyp', comment='1. bkps drawn with  bkps drawn with minimal gap constraints \n2. sensor breakdowns')

In [None]:
print("STATISTICS:\t", exp_statistics_3)
print("WITH STATIO:\t", statio_results_3)
print("WITHOUT STATIO:\t", normal_results_3)

## References

<a id="Izenman2008">[Izenman2008]</a>
Izenman Alan J. (2008). Introduction to Random-Matrix Theory [asc.ohio-state.edu]

<a id="Ryan2023">[Ryan2023]</a>
Sean Ryan and Rebecca Killick. Detecting Changes in Covariance via Random Matrix Theory. Technometrics, 65(4):480–491, October 2023. Publisher: Taylor & Francis