# Anomaly Detection For Timeseries of Networks

There is a particular type of sea slug who have gills on the outside of their body. When you squirt water at these gills, they withdraw into the slug. The interesting thing about this type of slug is that the brain network involved in this gill withdrawal reflex is entirely mapped out, from the neurons which detect and transmit information about the water into the slug's brain, to the neurons that leave the brain and fire at its muscles. (For the interested, this is a real thing - look up Eric Kandel's research on Aplysia!)

Say you're a researcher studying these sea slugs, and you have a bunch of brain networks of the same slug. Each node is a single neuron, and edges denote connections between neurons. Each of the brain networks that you have were taken at different time points: some before water started getting squirted at the slug's gills, and some as the water was getting squirted. Your goal is to reconstruct when water started to get squirted, using only the networks themselves. You hypothesize that there should be some signal change in your networks which can tell you the particular time at which water started getting squirted. Given the network data you have, how do you figure out which timepoints these are?

The broader class of problems this question addresses is called *anomaly detection*. The idea in general is that you have a bunch of snapshots of the same network over time. Although the nodes are the same, the edges are changing at each time point. Your goal is to figure out which time points correspond to the most change, either in the entire network or in particular groups of nodes. You can think of a network as "anomalous" with respect to time if some potentially small group of nodes within the network concurrently changes behavior at some point in time compared to the recent past, while the remaining nodes continue with whatever noisy, normal behavior they had.

In particular, what we would really like to do is separate the signal from the noise. All of the nodes in the network are likely changing a bit over time, since there is some variability intrinsic in the system. Random noise might just dictate that some edges get randomly deleted and some get randomly created, but we want to figure out when neurons are changing as the result of the squirting in particular.

For instance, let's call the latent positions for our network $X^{(t)}$ when the we've taken a snapshot of the network at time $t$. We can assume that  $||X^{(t)} - X^{(t-1)}|| < c$: in other words, that the difference in norm between the latent positions at time $t$ and the latent positions at time $t-1$ is less than some constant $c$. The problem essentially boils down to finding the times for which $||X^{(t)} - X^{(t-1)}|| > c$.

## bla bla bla temporary title for making simulations

For this data generation, we're going to assemble a set of 12 time-points for a network directly from its latent positions (we'll assume that each time-point for the network is drawn from an RDPG). Ten of these time points will just have natural variability, and two will have a subset of nodes whose latent positions were perturbed a bit. These two will be the anomalies.

We'll say that the latent positions for the network are one-dimensional, and that each time point has 100 nodes. There will be the same number of adjacency matrices as there are time points, since our network will be changing over time.

For each of the ten normal time points, we'll:
1. Generate 100 random latent positions. Each latent position will be a random number between .2 and .8.
2. Use graspologic's rdpg function to sample an adjacency matrix using these latent positions.

And for each of the two perturbed time points, we'll:
1. Generate 100 random latent positions, in the same way as above.
2. Add a small amount of noise to 10 latent positions.
3. Generate an adjacency matrix as above

In [None]:
import numpy as np
from graspologic.simulations import rdpg

def gen_timepoint(perturbed=False, n_perturbed=20, perturbation=.1):
    nodes = 100
    X = np.random.uniform(.2, .8, size=nodes)
    if perturbed:
        baseline = np.array([1, -1, 0])
        delta = np.repeat(baseline, (n_perturbed//2, 
                                     n_perturbed//2, 
                                     nodes-n_perturbed))
        X += (delta * perturbation)
    X = X[:, np.newaxis]
    A = rdpg(X)
    return A, X
    

time_points = 12
networks = []
latents = []

for time in range(time_points-2):
    A, X = gen_timepoint()
    networks.append(A)
    latents.append(X)

for perturbed_time in range(5, 7):
    A, X = gen_timepoint(perturbed=True)
    networks.insert(perturbed_time, A)
    latents.insert(perturbed_time, X)
    
networks = np.array(networks)
latents = np.array(latents)

In [None]:
import matplotlib.pyplot as plt
from graphbook_code import networkplot
import seaborn as sns

def rm_ticks(ax, x=False, y=False, **kwargs):
    if x is not None:
        ax.axes.xaxis.set_visible(x)
    if y is not None:
        ax.axes.yaxis.set_visible(y)
    sns.despine(ax=ax, **kwargs)

fig = plt.figure();

perturbed_points = {5, 6}
for i in range(time_points):
    if i not in perturbed_points:
        ax = fig.add_axes([.02*i, -.02*i, .8, .8])
    else:
        ax = fig.add_axes([.02*i+.8, -.02*i, .8, .8])
    ax = heatmap(networks[i], ax=ax, cbar=False)
    if i == 0:
        ax.set_title("Ten Normal Time Points", loc="left", fontsize=16)
    if i == 5:
        ax.set_title("Two Perturbed Time Points", loc="left", fontsize=16)
    rm_ticks(ax, top=False, right=False)

## Detecting anomalies for individual nodes

We define individual vertex anomaly detection for the i-th vertex at time point $t^*$ as a test of the null hypothesis $H_{0i}^{(t*)}$ that t* is an anomaly time for vertex $i$.

## Detecting anomalies in the whole network

## Counting Changed Edges

One of the simplest approaches to this problem might just be to figure out which node has the highest count of edge changes across your timeseries. For each node across the timeseries, you'd count the number of new edges that appeared, and the number of existing edges that were deleted. Whichever count is highest is your anomalous node.

This might give you a rough estimate, but it doesn't account for other important pieces of information. You might be interested in which other nodes new edges were formed with, and which nodes edges were deleted from, for instance.

## 

## References



- j1's paper -- heritability
- vivek's paper -- mcc

## Notes

guodong's stuff: uses MASE and OMNI combined with DCORR to do hypothesis testing
- vivek did something similar for MCC