# Latent Two-Sample Hypothesis Testing

You have learned a lot thus far about network statistics, and now we want to apply some of that knowledge to doing two-sample network testing. Imagine you are part of an alien race called the Moops. You live in harmony with another race, called the Moors, that look very similar to you, on the planet Zinthar. The evil overlord Zelu takes a random number of Moops and Moors and puts them on an island. You want to find your fellow Moops, but the Moops and Moors look very similar to each other. However, you guess that, perhaps, if you were able to look at a network representing the brain connections for a Moop and another network representing the brain connections for a Moor, you think that there might be a difference between the two. What do you do?

To make this a little bit more concrete, let's develop our example. The Moops and Moors each have brains with $n=100$ brain areas. If two areas of the brain can communicate with one another (pass information back and forth), an edge exists; if they cannot communicate with one another, an edge does not exist. In this case, these networks will each be SBMs. The networks look like this:

## Two-sample tests, a quick refresher

This problem is another iteration of a question you encountered in the last section when [testing for differences between groups of edges](#link?), called the two-sample test. You have two samples of data: a network from a Moop and a network from a Moor, and you want to characterize whether these two networks are *different*. For our purposes, you will call the Moop network $A^{(p)}$ and the Moor network $A^{(r)}$. As you by this time know, for each of these networks, there are underlying random networks, $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$ of which the Moop and Moor networks you see, $A^{(p)}$ and $A^{(r)}$, are realizations of. The key issue is that you don't actually get to see the underlying random networks: what you will need to do is characterize differences between $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$ using *only* $A^{(p)}$ and $A^{(r)}$. 

Since $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$ are *random*, you can't really study them directly. But what you can study, as it turns out, are the *parameters* that govern $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$. 

This is just like how, in our coin flip example you learned about previously, you don't look for differences in the coins themselves, but rather, you look for differences in the *probabilities* that each coin lands on heads or tails. You construct hypotheses about the *probabilities*, not the coin. This is because the coins are the same (they both have a heads and tails side, all of the coin flips are performed without regard for the outcomes of other coin flips, so on and so forth), *other* than the fact that they land on heads and tails at different rates. This rate, the underlying probability, is therefore the element of the random coin that you want to hone in on to test whether they are different. Remember, you made a *null hypothesis* that there was no difference between the probabilities coins, $H_0 : p_1 = p_2$, and had an *alternative hypothesis* that there was a difference between the probabilities of the coins, $H_A: p_1 \neq p_2$. Next, you produced estimates of the coin probabilities using the samples, $\hat p_1$ and $\hat p_2$, and then used the samples to deduce whether you have enough evidence from our sample to support whether $H_A$ was true, that $p_1 \neq p_2$. 

### Two-sample tests and Random Networks

In this example, however, we're going to go a slightly different direction. We're going to describe $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$ as Random Dot Product Graphs (RDPGs). What we're going to say is that $\mathbf A^{(p)}$ is a random network where the probability an edge exists (or does not exist) is described by the probability matrix $P^{(p)}$, whose entries $p_{ij}^{(p)}$ are the probabilities that the edge $\mathbf a_{ij}^{(p)}$ between nodes $i$ and $j$ exist. You do the same thing for the Moor random network $\mathbf A^{(r)}$, with the probability matrix $P^{(r)}$. 

In this case, you want to test whether $H_0 : P^{(p)} = P^{(r)}$ against $H_A : P^{(p)} \neq P^{(r)}$. However, you have a slight problem: unlike the coin, you can't really use your sample to describe $P^{(p)}$ and $P^{(r)}$ directly. Instead, you need to make assumptions about the random networks in order to learn things about them, which was why we introduced [Statistical Models](#link?). You first need to choose a statistical model to describe $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$.

### Assuming a statistical model

To test whether the probability matrices are different, we're going to make the assumption that $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$ are each Random Dot Product Graphs (RDPGs), with latent position matrices $X^{(p)}$ and $X^{(r)}$, respectively. Remember that for a RDPG with latent position matrix $X$, that the probability matrix $P = XX^\top$. What this means is that if $P^{(p)} = X^{(p)}X^{(p)\top}$ and $P^{(r)} = X^{(r)}X^{(r)\top}$, then $P^{(p)}$ and $P^{(r)}$ are the same/different if $X^{(p)}$ and $X^{(r)}$ are same/different, right?

Unfortunately, this assumption is a *close*, but *not quite* correct. Remember back in [Chapter 6](#link?) we introduced the idea of a [rotation matrix](#link?). As it turns out, for a rotation matrix $W$ which is $d \times d$, then $WW^\top = I_{d \times d}$, the $d \times d$ identity matrix, which is the equivalent of multiplying by one for matrices. This means that any matrix times the identity matrix is just itself. So what if $X^{(p)}$ and $X^{(r)}$ are just rotations of each other? Stated another way, what if $X^{(p)} = X^{(r)} W$ is just $X^{(r)}$, but rotated?

Even if $X^{(r)}$ and $X^{(p)}$ are different, as long as they are just rotations of one another, then the probability matrices are *identical*. This is because, if we call $P^{(r)} = X^{(r)}X^{(r)\top}$, then:
\begin{align*}
    P^{(p)} &= X^{(p)}X^{(p)\top}, \\
    &= X^{(r)}W W^\top X^{(r)\top},\;\;\;\;\text{we used that }X^{(r)} = X^{(p)}W \\
    &= X^{(r)}I_{d \times d}X^{(r)\top},\;\;\;\;\text{we used that $W$ is a rotation, so }WW^\top = I_{d \times d} \\
    &= X^{(r)}X^{(r)\top} = P^{(r)}
\end{align*}
Which shows that the probability matrices would still actually be the same! 

What this means for you is that you *can't* just compare the latent position matrices, but rather, you have to compare the latent position matrices for *any* possible rotation matrix! Stated another way, we would say that $P^{(p)}$ and $P^{(r)}$ are equal if $X^{(r)}$ is can be obtained by rotating $X^{(p)}$, and they are not equal if $X^{(r)}$ cannot be obtained by rotation $X^{(p)}$. We write this down as a hypothesis by saying that $H_0 : X^{(p)} = X^{(r)}W$ for any rotation matrix $W$, against $H_A : X^{(p)} \neq X^{(r)}W$ for any rotation matrix $W$. With the assumption that $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$ are RDPGs, this is *exactly* the same as saying that $H_0 : P^{(p)} = P^{(r)}$, against $H_A : P^{(p)} \neq P^{(r)}$, which was our original statement we wanted to test.

## Latent position test

So, now we are ready to get to implementing your idea. You have two random networks, $\mathbf A^{(p)}$ and $\mathbf A^{(r)}$, and you assume that they are both RDPGs with latent position matrices $X^{(p)}$ and $X^{(r)}$ respectively. You want to test whether the latent position matrices are the same up to a rotation, $H_0 : X^{(p)} = X^{(r)}W$ for some rotation $W$, or they are different for any possible rotation $W$ $H_A: X^{(p)} \neq X^{(r)}W$.

To do this, the first step is to figure out whether $X^{(p)}$ and $X^{(r)}$ are rotations of one another. We do this by first trying to find the best possible rotation of $X^{(r)}$ to $X^{(p)}$. The problem can be written down as follows:
\begin{align*}
    \text{find $W$ where }||X^{(p)} - X^{(r)}W||_{F}\text{ is minimized}
\end{align*}
The term $||A - B||_F$ is the same Frobenius norm of the difference that you've come across a few times over the last few sections, which here, is just going to give you a sense of how different the two matrices are. It will be relatively big if $A$ and $B$ are not very similar, and relatively small if $A$ and $B$ are similar.

Basically, the idea is that if $X^{(p)}$ and $X^{(r)}$ are rotations of one another (or close to it), then if you can find the right rotation $W$, then $X^{(p)}$ and $X^{(r)}W$ will be identical (and $||X^{(p)} - X^{(r)}W||_{F}$ will be zero) or nearly identical (and $||X^{(p)} - X^{(r)}W||_{F}$ will be small). If they are not rotations of one another, then this equation $||X^{(p)} - X^{(r)}W||_{F}$ is going to have a comparatively high value, no matter what value of $W$ you could choose. This is called the [*orthogonal procrustes problem*](https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem), and there are a variety of ways you can come up with a pretty good guess at what $W$ is, but we won't need to go into details for our purposes.

Next, you need to figure out how to actually use this to implement a statitical test $H_0 : X^{(p)} = X^{(r)}W$ against $H_A: X^{(p)} \neq X^{(r)}W$. You can estimate values for $X^{(p)}$ and $X^{(r)}$, by just using [Adjacency Spectral Embedding](#link?). This gives you $\hat X^{(p)}$ and $\hat X^{(r)}$, respectively, which are your estimates of the latent position matrices. So when you do your best to find a rotation of $X^{(r)}$ onto $X^{(p)}$ by solving the orthogonal procrustes problem, and plugging in your estimates $\hat X^{(p)}$ and $\hat X^{(r)}$ for $X^{(p)}$ and $X^{(r)}$.

Unfortunately, you're not going to find a matrix $W$ where $|| \hat X^{(p)} - \hat X^{(r)}W||_{F} = 0$, even if $X^{(p)}$ and $X^{(r)}$ really are equal. This is because you are just using estimates of latent positions, so even if $X^{(p)}$ and $X^{(r)}$ are identical up to a rotation in reality, your estimates won't be. This means that $|| \hat X^{(p)} - \hat X^{(r)}W||_{F}$ is, if $\hat X^{(p)}$ and $\hat X^{(r)}$ are *close* up to a rotation, going to take a *relatively* small value:

Relative *what*, exactly?

### Generating a null distribution via parametric bootstrapping

In an ideal world, you would be able to characterize how far apart the estimates $\hat X^{(p)}$ and $\hat X^{(r)}$ would be if the quantities they were estimating, $X^{(p)}$ and $X^{(r)}$, were really identical up to a rotation. However, in reality, you don't *know* $X^{(p)}$ nor $X^{(r)}$, so you can't exactly say anything directly about how big $|| \hat X^{(p)} - \hat X^{(r)}W||_{F}$ should be if $X^{(p)}$ and $X^{(r)}$ are identical up to a rotation (and $H_0$ is true). If you did, you wouldn't need to do any of this statistical testing in the first place!

So what you do is the next best thing. You instead use $\hat X^{(p)}$ as the parameter to generate two new networks, $A^{(1)}$ and $A^{(2)}$, where the latent positions really *are* identical (and equal to $X^{(p)}$):

You then estimate the latent positions $\hat X^{(1)}$ and $\hat X^{(2)}$ using Adjacency Spectral Embedding again, and now you compute the value of $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$ for the best possible rotation $W_i$ of $X^{(2)}$ onto $\hat X^{(1)}$:

Finally, you compare $|| \hat X^{(p)} - \hat X^{(r)}W||_{F}$ to $|| \hat X^{(1)} - \hat X^{(2)}W||_{F}$. If $X^{(p)}$ and $X^{(r)}$ are identical up to a rotation, then you would expect that $|| \hat X^{(p)} - \hat X^{(r)}W||_{F}$ would be similar to $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$. If they are not, you would expect that $|| \hat X^{(p)} - \hat X^{(r)}W||_{F}$ would be much bigger than $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$:

You keep repeating this process again and again, and over time, you gradually get some idea of what $||\hat X^{(p)} - \hat X^{(r)}W||_{F}$ would look like if the true latent position estimates were identical. This is called a *parametric resampling*. It is called a *resampling* because you are *sampling* what $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$ if an assumption were true; namely, if the underlying latent position matrices were the same. It is called *parametric* because you are using properties of RDPGs to generate your estimates $\hat X^{(1)}$ and $\hat X^{(2)}$. When you do this dozens (or more) times, you start to notice a trend developing. We'll plot what $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$ looks like when we repeat this process $100$ times using a histogram, which indicates the number of times the value of $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$ lands in a particular range of the $x$-axis:

What you see is that $||\hat X^{(p)} - \hat X^{(r)}W||_{F}$ is much larger than the *almost all* of the values of $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$ that you calculated; in fact, it is larger than $97\%$ of them! We will use this to estimate a $p$-value, and we will say that the $p$-value of $H_0$ against $H_A$ is the fraction of times that when the underlying latent positions were equal, the value of $|| \hat X^{(1)} - \hat X^{(2)}W_i||_{F}$ exceeded the value of $||\hat X^{(p)} - \hat X^{(r)}W||_{F}$ which we estimated from our sample. This is called the *latent position test*, and it is implemented directly by graspologic. Note that the $p$-value that you obtain from this process is going to differ every time you run the test, since there is randomness in your generation process of the $A^{(1)}$s and the $A^{(2)}$s for every time you repeated your comparison. Making the number of repetitions larger by setting $R$ to a higher value will tend to yield more *stable* $p$-value estimates for when we estimate $p$-values using resampling techniques:

In [1]:
nreps = 100 # the number of times to repeat the process of estimating X1hat and X2hat
lpt_moop2moor = latent_position_test(AMoop, AMoor, n_bootstraps = nreps, n_components=n_components)
print("estimate of p-value: {:3d}".format(lpt_moop2moor[0]))

NameError: name 'latent_position_test' is not defined

The $p$-value is low, which means you have evidence that the alternative hypothesis is true and that the latent position matrix for a Moop brain networks differ from the latent position matrix for a Moor brain network.

What if we had another network from a Moop, and we compared the network of our first Moop to our new Moop? We can generate a new comparison:

In [None]:
# generate a new Moor network with the same block matrix, and hence, 
# the same latent position matrix as AMoor
AMoor2 = sbm(ns, Bmoor)

lpt_moor2moor = latent_position_test(AMoor, AMoor2, n_bootstraps=nreps, n_components=n_components)
print("estimate of p-value: {:3d}".format(lpt_moor2moor[0]))