# Multivariate Distance Matrix Regression

In [1]:
from mgcpy.independence_tests.mdmr import MDMR
from mgcpy.benchmarks.simulations import linear_sim

MDMR is a statistical technique that tests the significance of associations between predictors using a permutation test. Suppose that an appropriate distance measure is used to calculate an $n \times n$ distance matrix such that
$\bf{D} = d_{ij}\ \forall\ i, j = 1, ..., n$ where $d_{ij}$ quantifies the dissimilarity between subject $i$ and $j$. Let $\bf{y}$ be a $n \times q$ matrix that will be regressed onto a $n \times p$ matrix $\bf{x}$ of $p$ predictor variables and a column of 1's for the intercept. MDMR~is used to quantify the sum of square distances (SSD), which is
$$
    SSD = \sum_{i < j} d_{ij}^2 = \sum_{j < i} d_{ij}^2,
$$
into a portion attributed to $\bf{x}$ and a portion attributed to a residual. A symmetric projection matrix can be calculated as $\bf{L} = \bf{x} {\left( {\bf{x}} ^T \bf{x} \right)}^{-1} {\bf{x}} ^T$. In addition, a matrix involved in centering can be calculated as $\bf{A} = \left( -1 / 2 \right) d_{ij}^2$. Given that $\bf{I}$ is the identify matrix of size $n \times n$ and $\bf{J}$ is a matrix of ones that is the same size, the Gower centered matrix \citep{gower1966some} is
$$
\bf{G} = \left( \bf{I} - \frac{1}{n} \bf{J} {\bf{J}} ^T \right) \bf{A} \left( \bf{I} - \frac{1}{n} \bf{J} {\bf{J}} ^T \right).
$$
This is used because since $\text{tr}{\bf{D}} = 0$ while $\text{tr}{\bf{G}} = SSD/n$. The MDMR~test statistic can thus be written as

$$\begin{equation}
    \text{MDMR}_n = \frac{\text{tr}{\bf{L} \bf{G} \bf{L}} / p}{\text{tr}{\left( \bf{I} - \bf{L} \right) \bf{G} \left( \bf{I} - \bf{L} \right)} / \left( n - p - 1 \right)}.
\end{equation}$$

Since the degrees of freedom $p$ and $n - p - 1$ are constant and do not affect permuted p-values, they are typically omitted from the MDMR statistic.

As with the other tests, simply create an `MDMR` object and then call the test statistic method. This is done below, by utilizing a simulation and calculating the MDMR test statistic from that data:

In [2]:
x, y = linear_sim(10, 1)

mdmr = MDMR()
test_stat = mdmr.test_statistic(x, y)[0]
print("MDMR test statistic: %.2f" % test_stat)

MDMR test statistic: 1.21


P-values are calculated via permutation tests as with other packages. This is done by permutting $y$ and calculating the test statistic. The number of times that the test statistics are greater than or equal to null divided by the replication factor is equal to the p-value. This is shown below:

In [3]:
p_value = mdmr.p_value(x, y)[0]
print("MDMR p-value: %.2f" % p_value)

MDMR p-value: 0.27
