# ANOVA Decomposition

The *ANOVA decomposition* is defined for any multidimensional function $f: \mathbb{R}^N \to \mathbb{R}$ whose inputs are independently distributed. It partitions the total variance of the model, $\mathrm{Var}[f]$, as a sum of variances of orthogonal functions $\mathrm{Var}[f_{\alpha}]$ for all possible subsets $\alpha$ of the input variables $\{x_0, \dots, x_{N-1}\}$. Each $f_{\alpha}$ depends effectively on the variables contained in $\alpha$ only, and is constant with respect to the rest.

Reference: [*"Sobol Tensor Trains for Global Sensitivity Analysis"*](https://arxiv.org/abs/1712.00233), R. Ballester-Ripoll, E. G. Paredes, R. Pajarola (2017).

In [1]:
import torch
import tntorch as tn

N = 4
t = tn.rand([32]*N, ranks_tt=5)

Let's compute all ANOVA terms in one single tensor network:

In [2]:
anova = tn.anova_decomposition(t)
print(anova)

4D TT-Tucker tensor:

 33  33  33  33
  |   |   |   |
 32  32  32  32
 (0) (1) (2) (3)
 / \ / \ / \ / \
1   5   5   5   1



This tensor `anova` indexes *all* $2^N$ functions $f_{\alpha}$ of the ANOVA decomposition of $f$, and we can access it using our [tensor masks](https://github.com/rballester/tntorch/blob/master/tutorials/logic.ipynb). For example, let's keep all terms that *do not* interact with $w$:

In [3]:
x, y, z, w = tn.symbols(N)
anova_cut = tn.mask(anova, ~w)

We can undo the decomposition to obtain a regular tensor again:

In [4]:
t_cut = tn.undo_anova_decomposition(anova_cut)

As expected, our truncated tensor `t_cut` has become constant with respect to the fourth variable $w$:

In [5]:
t_cut[0, 0, 0, :].full()

tensor([10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290,
        10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290,
        10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290,
        10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290, 10.3290])

How much did we lose by making that variable unimportant?

In [6]:
print('The truncated tensor accounts for {:g}% of the original variance.'.format(tn.var(t_cut) / tn.var(t) * 100))

The truncated tensor accounts for 49.1902% of the original variance.


... which is also what [Sobol's method](https://github.com/rballester/tntorch/blob/master/tutorials/sobol.ipynb)) gives us:

In [7]:
tn.sobol(t, ~w) * 100

tensor(49.1902)

or, equivalently,

In [8]:
tn.sobol(t, tn.only(x | y | z)) * 100

tensor(49.1902)