# Sobol Indices

*Sobol's method* is one of the most popular for global sensitivity analysis. It builds on the [ANOVA decomposition](https://github.com/rballester/tntorch/blob/master/tutorials/anova.ipynb).

In [1]:
import tntorch as tn
import torch

N = 10
t = tn.rand([32]*N, ranks_tt=5)
t

10D TT tensor:

 32  32  32  32  32  32  32  32  32  32
  |   |   |   |   |   |   |   |   |   |
 (0) (1) (2) (3) (4) (5) (6) (7) (8) (9)
 / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
1   5   5   5   5   5   5   5   5   5   1

With *tntorch* we can handle all Sobol indices (i.e. for all subsets $\pmb{\alpha} \subseteq \{0, \dots, N-1\}$) at once. We can access and aggregate them using the function `sobol()` with the appropriate mask.

### Single Variables

The effect attributable to one variable only (without interactions with others) is known as its *variance component*. Let's compute it for the first variable $x$:

In [2]:
x, y, z = tn.symbols(N)[:3]
tn.sobol(t, mask=tn.only(x))

tensor(0.2568)

(see [this notebook](https://github.com/rballester/tntorch/blob/master/tutorials/logic.ipynb) for more on symbols and masks)

Input parameters $x, y, \dots$ need to be independently distributed. By default, uniform marginal distributions are assumed, but you can specify others with the `marginals` argument (list of vectors). For instance, if the first variable can only take one value, then its sensitivity indices will be 0 (no matter how strong its effect on the multidimensional model is!):

In [3]:
marginals = [None]*N  # By default, None means uniform
marginals[0] = torch.zeros(32)
marginals[0][0] = 1  # The marginal PMF is ll zeros but the first value
tn.sobol(t, tn.only(x), marginals=marginals)

tensor(0.)

The effect that also includes $x$'s interaction with other variables is called *total Sobol index* (it's always larger than the variance component):

In [4]:
tn.sobol(t, x)

tensor(0.3192)

### Tuples of variables

What are the indices for the first and third variables $x$ and $z$?

In [5]:
tn.sobol(t, tn.only(x & z))  # Variance component

tensor(0.0032)

In [6]:
tn.sobol(t, x | z)  # Total index

tensor(0.3807)

For tuples of variables two additional kinds of indices exist. The *closed index* aggregates all components for tuples *included* in $\pmb{\alpha}$, and for tuple $\{x, z\}$ it can be computed as follows:

In [7]:
tn.sobol(t, tn.only(x | z))

tensor(0.3076)

The *superset index* aggregates all components for tuples *that include* $\pmb{\alpha}$:

In [8]:
tn.sobol(t, x & z)

tensor(0.0050)

We can also easily count the influence of all $k$-plets of variables combined:

In [9]:
tn.sobol(t, tn.weight_mask(N, weight=[1]))

tensor(0.8661)

Note that we'll get the same result if we combine the effects differently:

In [10]:
print(tn.sobol(t, x & ~z) + tn.sobol(t, ~x & z) + tn.sobol(t, x & z))
print(tn.sobol(t, x) + tn.sobol(t, z) - tn.sobol(t, x & z))

tensor(0.3807)
tensor(0.3807)


### The Mean Dimension

Variance components are the basis for an important advanced sensitivity metric, the *mean dimension*, which is a measure of the average complexity of any multidimensional function. It's defined as $D_S := \sum_{\pmb{\alpha}} |\pmb{\alpha}| \cdot S_{\pmb{\alpha}}$ and computed as:

In [11]:
tn.mean_dimension(t)

tensor(1.1492)

We can also compute it in one line by weighting the Sobol indices by their tuple weight (according to the definition of mean dimension):

In [12]:
tn.sobol(t, tn.weight(t.ndim))

tensor(1.1492)

### The Dimension Distribution

Last, the dimension distribution gathers the relevance of $k$-tuples of variables for each $k = 1, \dots, N$:

In [13]:
tn.dimension_distribution(t)

tensor([8.6610e-01, 1.1992e-01, 1.2766e-02, 1.1256e-03, 8.5203e-05, 5.6002e-06,
        3.1852e-07, 1.5335e-08, 6.0670e-10, 1.8989e-11])

And, of course, this vector must always sum to $1$:

In [14]:
sum(tn.dimension_distribution(t))

tensor(1.0000)