otkerneldesign=0.1.1
openturns>=1.16
numpy=1.20.1
pandas=1.3.0
matplotlib
This numerical experiment aims at comparing methods for estimating central tendency of a random variable
Transfer theorem:
$$ \mu(g) = \mathbb{E}{\boldsymbol{X}}\left[g(\boldsymbol{X})\right] = \int{\mathcal{D}{\boldsymbol{X}}} g(\boldsymbol{X}) f{\boldsymbol{X}}(x) ,\mathrm{d}x $$
This statistic can be closely estimated using a large m-sized sample $\boldsymbol{X}m =\left{\boldsymbol{x}^{(1)}, \dots, \boldsymbol{x}^{(m)}\right}$ following the distribution $f{\boldsymbol{X}}$ (e.g., by a deterministic Sobol sequence).
The goal is to select the smallest n-sized design, points, strategy to estimate our quantity with the highest accuracy and precision. Let us denote this design of experiments
Two different types of methods can be separated:
- Direct estimators. The arithmetic mean of the generated design provides an estimation of the expectancy. Some methods additionally provide confidence intervals otherwise computed by bootstrap. Monte Carlo, Low discrepancy sequences, Latin Hypercube Sampling, Kernel Herding, Support Points are methods considered.
-
Metamodel based estimators. Metamodels are supposed to emulate the response of a costly function by an almost free to call statistical model. A strategy in our context is then to first fit an inexpensive metamodel of
$g$ on a design$\left(\boldsymbol{X}_n, y_n\right)$ then used to estimate our statistic on$\boldsymbol{X}_m$ . Note that the error from the estimation becomes negligible in front or the metamodel error and fixing the sample$\boldsymbol{X}_m$ allows to only analysis the error generated by the metamodel.
When g is approximated by a gaussian process
Note that this can be written for an estimator of another statistic such as a quantile, exceedence probability. In the case of the central tendency, $\hat{a}n = \mathbb{E}\left[\mathbb{E}{\boldsymbol{X}}[\xi]\right] = \mathbb{E}_{\boldsymbol{X}}\left[\mathbb{E}[\xi]\right]$.