Central tendency benchmark

Installation

Core requirements

otkerneldesign=0.1.1
openturns>=1.16
numpy=1.20.1
pandas=1.3.0
matplotlib

Objective

This numerical experiment aims at comparing methods for estimating central tendency of a random variable $Y$ resulting in the propagation of the random vector $\boldsymbol{X} \in \mathcal{D}{\boldsymbol{X}} \subset \mathbb{R}^d$ (defined by its joint pdf $f{\boldsymbol{X}}$) through a deterministic model $g: \mathbb{R}^d \rightarrow \mathbb{R}$. The study is particularly focusing on the case of costly simulation code, orienting the study towards active methods, mostly based on surrogate models or metamodels (e.g., gaussian processes) to iteratively add points to the design of experiments to improve the estimation.

Transfer theorem:

$$ \mu(g) = \mathbb{E}{\boldsymbol{X}}\left[g(\boldsymbol{X})\right] = \int{\mathcal{D}{\boldsymbol{X}}} g(\boldsymbol{X}) f{\boldsymbol{X}}(x) ,\mathrm{d}x $$

This statistic can be closely estimated using a large m-sized sample $\boldsymbol{X}m =\left{\boldsymbol{x}^{(1)}, \dots, \boldsymbol{x}^{(m)}\right}$ following the distribution $f{\boldsymbol{X}}$ (e.g., by a deterministic Sobol sequence).

$$ a_m(g):= \frac{1}{m} \sum_{i=1}^m g(\boldsymbol{x}^{(i)})\\ a_m(g) \approx \mu(g) $$

The goal is to select the smallest n-sized design, points, strategy to estimate our quantity with the highest accuracy and precision. Let us denote this design of experiments $\left(\boldsymbol{X}_n, y_n\right)$ such as $y_n=\left[g(\boldsymbol{x}^{(1)}), \dots, g(\boldsymbol{x}^{(n)})\right]$. Note that $n \ll m$ and $\boldsymbol{X}_n$ is not necessarily included in $\boldsymbol{X}_m$ but in practice it might be.

Two different types of methods can be separated:

Direct estimators. The arithmetic mean of the generated design provides an estimation of the expectancy. Some methods additionally provide confidence intervals otherwise computed by bootstrap. Monte Carlo, Low discrepancy sequences, Latin Hypercube Sampling, Kernel Herding, Support Points are methods considered.

$$ a_n(g) = \frac{1}{n} \sum_{i=1}^n g(\boldsymbol{x}^{(i)}) $$

Metamodel based estimators. Metamodels are supposed to emulate the response of a costly function by an almost free to call statistical model. A strategy in our context is then to first fit an inexpensive metamodel of $g$ on a design $\left(\boldsymbol{X}_n, y_n\right)$ then used to estimate our statistic on $\boldsymbol{X}_m$. Note that the error from the estimation becomes negligible in front or the metamodel error and fixing the sample $\boldsymbol{X}_m$ allows to only analysis the error generated by the metamodel.

When g is approximated by a gaussian process $\xi$:

$$ a_m(\xi) = \frac{1}{m} \sum_{i=1}^m \xi(\boldsymbol{x}^{(i)}) $$ $$ \hat{a}_n = \mathbb{E}\left[ a_m(\xi) | \mathcal{F}_n\right] $$

Note that this can be written for an estimator of another statistic such as a quantile, exceedence probability. In the case of the central tendency, $\hat{a}n = \mathbb{E}\left[\mathbb{E}{\boldsymbol{X}}[\xi]\right] = \mathbb{E}_{\boldsymbol{X}}\left[\mathbb{E}[\xi]\right]$.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DataCentric		DataCentric
RENEW		RENEW
ctbenchmark		ctbenchmark
experiments		experiments
figures		figures
results		results
.gitignore		.gitignore
1D_SUR_example.ipynb		1D_SUR_example.ipynb
CentralTendency_references.ipynb		CentralTendency_references.ipynb
README.md		README.md
ct_benchmark_GSobol 5D (normal input).csv		ct_benchmark_GSobol 5D (normal input).csv
gaussian_peak_problem2M.ipynb		gaussian_peak_problem2M.ipynb
gaussian_peak_problem2N.ipynb		gaussian_peak_problem2N.ipynb
gaussian_peak_problem5N.ipynb		gaussian_peak_problem5N.ipynb
gsobol5D_problem.ipynb		gsobol5D_problem.ipynb
irregular_problem2M.ipynb		irregular_problem2M.ipynb
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Central tendency benchmark

Installation

Core requirements

Objective

About

Releases

Packages

Languages

efekhari27/ctbenchmark

Folders and files

Latest commit

History

Repository files navigation

Central tendency benchmark

Installation

Core requirements

Objective

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages