# Sobol Sensitivity Analysis

In this notebook we apply the Sobol Sensitivy Analysis method to a building design problem.
We determine the sensitivty of the objective (electricty use) to each of the design parameters.

In [None]:
import time

import numpy as np
import pandas as pd
from besos import eppy_funcs as ef
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Matern, RationalQuadratic
from sklearn.model_selection import GridSearchCV

import sampling
from SALib.analyze import sobol as sanalysis
from SALib.sample import saltelli as ssampling
from evaluator import EvaluatorEP, EvaluatorGeneric
from parameter_sets import parameter_set
from problem import EPProblem

## Build an EnergyPlus Evaluator

In [None]:
parameters = parameter_set(7)  # use a pre-defined parameter set
problem = EPProblem(parameters, ["Electricity:Facility"])
building = ef.get_building()  # use the example building
evaluator = EvaluatorEP(problem, building)
inputs = sampling.dist_sampler(
    sampling.lhs, problem, 50
)  # get 50 samples of the input space

## Fit the Surrogate model

Evaluate the samples to get training data.

In [None]:
outputs = evaluator.df_apply(inputs, processes=1)

Set up the surrogate and fit it.

In [None]:
hyperparameters = {
    "kernel": [
        None,
        1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
        1.0 * RationalQuadratic(length_scale=1.0, alpha=0.5),
        # ConstantKernel(0.1, (0.01, 10.0))*(DotProduct(sigma_0=1.0, sigma_0_bounds=(0.1, 10.0))**2),
        1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
    ]
}
folds = 3
gp = GaussianProcessRegressor(normalize_y=True)
clf = GridSearchCV(gp, hyperparameters, iid=True, cv=folds)

clf.fit(inputs, outputs)

print(f"The best performing model $R^2$ score on the validation set: {clf.best_score_}")
print(f"The model $R^2$ parameters: {clf.best_params_}")
# print(f'The best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(test_in, test_out)}')

Make an `Evaluator`.

In [None]:
def evaluation_func(ind):
    return ((clf.predict([ind])[0][0],), ())


GP_SM = EvaluatorGeneric(evaluation_func, problem)

## Sobol Analysis

We can now derive the Sobol indices of the given design parameters.
This is a global variance-based sensitivity analysis method.
The resulting indices tell us how much of the variance is explained by each of the inputs.
Sobol analysis may be very sample intensive, with 1000 samples per input.
Simulation-based analysis would be very time intensive, so in this example we use a surrogate model instead. [[1]] [[2]]

[1]: https://www.sciencedirect.com/science/article/pii/S1364032112007101
[2]: http://statweb.stanford.edu/~owen/pubtalks/siamUQ.pdf

In [None]:
names = [parameters[i].name for i in range(len(parameters))]
bounds = [
    [parameters[i].value_descriptor.min, parameters[i].value_descriptor.max]
    for i in range(len(parameters))
]

problem = {"num_vars": len(parameters), "names": names, "bounds": bounds}

X = np.round(ssampling.sample(problem, N=10000, calc_second_order=True), decimals=3)
inputs = pd.DataFrame(data=X, columns=names)

print(
    f"This Sobol analysis will require {len(inputs)} design evaulations for the analysis."
)

In [None]:
outputs = GP_SM.df_apply(inputs)
Y = outputs.values.ravel()

In [None]:
now = time.time()
Si = sanalysis.analyze(
    problem,
    Y.ravel(),
    conf_level=0.95,
    print_to_console=True,
    parallel=True,
    n_processors=4,
)
print(time.time() - now)
# pd.DataFrame(data=Si['mu_star'], index=Si['names']).sort_values(by=0)