# Calibration of the flooding model with linear least squares

The goal of this example is to calibrate the simulator associated with the flooding model with linear least squares. 



In [1]:
import openturns as ot
import numpy as np

## Read the observations

We begin by reading the observations from the data file. There are 100 observations of the couple (Q,H).

In [2]:
observedSample = ot.Sample_ImportFromCSVFile("calibration-flooding-observations.csv")
nbobs = observedSample.getSize()
Qobs = observedSample[:,0]
Hobs = observedSample[:,1]
nbobs

100

In [3]:
Hobs.setDescription(["Height (m)"])

## Define the model

We define the model which has 4 inputs and one output H.

In [4]:
def functionFloodingModel(X):
    Q, K_s, Z_v, Z_m = X
    L = 5.0e3
    B = 300.0
    alpha = (Z_m - Z_v)/L
    H = (Q/(K_s*B*np.sqrt(alpha)))**(3.0/5.0)
    return [H]

In [5]:
modelPyFunc = ot.PythonFunction(4, 1, functionFloodingModel)
modelPyFunc.setDescription(["Q", "Ks", "Zv", "Zm","H"])

Define the value of the reference values of the $\theta$ parameter. In the bayesian framework, this is called the mean of the *prior* gaussian distribution. In the data assimilation framework, this is called the *background*.

In [6]:
KsInitial = 20.
ZvInitial = 49.
ZmInitial = 51.
thetaBackground = ot.Point([KsInitial,ZvInitial,ZmInitial])

The following statement create the calibrated function from the model. The calibrated parameters Ks, Zv, Zm are at indices 1, 2, 3 in the inputs arguments of the model.

In [7]:
calibratedIndices = [1,2,3]
mycf = ot.ParametricFunction(modelPyFunc, calibratedIndices, thetaBackground)

In [8]:
import CalibrationGraphics as cg

## Calibration

Defined the covariance matrix of the parameters $\theta$ to calibrate.

The `LinearLeastSquaresCalibration` class performs the linear least squares calibration by linearizing the model in the neighbourhood of the reference point.

In [9]:
algo = ot.LinearLeastSquaresCalibration(mycf, Qobs, Hobs, thetaBackground,"SVD")

The `run` method computes the solution of the problem.

In [10]:
algo.run()

In [11]:
calibrationResult = algo.getResult()

The `getParameterMAP` method returns the maximum of the posterior distribution of $\theta$.

In [12]:
thetaStar = calibrationResult.getParameterMAP()
thetaStar

In this case, we see that there seems to be a great distance from the reference value of $\theta$ to the optimum: the values seem too large in magnitude. The value of the optimum $K_s$ is nonpositive. In fact, there is an identification problem because the Jacobian matrix is rank-degenerate.

## Diagnostic of the identification issue

In this section, we show how to diagnose the identification problem.

The `getParameterPosterior` method returns the posterior gaussian distribution of $\theta$.

In [13]:
distributionPosterior = calibrationResult.getParameterPosterior()
distributionPosterior

We see that there is a large covariance matrix diagonal. 

Let us compute a 95% confidence interval for the solution $\theta^\star$.

In [14]:
distributionPosterior.computeBilateralConfidenceIntervalWithMarginalProbability(0.95)[0]

The confidence interval is *very* large.

In [15]:
mycf.setParameter(thetaBackground)

In [16]:
thetaDim = thetaBackground.getDimension()

In [17]:
jacobianMatrix = ot.Matrix(nbobs,thetaDim)
for i in range(nbobs):
    jacobianMatrix[i,:] = mycf.parameterGradient(Qobs[i]).transpose()
jacobianMatrix[0:5,:]

In [18]:
jacobianMatrix.computeSingularValues()

We can see that the singular values associated with $Z_v$ and $Z_m$ are relatively close to zero, compared to the singular value associated with $K_s$. 

This explains why the Jacobian matrix is close to being rank-degenerate.

In other words, the value of $K_s$ can be calibrated, but not $Z_v$ or $Z_m$. 

## Conclusion

There are several methods to solve the problem.
* Given that the problem is not identifiable, we can use some regularization method. Two methods are provided in the library: the gaussian linear least squares `BLUE` and the gaussian non linear least squares `ThreeDVar'.
* We can change the problem, replacing it with a problem which is identifiable. In the flooding model, replacing $Z_v-Z_m$ with $\Delta Z$ allows to solve the issue.