# Tuning GCM Parameterizations

*The objective of this notebook is to show how GCM closures can be tuned in practice. We will assume a specific formulation of a closure and estimate its parameters through a standard optimisation procedure with DA.* 

What is DA?

**Resources** : We have used material from Emmanuel Cosme's nice GitHub [repository](https://github.com/ecosme38/Data-Assimilation-Notebooks). 

## The GCM Paramterization Problem 

Our starting point is {doc}`gcm-parameterization-problem`.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt

# L96 provides the "real world", L96_eq1_xdot is the beginning of rhs of X tendency
from L96_model import L96, RK2, RK4, EulerFwd, L96_eq1_xdot

In [None]:
# Setting the seed gives us reproducible results
np.random.seed(13)

# Create a "real world" with K=8 and J=32
W = L96(8, 32, F=18)

# Run "real world" for 3 days to forget initial conditons
# (store=True save the final state as an initial condition for the next run)
W.run(0.05, 3.0, store=True);

From here on we can use `W.X` as perfect initial conditions for a model and sample the real world using `W.run(dt,T)`

In [None]:
class GCM:
    def __init__(self, F, parameterization, time_stepping=EulerFwd):
        self.F = F
        self.parameterization = parameterization
        self.time_stepping = time_stepping

    def rhs(self, X, param):
        return L96_eq1_xdot(X, self.F) - self.parameterization(param, X)

    def __call__(self, X0, dt, nt, param=[0]):
        # X0 - initial conditions, dt - time increment, nt - number of forward steps to take
        # param - parameters of our closure
        time, hist, X = (
            dt * np.arange(nt + 1),
            np.zeros((nt + 1, len(X0))) * np.nan,
            X0.copy(),
        )
        hist[0] = X

        for n in range(nt):
            X = self.time_stepping(self.rhs, dt, X, param)
            hist[n + 1], time[n + 1] = X, dt * (n + 1)
        return hist, time

As a first step, we illustrate introducing a polynomial parameterization to GCM.

In [None]:
naive_parameterization = lambda param, X: np.polyval(param, X)

In [None]:
F, dt, T = 18, 0.01, 5.0
gcm = GCM(F, naive_parameterization)
# where did param values come from?
X, t = gcm(W.X, dt, int(T / dt), param=[0.85439536, 1.75218026])

### Comparing model and true trajectories.

In [None]:
# This samples the real world with the same time interval as "dt" used by the model
X_true, _, _ = W.run(dt, T)

In [None]:
plt.figure(dpi=150)
plt.plot(t, X_true[:, 4], label="Truth")
plt.plot(t, X[:, 4], label="Model")
plt.xlabel("$t$")
plt.ylabel("$X_4(t)$")
plt.legend(fontsize=7)

We also copy the distance metrics used in the gcm-parameterization-problem notebook 

In [None]:
# - pointwise distance :
def pointwise(X1, X2, L=1.0):  # computed over some window t<L.
    D = (X1 - X2)[np.where(t < L)]
    return np.sqrt(D**2).mean(axis=0)

In [None]:
# - mean state metric :
def dist_mean(X1, X2, L=1.0):
    _X1 = X1[np.where(t < L)]
    _X2 = X2[np.where(t < L)]
    return np.sqrt((_X1.mean(axis=0) - _X2.mean(axis=0)) ** 2)

In [None]:
def norm_initial_tendency(X1, X2):
    T1 = X1[1, :] - X1[0, :]
    T2 = X2[1, :] - X2[0, :]
    return np.sqrt((T1 - T2) ** 2).mean(axis=0)

## Variational estimation of optimal parameters for a predefined closure

We will try here to estimate the parameters of `naive_parameterization` with a variational approach. 

In [None]:
# - assuming the formulation of the parameterization
gcm = GCM(F, naive_parameterization)

### Estimating parameters based on one initial condition and one time step

#### Cost function 

What we will be doing here is very close to what is done with classical variational data assimilation, where we try to estimate the state of the parameters of a model through the minimization of a cost function $J$. This is also very close to what is done when parameterizations are encoded as neural networks. 

We introduce a cost function $J(p)$ which depends on the parameters of the closure. 

$$J(p) = ||X_p - X_{true}||_{d}$$

where $p=[p1,p2]$, $X_p$ is GCM solution computed with with parameters $p$ and $||\cdot ||_{d}$ is one of the distances above. 


In [None]:
def cost_function(param):
    F, dt, T = 18, 0.01, 0.01
    X_gcm, t = gcm(W.X, dt, int(T / dt), param=param)
    return norm_initial_tendency(X_true, X_gcm)

#### Minimization 

The problem dimension being small enough (2 parameters to find), one can use efficient derivative-free optimization methods.


In [None]:
prior = np.array([0.85439536, 1.75218026])  #  prior
res = opt.minimize(cost_function, prior, method="Powell")
opt_param = res["x"]

In [None]:
opt_param
# Do we need this line?

##### Let's test our closure.

In [None]:
F, dt, T = 18, 0.01, 100.0
gcm = GCM(F, naive_parameterization)
X_optimized, t = gcm(W.X, dt, int(T / dt), param=opt_param)
X_prior, t = gcm(W.X, dt, int(T / dt), param=prior)

# - ... the true state
X_true, _, _ = W.run(dt, T)

##### Results

In [None]:
plt.figure(dpi=150)
plt.plot(t[:500], X_true[:500, 4], label="Truth")
plt.plot(t[:500], X_prior[:500, 4], label="Initial GCM")
plt.plot(t[:500], X_optimized[:500, 4], label="Optimized GCM")
plt.xlabel("$t$")
plt.ylabel("$X_4(t)$")
plt.legend(fontsize=7);

The results are better but not great. This problem is related to the question of *a priori* versus *a posteriori* skill in LES closures. 

What is LES closures? Add link maybe

### Estimating parameters which optimize longer trajectories

In [None]:
F, dt, T = 18, 0.01, 5.0
gcm = GCM(F, naive_parameterization)
X_true, _, _ = W.run(dt, T)

In [None]:
X_gcm, t = gcm(W.X, dt, int(T / dt), param=[0, 0])

In [None]:
def cost_function(param):
    F, dt, T = 18, 0.01, 5
    X_gcm, t = gcm(W.X, dt, int(T / dt), param=param)
    return pointwise(X_true, X_gcm, L=5.0).sum()

In [None]:
prior = np.array([0.85439536, 1.75218026])  #  prior
res = opt.minimize(cost_function, prior, method="Powell")
opt_param = res["x"]

##### Let's test our closure.

In [None]:
F, dt, T = 18, 0.01, 100.0
gcm = GCM(F, naive_parameterization)
X_optimized, t = gcm(W.X, dt, int(T / dt), param=opt_param)
X_prior, t = gcm(W.X, dt, int(T / dt), param=prior)

X_true, _, _ = W.run(dt, T)

##### Results

In [None]:
plt.figure(dpi=150)
plt.plot(t[:500], X_true[:500, 4], label="Truth")
plt.plot(t[:500], X_prior[:500, 4], label="Initial GCM")
plt.plot(t[:500], X_optimized[:500, 4], label="Optimized GCM")
plt.xlabel("$t$")
plt.ylabel("$X_4(t)$")
plt.legend(fontsize=7);

Our closure produces better results but it is not clear how this would generalize to unseen initial conditions. 

## Discussion and Possible next steps

 - Estimating parameters over one time-step but with an ensemble of initial conditions. 
 - Encode the closure with neural networks.
 - The need for differentiable GCMs. 