```
This notebook tests and demonstrates dynamic time steping.

Copyright (C) 2019  SINTEF Digital

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
```

# Variable $\Delta t$ 

The maximum timestep allowed for CDKLM is given by the CFL conditions
$$\Delta t \leq \frac14 \min \left\{ 
            \frac{\Delta x}{\max_{\Omega} \left| u \pm \sqrt{gh} \right|},
            \frac{\Delta y}{\max_{\Omega} \left| v \pm \sqrt{gh} \right|}
\right\}.$$

Deterministic ocean models where dynamics are dominated by geostrophic balances and we have constant $H$, the currents $u$ and $v$ are typically small compared to theoretic gravity waves $\sqrt{gh}$.
There are typically therefore no strong need to check this condition.
With additive model errors, however, there is a slim chance that $u$ and $v$ at a given point increases more than expected, and we get a problem with stability with regards to the  CFL conditions.

The chance for this to happend with a given simulator is small, but when we run an ensemble over long time, the chances for this to happend at any point become larger.

## How this was implemented in OpenCL (branch variable-dt)
- The CFL conditions is computed during the flux computations, since the velocities reconstructed at the faces are available at this time. This gives per thread max dt, and is stored in shmem.
- At the end of the step kernel, a reduction is carried out on the shmem, and per block max dt is written to global memory.
- A global reduction of the per-block buffer gives the global max dt at index `[0][0]` in the dt buffer.

## How we implement this now in CUDA
- Since it is sufficient to check the CFL conditions only now and then, we make the functionality decoupled from the step-kernel. 
- Use the latest state to find per thread (per cell) max dt, and write this to shmem.
- Write a single per block max dt to global mem
- A global reduction kernel to find dt across the domain.


In [None]:
#Lets have matplotlib "inline"
%matplotlib inline
#%config InlineBackend.figure_format = 'retina'

#Import packages we need
import numpy as np
from matplotlib import animation, rc
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec

import os
import datetime
import sys

from importlib import reload

sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '../')))

#Set large figure sizes
#rc('figure', figsize=(16.0, 12.0))
#rc('animation', html='html5')
plt.rcParams["animation.html"] = "jshtml"

#Import our simulator
from SWESimulators import CDKLM16, PlotHelper, Common, WindStress, IPythonMagic
#Import initial condition and bathymetry generating functions:
from SWESimulators.BathymetryAndICs import *

In [None]:
%setup_logging --out compareschemes2d.log
%cuda_context_handler gpu_ctx

In [None]:
# Set initial conditions common to all simulators
sim_args = {
"gpu_ctx": gpu_ctx,
"nx": 128, "ny": 210,
"dx": 200.0, "dy": 200.0,
"dt": 1,
"g": 9.81,
#"f": 0,
"f": 0.0012,
"coriolis_beta": 1.0e-6,
"r": 0.0,
"num_threads_dt": 8
}

In [None]:
def sim_step_plot(simulator, T):
    sim.step(10*T)
    eta1, u1, v1 = sim.download(interior_domain_only=True)
    sim.updateDt()
    dt = sim.downloadDt()
    
    #Create figure and plot initial conditions
    fig = plt.figure(figsize=(16, 8))
    domain_extent = [0, sim.nx*sim.dx, 0, sim.ny*sim.dy]
    
    ax_eta = plt.subplot(1,4,1)
    sp_eta = ax_eta.imshow(eta1, interpolation="none", origin='bottom', vmin=-1.5, vmax=1.5, extent=domain_extent)
    
    ax_u = plt.subplot(1,4,2)
    sp_u = ax_u.imshow(u1, interpolation="none", origin='bottom', vmin=-1.5, vmax=1.5, extent=domain_extent)
    
    ax_v = plt.subplot(1,4,3)
    sp_v = ax_v.imshow(v1, interpolation="none", origin='bottom', vmin=-1.5, vmax=1.5, extent=domain_extent)
    
    ax_dt = plt.subplot(1,4,4)
    sp_dt = ax_dt.imshow(dt, interpolation="none", origin='bottom', extent=domain_extent)
    #plt.colorbar(mappable=sp_dt, ax=ax_dt)
    
    fig.suptitle("Time = {:04.0f} s ({:s}), {:d} steps".format(sim.t, sim.__class__.__name__, sim.num_iterations), fontsize=18)


In [None]:
reload(CDKLM16)

ghosts = np.array([2,2,2,2]) # north, east, south, west
dataShape = (sim_args["ny"] + ghosts[0]+ghosts[2], 
             sim_args["nx"] + ghosts[1]+ghosts[3])

H = np.ones((dataShape[0]+1, dataShape[1]+1), dtype=np.float32) * 60.0
eta0 = np.zeros(dataShape, dtype=np.float32)
u0 = np.zeros(dataShape, dtype=np.float32)
v0 = np.zeros(dataShape, dtype=np.float32)

#Create bump in to lower left of domain for testing
addCentralBump(eta0, sim_args["nx"], sim_args["ny"], sim_args["dx"], sim_args["dy"], ghosts)

#Initialize simulator
ctcs_args = {"H": H, "eta0": eta0, "hu0": u0, "hv0": v0, "rk_order": 2}
sim = CDKLM16.CDKLM16(**ctcs_args, **sim_args)


In [None]:
for i in range(3):
    sim_step_plot(sim, T=10)
for i in range(3):
    sim_step_plot(sim, T=20)



In [None]:
sim.updateDt(courant_number=1.0)
dt_blocks = sim.downloadDt()
print(np.min(dt_blocks), 0.8*np.min(dt_blocks), sim.dt)
print(dt_blocks)