# A short introduction to Xarray-simlab

[Xarray-simlab](https://xarray-simlab.readthedocs.io) is a both a framework for building (or assembling) models and a xarray extension for driving the model (simulations).

In this notebook, we'll see how to import an existing xarray-simlab model, explore its components and run simulations. We'll use Fastscape (landscape evolution model) as an example (more info at https://fastscape.org/)

## Environment setup

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import xsimlab as xs

from dask.distributed import LocalCluster, Client

import hvplot.xarray
from ipyfastscape import TopoViz3d

%load_ext xsimlab.ipython

## Import and inspect an Xarray-simlab model

Let's import `basic_model` from the `fastscape` package: 

In [None]:
from fastscape.models import basic_model

This model simulates the long-term evolution of topographic surface elevation (hereafter noted $h$) on a 2D regular grid. The local rate of elevation change, $\partial h/\partial t$, is determined by the balance between uplift (uniform in space and time) $U$ and erosion $E$.

$$\frac{\partial h}{\partial t} = U - E$$

Total erosion $E$ is the combined effect of the erosion of (bedrock) river channels, noted $E_r$, and erosion- transport on hillslopes, noted $E_d$

$$E = E_r + E_d$$

Erosion of river channels is given by the stream power law:

$$E_r = K_r A^m (\nabla h)^n$$

where $A$ is the drainage area and $K$, $m$ and $n$ are parameters.

Erosion on hillslopes is given by a linear diffusion law:

$$E_d = K_d \nabla^2 h$$

Here, `basic_model` is an [xsimlab.Model](https://xarray-simlab.readthedocs.io/en/latest/_api_generated/xsimlab.Model.html#xsimlab.Model) object, i.e., a collection of inter-dependent components (or "processes") that together form a computational model.  Just typing `basic_model` shows the ordered list of components as well as all model inputs (parameters), grouped by the component to which they belong:

In [None]:
basic_model

To have a better picture of all processes (and inputs and/or variables) in the model, we can visualize it as a graph. Processes are in blue and inputs are in yellow. The order in the graph corresponds to the order in which the processes will be exectued during a simulation.

In [None]:
basic_model.visualize(show_inputs=True)

More information can be shown for each process in the model, e.g., for the grid component here below. We can see all the variables defined in that components (thus not only those that are inputs of ``basic_model``):

In [None]:
basic_model.grid

Xarray-simlab also automatically generates documentation (docstrings) for each model component:

In [None]:
basic_model.topography?

## Customize a model (using existing components)

Xarray-simlab is a modular framework: models can be easily customized by adding, dropping or replacing processes.

The `basic_model` imported above computes flow paths using a single flow direction algorithm. We can switch to a multiple flow direction algorithm by replacing the "flow" process with another process `MultiFlowRouter` available in `fastscape`:

In [None]:
from fastscape.processes import MultipleFlowRouter

model = basic_model.update_processes({'flow': MultipleFlowRouter})

Let's visualize this custom model:

In [None]:
# note the additional input for the "flow" process (multiple flow partition slope exponent)

model.visualize(show_inputs=True)

## Run one simulation

Let's create a simulation setup using the `model` object created above:

In [None]:
# %create_setup model -v -d
import xsimlab as xs

ds_in = xs.create_setup(
    model=model,
    clocks={
        'tstep': np.linspace(0., 1e6, 101),   # time steps in years
        'time': np.linspace(0., 1e6, 51),     # output snapshots every 2 steps 
    },
    master_clock='tstep',
    input_vars={
        # nb. of grid nodes in (y, x)
        'grid__shape': [201, 201],
        # total grid length in (y, x)
        'grid__length': [2e4, 2e4],
        # node status at borders
        'boundary__status': ['looped', 'looped', 'fixed_value', 'fixed_value'],
        # uplift rate
        'uplift__rate': 1e-3,
        # random seed
        'init_topography__seed': None,
        # MFD partioner slope exponent
        'flow__slope_exp': 1.0,
        # bedrock channel incision coefficient
        'spl__k_coef': 1e-4,
        # drainage area exponent
        'spl__area_exp': 0.4,
        # slope exponent
        'spl__slope_exp': 1,
        # diffusivity (transport coefficient)
        'diffusion__diffusivity': 1e-1,
    },
    output_vars={
        'topography__elevation': 'time',
        'drainage__area': 'time',
        'erosion__rate': 'time'
    }
)


The simulation setup is stored into a `xarray.Dataset` object

In [None]:
ds_in

Let's run the model...

In [None]:
with xs.monitoring.ProgressBar():
    ds_out = ds_in.xsimlab.run(model=model)

The simulation outputs are stored in another `xarray.Dataset`

In [None]:
# note the output variables "drainage__area" and "topography__elevation" present in this dataset

ds_out

Let's visualize the results using an interactive widget (ipyfastscape):

In [None]:
app = TopoViz3d(ds_out, canvas_height=500, time_dim="time")

app.show()

In [None]:
app.widget.close()

### Hands-on

- Try extracting and plotting cross-sections of the topography at various time steps
- Build an interactive plot of cross-sections (using `hvplot`)
- Compute erosion rates from the elevation output snapshots and compute the spatial average

In [None]:
((ds_out.uplift__rate-ds_out.topography__elevation.differentiate('time'))
 .mean(('x','y'))
 .plot(label=r'$U-\frac{\partial h}{\partial t}$'))
ds_out.erosion__rate.mean(('x','y')).assign_attrs({'units':r'$m yr^{-1}$'}).plot(label=r'$\dot \epsilon$')
plt.legend()

## Run simulation batches

Xarray-simlab allows to define extra dimensions for the model input variables. Here we'll use this feature to explore the influence of the flow routing slope partition exponent the on modelled topography.

Xarray-simlab leverages Xarray + Dask + Zarr so that simulations can be run in parallel and model outputs can be saved on disk (or any other storage supported by Zarr) while the simulation is running.

Let's first create a dask local cluster:

In [None]:
# note fastscape is not thread-safe -> cannot run multiple simulations in parallel with threads
# see also https://xarray-simlab.readthedocs.io/en/latest/run_parallel.html#multi-models-parallelism

cluster = LocalCluster(threads_per_worker=1)
client = Client(cluster)
client

Instead of re-creating a new simulation setup from scratch, we'll reuse the previous one and just update some input variables:

In [None]:
ds_in_batch = ds_in.xsimlab.update_vars(
    model=model,
    input_vars={
        'flow__slope_exp': ("flow__slope_exp", np.arange(0., 4.))   # (dimension name, values) tuple
    }
)

In [None]:
# note the dimension for the flow partition slope exponent
# since the dimension name matches the variable name, "flow__slope_exp"
# is promoted as a coordinate

ds_in_batch

Let's run a batch of simulations:

In [None]:
ds_out_batch = ds_in_batch.xsimlab.run(
    model=model,
    batch_dim="flow__slope_exp",  # dimension name used to pick input values for each simulation
    store="flow_runs.zarr",       # zarr (directory) store where to save the outputs
    parallel=True,                # run the simulations in parallel with dask 
    scheduler=client,             # use the dask local cluster created above
)

In [None]:
# note "drainage__area" and "topographic__elevation" which have 4 dimensions!

ds_out_batch

In [None]:
app = TopoViz3d(ds_out_batch, canvas_height=500, time_dim="time")

app.show()

In [None]:
app.widget.close()

### Hands-on

- Create facet plots showing the topographic elevation for different values of "flow__slope_exp" (rows) and different time steps (cols). 

In [None]:
np.log10(ds_out_batch.drainage__area).isel(time=[0,26,-1]).plot(row='flow__slope_exp', col='time', size=5, aspect=1.2);

## Time-varying input values

For model variables that are not "static", it is possible to provide input values with a time dimension (i.e., the dimension of the simulation main clock).

For example, let's explore variable block uplift rates as model external forcing (sudden 2x decrease in the middle of the simulation):

In [None]:
# we can leverage the xarray `.where` function here

u_t = ds_in.uplift__rate.where(ds_in.tstep < 5e5,
                               ds_in.uplift__rate / 2)

In [None]:
u_t.plot();

Let's run the simulation:

In [None]:
# note: it is possible to chain methods on xarray objects
# like below to update the simulation setup then run the simulation

in_vars = {'uplift__rate': u_t}

with model, xs.monitoring.ProgressBar():
    ds_out_ut = (
        ds_in
        .xsimlab.update_vars(input_vars=in_vars)
        .xsimlab.run()
    )

In [None]:
ds_out_ut

### Hands-on

- Show the influence of the sudden change of the uplift rate by plotting a time-series built from the model outputs.

In [None]:
ds_out_ut.topography__elevation.mean(('x','y')).plot()

## Further exercises

### Time + space varying values

The `uplift__rate` parameter in `model` also accepts values defined on a 2D grid, that we can combine with the time (steps) dimension.

In [None]:
# note the allowed dimensions: () or ('y', 'x')
# "()" means scalar value (empty tuple)

model.uplift.rate?

- Run a new simulation with space and time varying uplift rate

### Batch + time + space varying values

- What about running a batch of simulation with different overall magnitude for the time + space varying uplift rates? 