<img SRC="https://avatars2.githubusercontent.com/u/31697400?s=400&u=a5a6fc31ec93c07853dd53835936fd90c44f7483&v=4" WIDTH=125 ALIGN="right">

# Caching

*O.N. Ebbens, Artesia, 2021*

Groundwater flow models are often data-intensive. Execution times can be shortened significantly by caching data. This notebooks shows some examples of caching using the nlmod package.

### Contents<a name="TOC"></a>
1. [Cache directory](#cachedir)
2. [Caching in nlmod](#cachingnlmod)
3. [](#)

In [9]:
import matplotlib.pyplot as plt
import flopy
import os
import geopandas as gpd

import nlmod

print(f'nlmod version: {nlmod.__version__}')

nlmod version: 0.0.2b


### [1. Cache directory](#TOC)<a name="cachedir"></a>

When you create a model you usually start by assigning a model workspace. This is a directory where model data is stored. The `nlmod.util.get_model_dirs()` function can be used to create a file structure in two steps.
First the model workspace directory is created if it does not exists yet. Secondly, two subdirectories are created: 'figure' and a 'cache'. Calling the function below we create the `figdir` and `cachedir` variables with the paths of the subdirectories. In this notebook we will use this `cachedir` to write and read cached data. It is possible to define your own cache directory.

In [4]:
model_ws = 'model5'

# Model directories
figdir, cachedir = nlmod.util.get_model_dirs(model_ws)

print(model_ws)
print(figdir)
print(cachedir)

model5
model5\figure
model5\cache


### [2. Caching in nlmod](#TOC)<a name="cachingnlmod"></a>

In the nlmod package you can use NetCDF files to cache model data. You can easily read and write NetCDF files as an `xarray.Dataset`. As you can see in the example notebook [01_basic_model](01_basic_model.ipynb), an `xarray.Dataset` is used to store most of our model data. Running the code below creates a Dataset `layer_model` with data from regis.

In [5]:
# layer model
layer_model = nlmod.read.regis.get_layer_models(extent=[95000.0, 105000.0, 494000.0, 500000.0],
                                                delr=100., delc=100., use_geotop=False)

layer_model

The `get_layer_models` function takes some time to complete because the data is read from a server and projected on the desired model grid. Everytime you run this function you have to wait for this process to finish which results in long execution times and an unhealthy number of coffee breaks. This is where caching comes into play.

If you cache the data when you run a function you can use the cached data everytime you re-run the same function, reducing the execution time signficantly. The `get_layer_models` function has some options to do exactly this. If we use the keyword arguments `use_cache=True`, `fname_netcdf='combined_layer_ds.nc'` and `cachedir=cachedir` these steps are completed:
1. See if there is a netCDF file with the name 'combined_layer_ds.nc' in the cache directory. If the file exists go to step 2, otherwise go to step 3.
2. Check if the cached dataset has the same properties as the desired dataset. Which in this case means that the extent, delr and delc of the cached dataset correspond to the desired dataset. If so, return the cached dataset otherwise go to step 3.
3. Call the `get_combined_layer_models` function to obtain a new dataset. Save this dataset as 'combined_layer_ds.nc' in the cache directory and return the dataset.

When you run the function below twice you wil see that the second time you run it, the execution time is significantly shortened by using the cached dataset. Using `verbose=True` some information about the caching is printed.

In [8]:
# layer model
layer_model = nlmod.read.regis.get_layer_models(extent=[95000.0, 105000.0, 494000.0, 500000.0],
                                                delr=100., delc=100., use_geotop=False,
                                                use_cache=True, fname_netcdf='combined_layer_ds.nc',
                                                cachedir=cachedir, verbose=True)
layer_model

found cached combined_layer_ds.nc, loading cached dataset
delr of current grid is the same as cached grid
delc of current grid is the same as cached grid
extent of current grid is the same as cached grid


uitleg geven over:
- hoe werkt `util.get_cache_netcdf`.

nadelen caching in de huidige vorm:
- je hebt twee functies van alles (get_layer_models en get_combined_layer_models). Hierdoor definieer je twee keer de default values van alle functies. Op dit moment is de naamgeving van die functies ook niet dusdanig dat duidelijk is welke cache functie hoort bij de originele functie. Dit kan vast beter met een decorator oid.