# Data Management
In this notebook we outline the usage of the data management utility functions that are part of this project. These functions revolve around saving and loading simulation data of (cellular automaton) grids, which make up lists of 2D arrays. Their usage allows for more efficient data analysis as described in notebooks 01-04, as the data can be generated once and reloaded anytime. 

As part of the data management utilities, we have included a function for running and saving simulations in parallel, which greatly speeds up the simulation process.

## Setting up

In [None]:
# import matplotlib.pyplot as plt #to remove later 
import sys
from pathlib import Path
project_root = Path("..").resolve()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import src.CA_model as CA
# import src.analysis as an
import src.utils as ut
import numpy as np
# import matplotlib.pyplot as plt
import time
from importlib import reload

#### Reload our own modules in case they are updated

In [None]:
# run to reload CA_model.py and utils.py for updated code
reload(CA)
reload(ut)

## Generating a single evolution
After setting the desired parameters for the CA_model module, a sequence of grids in time can be generated using the evolve_CA function.

In [None]:
# parameter settings
size = 100                          # width and height of the grid
p = 0.5                             # starting fraction of vegetation
update_rule = CA.update_Scanlon2007 # function containing update rule
true_frac=0.2                       # 'natural' (equilibrium) fraction of vegetation
k=3                                 # strength of local interactions
M=10                                # radius of neighbourhood
N_steps=200                         # number of iterations
skip=100                            # iterations to skip (equilibration period)
seed=0

grids = CA.evolve_CA(
    size=size,
    p=p,
    update_rule=update_rule,
    true_frac=true_frac,
    k=k,
    M=M,
    N_steps=N_steps,
    skip=skip,
    seed=seed,
)

### Saving the data
The utils.py module contains a function for saving all grids of one full evolution. All that needs to be supplied is the list of grids simulated, and the corresponding parameter settings to put in the filename. The data will be saved in a folder data/[update_rule_name], which will be created if it did not exist already.

In [None]:
ut.save_data(grids, size, update_rule, true_frac, k, M, N_steps, skip, seed=seed)

### Loading the data
The utils.py module also contains a function for loading data that was saved using the save_data function.

In [None]:
loaded_grids = ut.load_data(size, update_rule, true_frac, k, M, N_steps, skip, seed=seed)

# check that the saved data is correctly recovered from the original data
assert np.all(loaded_grids == grids)

## Generating and saving several datasets
This can be done either in the straightforward way of looping through a list of varying parameter values, i.e. sequentially, or by running the simulations in parallel. Both are shown respectively in the following.

### Sequentially
In the example below, we are running the same set of parameter for different seeds.

In [None]:
size = 100                          # width and height of the grid
p = 0.5                             # starting fraction of vegetation
update_rule = CA.update_Scanlon2007 # function containing update rule
true_frac=0.2                       # 'natural' (equilibrium) fraction of vegetation
k=3                                 # strength of local interactions
M=10                                # radius of neighbourhood
N_steps=20                          # number of iterations
skip=0                              # iterations to skip (equilibration period)
starting_seed=0

N_evolutions = 5                    # number of full evolutions to generate for this set of parameters

for i in range(N_evolutions):
    start = time.time()
    seed = starting_seed+i
    grids = CA.evolve_CA(
        size=size,
        p=p,
        update_rule=update_rule,
        true_frac=true_frac,
        k=k,
        M=M,
        N_steps=N_steps,
        skip=skip,
        seed=seed,
    )
    end = time.time()
    print(f"Grid evolution {i+1} out of {N_evolutions} completed in {end-start} seconds.")

    ut.save_data(grids, size, update_rule, true_frac, k, M, N_steps, skip, seed)

### In parallel

In [None]:
size = 200                          # width and height of the grid
p = 0.5                             # starting fraction of vegetation
update_rule = CA.update_Scanlon2007 # function containing update rule
true_fracs=np.arange(0.05,0.7,0.05) # 'natural' (equilibrium) fraction of vegetation
k=3                                 # strength of local interactions
M=20                                # radius of neighbourhood
N_steps=200                         # number of iterations
skip=100                            # iterations to skip (equilibration period)
starting_seed=0

ut.generate_parallel_true_fracs(size, p, update_rule, true_fracs, k, M, N_steps, skip, starting_seed)