# Nested Runs

In this tutorial, we will explore the `nestedRuns` module from the postopus package, which allows you to analyze multiple runs at once. This feature is useful when working with a large number of simulation runs, making it easier to access and process data from multiple runs simultaneously.

In [None]:
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd

from postopus import nestedRuns

input file is already defined in the folder (s. GitLab repo), otherwise we recommend defining it in the notebook:

In [None]:
cd ../octopus_data/nested_runs/

For this example, we need to trigger Octopus from a python script.

In [None]:
!python3 create_runs.py

The initialization of a `nestedRun` object is very similar to the one of a `Run` object. As an argument we need to pass a `Pathlib.Path` of the folder that contains the multiple runs. 

In [None]:
n = nestedRuns(Path("."))

The `nestedRun` object will contain nested Dictionaries. These reflect the traversing path from the current working directory to the folder with the data (in this case the `nested_runs` folder). In the data folder, we will also see the `nestedObjects` object that contains the initialised `Run` objects that will be used to retrieve the data. To make the structure of the `nestedRun` object as easy as possible you can initialize the `nestedRun` object in the same folder where your data is stored. This will make the data access much easier as we will see below.

In [None]:
n

To access the data, we will need to traverse the nested dictionary tree structure until we get to the data level sub-tree. We can use the dot notation and tab completion for traversing. Except for the cases where the path contains a dot. The Python interpreter has problems with that. In these cases, we need to use the typical dictionary-attribute-accesing syntax `[""]`. Since we are already in the correct folder we don't need to traverse further.

In [None]:
nruns = n

In [None]:
nruns

We can extract individual `Run` objects from the original `nestedRun` object and do all the operations that we know from the other tutorials. Note that here we need to use the square brackets syntax because the individual run folders have a dot in the name (e.g. `deltax_0.6`).

In [None]:
run = nruns["deltax_0.6"]

In [None]:
run

In [None]:
run.default.scf.density().plot()

We also can use the `nestedObjects.apply` method to retrieve data from each of the run objects at the same time. E.g. here we can see all the convergence data of three different `octopus` runs in one multiindex dataframe.

In [None]:
convergence = pd.concat(nruns.apply(lambda run: run.default.scf.convergence()))

In [None]:
convergence

Taking this combined dataframe we can produce complex plots with a few lines of code:

In [None]:
def get_parameter_from_path(path):
    # this is a hack to get the spacing from the path
    return float(path[-3:])

In [None]:
def get_converged_data(convergence):
    # get only the information from the last iteration for each run
    converged = convergence.groupby(level=0).tail(1).droplevel(1)
    i = converged.index
    combined = converged.set_index(i.map(get_parameter_from_path)).sort_index()
    return combined

In [None]:
converged = get_converged_data(convergence)

In [None]:
converged

In [None]:
width = 5
f, ax = plt.subplots(1, 1, figsize=(width, width * 0.6), sharex=True)
ax.plot(converged.index, converged.energy)
ax.set_ylabel("Total energy [eV]")
ax.set_xlabel(r"Spacing [$\AA$]")
f.tight_layout()
f.savefig("convergence.png")

f, ax = plt.subplots(1, 1, figsize=(width, width * 0.6), sharex=True)
for k, group in convergence.groupby(level=0):
    ax.semilogy(
        group.loc[k].index,
        group.rel_dens,
        label=rf"Spacing {get_parameter_from_path(k)} $\AA$",
    )
ax.legend()
ax.set_ylabel("Relative density change")
ax.set_xlabel(r"Iteration number")
f.tight_layout()

We can also retrieve field data in an analogous manner.

In [None]:
nruns.apply(lambda run: run.default.scf.density())