# Detailed overview
## Introduction

Postopus is a post-processing tool for [Octopus](https://octopus-code.org/) (POSTprocessing for OctoPUS). It provides a user-friendly interface to find and read data written by Octopus throughout a simulation and offers common operations to evaluate this data.

Octopus takes an input file describing the systems and the simulation parameters.
The input file is called `inp` and is a text file with no extension.
The command `octopus` would then have to be executed in the folder that contains this `inp` file.
Since octopus doesn't allow custom names for the input file, a possible project structure could look like the following:

                  
    ├── benzene                    # Project folder
    │   ├── benzene.xyz            # Geometry file or other supporting files
    │   └── inp                    # Input file
    ├── h-atom
    │   └── inp
    ├── he
    │   └── inp
    ├── methane                     # Project folder in case of a multi-stage calculation
    │   ├── calculation_gs          # Ground state calculation
    │   │   └── inp
    │   ├── calculation_td          # Time dependent calculation
    │   │   └── inp
    │   └── inp                     # Input file for the whole calculation (The other files must be placed here one by one in each stage)
    └── recipe
        └── inp
To then run one of these simulations one would run the command `octopus` in the root of the respective folder.

## Running the simulation
As mentioned before, running a simulation involves two steps:
1. Change the directory to the project folder (that contains the `inp` file)
2. Run the command `octopus`(optionally store the octopus output in a log file by calling `octopus > out_gs.log 2>&1`). 

The above two steps could theoretically be executed in a separate shell. However, since the Jupyter notebook has terminal capabilities, it is recommended to execute all steps of a workflow (from defining the input file to the final analysis) in a single notebook (when computationally feasible), as we will do in the following. The notebook then serves as a record of all the steps taken to achieve a particular result. This increases the reproducibility of your work and makes it easier for others to understand and reuse your conclusions. For the latter reason, we also recommend using `!pip freeze` in the notebook to keep track of the software versions of the used Python packages and to print out the Octopus version used to generate the data with `octopus -v`.



In [None]:
!octopus -v

In [None]:
!pip freeze

In [None]:
!mkdir -p "examples/interference"

In [None]:
cd examples/interference/

We now create the input file (inp). To do this we use the magic command `%%writefile inp` of the Jupyter Notebook to write the contents of the cell to a file called `inp` (our input file) in the current directory.

In [None]:
%%writefile inp

stdout = "td_stdout.txt"
stderr = "td_stderr.txt"

CalculationMode = td
ExperimentalFeatures = yes
FromScratch = yes

%Systems
  'Maxwell' | maxwell
%

Maxwell.ParStates = no

# Maxwell box variables
lsize_mx = 10.0

Maxwell.BoxShape = parallelepiped

%Maxwell.Lsize
 lsize_mx | lsize_mx | lsize_mx
%

dx_mx = 0.5

%Maxwell.Spacing
 dx_mx | dx_mx | dx_mx
%

# Maxwell calculation variables
%MaxwellBoundaryConditions
 plane_waves | plane_waves | plane_waves
%

%MaxwellAbsorbingBoundaries
 not_absorbing | not_absorbing | not_absorbing
%

# Output variables
OutputFormat = plane_x + plane_y + plane_z + axis_x + axis_y + axis_z

# Maxwell output variables
%MaxwellOutput
 electric_field
 magnetic_field
 maxwell_energy_density
 trans_electric_field
%
MaxwellOutputInterval = 10
MaxwellTDOutput = maxwell_energy + maxwell_total_e_field + maxwell_total_b_field

# Time step variables
TDSystemPropagator = prop_expmid
dt = 1 / ( sqrt(c^2/dx_mx^2 + c^2/dx_mx^2 + c^2/dx_mx^2) )
TDTimeStep = dt
TDPropagationTime = 0.35

# laser propagates in x direction
k_1_x  =  0.707107
k_1_y  = -0.707107
k_2_x  = -0.447214
k_2_y  = -0.223607
E_1_z  =  0.5
E_2_z  =  0.5
pw_1   =  5.0
pw_2   =  7.5
ps_1_x = -sqrt(1/2) * 20.0
ps_1_y =  sqrt(1/2) * 20.0
ps_2_x =  sqrt(2/3) * 20.0
ps_2_y =  sqrt(1/3) * 20.0

%MaxwellIncidentWaves
  plane_wave_mx_function | electric_field | 0 | 0 | E_1_z | "plane_waves_function_1"
  plane_wave_mx_function | electric_field | 0 | 0 | E_2_z | "plane_waves_function_2"
%

%MaxwellFunctions
  "plane_waves_function_1" | mxf_cosinoidal_wave | k_1_x | k_1_y | 0 | ps_1_x | ps_1_y | 0 | pw_1
  "plane_waves_function_2" | mxf_cosinoidal_wave | k_2_x | k_2_y | 0 | ps_2_x | ps_2_y | 0 | pw_2
%


Assuming you have octopus in your PATH:

In [None]:
!octopus

## Loading Data with Postopus
To load data with Postopus the path to the output directory of the Octopus simulation is required. In this folder, all output data, as well as the input file `inp` are expected. Data is found automatically and can be discovered by the user by listing all found systems/fields/etc or using auto-completion at run time, e. g. when using Jupyter Notebook.

The entry point for users to Postopus is the `Run` class.

In [None]:
from postopus import Run

run = Run()

The `Run` object discovers available data on the file system and builds a data structure allowing access. If no path to a directory containing the inp file is passed to the run object (i.e. `run = Run()`), then the run object searches in the current working directory. The run object can also be instantiated with a specific path (i.e. `run = Run("path/to/inpfile_directory/")`). In general, the data structure allows choosing data with the following syntax:

run.*systemname*.*calculationmode*.*output_name*

Parameters set in italics must be replaced with values that mostly correspond to values set in the input file. A closer look at those will be taken in the following sections.

## System selection

The first parameter to select is the system's name. Octopus allows to simulate multiple systems at once, as well as so-called "multisystem"s which build a hierarchy of encapsulated systems.  
Checking out the "Systems" block in the `inp`, the Maxwell system can be found:  
```
%Systems
  'Maxwell' | maxwell
%
```  
One system with the name "Maxwell" of type "maxwell". The types here are relevant for Octopus, for us the system names are of interest.  
Be aware that simulation with Octopus is also possible without setting any systems. In that case, the system's type will be set (by Octopus) to "electronic_system". As Postopus requires a name for this system, it will be named "**default**" (while not having a name in Octopus). Also, the "default" system will always exist, as it is used to store global parameters read from the `inp`, but will never contain any data when the "Systems" block is defined in `inp`.

Besides reading these names from the `inp`, it also is possible to access this via Postopus. Use:

In [None]:
run

## System data - Calculation modes and subsytems

To load data from a system, we now call: `run.Maxwell`.
This gives a list of the calculation modes which are available via Postopus:

In [None]:
run.Maxwell

This is expected, as the `CalculationMode` variable in the `inp` is set to "td". As the time-dependent calculation ("td") required a previous self-consistent field simulation ("scf") this data also could be present in the output folder. If this would be the case, one could see 
```
System(name='Maxwell', rootpath='.'):
Found calculation modes:
    'scf'
    'td'
```
as output and select between these two. For multisystem examples like the [celestial_bodies tutorial](https://octopus-code.org/documentation/13/tutorial/multisystem/solar_system/), we would also have the keys `Moon`, `Earth` and `Sun` as subsystems, for example `run.SolarSystem.Earth` and `run.SolarSystem.Moon` or, if three levels of nesting are used, `run.SolarSystem.Earth.Terra` and `run.SolarSystem.Earth.Luna`.

## Outputs

Getting a list of all available outputs can be done with:

In [None]:
run.Maxwell.td

In [None]:
run.Maxwell.td.maxwell_energy

Call the output to get the provided data

In [None]:
run.Maxwell.td.maxwell_energy()

Octopus produces output files accross multiple folders on the file system. Postopus tries to group them together and provide them in a single object.

As example, Octopus outputs the files "e_field-x", "e_field-y" and "e_field-z" (with multiple extensions ".x=0", ".y=0", ".x=0,y=0", ...) for each `n` step of the simulation in "output_iter/td.0000000", "output_iter/td.0000010", "output_iter/td.0000020", ...

In Postopus all these files are united and provided by the "e_field" output:


In [None]:
run.Maxwell.td.e_field

In [None]:
data = run.Maxwell.td.e_field(source=".x=0")
data

The data is provided as xarray.Dataset here. By selecting `data.sel(t=0.21, method="nearest")` one could access the files in ""output_iter/td.0000010". By accessing the component `data.vy` one would read from `e_field-y.x=0`.

Note that the data is accessed "lazily". This means, as long one does not work with the data no files are accessed. Only when the data is needed (e.g. when doing `data.values`, `data.min()`, ...) loading the files will be invoked.
Also only the files which match to the selection are used. In `data.sel(t=0.21, method="nearest").values` files in "output_iter/td.0000000" won't be touched.
Be aware that accessing the values before selecting specific steps may take a while when dealing with long simulations.

The data provided by postopus is in most cases either a [pandas](https://pandas.pydata.org/docs/index.html) DataFrame (as in `run.Maxwell.td.maxwell_energy()`) or an [xarray](https://docs.xarray.dev/en/stable/) DataArray/Dataset (as in `run.Maxwell.td.e_field(source=".x=0")`). Outputs returning an xarray object are referred to as "field".

For outputs where different sources (file extensions) are available the source has to be provided as parameter.

## Working with field data

After we have discovered all available data, we finally want to work with the values.

### Get data

To get the data we call the output. 
If we are dealing with `td` data, the call will transform the `step` into time `t` (`step * TDTimestep` (from parser.log)).
Our example `inp` has defined `MaxwellOutputInterval` (also could be `OutputInterval`) with a value of 10, meaning Octopus will write all fields every 10 simulation steps.

To load the e_field in z direction at the plane xy at z=0 and at t in [0.1, 0.2] we use:

In [None]:
e_field_plane = (
    run.Maxwell.td.e_field(source="z=0").sel(t=[0.1, 0.2], method="nearest").vz
)

The returned data itself is an object of a `xarray.DataArray`. This object has the following attributes  
    - `values` contains the data as a `NumPy` array  
    - `coords` provides the correct spatial coordinates for every data point in `values`  
    - `dims` gives the number of dimensions for the data, as well as the dimension names  

In [None]:
# Xarray object
e_field_plane

In [None]:
# Actual values
e_field_plane.values

In [None]:
# Shape of the values
e_field_plane.values.shape

In [None]:
# Dimensions of the data
e_field_plane.dims

In [None]:
# Coordinates of the data
e_field_plane.coords

In [None]:
print(e_field_plane.coords["x"].shape)
print(e_field_plane.coords["y"].shape)

Plotting this could now be done with Matplotlib's `imshow()` or `countour()`, or one could use `xarray`'s plot or `holoviews`. More information in [Plotting Documentation](xarray-plots1.ipynb).

The `source` parameter can be omitted if there is only one source (file-extension), in which case Postopus will use the one available extension for the requested files. If there is more than one source, postopus will throw a `ValueError`, showing the user the different available sources:

In [None]:
run.Maxwell.td.e_field()

You can also select the data by index with the `isel` method. The `indices` parameter can be also negative, like in python `list`s. One can also `isel` a `list` of indices or a `slice`. This method could come handy in case you don't want to look up the step number of the last iteration for example:

In [None]:
run.Maxwell.td.e_field(source="z=0").isel(t=-1).vz

The `Field` above is identical as the following one, which holds the data for the last iteration

In [None]:
run.Maxwell.td.e_field(source="z=0").sel(t=0.337, method="nearest").vz

Note that selection by step can be enabled by using `set_xindex("step")`. Use with cation (or drop the index at a later point) as now `t` and `step` refers to the same axis which might confuse other python packages (they will try things like selecting `e_field.sel(t=0.5, step=10)`, which then raises an error).

In [None]:
run.Maxwell.td.e_field(source="z=0").set_xindex("step").sel(step=160).vz

## Plotting

Plotting the data is well integrated with common packages, such as matplotlib and holoviews:

In [None]:
import matplotlib.pyplot as plt

e_field_plane = run.Maxwell.td.e_field(source="z=0").sel(t=0.15, method="nearest")

plt.imshow(e_field_plane.vz.values);

Xarray also provides a convenient `.plot` method. It uses matplotlib internally and makes use of the additional metadata such as dimensions and coordinates to create axis labels and a colorbar. For more details please refer to the [plotting Documentation](xarray-plots1.ipynb).

In [None]:
e_field_plane.vz.plot(x="x");

To generate dynamic and/or higher dimensional plots we recommend using `holoviews`, details can be found in the [holoviews tutorial](holoviews_with_postopus.ipynb):

In [None]:
import holoviews as hv

hv.extension("bokeh")  # Allow for interactive plots

In [None]:
# Note for web users: You should have an active notebook session to interact with the plot
e_field_over_time = run.Maxwell.td.e_field(source="z=0")
hv_ds = hv.Dataset(e_field_over_time.vz)
hv_im = hv_ds.to(hv.Image, kdims=["x", "y"])
hv_im

## Postprocessing 

To manipulate Xarray data is in general fairly simple, as it integrates many built-in methods from the scipy and NumPy libraries among others (see the [documentation](https://docs.xarray.dev/en/stable/user-guide/computation.html)), having the advantage that the operations that involve spatial manipulation are more intuitive (see also for example the [xrft tutorial](xrft.ipynb)). The results of the computations are itself xarrays, so it is still possible to do use all the plotting methods presented above. 

In [None]:
integrated_field = e_field_over_time.vz.integrate(coord="y")

In [None]:
integrated_field

As we integrated one of the coordinates, we are dealing now with 1D data that evolves in time, thus we are not going to use holoviews.Images, but holoviews.Curves.

In [None]:
# Note for web users: You should have an active notebook session to interact with the plot
hv_ds_int = hv.Dataset(integrated_field)
hv_im_int = hv_ds_int.to(hv.Curve, kdims=["x"], dynamic=True)
hv_im_int