# Post processing and standard plots

In this tutorial we learn the basics for handling data generated by Struphy simulations. As an example we will use data generated by the notebook `run_main.ipynb`. You need to run that notebook before engaging in this tutorial. Moreover, you need to be in the same working directory where you launched `run_main.ipynb`. Here, we will

1. Look at the generated data from the simulation in `run_main.ipynb`.
2. Run the main post processing file `strupy/post_processing/pproc_struphy/main.py`, which is executed upon calling
    ```
    $ struphy pproc -d DIR
    ```
    from the console (we will not invoke the console and run directly from the notebook).
3. Inspect the data generated during post processing.
4. Extract the time grid.
5. Look at binning data of the kinetic distribution function (in v and in x-v space).
6. Look at 1D snapshots of one component of the magnetic field.

At first, we look at the raw data coming from the simulation (before post processing):

In [None]:
import os

path_out = os.path.join(os.getcwd(), 'struphy_run_main/')

os.listdir(path_out)

Two files and one folder named `data/` have been created by the simulation. The file `parameters.yml` is a copy of the parameter file used in the simulation. Let us check the metadata in `meta.txt`, where we can find some useful information about the simulation, such as date of execution, operating system, number of processes etc.:

In [None]:
with open(os.path.join(path_out, 'meta.txt')) as file:
    print(file.read())

Let us now inspect the content of the `data/` folder:

In [None]:
path_data = os.path.join(path_out, 'data/')

os.listdir(path_data)

Since the simulation was ran with only one MPI process from a notebook, only one `.hdf5` file has been created (on process 0). In general, one such file will be created for each process. Let us inpect the content of the file:

In [None]:
import h5py

with h5py.File(os.path.join(path_data, 'data_proc0.hdf5'), "r") as f:
    for key in f.keys():
        print(key + '/')
        for subkey in f[key].keys():
            print('    ' + subkey + '/')

We can see five top level keys under which data is stored:

1. `feec` stores the finite element coefficients of the electromagnetic fields (only the magentic field 'b2' in this example) and of each fluid species (only one species `mhd` in this case).
2. `kinetic` stores, for each kinetic species (only one species `energetic_ions` in this case), the binning data of the distribution function and a certain number of selected marker trajectories (can be specified in the parameters file).
3. `restart` stores data in case the simualtion has been interrupted.
4. `scalar` stores the scalar quantities of the simulation, such as energies for instance.
5. `time` stores the time grid.

As a user, we do not need to access the above data directly. This is handled by Struphy's main post processing routine, which should usually be called from the console:

        $ struphy pproc -d DIR
        
Here, we look at this routine in a bit more detail:

In [None]:
from struphy.post_processing.pproc_struphy import main
main?

We can see one manadatory argument, namely the `path` to the simulation output folder. Let us perform the post processing on our example folder. This will perform the following steps:

1. Creation of Psydac FemFields.
2. Evaluation of fields on the grid specified by `celldivide`.
3. Creation of `.vtk` files for further diagnostics in Paraview.
4. Evaluation of marker orbits on the mapped domain (Euclidean space).
5. Collection of binning data of the distriution function (and evaluation of background in case of df-methods) from all processes. 

In [None]:
main(path_out)

If we inspect again the simulation folder after post processing, we see that an additional folder `post_processing` has been created:

In [None]:
os.listdir(path_out)

Let us look at its content:

In [None]:
data_path = os.path.join(path_out, 'post_processing')
os.listdir(data_path)

We start by extracting the time grid of the simulation:

In [None]:
import numpy as np

t_grid = np.load(os.path.join(data_path, 't_grid.npy'))
t_grid

In [None]:
kinetic_path = os.path.join(data_path, 'kinetic_data')
fluid_path = os.path.join(data_path, 'fields_data')

print(os.listdir(kinetic_path))
print(os.listdir(fluid_path))

We can see that `kinetic_data` contains one folder for each kinetic species (here just one species `energetic_ions`), whereas `fields_data` contains binary files for physical and logical grids as well as the following folders:

1. `em_fields` for the evaluated electromagnetic fields.
2. One fluid species `mhd` for the evaluated mhd variables. Note that the magnetic field is stored under `em_fields`.
3. `vtk` for viewing FEEC variables in Paraview.

Let us work with the data of the `energetic_ions` first. There are two kinds of kinetic data:  

1. the binned `distribution_function` ,
2. particle `orbits` of selected markers, which can be chosen in the parameter file.

In [None]:
ep_path = os.path.join(kinetic_path, 'energetic_ions')

os.listdir(ep_path)

In [None]:
f_path = os.path.join(ep_path, 'distribution_function')
orbits_path = os.path.join(ep_path, 'orbits')

print(os.listdir(f_path))
print(os.listdir(orbits_path))
print(len(os.listdir(orbits_path)))

In [None]:
print(os.listdir(os.path.join(f_path, 'v1')))
print(os.listdir(os.path.join(f_path, 'e1_v1')))

Under `distribution_function` we find two folders, one for each binning of $f$ during the simulation. The binning directions have been defined in the parameter file before the run. You can do several binnings during one simulation. For an *n*-dimensional binning in phase space, there are *n+1* `.npy` files created in the respective folder: one file `f_binned.npy` with the binned distribution function, and *n* files with with the corresponding 1d phase space grids in each direction. The naming convention is as follows:

1. Suppose we want 1-dimensional binning in the direction `v1` in velocity space. The folder is then called `v1`, the binned distribution function is saved under `v1/f_binned.npy` and the velocity binning grid is saved under `v1/grid_v1.npy`. The same could be done for instance in position space along the coordinate `e2`, which would lead to files `e2/f_binned.npy` and `e2/grid_e2.npy`, respectively.

2. Suppose we want to do 2-dimensional binning in the $(e_1,v_1)$ subspace of the phase space. In this case the folder is called `e1_v1`, the distribution function is saved under `e1_v1/f_binned.npy`, the $e_1$-grid is under `e1_v1/grid_e1.npy` and the $v_1$-grid is under `e1_v1/grid_v1.npy`. 

3. The kind of binnings are defined in the parameter file under `kinetic/<species_name>/save_data/f/slices`. There you can define a list where each string entry defines one binning. For example, `['v1', 'e1_v1', 'e1_e2', 'v1_v2_v3']` would define four binnings in different subspaces of the phase space.

Now let's see how the marker `orbits` are stored in Struphy. For each time step *n* (including *n=0*) there is one `.txt` file of the name `<species_name>_n.txt`. This file holds the positions **in physical space** of the designated markers (four in this example).

In [None]:
with open (os.path.join(orbits_path, 'energetic_ions_01.txt')) as file:
    orbit04_str = file.read()
    
orbit04 = orbit04_str.split('\n')
orbit04

Let us now plot the binned distribution function. In this example, two binnings have been performed during the simulation, namely `f_v1.npy` and `f_e1_v1.npy`. We shall perform the 1d plots first:

In [None]:
grid_v1 = np.load(os.path.join(f_path, 'v1/', 'grid_v1.npy'))
f_binned = np.load(os.path.join(f_path, 'v1/', 'f_binned.npy'))

print(grid_v1.shape)
print(f_binned.shape)

As we can see, the first index in the binning data `f_v1.npy` denotes the time index, whereas the second index is the grid index. Let us plot the data at four different instances in time:

In [None]:
from matplotlib import pyplot as plt

plt.figure(figsize=(12, 12))

steps = [0, 1, 2, -1]
for n, step in enumerate(steps):
    plt.subplot(2, 2, n + 1)
    plt.plot(grid_v1, f_binned[step], label=f'time = {t_grid[step]}')
    plt.xlabel('v1')
    plt.ylabel('f(v1)')
    plt.legend()

The 2d plots are working in an analogous fashion. Here, we use pyplot's `pcolor` to display 2d data:

In [None]:
grid_e1 = np.load(os.path.join(f_path, 'e1_v1/', 'grid_e1.npy'))
grid_v1 = np.load(os.path.join(f_path, 'e1_v1/', 'grid_v1.npy'))
f_binned = np.load(os.path.join(f_path, 'e1_v1/', 'f_binned.npy'))

print(grid_e1.shape)
print(grid_v1.shape)
print(f_binned.shape)

In [None]:
plt.figure(figsize=(12, 12))

steps = [0, 1, 2, -1]
for n, step in enumerate(steps):
    plt.subplot(2, 2, n + 1)
    plt.pcolor(grid_e1, grid_v1, f_binned[step].T, label=f'time = {t_grid[step]}')
    plt.xlabel('e1')
    plt.ylabel('v1')
    plt.title('f(e1, v1)')
    plt.legend()

Last but not least we look at some `fields_data`. Since this data is stored in binary format, we use `pickle` for loading. We see that `grids_phy.bin` leads to a list of length three, each entry corresponding to a meshgrid of the respective direction. 

By contrast, `b2_phy.bin` leads to a dictionary with *n+1* keys, where *n* is the number of time steps saved during the simulation. The keys are the actual time (not the index!) of the time step and the values are lists holding the three components of the magnetic field in physical space (pushed forward 2-form). We shall plot the *z*-component of the *B*-field as a function of *x*, at eight dfferent instances in time:

In [None]:
import pickle

with open(os.path.join(fluid_path, 'grids_phy.bin'), 'rb') as file:
    x_grid, y_grid, z_grid = pickle.load(file)
    
with open(os.path.join(fluid_path, 'em_fields', 'b2_phy.bin'), 'rb') as file:
    b2 = pickle.load(file)
    
print(type(x_grid))
print(type(b2))
print(x_grid.shape)
print(len(b2))

In [None]:
for key, val in b2.items():
    print(key)
    print(type(val))
    for va in val:
        print(va.shape)

In [None]:
plt.figure(figsize=(12, 12))

steps = [0, 1, 2, 3, 4, 5, 6, -1]
for n, step in enumerate(steps):
    t = t_grid[step]
    plt.subplot(4, 2, n + 1)
    plt.plot(x_grid[:, 0, 0], b2[t][2][:, 0, 0], label=f'time = {t}')
    plt.xlabel('x')
    plt.ylabel('$B_z$(x)')
    plt.legend()