# 00 - PADMe - Basic Usage

Before anything, make sure that the conda environment is installed. Using the terminal, once you are at the root directory of this project, just type:

`make create_environment`

and then:

`make requirements`

If no errors were found, we are good to go.

Welcome to the very first Jupyter Notebook using PADMe for Dynamic Mode Decomposition processing. We start by importing the necesssary packages:

In [1]:
%load_ext autoreload
%autoreload 2

import os
from pathlib import Path 

from natsort import natsorted # Required for sorting filename correctly
from dotenv import find_dotenv # Required for locating project's root dir

from src import DMD # DMD class for ingesting and processing data
from src import snapshots_assembly # Function for assembling the Snapshots Matrix
from src import (
    load_notebook_params, # Function that loads predefined examples and paths
    generate_paths, # Function that creates the paths to datasets
    download_dataset, # Function that downloads the dataset from Kaggle
    unpack_kaggle_dataset, # Function that unzips the downloaded dataset
)

from src import PostProcessingDMD # Class used for postprocessing and dataviz

root_dir = os.path.dirname(find_dotenv())

Let's see if the `root_dir` is correctly set up. This is required for properly pointing directions to the locations where data will be stored. Make sure `root_dir` points to the root directory of this project.

In [2]:
root_dir

'/home/bombra/library/padme'

Now let's get all notebook params. This function reads from a `.yaml` file in the `notebooks/` directory that contains example file names and paths.

In [3]:
notebook_params = load_notebook_params()

Before downloading dataset, you need to create an account on Kaggle. Instructions for using the Kaggle API can be seen at sections **Installation** and **Authentication** on https://www.kaggle.com/docs/api. 


Once your `~/.kaggle/kaggle.json` file is setup, we can download the datasets used in this example.

For this notebook, we will used the `covid-19-spread-in-lombardy-italy` dataset. To download this dataset, the `example_dataset` variable can be either `sird` or `seird`. This dataset was generated when modeled the COVID-19 spread in the Lombardy region (Italy) using PDE-based compartmental models spatially discretized using the Finite Element Method. 
  
This zip file contains 991MB and contains two versions of the simulation. 
The datasets are:
- `covid-sird-lombardy-freefem` where a SIRD model was approximated and modeled using the FreeFEM++ library. More details can be seen on: `citar artigo do Alex aqui`
- `covid-seird-lombardy-libmesh` where a SEIRD model was approximated and modeled using the libMesh library. More details can be seen on: `citar artigo da Malu aqui`

In [10]:
example_dataset = "sird"

dict_paths = generate_paths(example_dataset, root_dir, notebook_params)
print(dict_paths)

download_dataset(dict_paths, example_dataset, notebook_params, FORCE_DOWNLOAD=False)
unpack_kaggle_dataset(dict_paths)

{'complete_filepath': PosixPath('/home/bombra/library/padme/data/00_raw'), 'complete_filename': PosixPath('/home/bombra/library/padme/data/00_raw/covid-19-spread-in-lombardy-italy.zip'), 'snapshots_filepath': PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem')}


If the previous cell was correctly executed, there is a zipfile in your `padme/data/00_raw/` folder and two directories, containing each one of the datasets described. They are in their raw form, that is, exactly in the form where the numerical simulations output them.

This is one of the most annoying parts of DMD architectures and **PADMe** aims to automatize this procedure. If you explore the two generated folders, you will see that the `susceptible` snapshots on the `covid-sird-lombardy-freefem` dataset are of the form:
- `covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/4susceptible.vtk`
- `covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/8susceptible.vtk`
- `covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/12susceptible.vtk`

while the `susceptible` snapshots on the `covid-seird-lombardy-libmesh` dataset are of the form:
- `covid-seird-lombardy-libmesh/covid-seird-lombardy-libmesh/step0/out_1_000_00000.h5`
- `covid-seird-lombardy-libmesh/covid-seird-lombardy-libmesh/step0/out_1_000_00001.h5`
- `covid-seird-lombardy-libmesh/covid-seird-lombardy-libmesh/step0/out_1_000_00002.h5`
being `s` the dataset inside the `.h5` files containing the susceptibles data.

This difference in structure delays the application of DMD to multiple datasets and usually requires attention and time of the engineer. Using **PADMe**, we need to create a list of strings (or Path) containing the snapshots absolute filename. The following cell does that for the first case.

In [14]:
os_walk_files = next(os.walk(dict_paths["snapshots_filepath"]))
files = natsorted(os_walk_files[2])

filenames = [
    dict_paths["snapshots_filepath"] / Path(f"{str(i)}infected.vtk")
    for i, _ in zip(range(4, 480, 4), files)
]

Let's check if the five first and last filenames are correct: 

In [15]:
filenames[:5]

[PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/4infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/8infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/12infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/16infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/20infected.vtk')]

In [16]:
filenames[-5:]

[PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/460infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/464infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/468infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/472infected.vtk'),
 PosixPath('/home/bombra/library/padme/data/00_raw/covid-sird-lombardy-freefem/covid-sird-lombardy-freefem/476infected.vtk')]

In [17]:
snapshot_ingestion_parameters = {
    "filenames": filenames,
    "dataset": "s",
}

In [None]:
dataset = snapshots_assembly(
    "h5_libmesh", snapshot_ingestion_parameters=snapshot_ingestion_parameters
)

In [None]:
dataset.shape

In [None]:
example_dataset = "seird"

dict_paths = generate_paths(example_dataset, root_dir, notebook_params)
dict_paths

download_dataset(dict_paths, example_dataset, notebook_params, FORCE_DOWNLOAD=False)
unpack_kaggle_dataset(dict_paths)

In [11]:
os_walk_files = next(os.walk(dict_paths["snapshots_filepath"]))
folders = natsorted(os_walk_files[1])

filenames = [
    dict_paths["snapshots_filepath"]
    / Path(snapshot_folder)
    / Path(f"out_1_000_{str(i).zfill(5)}.h5")
    for i, snapshot_folder in enumerate(folders)
]

In [None]:
filenames[:5]

In [None]:
filenames[-5:]

In [None]:
snapshot_ingestion_parameters = {
    "filenames": filenames,
    "starting_line": 125939,
    "ending_line": 210239,
}

In [None]:
# dataset = snapshots_assembly(
#     "vtk_freefem", snapshot_ingestion_parameters=snapshot_ingestion_parameters
# )

In [None]:
dmd_parameters = {
    "factorization_algorithm": "randomized_svd",
    "basis_vectors": 50,
    "randomized_svd_parameters": {
        "power_iterations": 1,
        "oversampling": 20,
    },
    "starting_step": 20,
    "dt_simulation": 0.05,
}

In [None]:
dmd = DMD(dataset, dmd_parameters)

In [None]:
dmd.factorization()

In [None]:
dmd.dmd_approximation["s"]

In [None]:
dmd.dmd_core()

In [None]:
import numpy as np


def compute_frobenius_norm(mat_a, mat_b):
    return np.linalg.norm(mat_a - mat_b) / np.linalg.norm(mat_a)

In [None]:
compute_frobenius_norm(dmd.snapshots_matrix, dmd.dmd_approximation["dmd_matrix"])

In [None]:
dmd.dmd_approximation.keys()

In [None]:
dmd_visualizer = PostProcessingDMD(dmd.dmd_approximation)

In [None]:
dmd.dmd_approximation["s"].shape

In [None]:
dmd_visualizer.plot_singular_values("plotly")

In [None]:
dmd.dmd_approximation["eigenvals_original"]

In [None]:
dmd_visualizer.plot_eigenvalues("matplotlib")

In [None]:
dmd.dmd_approximation.keys()

In [None]:
dmd_visualizer.compute_temporal_l2_norm("plotly")