# Tutorial 0: Overview of the OF-DFT codebase of the SCIAI-Lab

### What our codebase is all about 
This tutorial is designed to offer an introduction to our large and in some places quite complicated codebase. Our codebase serves the goal to enable machine-learned orbital free density function theory (OF-DFT). The idea is summarized well in the following figure (see https://pubs.acs.org/doi/full/10.1021/jacs.5c06219 for details):

<img src="tutorial_fig1.png" alt="Figure from https://pubs.acs.org/doi/full/10.1021/jacs.5c06219" style="width: 90%">

In very simple words: we take the molecules from different public dataset (e.g. QM9 or QMugs) and compute the energy that these molecule have for different electron densities (= disribution of the electrons around the atom nuclei). For each molecule, the densities are described by a linear superposition of atom-cendered basis functions (functions of 3D space localized around the different atoms in the molecule). 

From the computation of the energies we also get gradients that tell us how to change the density in order to decrease the energy. Especially on large molecules, these computations, traditionally done with the so called Kohn-Sham DFT, are very expensive. Therefore, our goal is to train a neural network that can reproduce the energie and gradients for the electron densities in our datasets (trained on KS-DFT data) and hopefully generalizes to larger molecules.

Our trained neural network can be used to follow the gradients to lower and lower energies to finally obtain a good estimate of the ground state electron density. We call this process density optimization. 

For a broader overview into what OF-DFT is and what we as a group are doing consider watching the following lecture video [part 1](https://www.youtube.com/watch?v=CoZUTMjU8C8) and [part 2](https://www.youtube.com/watch?v=iyx1C4vaP7k). 

### What this tutorial covers
This tutorial covers a broad range of topics from the loading and handling of the molecule dataset (which contains samples that combine an electron density with a target energy and target energy gradient) over visualizing molecules and electron densities all the way to understanding how our machine learning model is trained and can be used for density optimization. 

In summary, this tutorial will guide you through the following topics:
1) [**datamodule**](./tutorial_1_datamodule.ipynb): Understanding the following classes that handle the loading and processing of our data: OFDataset, OFData, OFBatch, OFLoader, OFDataModule, BasisInfo

2) [**visualization**](./tutorial_2_visualization.ipynb): Demonstration of how to visualize molecules and electron densities in 3D via molview.org and via Pyvista

3) [**transforms**](./tutorial_3_transforms.ipynb): Demystifying the MasterTransformation class that handles the transformation of data samples before they can be passed to the model for training (including a visualization of the basis transforms). Keywords: Global symmetric natrep, local frames, gradient projection.

4) [**hydra & omegaconf**](./tutorial_4_hydra_omegaconf.ipynb): Understanding how configs are managed with hydra and omegaconf, including hydra overrides, omegaconf syntax and omegaconf resolver.

5) [**mldftlitmodule**](./tutorial_5_mldftlitmodel.ipynb): Understaning how to train a model based on our PytorchLightning class MLDFTLitModule, including a closer look into the forward, training_step,  backpropagate and the usage of dataset statistics in our model. 

6) [**density optimization**](./tutorial_6_density_optimization.ipynb): Demonstration of how to load a trained model from a checkpoint and perform density optimization, including important plots to evaluate the density optimization process.  

Before starting the tutorial, please take a look at the [README](../../README.md) to set up your virtual environment.

Please execute the following cell to download two small sample datasets from our huggingface model repository (https://huggingface.co/sciai-lab/structures25/tree/main). One dataset contains QM9 molecule and one the larger QMugs molecules. 

In [None]:
import os

# download a small dataset from huggingface that contains QM9 and QMugs data
# https://huggingface.co/docs/datasets/cache#cache-directory
# The default cache directory is `~/.cache/huggingface/datasets`
# You can change it by setting this variable to any path you like
CACHE_DIR = None  # e.g. change it to "./hf_cache"


# https://huggingface.co/sciai-lab/structures25/tree/main
print("Downloading minimal dataset for QM9 and QMugs from huggingface...")
os.environ[
    "HF_HUB_DISABLE_PROGRESS_BARS"
] = "1"  # to avoid problems with the progress bar in some environments
from huggingface_hub import snapshot_download

data_path = snapshot_download(
    repo_id="sciai-lab/minimal_data_QM9_QMugs", cache_dir=CACHE_DIR, repo_type="dataset"
)

print(f"Successfully downloaded data to the following path {data_path}.")

If you want to train models properly beyond this tutorial (on the IWR servers), please set the environment variables `DFT_DATA` and `DFT_MODELS` in your `.bashrc` or `.zshrc` file:

In [None]:
# overall hint: to be able to go through this tutorial and execute all cells,
# you should be able to access our data and models folder
# and should have set the two environment variables DFT_DATA and DFT_MODELS:

print("DFT_DATA:", os.getenv("DFT_DATA"))  # set is to /export/scratch/ialgroup/dft_data
print(
    "DFT_MODELS:", os.getenv("DFT_MODELS")
)  # set this one to where you want to save your trained models