## Quantum Deep Field: Data-Driven Wave Function, Electron Density Generation, and Atomization Energy Prediction and Extrapolation with Machine Learning

Deep neural networks (DNNs) have been used to successfully predict molecular properties calculated based on the Kohn-Sham density functional theory (KS-DFT). Although this prediction is fast and accurate, we believe that a DNN model for KS-DFT must not only predict the properties but also provide the electron density of a molecule. This Letter presents the quantum deep field (QDF), which provides the electron density with an unsupervised but end-to-end physics-informed modeling by learning the atomization energy on a large-scale dataset. QDF performed well at atomization energy prediction, generated valid electron density, and demonstrated extrapolation.

QDF is a machine learning model that provides the electron density ρ of molecules by learning the atomization energy E of molecules on a large dataset (e.g., the QM9 dataset [1]). The QDF model involves a linear component (i.e., the linear combination of atomic orbitals, LCAO [2]) and two nonlinear components (i.e., the energy functional and the Hohenberg-Kohn map [3]), in which the latter two are implemented by deep neural network (DNN) (see the above figure). In particular, the DNN-based Hohenberg-Kohn map serves as a physical, external potential constraint on ψ (i.e., the Kohn-Sham molecular orbitals) in learning the energy functional E = F[ψ] based on the density functional theory. 

<b>Characteristics</b>

* This implementation is easy to use and understand for beginners.
* You can train a QDF model with the properties (e.g., the atomization energy, homo, and lumo) in the QM9 dataset by running only two commands.
* You can predict the properties of new molecules using the pre-trained QDF models, which are already provided in this repository, by running only two commands.
* You can train a QDF model with your dataset and predict the property of your molecules using your pre-trained QDF model.

Link to paper: https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.125.206401

Credit: https://github.com/masashitsubaki/QuantumDeepField_molecule

Google Colab: https://colab.research.google.com/drive/1alOQI6HLD_sCgS9UFYreWeodZoUqRyCr?usp=sharing

In [2]:
# Clone the repository and cd into directory
!git clone https://github.com/masashitsubaki/QuantumDeepField_molecule.git
%cd QuantumDeepField_molecule

Cloning into 'QuantumDeepField_molecule'...
remote: Enumerating objects: 166, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 166 (delta 0), reused 2 (delta 0), pack-reused 161[K
Receiving objects: 100% (166/166), 155.48 MiB | 31.79 MiB/s, done.
Resolving deltas: 100% (65/65), done.
/content/QuantumDeepField_molecule


In [None]:
# Install requirements / dependencies
!pip install mayavi PyQt5

In [3]:
# Change into dataset folder and examine files
%cd /content/QuantumDeepField_molecule/dataset
!ls

/content/QuantumDeepField_molecule/dataset
QM9full_atomizationenergy_eV.zip	 QM9under7atoms_atomizationenergy_eV
QM9full_homolumo_eV.zip			 QM9under7atoms_homolumo_eV
QM9over15atoms_atomizationenergy_eV.zip  yourdataset_property_unit
QM9under14atoms_atomizationenergy_eV


In [None]:
# Unzip dataset files
!unzip QM9full_atomizationenergy_eV.zip
!unzip QM9full_homolumo_eV.zip
!unzip QM9over15atoms_atomizationenergy_eV.zip
%cd ..

### Train a QDF model with the QM9 dataset

With a dataset in the dataset directory, you can train a QDF model by running the commands as follows:

In [6]:
%cd /content/QuantumDeepField_molecule/train
!bash preprocess.sh
!bash train.sh

/content/QuantumDeepField_molecule/train
Preprocess QM9under7atoms_atomizationenergy_eV dataset.
The preprocessed dataset is saved in ../dataset/QM9under7atoms_atomizationenergy_eV/ directory.
If the dataset size is large, it takes a long time and consume storage.
Wait for a while...
--------------------------------------------------
Training dataset...
10％ has finished.
50％ has finished.
90％ has finished.
--------------------------------------------------
Validation dataset...
10％ has finished.
50％ has finished.
--------------------------------------------------
Test dataset...
10％ has finished.
50％ has finished.
--------------------------------------------------
The preprocess has finished.
The code uses a GPU.
--------------------------------------------------
# of training samples:  34
# of validation samples:  4
# of test samples:  5
--------------------------------------------------
Set a QDF model.
# of model parameters: 392562
--------------------------------------------------


### Predict a property of molecules using the pre-trained QDF model

Actually, we have already trained some QDF models and provide them in the <code>pretrained_model</code>
directory using a pre-trained QDF model, you can predict the property of new molecules by running the commands as follows:

In [7]:
%cd /content/QuantumDeepField_molecule/predict
!bash preprocess.sh
!bash predict.sh

/content/QuantumDeepField_molecule/predict
Preprocess QM9over15atoms_atomizationenergy_eV dataset.
The preprocessed dataset is saved in ../dataset/QM9over15atoms_atomizationenergy_eV/ directory.
If the dataset size is large, it takes a long time and consume storage.
Wait for a while...
--------------------------------------------------
10％ has finished.
50％ has finished.
90％ has finished.
--------------------------------------------------
The preprocess has finished.
Start predicting for QM9over15atoms_atomizationenergy_eV dataset.
using the pretrained model with QM9under14atoms_atomizationenergy_eV dataset.
The prediction result is saved in the output directory.
Wait for a while...
The prediction will finish in about 3 hours 49 minutes.
MAE: 0.12999092042446136
The prediction has finished.
