🔴
Installation guide | Example DFT Computations | Generating data | Training SchNet | QM1B dataset
PySCF-IPU is built on top of the PySCF package, porting some of the PySCF algorithms to the Graphcore IPU.
Only a small portion of PySCF is currently ported, specifically Restricted Kohn Sham DFT (based on RKS, KohnShamDFT and hf.RHF).
The package is under active development, to broaden its scope and applicability. Current limitations are:
- Number of atomic orbitals less than 70
mol.nao_nr() <= 70
. - Larger numerical errors due to
np.float32
instead ofnp.float64
. - Limited support for
jax.grad(.)
To generate datasets based on the paper Repurposing Density Functional Theory to Suit Deep Learning Link PDF presented at the Syns & ML Workshop, ICML 2023, the entry point is the notebook DFT Dataset Generation, and the file density_functional_theory.py.
We also provide a lightweight implementation of the SCF algorithm, optimized for readability and hackability, in the nanoDFT demo notebook and in nanodft folder.
Additional notebooks in notebooks demonstrate other aspects of the computation.
PySCF on IPU requires Python 3.8, JAX IPU experimental, TessellateIPU library and Graphcore Poplar SDK 3.2.
We recommend upgrading pip
to the latest stable release to prepare your environment.
pip install -U pip
This project is currently under active development.
For CPU simulations, we recommend installing pyscf-ipu
from latest main
branch as:
pip install pyscf-ipu[cpu]@git+https://github.com/graphcore-research/pyscf-ipu
and on IPU equipped machines:
pip install pyscf-ipu[ipu]@git+https://github.com/graphcore-research/pyscf-ipu
The following commands may be useful to check the installation. Each command runs a test-case which compares PySCF against our DFT computation using different options.
python density_functional_theory.py -methane -backend cpu # defaults to float64 as used in PySCF
python density_functional_theory.py -methane -backend cpu -float32
python density_functional_theory.py -methane -backend ipu -float32
This will automatically compare our DFT against PySCF for methane CH4
and report numerical errors.
This section contains an example on how to generate a DFT dataset based on GDB. This is not needed if you just want to train on the QM1B dataset (to be released soon).
Download the gdb11.tgz
file from https://zenodo.org/record/5172018 and extract its content in gdb/
directory:
wget -p -O ./gdb/gdb11.tgz https://zenodo.org/record/5172018/files/gdb11.tgz\?download\=1
tar -xvf ./gdb/gdb11.tgz --directory ./gdb/
To utilize caching you need to sort the SMILES strings by the number of hydrogens RDKit adds to them. This means molecule i
and i+1
in most cases have the same number of hydrogens which allows our code to reuse/cache the computational graph for DFT. This can be done by running the following Python script:
python ./gdb/sortgdb.py ./gdb/gdb11_size09.smi
You can then start generating (locally on CPU) a dataset using the following command:
python density_functional_theory.py -generate -save -fname dataset_name -level 0 -plevel 0 -gdb 9 -backend cpu -float32
You can speed up the generation by using IPUs. Please try the DFT dataset generation notebook
Training SchNet on QM1B
We used PySCF on IPU to generate the QM1B dataset with one billion training examples (to be released soon). See Training SchNet on QM1B for an example implementation of a neural network trained on this dataset.
Copyright (c) 2023 Graphcore Ltd. The project is licensed under the Apache License 2.0, with the exception of the folders electron_repulsion/
and exchange_correlation/
.
The library is built on top of the following main dependencies:
Component | Description | License |
---|---|---|
pyscf | Python-based Simulations of Chemistry Framework | Apache License 2.0 |
libcint | Open source library for analytical Gaussian integrals | BSD 2-Clause “Simplified” License |
xcauto | Arbitrary order exchange-correlation functional derivatives | MPL-2.0 license |
Please use the following citation for the pyscf-ipu project:
@inproceedings{mathiasen2023qm1b,
title={Generating QM1B with PySCF $ \_ $\{$$\backslash$text $\{$IPU$\}$$\}$ $},
author={Mathiasen, Alexander and Helal, Hatem and Klaeser, Kerstin and Balanca, Paul and Dean, Josef and Luschi, Carlo and Beaini, Dominique and Fitzgibbon, Andrew William and Masters, Dominic},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}