GPU-accelerated coupled cluster with density fitting
Cuda C++ Python CMake C Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cmake
tests
.gitignore
CMakeLists.txt
GPUtil.py
LICENSE
README.md
__init__.py.in
blas.h
blas_mangle.h
ccsd.cu
ccsd.h
dfcc_fnocc.h
do-configure
doc.rst
extras.py
frozen_natural_orbitals.h
gpu_dfcc.cu
gpuhelper.cu
gpuhelper.h
gpuhelper_driver.cc
gpuonly.h
input.dat
pymodule.py
triples.cu removes sprintf that was generating a warning Jan 7, 2018

README.md

GPU DF-CC plugin in PSI4

OVERVIEW

This plugin to Psi4[1] performs GPU-accelerated density-fitted (DF) singles and doubles coupled cluster (CCSD)[2] computations. The perturbative triples contribution to the correlation energy (T) is also implemented, but the performace of (T) in the present plugin is not much better than that of the DF-CCSD(T)[3,4] implementation in the current release of Psi4. For additional information regarding the performance of other GPU-accelerated coupled-cluster algorithms, see Refs. 5 and 6.

INSTALLATION

To run the psi4 plugin gpu_dfcc:

  • Download and install psi4public from github.com: https://github.com/psi4/psi4public. You can obtain the source using git:

    git clone git@github.com:psi4/psi4public.git

    Install psi4 as described on http://www.psicode.org/.

  • Configure gpu_dfcc by editing the do-configure file in the gpu_dfcc/ directory. Specify the location of your cublas library and run the configure script:

    ./do-configure

    Make sure that your LD_LIBRARY_PATH contains the location of your cublas library.

  • Compile the plugin:

    make

  • Run the test in this directory:

    psi4 input.dat -n 2

    Note that plugin gpu_dfcc requires psi4 be run with at least two threads. In general, the code requires that you use one more thread than the number of GPUs on your system.

INPUT OPTIONS

  • NUM_GPUS (int):

    the number of GPUs on your system. The code will automatically determine the resources available, so this keyword is useful if you want to use fewer GPUs than are available. This may be the case if you have a GPU to drive your monitor in addition to a compute-oriented card.

  • ACTIVE_GPUS (string):

    the specific gpu's you would like to use. If this is not specified the code will automatically find the gpu(s) with the highest memory up to the number of gpu(s) specified with NUM_GPUS. Ex. active_gpus '1-3' This will pick the 1st and 3rd gpu in the system but not the 0th or 2nd.

  • MAX_MAPPED_MEMORY (int):

    the maximum amount of pinned CPU memory. This code will pin an amount of CPU memory that is equivalent to the amount of GPU memory, up to this maximum value. The value is given in mb.

  • CC_TIMINGS (bool):

    do time each CC diagram?

  • E_CONVERGENCE (double):

    energy convergence for the CCSD energy.

  • R_CONVERGENCE (double):

    amplitude convergence for the CCSD energy.

  • MAXITER (int):

    the maximum number of CCSD iterations.

  • DIIS_MAX_VECS (int):

    the maximum number of DIIS vectors stored on disk.

  • NAT_ORBS (bool):

    do truncate the virtual space using MP2 natural orbitals?

  • OCC_TOLERANCE (double):

    occupation tolerance for neglecting MP2 natural virtual orbitals.

  • DF_BASIS_CC (str):

    auxiliary basis set for DF-CCSD(T).

  • CHOLESKY_TOLERANCE (double):

    tolerance for Cholesky decomposition of the ERI tensor (only used if DF_BASIS_CC=cholesky or SCF_TYPE=cd).

KNOWN ISSUES

  • The program tends to exit with an innocuous segfault.

  • Compilation with cuda 6.5 fails. For now, we recommend cuda 5.5, 8.0 or 9.0

REFERENCES

[1] J. M. Turney, A. C. Simmonett, R. M. Parrish, E. G. Hohenstein, F. A. Evangelista, J. T. Fermann, B. J. Mintz, L. A. Burns, J. J. Wilke, M. L. Abrams, N. J. Russ, M. L. Leininger, C. L. Janssen, E. T. Seidl, W. D. Allen, H. F. Schaefer, R. A. King, E. F. Valeev, C. D. Sherrill, and T. D. Crawford, WIREs: Comp. Molec. Sci. 2, 556 (2012). "Psi4: an open-source ab initio electronic structure program"

[2] A. E. DePrince III, M. R. Kennedy, B. G. Sumpter, and C. D. Sherrill, Mol. Phys. 112, 844 (2014). "Density-fitted singles and doubles coupled cluster on graphics processing units"

[3] A. E. DePrince III and C. D. Sherrill, J. Chem. Theory Comput. 9, 2687 (2013). "Accuracy and efficiency of coupled-cluster theory using density fitting / Cholesky decomposition, frozen natural orbitals, and a t1-transformed Hamiltonian"

[4] A. E. DePrince III and C. David Sherrill, J. Chem. Theory Comput. 9, 293 (2013). "Accurate noncovalent interaction energies using truncated basis sets based on frozen natural orbitals"

[5] A. E. DePrince III and J. R. Hammond, J. Chem. Theory Comput. 7, 1287 (2011). "Coupled cluster theory on graphics processing units I: The coupled cluster doubles method"

[6] A. E. DePrince III and J. R. Hammond, 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC), 131-140 (2011). ``Quantum chemical many-body theory on heterogeneous nodes''