GPU DF-CC plugin in PSI4
This plugin to Psi4 performs GPU-accelerated density-fitted (DF) singles and doubles coupled cluster (CCSD) computations. The perturbative triples contribution to the correlation energy (T) is also implemented, but the performace of (T) in the present plugin is not much better than that of the DF-CCSD(T)[3,4] implementation in the current release of Psi4. For additional information regarding the performance of other GPU-accelerated coupled-cluster algorithms, see Refs. 5 and 6.
To run the psi4 plugin gpu_dfcc:
Download and install psi4public from github.com: https://github.com/psi4/psi4public. You can obtain the source using git:
git clone email@example.com:psi4/psi4public.git
Install psi4 as described on http://www.psicode.org/.
Configure gpu_dfcc by editing the do-configure file in the gpu_dfcc/ directory. Specify the location of your cublas library and run the configure script:
Make sure that your LD_LIBRARY_PATH contains the location of your cublas library.
Compile the plugin:
Run the test in this directory:
psi4 input.dat -n 2
Note that plugin gpu_dfcc requires psi4 be run with at least two threads. In general, the code requires that you use one more thread than the number of GPUs on your system.
the number of GPUs on your system. The code will automatically determine the resources available, so this keyword is useful if you want to use fewer GPUs than are available. This may be the case if you have a GPU to drive your monitor in addition to a compute-oriented card.
the specific gpu's you would like to use. If this is not specified the code will automatically find the gpu(s) with the highest memory up to the number of gpu(s) specified with NUM_GPUS. Ex. active_gpus '1-3' This will pick the 1st and 3rd gpu in the system but not the 0th or 2nd.
the maximum amount of pinned CPU memory. This code will pin an amount of CPU memory that is equivalent to the amount of GPU memory, up to this maximum value. The value is given in mb.
do time each CC diagram?
energy convergence for the CCSD energy.
amplitude convergence for the CCSD energy.
the maximum number of CCSD iterations.
the maximum number of DIIS vectors stored on disk.
do truncate the virtual space using MP2 natural orbitals?
occupation tolerance for neglecting MP2 natural virtual orbitals.
auxiliary basis set for DF-CCSD(T).
tolerance for Cholesky decomposition of the ERI tensor (only used if DF_BASIS_CC=cholesky or SCF_TYPE=cd).
The program tends to exit with an innocuous segfault.
Compilation with cuda 6.5 fails. For now, we recommend cuda 5.5, 8.0 or 9.0
 J. M. Turney, A. C. Simmonett, R. M. Parrish, E. G. Hohenstein, F. A. Evangelista, J. T. Fermann, B. J. Mintz, L. A. Burns, J. J. Wilke, M. L. Abrams, N. J. Russ, M. L. Leininger, C. L. Janssen, E. T. Seidl, W. D. Allen, H. F. Schaefer, R. A. King, E. F. Valeev, C. D. Sherrill, and T. D. Crawford, WIREs: Comp. Molec. Sci. 2, 556 (2012). "Psi4: an open-source ab initio electronic structure program"
 A. E. DePrince III, M. R. Kennedy, B. G. Sumpter, and C. D. Sherrill, Mol. Phys. 112, 844 (2014). "Density-fitted singles and doubles coupled cluster on graphics processing units"
 A. E. DePrince III and C. D. Sherrill, J. Chem. Theory Comput. 9, 2687 (2013). "Accuracy and efficiency of coupled-cluster theory using density fitting / Cholesky decomposition, frozen natural orbitals, and a t1-transformed Hamiltonian"
 A. E. DePrince III and C. David Sherrill, J. Chem. Theory Comput. 9, 293 (2013). "Accurate noncovalent interaction energies using truncated basis sets based on frozen natural orbitals"
 A. E. DePrince III and J. R. Hammond, J. Chem. Theory Comput. 7, 1287 (2011). "Coupled cluster theory on graphics processing units I: The coupled cluster doubles method"
 A. E. DePrince III and J. R. Hammond, 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC), 131-140 (2011). ``Quantum chemical many-body theory on heterogeneous nodes''