-
Notifications
You must be signed in to change notification settings - Fork 49
Description
import cupy as cp, numpy as np
from pyscf import gto, dft
from gpu4pyscf import dft as gdft
cp.random.seed(12345)
mol = gto.Mole()
mol.atom = '''
O 0.000000 0.000000 0.000000
H 0.758602 0.000000 0.504284
H -0.758602 0.000000 0.504284
'''
mol.basis = 'sto-3g'
mol.build()
print("\n=== CPU SCF + TDDFT ===")
mf_cpu = dft.RKS(mol).set(xc='lda')
mf_cpu.kernel()
td_cpu = mf_cpu.TDDFT()
td_cpu.nstates = 3
td_cpu.kernel()
print("Excited state energies (CPU):", td_cpu.e)
print("\n=== GPU SCF + TDDFT ===")
mf_gpu = gdft.RKS(mol).set(xc='lda')
mf_gpu.kernel()
td_gpu = mf_gpu.TDDFT()
td_gpu.nstates = 3
td_gpu.kernel()
print("Excited state energies (GPU):", td_gpu.e)
I used the above code to test the calculation of excited state energies on GPU vs. CPU and each time GPU results deviated and were non-deterministic. This is not same for the ground state SCF, as shown below.
Run 1:
=== CPU SCF + TDDFT ===
converged SCF energy = -74.0345163178308
Excited State energies (eV)
[12.21307739 13.672496 15.5405123 ]
Excited state energies (CPU): [0.44882232 0.50245496 0.5711033 ]=== GPU SCF + TDDFT ===
converged SCF energy = -74.0345163178773
Excited State energies (eV)
[12.63043364 26.57117295 32.13114449]
Excited state energies (GPU): [0.46415988 0.9764726 1.18079779]
Run 2:
=== CPU SCF + TDDFT ===
converged SCF energy = -74.0345163178308
Excited State energies (eV)
[12.21307739 13.672496 15.5405123 ]
Excited state energies (CPU): [0.44882232 0.50245496 0.5711033 ]=== GPU SCF + TDDFT (with CPU guess) ===
converged SCF energy = -74.0345163178773
TD-SCF states [0, 1, 2] not converged.
Excited State energies (eV)
[14.4972545 15.50424422 16.52782828]
Excited state energies (GPU): [0.53276428 0.56977047 0.60738649]
This behavior is not dependent on xc chosen, basis set or solvation, but I believe (after some documentation searching) it probably is a difference in the way the TD effects are calculated on GPU and CPU - including fused multiply-add (FMA) and perhaps other code based differences.
I would really appreciate some more understanding and advice about fixes.
Thank you for your help.