Inga Ulusoy, Computational modelling in python, SoSe2020 

# Quantum chemistry using python

You need to install a couple of programs and libraries, and the installation is not entirely trivial.

If you do not manage to install the libraries, and therefore cannot complete today's task, then please upload a screenshot or otherwise detail why it failed in your case and I will grade your attempt as completed. This will also help me preparing the material for future semesters.

In the following, the instructions for __Windows__ are detailed, for Mac/Unix please skip further down.

## Windows

If you experience issues during the install, the occasional reboot of the operating system works wonders.

For both programs that we will be using - PySCF and Psi4 - you will need the Windows Subsystem for Linux. You can download this directly in the App Store following the directions detailed here:

https://www.windowscentral.com/install-windows-subsystem-linux-windows-10

I recommend installing the Ubuntu subsystem. 

In case that you are wondering __how to access your linux files / how to access your windows files in between the subsystems__, you can find some pointers here: \
https://superuser.com/questions/1066261/how-to-access-windows-folders-from-bash-on-ubuntu-on-windows

Once you have this working, open an Ubuntu terminal. The first time it asks you to create a username and password.

Next you need to install some compilers and the current anaconda distribution into your Linux shell. Copy the below commands line by line and execute in the terminal. sudo means that you are executing a command with root (administrator) privileges and you will need to provide your password.

`sudo apt-get install cmake` \
`sudo apt-get install -y build-essential` \
`sudo apt-get install libblas-dev liblapack-dev`\
`wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh` \
`bash Anaconda3-2020.02-Linux-x86_64.sh` \
`echo 'export PATH="~/anaconda/bin:$PATH"' >> ~/.bashrc` \
`rm Anaconda3-2020.02-Linux-x86_64.sh` \
`source ~/.bashrc`\
`conda update conda`

Now you should have the basic libraries installed. We will start off with PySCF - https://sunqm.github.io/pyscf/ and https://www.researchgate.net/publication/339615992_Recent_developments_in_the_PySCF_program_package for an overview of available methods - which you can get using

`pip install pyscf`

We will also need the Berny solver for the geometry optimization:

`pip install -U pyberny`

You will now need to start the jupyter notebook from the Ubuntu terminal and not through your Windows anaconda install! 

`jupyter-notebook Problem15.ipynb`

Most likely, the jupyter window will not open up automatically but you need to copy and paste the link as printed in the terminal into the browser of your choice. See https://medium.com/@sayanghosh_49221/jupyter-notebook-in-windows-subsystem-for-linux-wsl-f075f7ec8691

## MacOS and Unix

The installation is straightforward. Execute in your terminal:

`pip install pyscf`\
`pip install -U pyberny`

In [None]:
from scipy.constants import physical_constants
au2eV = physical_constants['Hartree energy in eV'][0]
bohr2ang = physical_constants['Bohr radius'][0]*1e10

import pyscf
from pyscf import gto #basis sets
from pyscf import scf #electronic-structure methods

In [None]:
#build the molecule object
mol = gto.Mole()
mol.atom = '''O 0 0 0; H  0 1 0; H 0 0 1'''
mol.basis = '6-31G*'
mol.build()

In [None]:
#run a HF calculation
m = scf.RHF(mol)
print('E(HF) = %g' % m.kernel())

In [None]:
#Orbital energies, Mulliken population etc.
m.analyze()

In [None]:
#use different initial guesses - the default is this (the atomic density)
m.init_guess = '1e'
m.kernel()

In [None]:
#this initial guess is the Hueckel guess
m.init_guess = 'huckel'
EHF = m.kernel()

In [None]:
#an MP2 calculation can be run like this re-using the HF calculation
from pyscf import mp
mp2 = mp.MP2(m)
Ecorr = mp2.kernel()[0]
EMP2 = EHF + Ecorr
print('E(MP2) = {:3.8f}'.format(EMP2))


In [None]:
#number of atomic orbitals
nao = mol.nao_nr()
print(nao)

In [None]:
#number of molecular orbitals
print(mol.nbas) #MOs

In [None]:
#write molden files for visualization purposes
#note that pyscf is not always writing these files correctly (particularly, the orbitals)
from pyscf.tools import molden

In [None]:
with open('H2Omo.molden', 'w') as f1:
    molden.header(mol, f1)
    molden.orbital_coeff(mol, f1, m.mo_coeff, ene=m.mo_energy, occ=m.mo_occ)

In [None]:
#write cube files for visualization purposes
#note that pyscf is not always writing these files correctly (particularly, the orbitals)
from pyscf.tools import cubegen
cubegen.density(mol, 'h2o_den.cube', m.make_rdm1())

## Visualization

You may use orbkit or VMD for the visualization. Orbkit has the advantage that it provides another API and will run directly in your ipython notebook. The disadvantage is that the visualization requires mayavi and I have not had much success so far in getting mayavi working on python3 systems. You may try nevertheless, as orbkit is a useful program and can compute densities quite nicely, and I included some example code below, but at this point we will use VMD.

VMD can be downloaded from here:
https://www.ks.uiuc.edu/Research/vmd/

It is also not entirely intuitive to use, but is in fact very powerful. It can even used to generate movies and spectra, and is particularly popular in the Molecular Dynamics (MD) community - you can visualize large biomolecules very nicely.

The orbkit installation instructions can be found here:\
https://github.com/orbkit/orbkit

You also need mayavi which you can get through

`pip install mayavi` \
`conda install mayavi`

mayavi needs to be enabled in the jupyter notebooks

`jupyter nbextension install --py mayavi --user` \
`jupyter nbextension enable mayavi --user --py`

Here are some alternative ways of installing mayavi: \
https://docs.enthought.com/mayavi/mayavi/installation.html

## Orbkit - just as a reference if you are interested

In [None]:
from orbkit import atomic_populations as ap
from orbkit import read
from orbkit import analytical_integrals as ai

In [None]:
# Number of processes
numproc = 2

# Read data with ORBKIT
#provide a molden file in your directory
qc = read.main_read('LiH.molden',itype='molden',all_mo=True)

# Compute AO overlap matrix
#ao_overlap_matrix = ai.get_ao_overlap(qc.geo_spec,qc.geo_spec,qc.ao_spec,ao_spherical=qc.ao_spherical)
ao_overlap_matrix = ai.get_ao_overlap(qc.geo_spec,qc.geo_spec,qc.ao_spec)

# Compute MO overlap matrix
moom = ai.get_mo_overlap_matrix(qc.mo_spec,qc.mo_spec,ao_overlap_matrix,numproc=numproc)

# Compute electron number
# Diagonal of MO overlap matrix times MO occupation number
en = 0
for i in range(len(qc.mo_spec)):
    en += moom.diagonal()[i]*qc.mo_spec[i]['occ_num']
print('The analytical electron number is %.8f' % en)

# Calculate Mulliken charges
pop_mull = ap.mulliken(qc)
print('\nMulliken Charges')
for i in pop_mull['charge']:
    print('ORBKIT: %.8f' % i)

# Calculate Lowdin charges
pop_low = ap.lowdin(qc)
print('\nLowdin Charges')
for i in pop_low['charge']:
    print('ORBKIT: %.8f' % i)


In [None]:
from orbkit import options, main

In [None]:
options.filename = 'LiH.molden'
options.outputname = 'h2o'
options.otype = ['mayavi']
options.adjust_grid = [5,0.1]
options.calc_mo = 'homo-1:homo'

In [None]:
mo_list, mo_info = main.run_orbkit()

## An example: FeCO

FeCO is a complex metal compound that has two very close-lying spin states: The triplet ground state (called "LS" for low-spin hereafter) and a close-lying quintet state ("HS" for high-spin). We will now optimize the geometry of both LS and HS compound and compute energy differences between the LS and HS at each of the geometries. We will start off with Hartree-Fock. 

In [None]:
#Set up the input
feco = gto.M(atom='''
            Fe 0.00, 0.00, 0.00
            C  0.00, 0.00, 1.70
            O  0.00, 0.00, 2.90''', basis='cc-pVDZ')
feco.build()

In [None]:
#run the HF calculation to check the input
hf_feco = scf.UHF(feco).run()

In [None]:
#the default is singlet, check the spin state
feco.pack()

In [None]:
feco_3 = feco.copy()
feco_3.spin = 2 #this is 2S -> alpha_electrons - beta_electrons; we have two more alpha than beta

In [None]:
#hf_feco_3 = scf.UHF(feco_3).run()
hf_feco_3 = scf.UHF(feco_3).run(max_cycle=100)

In [None]:
#optimize the geometry
from pyscf.geomopt.berny_solver import optimize
opt_feco_3 = optimize(hf_feco_3)

In [None]:
#the optimized coordinates
opt_coords = opt_feco_3.atom_coords()
print(opt_coords)

In [None]:
#compare to experimental bond lengths
print('Fe-C distance: {:3.5f} a0'.format(abs(opt_coords[0,2]-opt_coords[1,2])))
print('C-O distance: {:3.5f} a0'.format(abs(opt_coords[1,2]-opt_coords[2,2])))
print('Experimental: Fe-C distance: {:3.5f} a0'.format(1.727/bohr2ang))
print('Experimental: C-O distance: {:3.5f} a0'.format(1.1586/bohr2ang))

In [None]:
opt_feco_3.pack()
opt_feco_3.build()

In [None]:
#final total energy
LS = scf.UHF(opt_feco_3)
LS.max_cycle=200
LS.init_guess = 'huckel'
ELS = LS.kernel()
#get the quintet energy at the triplet geometry
feco_5 = opt_feco_3.copy()
feco_5.spin = 4 
feco_5.build()
HS = scf.UHF(feco_5)
HS.max_cycle=200
HS.init_guess = 'huckel'
EHS = HS.kernel()

In [None]:
print('LS-HS energy difference: {}Eh, {}eV {}cm-1'.format(ELS-EHS,(ELS-EHS)*27.2,(ELS-EHS)*27.2*8065.6))

The experimental energy difference, including ZPE differences is 1135 cm-1. The pure electronic energy difference is about 450-500 cm$^{-1}$ with the LS the lower-lying state. UHF clearly favors the HS state due to the correct description of exchange but incorrect description of correlation in this mean-field approach.

We will now try this with a DFT approach. The DFT functionals that are available are listed here:
https://sunqm.github.io/pyscf/_modules/pyscf/dft/libxc.html

In [None]:
from pyscf import dft

fecoDFT_3 = opt_feco_3.copy()
fecoDFT_3.build()
fecoDFT_3 = dft.ROKS(fecoDFT_3)
fecoDFT_3.xc = 'tpss0'
#fecoDFT_3.xc = 'b3lyp'
fecoDFT_3.verbose=4
fecoDFT_3.max_cycle=200
#fecoDFT_3.init_guess='hcore'
fecoDFT_3.init_guess='huckel'
print(fecoDFT_3.kernel())

In [None]:
fecoDFT_3.xc = 'b3lyp'
opt_feco_3 = optimize(fecoDFT_3)

In [None]:
#print the final coordinates
opt_coords = opt_feco_3.atom_coords()
print(opt_coords)
#compare to experimental bond lengths
print('Fe-C distance: {:3.5f} a0'.format(abs(opt_coords[0,2]-opt_coords[1,2])))
print('C-O distance: {:3.5f} a0'.format(abs(opt_coords[1,2]-opt_coords[2,2])))
print('Experimental: Fe-C distance: {:3.5f} a0'.format(1.727/bohr2ang))
print('Experimental: C-O distance: {:3.5f} a0'.format(1.1586/bohr2ang))

fecoDFT_3 = opt_feco_3.copy()
fecoDFT_3.build()
fecoDFT_3 = dft.ROKS(fecoDFT_3)
fecoDFT_3.xc = 'tpss0'
#fecoDFT_3.xc = 'b3lyp'
fecoDFT_3.verbose=4
fecoDFT_3.max_cycle=200
#fecoDFT_3.init_guess='hcore'
fecoDFT_3.init_guess='huckel'
ELS = fecoDFT_3.kernel()

In [None]:
fecoDFT_5 = opt_feco_3.copy()
fecoDFT_5.spin = 4
fecoDFT_5.build()
fecoDFT_5 = dft.ROKS(fecoDFT_5)
fecoDFT_5.xc = 'tpss0'
#fecoDFT_3.xc = 'b3lyp'
fecoDFT_5.verbose=4
fecoDFT_5.max_cycle=200
#fecoDFT_3.init_guess='hcore'
fecoDFT_5.init_guess='huckel'
EHS = fecoDFT_5.kernel()

In [None]:
print('LS-HS energy difference: {} Eh, {} eV, {} cm-1'.format(ELS-EHS,(ELS-EHS)*27.2,(ELS-EHS)*27.2*8065.6))

In [None]:
#We can also compute vibrational frequencies
hess = mol.Hessian().kernel()
print(hess)

In [None]:
from pyscf.hessian import thermo
freq_info = thermo.harmonic_analysis(fecoDFT_3.mol, hess)
print(freq_info['freq_wavenumber'])
print(freq_info)

# Task 1

Locate the experimental geometry of formaldehyde, H$_2$CO, on the CCCBDB database:\
https://cccbdb.nist.gov/exp1x.asp

Calculate the total energy using the
1. cc-pVDZ,
2. cc-pVTZ,
3. cc-pVQZ,
4. aug-cc-pVDZ,
5. aug-cc-pVTZ,
6. aug-cc-pVQZ

basis sets and the HF, MP2 and DFT (B3LYP) methods. Plot the total energy for the different basis sets for each of the three methods, and upload this plot to moodle. Check that your result is consistent with your expectations for the results.

# Task 2
Perform a frequency analysis using B3LYP/cc-pVTZ for H$_2$CO. Obtain the peak intensities $I$ from the CCCBDB database. Generate a plot of the spectrum using Lorentzian broadening of the peaks with a width $\Gamma$ of your choice:
\begin{align}
L(\omega-\omega_0) = I \cdot \frac{\Gamma}{(\omega-\omega_0)+\Gamma^2} 
\end{align}

Generate a spectrum of the experimental frequencies from CCCBDB and compare. Upload your plots to moodle.



# Psi4

--- with contributions from Lucas E. Aebersold ---

See http://psicode.org/psi4manual/master/index.html

Psi4 can be run directly or using the API. An API is an Application Program Interface and basically communicates your ipython commands to the Psi4 program which is written in C++. You will learn about other APIs when we look at machine learning. You first need to install the program; the route that worked best for me was

`conda create -n p4env python=3.7 psi4 psi4-rt -c psi4/label/dev -c psi4` 

This will generate a specific conda environment for Psi4 as it heavily relies on specific versions of some modules (i.e., numpy). To run Psi4, you first need to activate the environment in the terminal you started jupyter from:

`conda activate p4env`

Most likely you need to install a few additional libraries into that environment for the API to work with jupyter

`conda install --yes matplotlib`\
`conda install --yes ipykernel`\
`python -m ipykernel install --user --name p4env`

To check if the `p4env` is available to jupyter, execute

`jupyter kernelspec list`

and you should see the python3 and p4env kernels.


In [None]:
import psi4
from numpy import *
import os, sys
from scipy.linalg import eigh, eig, inv
from scipy.constants import physical_constants
au2eV = physical_constants['Hartree energy in eV'][0]
bohr2ang = physical_constants['Bohr radius'][0]*1e10

If you wish to redirect the output to a file, use psi4.core.set_output_file('output.dat', False) where you replace the filename (and directory) as you wish. Otherwise, everything is printed to the screen as we will use here.

In [None]:
#! Sample HF/cc-pVDZ H2O Computation

psi4.set_memory('500 MB')

#this builds the molecule - here using a z-matrix, but you may also use xyz coordinates
h2o = psi4.geometry("""
O
H 1 0.96
H 1 0.96 2 104.5
""")

#perform a HF calculation with the cc-pVDZ basis set
EHF = psi4.energy('scf/cc-pvdz')
print('EHF is {} Eh'.format(EHF))

In [None]:
psi4.set_options({'reference': 'rhf','basis': 'cc-pvdz'})
scf_e, scf_wfn = psi4.energy('SCF', return_wfn=True)
print(asarray(scf_wfn.molecule().geometry()))

In [None]:
print(scf_e,EHF)

In [None]:
psi4.optimize('scf/cc-pvdz', molecule=h2o)

In [None]:
#print geometry - the optimized geometry is automatically updated
print(asarray(scf_wfn.molecule().geometry()))

In [None]:
#frequency analysis
scf_e, scf_wfn = psi4.frequency('scf/cc-pvdz', molecule=h2o, return_wfn=True)

In [None]:
h2o_freqs = scf_wfn.frequencies().to_array() 
print(h2o_freqs)

In [None]:
#let's try a CI calculation on this
ECISD,wfn = psi4.energy('cisd',return_wfn=True)
ECISDT,wfn = psi4.energy('cisdt',return_wfn=True)

In [None]:
#ECISDT = psi4.energy('cisdt')
print('ECISD is {} Eh and ECISDT is {} Eh'.format(ECISD,ECISDT))

Below you can find an example of how to use Psi4 in your own SCF program, using a DIIS solver:

In [None]:
def diag_F(F, norb):
    Fp = dot(A, dot(F, A))
    e, Cp = linalg.eigh(Fp)
    C = dot(A, Cp)
    C_occ = C[:, :norb]
    P = einsum('pi,qi->pq', C_occ, C_occ)
    return (C, P, e)
def formG(P):
    J = einsum('pqrs,rs->pq', Vee, P, optimize=True)
    K = einsum('prqs,rs->pq', Vee, P, optimize=True)
    G = 2*J - K
    return G
def diis_xtrap(F_list, diis_resid):

    B_dim = len(F_list) + 1
    B = empty((B_dim, B_dim))
    B[-1, :] = -1
    B[ :,-1] = -1
    B[-1,-1] = 0
    for i in range(len(F_list)):
        for j in range(len(F_list)):
            B[i,j] = sum(diis_resid[i]*diis_resid[j])

    rhs = zeros((B_dim))
    rhs[-1] = -1

    coeff = linalg.solve(B, rhs)

    F_diis = zeros_like(F_list[0])
    for ix in range(coeff.shape[0] - 1):
        F_diis += coeff[ix]*F_list[ix]

    return F_diis

In [None]:
bond_dist = 1.4632*bohr2ang
cmpd = 'HeH'
# here is how we define the geometry
mol = psi4.geometry("""
        He 
        H 1 {: .5f}
        """.format(bond_dist))

# set charge to positive 1
#mol.set_molecular_charge(0)
mol.set_molecular_charge(1)
mol.set_multiplicity(1)

#the_basis = '6-31g'
#the_basis = 'sto-3g'
#the_basis = 'cc-pvdz'
the_basis = '6-31G**'
# set our calculation options

psi4.set_options({'guess':'core',
    'basis':'{}'.format(the_basis),
    'e_convergence':1e-8,
    'scf_type':'pk',
    'reference':'rohf'
    })


# our guess is the core guess, -> can also be sad (superposition of atomic densities)
# scf_type -> ERI algorithm, pk is default
# reference -> rhf, uhf, and maybe rohf

# compute static 1e- and 2e- quantities in Psi4
# Class initialization

wfn = psi4.core.Wavefunction.build(mol,psi4.core.get_global_option('basis'))
# mints is the integral helper
mints = psi4.core.MintsHelper(wfn.basisset())

# the Smat is the atomic orbital overlap
Smat = asarray(mints.ao_overlap()) 
# number of basis functions, alpha orbitals -> rhf so just call alpha
#nbf = Smat.shape[0]
nbf = wfn.nso()
ndocc = wfn.nalpha() 

# Build core Hamiltonian
Tmat = asarray(mints.ao_kinetic())
Vmat = asarray(mints.ao_potential())
Hmat = Tmat + Vmat

# build the nasty two-electron repulsion integral
Vee = asarray(mints.ao_eri())

# Construct AO orthogonalization matrix A
# this is the Psi4 way, which is for symmetric orthog
# A = mints.ao_overlap()
# A.power(-0.5, 1.0e-16)
# A = asarray(A)

# get nuclear repulsion energy from Psi4
E_nuc = mol.nuclear_repulsion_energy() 
print(E_nuc)

# symmetryic ortho
# we'll keep our way in here, it works the same
u, V = eigh(Smat)
U = sqrt(inv(u*eye(len(u))))
A = dot(V, dot(U, V.T))

# maximum scf iterations
maxiter = 40

# energy convergence criterion
E_conv = 1.0e-6
D_conv = 1.0e-4

# pre-iteration step
# scf & previous energy
SCF_E = 0.0
E_old = 0.0
# form core guess
C, P, epsilon = diag_F(Hmat, ndocc)
print(epsilon)

# trial and resiual vector lists
F_list = []
R_list = []
diis_resid = []

print('Number of occupied orbitals {}'.format(ndocc))
print('Number of basis functions {}'.format(nbf))

print('==> Starting SCF Iterations <==\n')

# comment in to write initial wavefunction
# f = open('init_wfn.dat', 'w')
# for i in 
#     for j in 
#             C[i,j]
# f.close()

for scf_iter in range(maxiter):
    # Build the Fock matrix
    # We will build the G matrix in a slightly different way, using
    # the einsum function
    F = Hmat + formG(P)

    # for the diis
    # A * (F*P*S - S*P*F) * A
    M = dot(F, dot(P, Smat)) - dot(Smat, dot(P, F))
    # here 
#   diis_r = dot(A, dot(M, A))
    diis_r = dot(A.T, dot(M, A))

    F_list.append(F)
    R_list.append(diis_r)

    SCF_E = sum(P*(Hmat + F)) + E_nuc

    dE = SCF_E - E_old

    dRMS = mean(diis_r**2)**0.5
    print('SCF Iteration {:3d}: Energy = {: 4.16f} dE = {: 1.5e} dRMS = {:1.5e}'.format(scf_iter+1, SCF_E, dE, dRMS))

    if (abs(dE) < E_conv) and (dRMS < D_conv):
        break
    E_old = SCF_E

    if scf_iter >= 2:
        F = diis_xtrap(F_list, R_list)

    C, P, epsilon = diag_F(F, ndocc)

    if scf_iter == maxiter:
        psi4.core.clean()
        raise Exception("Maximum number of SCF iterations exceeded.")

print('\nSCF Converged.')
print('Final RHF Energy: {: .8f} [Eh]'.format(SCF_E))

# print the final wavefunction
# f = open('final_wfn.dat', 'w')
# for i in range(C.shape[0]):
#     for j in range(C.shape[1]):
#         print('{: 23.15f}'.format(C[i,j]), file=f)
# f.close()

SCF_E_psi, wfn = psi4.energy('SCF', return_wfn=True)

psi4.compare_values(SCF_E_psi, SCF_E, 6, 'SCF Energy')

# remove any molden file with our compoudns name if it exists
os.system('rm -f {}.rhf.molden'.format(cmpd))
# create new molden file
psi4.molden(wfn, '{}.rhf.molden'.format(cmpd))

# uncomment for FCI energy
FCI_E_psi = psi4.energy('FCI')
print(FCI_E_psi)

# Optional task

Calculate CISD energies for the basis sets in Task 1 and add these to your energy vs. basis plot.