Welcome to the PySCF tutorial!

This component of the tutorial is primarily concerned with computational chemistry calculations and will not cover the theoretical details of the methods utilized [i.e., Hartree--Fock (HF), Kohn--Sham Density Functional Theory (KS-DFT), Møller–Plesset Peturbation Theory (MP), Coupled Cluster Theory (CC), etc.]. For specifics regarding these methods, please see [TODO].

We will go through the steps necessary to carry out a variety of relevant calculations. These include:

* Single Point Energies
  * Ground State
    1. atomization energies and bond dissociation energies
    2. reaction energies
    3. electron affinities, ionization potentials, and proton affinities (vertical and adiabatic)
    4. isomerization (relative) energies
    5. binding energies of non-covalent complexes (weak/hydrogen/rare-gas)
    6. barrier heights
  * Excited State (Time-Dependent)
    1. TD-HF/TD-DFT
    2. Wavefunction Stability Analysis
    3. Visualization?

* Forces and Geometry Optimizations
  * Ground State
      1. Force for equilibrium structure to show zeros
      2. Force for non-eq
      3. TS search
      4. Geom opt
  * Excited State (Time-Dependent)
      1. can this be done?

* Frequencies
  * Ground State
    1. vibrational modes
    2. free-energy
    3. Geometry optimization local minimum
  * Excited State (Time-Dependent)
    1. can this be done?

Furthermore, we will demonstrate the effect of the most important computational settings on the final calculated results using a handful of the above interactions:
  * Relevant Settings
    1. basis set (minimal vs. double vs. triple. vs. quadruple vs. basis set limit) BSIE
    2. DFT integration grid (coarse vs. fine, etc.)
    3. method (HF/DFT/MP2/CC/etc.)
    4. counterpoise corrections (which is an extension of basis set category) talk about BSIE/BSSE
    5. integral threshes and convergence criteria
    6. cart vs spherical?
    7. restricted vs unrestricted

---

In order to get started, we need to import the pyscf module, which is accomplished by the command in the cell below. If this command fails, please see the [quick setup guide](http://sunqm.github.io/pyscf/tutorial.html#quick-setup) or the [detailed installation instructions](http://sunqm.github.io/pyscf/install.html).

In [1]:
import pyscf

The next step is to define the molecule, which requires importing the gto (gaussian type orbital) submodule:

In [2]:
from pyscf import gto

# gto.Mole object

This Markdown cell will go through the steps involved in setting up a molecule, including a description of the eight most important attributes. The following Code cell combines the content explained here into a useful sample input. The more advanced attributes are listed (commented) at the bottom of the code cell and the expert user can find more information about them [here](http://sunqm.github.io/pyscf/gto.html).

The molecule used in this demonstration is the water dimer (two hydrogen-bonded water molecules) from the widely-used S22 dataset of non-covalent interactions, and can be found [here](http://www.begdb.com/index.php?action=oneMolecule&state=show&id=82). The molecule is pictured here:

<img style="float: left;" src="images/water_dimer.jpg"><br clear="all" />

The first step is to create the molecule object:
```
mol=gto.Mole()
```

## atom attribute

PySCF allows for a variety of [molecular input formats](http://sunqm.github.io/pyscf/gto.html#geometry), but the one that is most suitable for the current example is the use of triple-quotes. This allows one to simply copy and paste the geometry from the BEGDB website without further modification:
```
mol.atom="""
O  -1.551007  -0.114520   0.000000
H  -1.934259   0.762503   0.000000
H  -0.599677   0.040712   0.000000
O   1.350625   0.111469   0.000000
H   1.680398  -0.373741  -0.758561
H   1.680398  -0.373741   0.758561"""
```
Another useful way to set up a molecule is to read the coordinates from an existing file. The file can either be an XYZ (see geom/water_dimer.xyz), contain only the coordinates (see geom/water_dimer.mol), or contain the coordinates prepended by a line containing the charge and spin separated by a space (see geom/water_dimer.qc). As long as the file is in one of the three aforementioned formats, PySCF can automatically detect and parse the file. Reading in a molecule file can be accomplished via:
```
mol.atom=read_file('water_dimer.mol')
```

## basis attribute

The simplest way to set up the basis is to specify the same basis set for all atoms:
```
mol.basis='aug-cc-pVDZ'
```
In this case, the aug-cc-pVDZ basis set is used for all of the hydrogen and oxygen atoms.

Alternatively, one can use a different basis set for different elements:
```
mol.basis={'H': 'cc-pVDZ', 'O': 'aug-cc-pVDZ'}
```
In this case, the cc-pVDZ basis set is used for all of the hydrogen atoms, while the aug-cc-pVDZ basis set is used for all of the oxygen atoms.

One can also use different basis sets for different atoms of the same element by labeling the atoms with integers:
```
mol.atom="""
O1  -1.551007  -0.114520   0.000000
H2  -1.934259   0.762503   0.000000
H3  -0.599677   0.040712   0.000000
O4   1.350625   0.111469   0.000000
H5   1.680398  -0.373741  -0.758561
H6   1.680398  -0.373741   0.758561"""
mol.basis={'O1': 'cc-pVDZ', 'H2': 'cc-pVDZ', 'H3': 'aug-cc-pVDZ', 'O4': 'aug-cc-pVDZ', 'H5': 'cc-pVDZ', 'H6': 'cc-pVDZ'}
```
In this case, the cc-pVDZ basis set is used for all of the atoms except for those involved in the hydrogen bond (H3 and O4). The diffuse aug-cc-pVDZ basis set is used in this case.

Finally, one common way of reducing basis set superposition error (BSSE) is to use counterpoise corrections by employing ghost functions on ghost atoms. A useful guide to BSSE can be found [here](http://vergil.chemistry.gatech.edu/notes/cp.pdf). In PySCF, counterpoise corrections can be applied via simple modifications to the atom and basis attributes.

For example, running the water dimer calculation with ghost functions on the second water monomer can be accomplished via:
```
mol.atom="""
O  -1.551007  -0.114520   0.000000
H  -1.934259   0.762503   0.000000
H  -0.599677   0.040712   0.000000
GhostO   1.350625   0.111469   0.000000
GhostH   1.680398  -0.373741  -0.758561
GhostH   1.680398  -0.373741   0.758561"""
mol.basis={'H': 'aug-cc-pVDZ', 'O': 'aug-cc-pVDZ', 'GhostH': gto.basis.load('aug-cc-pVDZ','H'), 'GhostO': gto.basis.load('aug-cc-pVDZ','O')}
```

Advanced users who are interested in using custom basis sets or basis sets that not pre-defined in PySCF are directed [here](http://sunqm.github.io/pyscf/gto.html#input-basis). Useful resources for the latter include the [EMSL basis set exchange](https://bse.pnl.gov/bse/portal) and the [PSI4 basis set page](http://www.psicode.org/psi4manual/master/basissets_byfamily.html). PySCF can parse both NWChem and Gaussian94 format basis set files. Additionally, PySCF can read a file containing the desired basis set.

## cart attribute

The cart attribute determines whether the d and/or higher basis functions are taken to be *spherical* (i.e., d=5 functions, f=7 functions, g=9 functions, etc.) or *Cartesian* (i.e., d=6 functions, f=10 functions, g=15 functions, etc.). mol.cart=0 (default) specifices *spherical* functions and mol.cart=1 specifies *Cartesian* functions. If an existing basis set in PySCF is being used, the cart attribute is automatically set based on how the basis set itself was optimized. Thus, it is not necessary to define this attribute for most calculations.

There are two instances where specifying mol.cart is necessary, and this will pertain primarily to more advanced users:

1.  If either a custom basis set is being used or one that is not pre-defined in PySCF, it is important to determine the appropriate value of mol.cart and set it accordingly. For example, the [UGBS basis set](https://aip.scitation.org/doi/abs/10.1063/1.475959) is commonly used to determine absolute atomic energies. This basis set was optimized for use with spherical basis functions. Since UGBS is not pre-defined in PySCF, it is important to set mol.cart=0.

2.  If the user is interested in experimenting with the effect of using spherical vs. Cartesian functions for a given basis set, it is possible to override the default for pre-defined basis sets. For example, the Dunning basis sets (cc-pVXZ and aug-cc-pVXZ) are meant to be used with spherical functions (mol.cart=0), yet one can set mol.cart=1 to gauge the sensitivity of absolute and relative energies to this setting.

## charge attribute

The charge attribute sets the charge of the molecule, and is 0 for neutral systems (default), 1 for cations, 2 for dications, -1 for anions, -2 for dianions, etc.

## ecp attribute

When heavy elements are present in a molecule, standard Gaussian basis sets are oftentimes both insufficient (because they are unable to capture relativistic effects that are in heavier elements) and impractical (because as one descends the rows of the periodic table, Gaussian basis sets tend to become very large and heavily-contracted due to the increased number of electrons). For this reason, a great deal of work has been devoted to developing effective core potentials (ECP) which replace core electrons around a nucleus by pseudopotentials.

## spin attribute

The spin attribute simply sets the number of unpaired electrons. By default, a closed-shell system is assumed and the value of mol.spin=0. For open-shell systems, the spin must be set. For example, boron has a single unpaired electron, so mol.spin=1. Carbon has two unpaired electrons, so mol.spin=2. And finally, nitrogen has three unpaired electrons, so mol.spin=3. The same concept applies to open-shell molecules.

## unit attribute

The unit attribute can be either set to 'Angstrom' (default) or 'Bohr'. This entirely depends on the coordinates specified in mol.atom.

## verbose attribute

The verbose attribute controls the print level for the Mole object. Setting mol.verbose=0 will print little to no information, while setting mol.verbose=4 prints useful information about the basis and number of basis functions. Users who want to see detailed information should set mol.verbose=10.

The final step is to build the molecule object:
```
mol.build()
```

In [3]:
#The code below combines the content introduced in the Markdown cell above into a useable input.

mol=gto.Mole()

mol.atom="""
O  -1.551007  -0.114520   0.000000
H  -1.934259   0.762503   0.000000
H  -0.599677   0.040712   0.000000
O   1.350625   0.111469   0.000000
H   1.680398  -0.373741  -0.758561
H   1.680398  -0.373741   0.758561"""
mol.basis='aug-cc-pVDZ'
mol.cart=False
mol.charge=0
mol.ecp={}
mol.spin=0
mol.unit='Angstrom'
mol.verbose=4

mol.build()

#ADVANCED ATTRIBUTES -- SEE MANUAL
#mol.groupname=
#mol.incore_anyway=
#mol.irrep_id=
#mol.irrep_name=
#mol.max_memory=
#mol.nucmod=
#mol.output=
#mol.symm_orb=
#mol.symmetry=
#mol.symmetry_subgroup=
#mol.topgroup=

System: ('Linux', 'gund', '4.13.0-36-generic', '#40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018', 'x86_64', 'x86_64')  Threads 1
Python 2.7.12 (default, Nov 20 2017, 18:23:56) 
[GCC 5.4.0 20160609]
numpy 1.12.1  scipy 0.19.0
Date: Thu Mar  8 22:04:32 2018
PySCF version 1.4.2
PySCF path  /home/narbe/pyscf/pyscf
GIT ORIG_HEAD e02cd77709ad27dd6f216c61695ac08626be4f4e
GIT HEAD      ref: refs/heads/dev
GIT dev branch  1520d69620cbf726e21f732f9c78f5e8057c8b03

[INPUT] VERBOSE 4
[INPUT] num atoms = 6
[INPUT] num electrons = 20
[INPUT] charge = 0
[INPUT] spin (= nelec alpha-beta = 2S) = 0
[INPUT] symmetry False subgroup None
[INPUT]  1 O     -1.551007000000  -0.114520000000   0.000000000000 AA   -2.930978447283  -0.216411435785   0.000000000000 Bohr
[INPUT]  2 H     -1.934259000000   0.762503000000   0.000000000000 AA   -3.655219763975   1.440921839159   0.000000000000 Bohr
[INPUT]  3 H     -0.599677000000   0.040712000000   0.000000000000 AA   -1.133225293201   0.076934529983   0.00000000

Warn: Ipython shell catchs sys.args


<pyscf.gto.mole.Mole at 0x7f033f9646d0>

---

After the molecule has been built, it is necessary to follow a similar procedure to initialize the SCF object [Hartree--Fock (HF) or Kohn-Sham Density Functional Theory (KS-DFT)]. For most users, the relevant SCF classes will be RHF (Restricted Hartree--Fock), UHF (Unrestricted Hartree--Fock), RKS (Restricted Kohn--Sham), and UKS (Unrestricted Kohn--Sham). RHF and RKS can also be used to run ROHF (Restricted Open-Shell Hartree--Fock) and ROKS (Restricted Open-Shell Kohn--Sham) as long as an open-shell molecule is defined.

An overview of how to use an SCF class will be demonstrated with RHF.

In order to create an SCF object, it is necessary to import the scf (self-consistent field) submodule:

In [4]:
from pyscf import scf

# scf.RHF object

The first step is to create the method object (RHF in this case):
```
mf=scf.RHF(mol)
```
In this example, the object is name mf (mean-field). The input object is the molecule (mol) that was discussed previously.
Note: The name for the mean-field object can be set to anything the user desires (e.g., ```calc=scf.RHF(mol)```).

## conv_tol attribute
In order for an SCF calculation to converge, PySCF requires two criteria to be met. The first is controlled by the conv_tol attribute, namely the difference in the SCF energy between two sucessive cycles. 

## conv_tol_grad attribute
The second criterion for convergence is the conv_tol_grad attribute, namely, the root-mean-square of the orbital gradient. This is a vector of length nocc\*nvirt.


## direct_scf_tol attribute
The direct_scf_tol attribute is the integral thresh.


## init_guess attribute
A good initial guess is vital for an efficient SCF procedure. There are four initial guess options in PySCF:
1. minao
2. atom
3. 1e
4. chkfile

The first option (```mf.init_guess='minao'```) is the default option and generates an initial guess for the density matrix based on the ANO basis, and then projects this onto the basis set specified in ```mol.basis```.

The second option (```mf.init_guess='atom'```) generates and initial guess based on a superposition of atomic densities.

The third option (```mf.init_guess='1e'```) sets the initial density matrix, P=0.

The last option (```mf.init_guess='chkfile'```) is somewhat advanced and can read in an existing density matrix from disk. Further information on this option can be found [here](http://sunqm.github.io/pyscf/scf.html#pyscf.scf.hf.SCF.init_guess_by_chkfile).

## max_cycle attribute
The max_cycle attribute simply sets the maximum number of SCF cycles that should be carried out before the calculation terminates. The default value is 50, but for systems that are notoriously difficult to converge, the value should be increased to 100 or even 1000.


## max_memory attribute
The max_memory attribute determines the maximum amount of memory that PySCF is allowed to utilize during the SCF procedure. This should be set by the user based on the memory limitations of the computer/server utilized.

## verbose attribute
The verbose attribute controls the print level for the RHF object. Setting mol.verbose=0 will print only the SCF energy, while setting mol.verbose=4 prints useful information about SCF settings as well as the SCF energy per iteration, HOMO/LUMO energies, and convergence metris. Users who want to see detailed information should set mol.verbose=10, which provides additional information such as molecular orbital energies.

The final step is to run the SCF calculation:
```
mf.kernel()
```

In [16]:
mf=scf.RHF(mol)

mf.conv_tol=1e-12
mf.conv_tol_grad=1e-8
mf.direct_scf_tol=1e-13
mf.init_guess='1e'
mf.max_cycle=100
mf.max_memory=8000
mf.verbose=0

mf.kernel()

#ADVANCED ATTRIBUTES -- SEE MANUAL
#mf.chkfile=
#mf.conv_check=
#mf.damp=
#mf.diis=
#mf.diis_file=
#mf.diis_space=
#mf.diis_space_rollback=
#mf.diis_start_cycle=
#mf.direct_scf=
#mf.level_shift=

-152.08859934750342

---

In order to run an unrestricted Hartree--Fock calculation, the steps taken above can be followed, except the initial object should be initiated as:
```
mf=scf.UHF(mol)
```

Density 

In [6]:
#TODO
#bare minimum is: atom, basis, w/o it should crash
#allow mol.atom to take something like @H or !H or something to indicate Ghost
#mol.cart needs to be set based on the basis set
#be able to read mol.atom from any file, and the file can either be just the XYZ, be an actual XYZ, or be q-chem-like
#one should just be able to copy and paste from an XYZ w/o the ;. that is not commonly or ever present
#D3 HAS to be implemented, and VV10 sped up, and maybe D2 D3(BJ) and all that new shit etc
#what about fragment monomoners?
#just specify a single basis for everything: mol.basis={'aug-cc-pVDZ'}
#be able to take input of Gaussian94 basis
#be able to read basis from file on disk
#maybe for mol.cart instead of 0 and 1, do spherical and Cartesian
#there should be a verbose level for Mole that doesn't print bais set coeffs and stff
#lin dep should be automatically removed!!!
#shouldn't extra cycle for SCF (conv_check) only be initiated IF there is a level shift??
#some properties should be computed for free and by default -- atomic charges, S^2, multipole moments
#stability analysis / diis/gdm
#should we be using the norm of the orbital gradient for convergence of the rms
#printing HOMO/LUMO on each cycle is unnecessary or at least we should be able to turn it off
#we should print like orbital energies at end, split into occ and virt
#what exactly is this minao guess?