# Basis Sets - Defining Vector Spaces

Three questions have to be addressed before tackling an electronic
structure problem: Which computer code is best suited for a given
problem, which computational method will give the most accurate results
in a reasonable time, and what basis set offers the best compromise of
accuracy and efficiency? Throughout this course, you shall always be
using the same code (*Psi4*) - but you will get to try out some
of the different approaches discussed in the lecture. Before the first
practical example - applying the Hartree-Fock-Roothaan scheme (that you
have just treated in the lecture) - to the hydrogen atom, there remains
one issue to be resolved: What is the basis in which we want to expand
our wavefunction that is described by the in principle infinite
expansion 

$$
 \Psi(\mathbf{r}_1,\dots,\mathbf{r}_N) = \sum_j c_j \psi_j(\mathbf{r}_1,\dots,\mathbf{r}_N)
$$ (wf_expansion)

## One-Electron Wavefunctions: Slater-Type Orbitals

By defining a basis set, we define a *vector space* in which the
Schrödinger equation is to be solved - and we wish this space to be as
close as possible to the complete space that defines the accurate
solution. You have already seen that the Hartree-Fock scheme makes a
convenient (but not always accurate) approximation to $\Psi$, in that it
is assumed that one Slater determinant is enough to accurately describe
the problem. Therefore, in Hartree-Fock theory, the {eq}`wf_expansion` reduces to:

$$
 \Psi(\mathbf{r}_1,\dots,\mathbf{r}_N) = \psi(\mathbf{r}_1,\dots,\mathbf{r}_N),
$$ (HF_wf)

where

$$ 
 \psi(\mathbf{r}_1,\dots,\mathbf{r}_N) = \det\left|\phi_1(\mathbf{r}_1),\dots,\phi_N(\mathbf{r}_N)\right|
$$ (slater_det)

is a Slater determinant to account for the antisymmetry requirement as
discussed in the preceeding chapter, and the $\left\{\phi\right\}$ are
one-electron orbitals. Although an expression for the many-electron
wavefunction in terms of one-particle wavefunctions is now given, the
latter are not yet specified. An intuitive approach to the one-electron
orbitals may be based on the *LCAO* (Linear Combination of Atomic
Orbitals) theory, where one-particle molecular orbitals are formed from
one-particle atomic orbitals. This implies that $\phi_m(\mathbf{r}_m)$
will be expanded in terms of all *atomic one-particle orbitals* of the
system, a set of *atomic basis functions* 

$$
 \phi_m(\mathbf{r}_m) = \sum_n D_{mn} \chi_n(\mathbf{r}_m),
$$ (LCAO)

where the $\left\{\chi\right\}$ are the atomic orbitals and $D_{mn}$ is
the expansion coefficient (the contribution) of the n$^{th}$ atomic
orbital to the single-particle molecular orbital $\phi_m$. As the
Hartree-Fock many-electron wavefunction is expressed as a single Slater
determinant, the coefficients $c_j$ as defined in the introduction
vanish, and the only coefficients left in the definition are the
$D_{mn}$. These are the expansion coefficients that are optimised in a
Hartree-Fock calculation.\
 \
Still, the question how to define the single-particle atomic orbitals is
not yet resolved. In principle, the condition that there be a cusp at
the nuclei and that the orbital fall off exponentially at large
distances from the nuclei dictates a certain form. One suitable form was
proposed by Slater in the 30ies of the last century: 

$$
 \chi_{\xi,n,l,m}(\mathbf{r},\theta,\phi) = N \cdot Y_{lm}(\theta,\phi)\cdot r^{n-1}\cdot e^{-\zeta r}
$$ (STO)

A Slater-type orbital is composed of an angular part that is taken from
the exact solution of the hydrogen atom $Y_{lm}$ (the spherical
harmonics), an exponential part (to ensure the right long-range decay)
and a polynomial. However, products of these functions will need to be
evaluated - and these are impractically expensive to compute. It is
therefore more convenient to choose basis functions that offer some
computational advantages. *Gaussian functions* would be especially
suited, as products of Gaussians will simply yield another Gaussian that
is placed off the initial centres. Frank Boys therefore proposed to
approximate Slater-type orbitals with a linear combination of
Gaussian-type functions. These Gaussian-type basis functions are
referred to as *contraction functions*. This implies that the atomic
basis function $\chi$ is in turn defined by several basis functions (the
term contraction is chosen to avoid confusion between the atomic basis
functions, and the linear combination of Gaussians they are based upon):

$$
 \chi_{\xi,n,l,m}^{STO-3G}(\mathbf{r},\theta,\phi) = \sum_{i=1}^3 d_i \cdot N_i \cdot Y_{lm}(\theta,\phi) \cdot r^{2n-2-l} \cdot e^{-\xi_i r^2},
$$ (STO-3G)

where $N_i$ is a normalisation constant, and $\xi_i$ is the i$^{th}$
prefactor in the exponent that guarantees an optimal fit to the
Slater-type orbitals. This defines a *minimal Gaussian basis set* known
as *STO-3G* (STO stands for Slater-type orbital and refers to the origin
of the Gaussian expansion). The term minimal basis does not refer to the
number of contractions, but to the number of basis functions: For each
orbital, there is one basis function. Minimal bases create minimal
computational overhead, but will often not provide sufficient
flexibility to accurately describe the system's wavefunction - there is
always a certain trade-off between the desired accuracy and the
efficiency of a calculation. For more details, you may refer to the main
course script.


### Pople-Type Split-Valence Basis Sets

Core and valence orbitals are equally important for the energetics of a
system, but bonding is dictated by the valence electrons. One may
therefore want to improve over the STO-3G basis by allowing for
additional flexibility in the description of valence electrons. In a
*split-valence basis set*, the number of basis functions that is
assigned to core orbitals differs from the one for the valence orbitals.
Usually, core electrons are described by one function, which is in turn
composed of a certain number of Gaussian functions (*i.e.*
contractions). For the description of the valence electrons, multiple
functions will be included (most often 2 to 6); and every of these
functions will in turn be expressed by a varying number of Gaussian
contractions.\
 \
An example of a split-valence basis set is John Pople's 3-21G. The
notation encodes information about the contraction: The number on the
left of the hyphen denotes the number of contractions for the core
orbitals, which consist of a single basis function per orbital only. The
information on the right describes the contraction of the valence
orbitals: There are two numbers, hence there are two basis functions
$\chi$ per orbital. These basis functions, in turn, are constructed by
two and one Gaussian contraction(s) respectively.


```{figure} ../../images/orbitals.png
---
name: orbitals
---
Explanation of what the numbers in the 3-21G basis set notation mean. 
```

Consider, as a practical example, carbon with the electronic
configuration $1s^22s^22p^2$ in the 3-21G basis. The core orbital (1s)
is given by a contraction over <span style="color:green">three</span> Gaussians. 

$$
 \chi(1s)=\sum_{k=1}^3 \alpha_{1s,k}\mathrm{e}^{-\zeta_{1s,k}\mathbf{r}^2}
$$ (example_C_core)

To every valence orbital (2s and 2p), one function containing <span style="color:blue"> two </span>
  Gaussians and one function containing
<span style="color:red"> one </span> Gaussian is attributed. 

$$
\begin{aligned}
\begin{split}
\chi(2s)^{(2)} & = \sum_{k=1}^2 \alpha_{2s,k} \ \mathrm{e}^{-\zeta_{2s,k}\mathbf{r}^2} \\
\chi(2s)^{(1)} & = \alpha'_{2s} \ \mathrm{e}^{-\zeta'_{2s}\mathbf{r}^2}
\end{split}
\end{aligned}
$$ (example_C_valence1)

$$
\begin{aligned}
\begin{split}
\chi(2p)^{(2)}_{\Gamma} & = \sum_{k=1}^2 \alpha_{2p,k} \ \Gamma_p(\mathbf{r}) \ \mathrm{e}^{-\zeta_{2p,k}\mathbf{r}^2} \\
\chi(2p)^{(1)}_{\Gamma}  & =\alpha'_{2p} \ \Gamma_p(\mathbf{r}) \ \mathrm{e}^{-\zeta'_{2p}\mathbf{r}^2}
\end{split}
\end{aligned}
$$ (example_C_valence2)


where $\Gamma_p(\mathbf{r})=x,y,z$ accounts
for orbitals $p_x$, $p_y$, $p_z$. Fixed coefficients are added in front
of each Gaussian, denoted by $\alpha$.\
For each atom, there are individual sets of parameters $\alpha$ and
$\zeta$, which were determined back when the basis set was designed.
These contraction parameters are *never* changed during an electronic
structure calculation. Recall that the molecular one-electron
wavefunctions are variable linear combinations of *fixed* atomic
orbitals; changing the contraction parameters during the calculation
would change and therefore mess up the atomic basis functions. The
values for standard basis sets are usually hard-coded in the electronic
structure codes. 

For instance, *Psi4* represents the basis set
parameters in the following format:


```{figure} ../../images/basis_set_param_noted.png
---
name: basis_set_param
---
Example of a basis set parameter file
```

which are the 3-21G basis set parameters for a carbon atom (from https://github.com/psi4/psi4/blob/master/psi4/share/psi4/basis/3-21g.gbs).   
The $S$ entry contains information about the core, the $SP$ entries about the valence orbitals. The first number after $S$ or $SP$ refers to the index of the contraction $k$, the column below gives the contraction parameters
$\zeta_k$, the second column gives the $\alpha_{s,k}$ and the third the
$\alpha_{p,k}$. Note that if there is just one contraction, then
$\alpha_{l,1} = 1$. In general, s and p orbitals do not differ in
$\zeta_k$, but just in $\alpha_{l,k}$.



```{admonition} Exercise 1
:class: exercise 
A minimal basis set...\
    a) ...always gives the lowest energy.\
    b) ...is optimized for small molecules.\
    c) ...contains one basis function for each atomic orbital only.
```


```{admonition} Exercise 2
:class: exercise 
A split-valence basis set...\
    a) ...contains two basis functions for each valence atomic orbital.\
    b) ...doubles the CPU time of the calculation.\
    c) ...attributes a different number of basis functions to valence and
    core orbtials.
```


```{admonition} Exercise 3
:class: exercise 
Which of the following basis sets does not contain polarisation functions?\
    a) 6-31G$^\ast$\
    b) 6-31G(d,p)\
    c) 3-21+G\
    d) DZP
```


```{admonition} Exercise 4
:class: exercise 
Diffuse functions are added to a basis set to...\
    a) ...save CPU time.\
    b) ...better represent electronic effects at larger distances from the nuclei.\
    c) ...take polarisation into account.\
    d) ...enhance the description of core orbitals.
```


```{admonition} Exercise 5
:class: exercise 
Using the information given about the 3-21G contraction coefficients:\
    a) Give the basis functions corresponding to the 1s, 2s and 2p orbitals of Carbon (**Hint**: use information from **Fig. 2.2**).\
    b) If you wish to calculate the Hartree-Fock energy of a carbon atom,
    how many coefficients are *optimised* during the calculation?
```


```{admonition} Exercise 6
:class: exercise 
You wish to calculate the wavefunction of ethylene C$_2$H$_2$ using the 6-31G\* basis. \
   Indicate the number of basis functions and the number of Gaussian primitives that will be used in the calculation.
```

# First Steps in *Psi4*: The Hydrogen Atom

You will learn how to use *Psi4* by putting your hands on a simple
example: The total energy of the hydrogen atom in Hartree-Fock theory.
This is a tutorial - you are not only invited to type the commands that
are being introduced, you are obliged to.

## Electronic Structure Software

*Ab initio* electronic structure software packages make it possible to
calculate numerically a variety of properties of a given system, based
only on physical constants and the system's Hamiltonian. The only
approximations that need to be made are in the method and basis set that
have to be chosen, in order to allow for a reasonable computational
time. (The stronger your workstation, the more approximations you may
drop, and the more elaborate your approach can be.) There are plenty of
*ab initio* quantum chemical packages on the market; they differ in
their capabilities, license policy and pricing. Widley used packages
include GAMESS US, turbomole, DALTON, CP2K, CPMD and the Gaussian set of
programs.

*Psi4* is a free and open-source ab initio electronic structure program providing implementations of Hartree–Fock, density functional theory, many-body perturbation theory, configuration interaction, density cumulant theory, symmetry-adapted perturbation theory, and coupled-cluster theory. Most of the methods are quite efficient, thanks to density fitting and multi-core parallelism. The program is a hybrid of C++ and Python, and calculations may be run with very simple text files or using the Python API, facilitating post-processing and complex workflows. [Reference: Smith DG, *et al.* PSI4 1.4: Open-source software for high-throughput quantum chemistry. *The Journal of chemical physics* (2020) DOI: https://doi.org/10.1063/5.0006002 ]

The current version of *Psi4*  that you will be using is v1.4rc1. 

## Preliminary steps

Before starting to work with *Psi4* we need to set up the environment, importing the required modules:

In [2]:
import psi4
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.style.use(['seaborn-poster', 'seaborn-ticks'])

then we can also set the maximum resources that can be used, specifying the available memory for *Psi4* to use with the `psi4.set_memory()` function, and the number of threads to use in SMP parallel computations with the `psi4.set_num_threads()` function

In [3]:
psi4.set_memory('2 GB')
psi4.set_num_threads(2)

## Writing An Input For *Psi4* and Invoking the Program

The geomety of the $H$ atom can now be defined, passing it as a string into the `psi4.geometry()` function in either Z-matrix (see note below) or Cartesian format, where the triple-quote `"""string"""` syntax is used to allow the string to break over multiple lines. 

In [4]:
h = psi4.geometry("""
0 2
H 0.0 0.0 0.0
""")

Where the first line contains the total charge of the system (0), followed by the spin multiplicity (2), and the following lines contain all the atoms of your system, one atom per line.

:::{admonition} Atomic coordinates representation
:class: dropdown

* **Z-matrix**: each line gives the coordinates of a single atom in terms of its internal coordinates (atomic number, bond length, bond angle, and dihedral angle). As example, the Z-matrix for hydrogen peroxide is reported:

```
H
O 1 0.9
O 2 1.4 1 105.0
H 3 0.9 2 105.0 1 120.0
```

The first line defines the fistr atom, a H in this case. The next line defines the second atom, an O, and specifies the internuclear distance (0.9Å) with the first atom (1). The next line defines the third atom, another O, and specifies the internuclear distance (1.4Å) with the second atom (2) and the angle (105.0°) with atom 2 and atom 1 (O-O-H angle). The last line defines the fourth atom, another H,  and specifies the internuclear distance (0.9Å) with the third atom (3), the angle (105.0°) with atom 3 and atom 2 (H-O-O angle), and the dihedral angle  (120.0°) with atom 3, 2, and 1 (H-O-O-H angle).\
Note that if more atoms would be present, it would be needed to specify for each of them the value for  internuclear distance, angle and dihedral with other atoms previously defined.


* **Cartesian coordinates**: each line gives the $x$, $y$, and $z$ coordinates of a single atom, with respect to the origin of the cartesian axes. The Cartesian coordinates for hydrogen peroxide result:\

```
H 0.000 0.000 0.000
O 0.900 0.000 0.000
O 1.262 1.352 0.000
H 1.742 1.465 0.753
```
Note that Z-matrices can be converted to Cartesian coordinates and back, as the structural information content is identical.
:::

The energy of the system can be computed using the function `psi4.energy()`, passing as first argument a string with the desired method and the basis set (`'METHOD/BASIS'`), and the target molecule as second argument. The ouput is the total electronic energy in Hartree.

Calling `psi4.energy()` will perform a so called single point calculation and will not change the geometry. During the calculation the program will perform a wavefunction optimization to find the lowest energy combination of wavefunction coefficients. 

In [5]:
basissets = ['STO-3G', '6-31G', '6-311G'] # these are the basis sets we are going to use

psi4.set_options({'reference':'UHF'}) # We are using Unrestriced Hartee Fock

for basis in basissets:
    psi4.core.set_output_file(f'{basis}-output.log', False) # save in seperate log files
    E = psi4.energy(f'hf/{basis}', molecule=h)  # we do the single point energy calculation once per basis set

    print(basis, E)

STO-3G -0.4665818495572754
6-31G -0.4982329107290703
6-311G -0.4998098152732822


```{admonition} Exercise 7
:class: exercise 
Include a table of the the calculated energies using the three different basis sets, specifying the number of basis functions used. Compare the energies with the analytical value for the H atom, given by the analytical expression: 

$$
 E = \frac{1}{2} m_e c^2\alpha^2,
$$
where $\alpha$ is the fine structure constant. 

$$\begin{align}
m_e = 0.910953\cdot10^{-30} kg\\
c = 2.99792458\cdot10^8 m s^{-1}\\
\alpha = 7.2973525376\cdot10^{-3}\\
N_A = 6.0221367\cdot10^{23}mol^{-1}
\end{align}$$
Pay attention to the units - use atomic units or kcal$\cdot$mol$^{-1}$ throughout!
```

```{admonition} Exercise 8
:class: exercise 
What is the influence of the basis set size on the accuracy of the result? How do the split-valence bases compare to STO-3G?
```

# RHF vs. UHF

This exercise is already a preparation for the next set of exercises -
you will now calculate a molecular structure, rather than an isolated
atom. Using the same basis throughout (6-31G), you should compare $H_2$
at equilibrium distance (0.7414 Å) and at a larger distance of 5.6 Å at
two different levels of Hartree-Fock: Restricted Hartree-Fock (RHF) and
Unrestricted Hartree-Fock (UHF). 

Define the geometry for the $H_2$ molecule at the equilibrium distance and at the larger distance. Note the change in  the spin multiplicity, from 2 to 1.

In [6]:
h2_eq = psi4.geometry("""
0 1
symmetry c1
H 0.0 0.0 0.0
H 0.0 0.0 0.7414
""")

h2_large = psi4.geometry("""
0 1
symmetry c1
H 0.0 0.0 0.0
H 0.0 0.0 5.6000
""")

The energy of the system can be computed as before (for each new calculation, be sure to have always a new output file to avoid overwriting files).


In [None]:
#h2_eq
print('Equilibrium distance:')

psi4.core.set_output_file(f'eq_RHF-output.log', False) # save in seperate log files
psi4.set_options({'reference':'rhf', 'guess_mix':"False"})
E_RHF = psi4.energy('hf/6-31G', molecule=h2_eq)
print('RHF: ', E_RHF)

psi4.core.set_output_file(f'eq_UHF-output.log', False) # save in seperate log files
psi4.set_options({'reference':'uhf', 'guess':'gwh', 'guess_mix':"True"})
E_UHF = psi4.energy('hf/6-31G', molecule=h2_eq)
print('UHF: ', E_UHF)
    
    
#h2_large
print('Large distance:')
psi4.core.set_output_file(f'large_RHF-output.log', False) # save in seperate log files
psi4.set_options({'reference':'rhf', 'guess_mix':"False"})
E_RHF = psi4.energy('hf/6-31G', molecule=h2_large)
print('RHF: ', E_RHF)

psi4.core.set_output_file(f'large_UHF-output.log', False) # save in seperate log files
psi4.set_options({'reference':'uhf', 'guess':'gwh', 'guess_mix':"True"})
E_UHF = psi4.energy('hf/6-31G', molecule=h2_large)
print('UHF: ', E_UHF)

```{admonition} Exercise 9
:class: exercise 
Include a table of the the calculated energies using the two reference wafefunctions. Compute also their difference, for both bond distances.\
The energy to break a chemical bond is usually between 20 and 100 kcal/mol. Explain the roots and physical origin of the difference $E_{UHF-RHF}$ (in your own words). Why is the energy gap between UHF and RHF larger at a larger bond distance?
```

# Output Files 

So far we only focused on the energy given as ouput from the `psi4.energy()` function. However, for each calculation a .log file is produced, containing detailed information of the procedure.

List the files in your directory (type `ls` in your terminal) to see what new files have been generated. Indeed,
you should find the .log files created by Psi4. As this files may be very large and we have no intention of editing it, you may use `less` to display it. 

As example, type `less` followed by the name of the last .log file created (*large_UHF-output.log*):

:::{admonition} Terminal commands in Jupyter Notebooks
:class: dropdown

You can access the terminal on JupyterHub by clicking on the plus icon in the toolbar and launching a new terminal in the `Quantum Chemistry` environment. This allows you to navigate the file system (e.g using `cd` and `ls`), to view files (using `less`, `tail`, `head`) or to edit files (using `vi` or `nano`). 

In Jupyter Notebooks you can also execute terminal commands by prefixing them with `!` and running them inside a python cell. In the actual terminal commands do not need a prefix.

You can even use python variables for this: 

```
myvariable = 'searchstring'  # this is python
!grep $myvariable filetosearch.txt 
```
The upper command will search the file `filetosearch.txt` for occurence of the contents of the variable and return them. 
:::

In [1]:
!less large_UHF-output.log

7[?47h[?1h=
*** tstart() called on Zeus
*** at Mon Jun 28 08:53:04 2021

   => Loading Basis Set <=

    Name: 6-31G
    Role: ORBITAL
    Keyword: BASIS
    atoms 1-2 entry H          line    26 file /root/miniconda3/envs/iesm/share/psi4/basis/6-31g.gbs 


         ---------------------------------------------------------
                                   SCF
               by Justin Turney, Rob Parrish, Andy Simmonett
                          and Daniel G. A. Smith
                              UHF Reference
                        4 Threads,   1907 MiB Core
         ---------------------------------------------------------

  ==> Geometry <==

[K:[Kge_UHF-output.log[m[K

When using the terminal, after typing `less` you may ‘jump’ through the file using the enter key, or you may directly drop to its end using a capital `G` (i.e. shift and g). Pop back to the top by typing a lowercase `g`. For a more finely tuned navigation, simply resort to the arrow keys. Typing `/` followed by a word will jump to said word in the file and pressing `n` will make you go through all the occurrences of that word in the file.

If you are interested in extracting the lines of a file containing one particular keyword another useful command is `grep`.
For example, considering again the last .log file created, it is possible to extract the final energy with the command:

In [3]:
!grep 'Final Energy' large_UHF-output.log

  @DF-UHF Final Energy:    -0.99646589192706


```{admonition} Exercise 10
:class: exercise
Have a look at the log files produced so far and answer the following questions:
1.  What is the significance of the statement *Energy and wave function converged*?

2.  What is the meaning of the different *iter* preceding *Energy and wave function converged*? 
    Compare the number of cycles for the different basis sets.
```

```{admonition} Exercise 11
:class: exercise
Why did we have to change the spin multiplicity when moving from an atom to a molecule? How do you calculate the spin multiplicity of a species?
```