# Quantum chemistry practical
Guido Raos, Politecnico di Milano, March 2021 (guido.raos@polimi.it)

In this practical we will learn how to perform simple electronic structure calculations with the [ORCA code](https://orcaforum.kofo.mpg.de/app.php/portal). The code has an extensive manual. It is not necessary to read it all, but skimming though the first sections can be quite useful. Further "unofficial" information about the program can also be found on the [ORCA input library](https://sites.google.com/site/orcainputlibrary/).


## Carbon monoxide - single point

CO is a very interesting molecule: read about it on [Wikipedia](https://en.wikipedia.org/wiki/Carbon_monoxide) or in [this paper](https://doi.org/10.1002/jcc.20477). Here is the first paragraph from the introduction of the paper:
> Describing the electronic structure of, and the nature of the bond in, carbon monoxide in terms of simple bonding models is not a trivial task because of the unusual chemical and physical properties of the molecule, the only monocoordinated carbon compound that is stable under normal conditions. It has been called “an isolated embarrassment for introductory chemistry teachers,” and it exhibits several surprising features: (i) the triple bond between the atoms is a very unusual atomic valence state for both atoms but particularly for oxygen; there is also a mismatch between the formal oxidation states of the atoms (+2 for carbon and −2 for oxygen) and the triple bond; (ii) the dipole moment of the molecule is small (0.11 D) with the negative end at the carbon atom although carbon is clearly less electronegative than oxygen; (iii) the bond dissociation energy (BDE) of CO (255.7 kcal/mol) is significantly higher than that of isoelectronic N2 (225.1 kcal/mol); (iv) the carbon–oxygen bond in CO becomes stronger when the carbon atom forms a σ bond with another atom but becomes weaker when the oxygen atom forms a σ bond with another atom. 

We will use it for our first ORCA calculation. This is the input (to be copied ot typed inside the input file `CO.rhf.inp`):
```
# A simple single point RHF calculation
! RHF def2-svp LargePrint
# Carbon Monoxide at the experimental geometry
* xyz 0 1
C 0 0 0
O 0 0 1.128
*

```

The code perfoms a single-point RHF calculation (the default).
Run it from your command prompt (indicated below with `%>`). This should work on all opeating systems (windows, mac or linux):
```
%> orca CO.rhf.inp > CO.rhf.log
```

One the calculation has finished, open and take a close at the output file. Note that many other files have been generated. Here are some things to look for:
* Information about the basis set (numer and type of basis functions).
* No. of electrons, charge, multiplicity, etc.
* Progress of the SCF cycles (energy and density change, etc.).
* Has the SCF calculation converged? (look for the string `SUCCESS`).
* The value of the total energy and its components.
* What do the MO's look like? Inspect the orbital energies and the LCAO coefficients (can you see their symmetries?).
* The Mulliken and Loewdin population analyses: what are the carges on the atoms?
* The Mayer population analysis: what is the order of the bond between C and O?
* The value of the calculated dipole moment. How does it compare with experiment (see above)?

There are many ways of plotting the MO's and the total electron density. Try running this program from the command line:
```
%> orca_plot CO.rhf.gbw -i
```
The program reads the gbw (geometry, basis, wavefunction) and offers many different options. Note that the gbw is a binary file, so unlike the input and output files it is not human-readable.
One possibility is to produce a _cube file_ (a standard format in molecular modelling, orginated by Gaussian), which contains the values of a function (e.g., the electron density) on regularly spaced 3D grid of points (cube files are ASCII, so you can open and inspect them with a text editor). One you have produced then, you can visualize the cube files using VMD or another graphics program. 

## Carbon monoxide - geometry optimization

Our next step will be to attempt a geometry optimization. The program will compute the gradient of the total energy woth respect to the nuclear coordinates, and use this information to minimize the total energy. After a few optimization cycles, the calculation should converge to the energy minimum (zero gradient). Here is the input (to be inserted in a new file, such as `CO.rhf_opt.inp`):
```
# A simple RHF optimization and frequency calculation
! RHF def2-SVP TightOpt
! TightSCF
! AllPop
! Freq
* xyz 0 1
C 0 0 0
O 0 0 1.3
*
```

Some comments on the input:
* We have deleted the `LargePrint` option, to get a more compact output. 
* The initial geometry is away from the experimental minimum (but not completely unrealistic!).
* The `TightSCF` keyword controls the convergency of the SCF cycles, whereas the `TightOpt` keyword controls the geometry optimization (maximum value of the gradient, to declare the optimization successful).
* The two previous keywords are necessary, in order to make the final frequency calculation reliable.
* The frequency calculation is performed once, at the end of the optimization. It does not make much sense to do a frequency calculation away from a stationary point (a minimum or a saddle point on the potential energy surface). The frequency calculation requires the evaluation of the Hessian matrix (second derivative of the energy with respect to the nuclear coordinates). The evaluation of the Hessian is significantly more expensive that the evaluation of the gradient.
* The geometry optimization of CO is guaranteed to converge to a minimum, because of its simplicity. This is not always the case for more complex, polyatomic molecules. In some cases, one might have to relax the convergence criterion (stardard `Opt` instead of `TightOpt`). The frequency calculation can be important, in order to check that the final geometry corresponds to a stable minimum (a saddle point has "imaginary frequencies", corresponding to the normal modes along the unstable coordinate).

Run the program as before, for example:
```
%> orca CO.rhf_opt.inp > CO.rhf_opt.log
```
This should take a little longer, but not too much (less than a minute). Notice that many files have been produced by the calculation (below, `W%>` indicates a command to be executed under Windows, `U%>` a command to be executed under Unix-like Linux/Mac):
```
W%> dir CO.rhf_opt.*
U%> ls CO.rhf_opt.*
```
(a useful reference for the line commands under all operating systems is https://ss64.com/).

One the calculation has converged, take a look at the output:
* Has the calculation converged? Look for the string `HURRAY`.
* What is the value of the CO bond length? How does it compare to experiment?
* What is the value of the calculated vibrational frequency? How does it compare to experiment?

A good source of experimental data is the NIST chemistry webbook: https://webbook.nist.gov/chemistry/. Look for data on CO using the "Formula" option. In the case of diatomic molecules, in addition to standard chemical data (e.g. thermochemistry) there is a specialized section entitled "Constants of diatomic molecules", with very detailed and accurate gas-phase spectroscopic data.

## Carbon monoxide - further work

Try doing a calculation with a correlated wavefunction, such as MP2:
```
# A simple MP2 optimization and frequency calculation
! MP2 def2-SVP TightOpt NoFrozenCore
! TightSCF
! AllPop
! Freq
* xyz 0 1
C 0 0 0
O 0 0 1.128
*
```
Note that the MP2 frequency calculation requires the `NoFrozenCore` option. Excitations from 1s orbitals of C and O will be included in the calculation. This is necessary for algorithmic issues, it should not be necessary from a "chemical" point of view. Luckily, this extra computational effort is not a big deal in this case.

Once the calculation has finished, look again at the results and comare the with the RHF calculation. How has the equilibrium bond length, the dipole and the vibrational frequency of CO changed? Have the results improved, in comparison to experiment?

Next, try using a larger basis set (e.g., def2-TZVP instead of def2-SVP). With and without electron correlation (RHF or MP2). Again, how do the results change?



## Hydrogen peroxide (H$_2$O$_2$) - DFT optimization

We now consider a slightly more complex molecule. Hydrogen peroxide is also quite interesting, _per se_. Have a look at  the [Wikipedia article](https://en.wikipedia.org/wiki/Hydrogen_peroxide). The molecule is small enough to be comfortably studied by highly correlated wavefunction-based methods (e.g., CCSD or CCSD(T)), with a large basis set. However, below we will switch to density functional theory. We will use the PBE functional, a pure GGA functional. This costs much less that hybrid functionals such as B3LYP and PBE0, which include a fraction of exact (Hartree-Fock) exchange

Input file for a DFT geometry optimization:
```
# - A simple DFT optimization
! RHF PBE RI def2-SVP def2/J TightOpt
# - Add LJ-like correction for disperion interactions
#! D3BJ
! TightSCF
# - Use implicit solvent model
#! CPCM(Water)
! PrintBasis
! AllPop
! Freq
# - NMR chemical shifts - GIAO requires axiliary basis set
! NMR
* int 0 1
O 0 0 0 0.0000   0.000   0.00
O 1 0 0 1.5500   0.000   0.00
H 2 1 0 1.0000 109.500   0.00
H 1 2 3 1.0000 109.500  90.00
*
```

Comments on the input file:
* The `PBE` keyword implies a DFT calculation (despite of the `RHF` keyword preceding it).
* The `RI` keyword implied a "Resolution of the Indentity" calculation. This is an approximate (but well tested, with minimum loss of accuracy) calculation, which used an auxiliary basis set (`def2/J`) in oder to speed up the evaluation of the two-electron integrals.
* Two keywords have been commented, but may be uncommented in desired. `D3BJ` specifies usage of Grimme's empirical correction for dispersion (London) interactions. This is a long-range electron correlation effect, which is missing from most density functionals. The correction is unimportant in this case, but may be important in larger and flexible molecules, or when studying intermolecular interactions (e.g., between two H$_2$O$_2$ molecules). `CPCM(Water)` can be used to model the effect of a surrounding solvent, by the Conductor-like Polarizable Continuum Model. Other solvents can be used, in place of water (see the ORCA manual). Without this keyword, the program performs a standard gas-phase calculation.
* After convergence, the program calculates the vibrational frequencies (`Freq` keyword) and the magnetic shielding tensors for all nuclei (`NMR` keyword).
* The starting geometry is specified in "internal coordinates" (`int` keyword): bond lengths, bond angles, and dihedral angles. There is one line for each atom. The spatial relationship among them is described using a [Z-matrix](https://en.wikipedia.org/wiki/Z-matrix_(chemistry)) (the precise format of the Z-matrix depends on the program, e.g. Gaussian or ORCA or GAMESS-US).

Run the calculation as before and open the output file, once it is complete:
* Has the geometry optimization converged? Look for the string `HURRAY`.
* How does the geometry compare with the experimental one? And the vibrational frequencies? And the dipole moment? Look for relavant data on the NIST web site(s): https://webbook.nist.gov/chemistry/ and the computational chemistry database https://cccbdb.nist.gov/alldata1.asp.
* In the NMR part of the calculation, the ordinary "chemical shifts" of the nuclei can be obtained from the isotropic part of their full (paramagnetic + diamagnetic) shielding tensor: $(\sigma_{11}+\sigma_{22}+\sigma_{33})/3$. Read the relevant part of the ORCA manual and look for the results in the output. Note: comparison with NMR spectra on H$_2$O$_2$ might be difficult, in comparison with the vibrational ones.

Among the files produced by the geometry optimization, there should be one named `h2o2.dft_opt_trj.xyz` (or something similar, depending on the name of your input file). Open it with a text editor: it contains the total energy and geometry at each step of the optimization. The molecular geometries within the file can be visualized as a "movie" with VMD (try it!). In more complicated molecules, a geometry optimization may take hours or even days. In these cases, it is useful to "keep an eye" on the calculation, to make sure that nothing wrong is happening (and restart it from a different geometry or using different options, if necessary).

In addition to looking at the change in the molecular geometry, it can be useful to monitor the energy change during (or after, in this case) the optimization. Is the energy decreasing steadily? Or does it increase or start oscillating after a few steps? Total energies can be inspected in the output or the trajectory file. It is even better to plot them. This can be done as follows.

First of all, we need to collect all the energies in a "clean" file (see https://ss64.com/ for a description of the Windows and Linux/Mac commands):
```
W%> find "Total Energy" h2o2.dft_opt.log > energy.dat
U%> grep "Total Energy" h2o2.dft_opt.log > energy.dat
```
Having done so, we can read and plot the data using NumPy and Matplotlib (move the `energy.dat` file to a different directory, if necessary):

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Energies = np.loadtxt('energy.dat', skiprows=2, usecols=3)  # usecols=3 (in E_h) or usecols=5 (in eV).
#print(Energies)
Iteration = range(len(Energies))
plt.plot(Iteration[0:], Energies[0:], 'r-o')
plt.xlabel('Iteration')
plt.ylabel('Energy')

## Hydrogen peroxide (H$_2$O$_2$) - further work

Study the effect on the results of one or two of these input parameters: basis set, DFT functional (or use a wavefunction method), inclusion of D3BJ correction, inclusion of CPCM model (water or other solvents). Look at the numbers and try to discuss them (do they change in a sensible way? is comparison with experimental results improved?).

Plot and try to interpret/describe the molecular orbitals. These will be the "canonical" Kohn-Sham orbitals. 
Afterwards, localize the orbitals and re-plot plot them. The localization can be performed with this input file (read the ORCA manual for further information):
```
# - Orbital localization and potential-derived charges
! RHF PBE RI def2-SVP def2/J TightOpt
%loc    LocMet PipekMezey    end
! chelpg
! moread
%moinp "h2o2.dft_opt.gbw"
* xyzfile 0 1 h2o2.dft_opt.xyz
```
Notice that we have read the final geometry and molecular orbitals from the previous calculation from two external files.

The `chelpg` keyword in the previous input requests the calculation of potential-derived charges. Compare these charges with those from the Mulliken and the Loewdin population analyses. Write a little Python script which, given a set of charges ($q_i$) and the atomic coordinates ($\mathbf{r}_i$), computes the molecular dipole moment:
\begin{equation*}
 \mathbf{\mu} = \sum_{i=1}^N q_i \mathbf{r}_i
\end{equation*}
How do the dipole moments calculated from these three sets of charges compare with the "exact" one, computed from the full electron density?


## Larger problems

Below are suggestions for larger problem, which could be submitted a part of the end-of-course examination. You may need to read parts of the ORCA manual, in order to perform some of these tasks.

1. Find out how to perform constrained geometry optimizations in ORCA. Re-optimize the geometry under the constraint that the H-O-O-H dihedral remains equal to given value (e.g. 90°). Repeat the calculation at a whole set of values (e.g., from 0° to 180°, in steps of 15°). This task can also be automated as a "potential energy scan" (look for it within the ORCA manual). Collect the results and plot the total energy and any other interesting quantities (e.g., O-O bond length, atomic charges, etc.) as a function of this dihedral angle. Discuss the results.

2. Construct a new input file with H$_2$O$_2$ surrounded by a reasonable number of water molecules (you can "draw" them using [Avogadro](https://avogadro.cc/)). Re-optimize the geometry, starting from several different initial coordinates. How many local energy minima can you find? Which is the most favourable one? Produce some plots to illustrate these results. Compare the computed properties of H$_2$O$_2$ in this "explicit solvent" with those from the "implicit solvent" calculation (see above).

2. Use the previous methods to study one larger, completely different molecule. For example: diborane, N-methylacetamide, biphenyl, para-nitroanyline, nickel tetracarbonyl, etc.. Find information about the molecule and perform some interesting calculations on it, using the previous methods. Discuss the results.