# Part 1: PySCF Workflow for Molecular Thermochemistry

## Introduction

This notebook introduces the PySCF workflow that underpins all thermochemical
calculations in this lab. The primary goal of **Part 1** is to become fluent
with the sequence of electronic-structure calculations required to connect a
molecular geometry to thermodynamic quantities.

Unlike Parts 2 and 3, which focus on specific chemical case studies, this
notebook emphasizes *computational methodology*. You will learn how to use PySCF to

- define a molecular system and level of theory,
- perform a geometry optimization,
- compute a Hessian and vibrational frequencies, and
- assemble zero-point and thermal contributions to obtain thermodynamic functions.

The same workflow will be reused in later parts of the lab; the difference is
that later notebooks focus more on chemical interpretation than on new PySCF
mechanics.

By the end of Part 1, you should be able to run a complete PySCF thermochemistry
calculation starting from a molecular geometry, and to interpret the reported
thermodynamic quantities in terms of translational, rotational, vibrational,
and electronic contributions.


## Setup

This notebook includes a small utility patch and a few standard imports used in
the subsequent sections. Run the setup cells once before continuing.



:::{note} Note on `import patch`

The `patch` module applies a small fix to PySCF’s handling of rotational
constants. For some (near-)linear molecules, numerical noise in the Hessian
can produce slightly negative or inconsistently ordered rotational constants.
Importing `patch` replaces PySCF’s routine with a more robust version that
enforces non-negative, sorted values.

You may reuse `patch.py` in your own projects: simply copy it into your
working directory and import it.
:::

In [None]:
# Import the main PySCF modules used in this workflow.
from pyscf import dft, gto
from pyscf.geomopt.geometric_solver import optimize
from pyscf.hessian import thermo

In [None]:
# Apply a small PySCF fix for rotational constants (robust handling of near-linear cases).
import patch

## Workflow Overview

The calculation in this notebook follows a typical **ab initio thermochemistry
workflow**:

1. **Build the molecule** using Cartesian coordinates and select a basis set.
2. **Run an electronic-structure calculation** (here: DFT with PBE0-D4).
3. **Optimize the geometry** to locate an energy minimum on the potential
   energy surface.
4. **Compute the Hessian** (matrix of second derivatives) at the optimized
   geometry.
5. **Perform a frequency analysis** to obtain normal modes and vibrational
   frequencies.
6. **Evaluate thermodynamic functions** at a given temperature and pressure
   using the rigid-rotor / harmonic-oscillator / ideal-gas models.

As you read and execute each code cell, identify which step of this workflow
it implements and which quantities are being approximated.


## Molecule Setup

In this section you define the **molecular system** for the calculation.
We use hydrogen fluoride (HF) as a simple diatomic example.

Key ingredients:

- The `atom` block specifies element symbols and Cartesian coordinates
  (in Ångström by default).
- The `basis` keyword chooses the one-electron basis set (here: `def2-TZVPPD`,
  a triple-zeta basis with polarization).
- The `verbose` flag controls how much output PySCF prints.

In [None]:
# HF molecule used in this example.
mol = gto.M(
    atom="""
    H 0.0 0.0 0.0
    F 0.0 0.0 1.0
    """,  # Cartesian coordinates in Å
    basis="def2-TZVPPD",  # Diffuse, triple-zeta basis
    verbose=3,
)

## Geometry Optimization

Once the molecule is defined, the next step is to **optimize the geometry**.

Conceptually:

- We are searching the potential energy surface for a local minimum.
- At a minimum, the gradient (forces on all atoms) vanishes.
- The optimized structure is then used as the reference point for the Hessian
  and vibrational analysis.

In this cell you will

- set up a DFT calculation with the PBE0-D4 functional,
- optionally embed the molecule in a polarizable continuum model (PCM) to
  mimic solvent effects (here: water), and
- call the geometry optimizer to relax the structure.

After you run the cell, look at the output:

- How many optimization steps were needed?
- Has the H–F distance changed relative to the initial guess?
- What is the final electronic energy at the optimized geometry?

In [None]:
# Geometry optimization with PBE0-D4 and optional PCM water.
mf = dft.RKS(mol)
mf.xc = "PBE0-D4"

mf = mf.PCM()
mf.with_solvent.eps = 78.3553

mol_opt = optimize(mf)

## Thermochemistry

The final stage of the workflow is to compute **thermodynamic functions** from
the vibrational frequencies and rotational constants.

In the last code cell of the HF example you will

- run a single-point energy calculation at the optimized geometry,
- compute the Hessian and perform a harmonic frequency analysis,
- evaluate thermodynamic properties at a specified temperature `T` and
  pressure `P` using `thermo.thermo`, and
- print a summary table with quantities such as internal energy, enthalpy,
  entropy, heat capacity, and Gibbs free energy.

After executing that cell, take a moment to interpret the output:

- Identify the contributions from translation, rotation, and vibration.
- Compare the electronic energy to the enthalpy and Gibbs free energy.
- Try changing `T` (for example, 200 K or 400 K) and see how the entropy and
  Gibbs free energy respond.

In [None]:
# Single-point energy, Hessian, frequencies, and RRHO thermochemistry.
mf_opt = dft.RKS(mol_opt)
mf_opt.xc = "PBE0-D4"
mf_opt = mf_opt.PCM()
mf_opt.with_solvent.eps = 78.3553
energy = mf_opt.kernel()

hess_opt = mf_opt.Hessian().kernel()
freq_info = thermo.harmonic_analysis(mol_opt, hess_opt)

T = 298.15
P = 100000.0
thermo_info = thermo.thermo(mf_opt, freq_info["freq_au"], T, P)

### Reporting with `dump_thermo`

PySCF provides a convenience function, `thermo.dump_thermo(...)`, that **formats and prints**
a thermochemistry summary to the notebook output. It does not recompute the underlying
quantities; it simply takes the molecular information and the results stored in `thermo_info`
and produces a human-readable report (including common totals such as $H$, $S$, and $G$ at the
chosen $T$ and $P$).

In the next cell, we call `dump_thermo` to display the results.

In [None]:
thermo.dump_thermo(mf_opt.mol, thermo_info)

### Understanding `thermo_info`

The object `thermo_info` returned by `thermo.thermo(...)` is a **Python dictionary** (often
called a *dict*). A dictionary stores **key–value pairs**: the *key* is typically a short
string (a label), and the *value* is the associated number, string, list, or other object.

Why this matters here: PySCF returns not only final thermodynamic totals, but also
intermediate contributions (for example, zero-point and thermal corrections). These are all
packaged into `thermo_info` so you can programmatically reuse them (e.g., for reaction
thermochemistry, plotting temperature dependence, or exporting data).

Run the next cells to (1) display the full dictionary in the notebook output,
(2) list the available keys, and (3) extract a single entry by key. Note that
PySCF stores many quantities as a `(value, unit)` pair.

In Parts 2 and 3 you will use selected entries from `thermo_info` (rather than relying only
on the printed report) to compute reaction and equilibrium quantities.


In [None]:
# (1) display the full dictionary in the notebook output
thermo_info

In [None]:
# (2) list the available keys
thermo_info.keys()

In [None]:
# (3) extract a single entry by key
thermo_info["H_tot"]  # example: total enthalpy (value, unit)