# Part 3: Acid–Base Thermochemistry and Solvent Effects

## Introduction

In this notebook you will apply the PySCF RRHO workflow from **Part 1** to an **acid–base equilibrium**
and examine how an implicit solvent model (PCM water) changes the result.

We use the deprotonation of acetic acid, written in a way that avoids an explicit free proton:

$$
\mathrm{CH_3COOH + H_2O \rightleftharpoons CH_3COO^- + H_3O^+}.
$$

## Learning goals

After completing Part 3, you should be able to:

- build simple molecular starting geometries from SMILES strings using RDKit,
- compute RRHO Gibbs free energies for neutral and charged species,
- assemble $\Delta_r G^\circ$ for an acid–base reaction and convert it to an equilibrium constant, and
- estimate $\mathrm{pK_a}$ values and explain the role of the water standard-state correction and PCM.

:::{note} Conventions
Unless stated otherwise, we use $T = 298.15\,\mathrm{K}$ and $P = 1\,\mathrm{bar}$.
Pressure is provided to PySCF in Pa, so $1\,\mathrm{bar} = 100000\,\mathrm{Pa}$.
:::


## From $\Delta_r G^\circ$ to $\mathrm{pK_a}$

In this notebook we evaluate the reaction

$$
\mathrm{HA + H_2O \rightleftharpoons A^- + H_3O^+}
$$

and compute its standard reaction Gibbs free energy, $\Delta_r G^\circ$, from RRHO free energies.

Experimental $\mathrm{pK_a}$ values are defined for dissociation in *water as solvent*, where the activity
of liquid water is taken as approximately unity ($a_{\mathrm{H_2O}}\approx 1$):

$$
K_a = \frac{a_{\mathrm{A^-}}\,a_{\mathrm{H_3O^+}}}{a_{\mathrm{HA}}}.
$$

If we instead define an equilibrium constant for the reaction that **includes water as a reactant**,
then (to a good approximation near room temperature)

$$
K_a \approx K_{\mathrm{rxn}}\,[\mathrm{H_2O}],
\qquad [\mathrm{H_2O}]\approx 55.5\ \mathrm{mol\,L^{-1}}.
$$

Equivalently,

$$
\Delta G^\circ_a \approx \Delta_r G^\circ - RT\ln(55.5),
\qquad
\mathrm{pK_a} = \frac{\Delta G^\circ_a}{2.303\,RT}.
$$

:::{admonition} What we will report
We compute **both** (i) $K_{\mathrm{rxn}}$ from $\Delta_r G^\circ$ and (ii) an estimated $\mathrm{pK_a}$
using the water-concentration correction above.
:::


## Setup

Run the next cell once to import modules used throughout the notebook.


In [None]:
import math

import patch
from pyscf import dft, gto
from pyscf.geomopt.geometric_solver import optimize
from pyscf.hessian import thermo
from rdkit import Chem
from rdkit.Chem import AllChem

## Helper: RRHO thermochemistry for one molecule

The function below reproduces the Part 1 workflow for a single species:

1. geometry optimization (DFT),
2. Hessian and harmonic frequency analysis, and
3. RRHO thermochemistry via `thermo.thermo(...)`.

It returns the `thermo_info` dictionary. In this notebook we extract total Gibbs free energies from
the key `"G_tot"` (stored as a `(value, unit)` pair).


In [None]:
def compute_thermo_for_molecule(
    mol, T=298.15, P=100000.0, use_pcm=False, eps_water=78.3553
):
    """Geometry optimization + RRHO thermochemistry for one molecule."""
    mf = dft.RKS(mol)
    mf.xc = "PBE0-D4"
    if use_pcm:
        mf = mf.PCM()
        mf.with_solvent.eps = eps_water

    mol_opt = optimize(mf)

    mf_opt = dft.RKS(mol_opt)
    mf_opt.xc = "PBE0-D4"
    if use_pcm:
        mf_opt = mf_opt.PCM()
        mf_opt.with_solvent.eps = eps_water

    _ = mf_opt.kernel()

    hess_opt = mf_opt.Hessian().kernel()
    freq_info = thermo.harmonic_analysis(mol_opt, hess_opt)

    return thermo.thermo(mf_opt, freq_info["freq_au"], T, P)

## Building molecules from SMILES with RDKit

For this project you can generate initial geometries directly from **SMILES** strings.

A SMILES string encodes connectivity and (when needed) formal charges. For the species used here:

- Acetic acid (HA): `CC(=O)O`
- Acetate (A⁻): `CC(=O)[O-]`
- Water: `O`
- Hydronium: `[OH3+]`

RDKit can generate a reasonable 3D starting structure via distance-geometry embedding. PySCF will then
optimize the geometry at the chosen level of theory.

:::{tip} Why this is sufficient
These are small molecules with simple bonding; a plain 3D embed is typically adequate as an initial guess.
If you encounter convergence issues, you can re-run the embed with a different random seed.
:::


In [None]:
def mol_from_smiles(
    smiles: str,
    charge: int,
    basis: str = "def2-TZVPPD",
    seed: int = 1,
    verbose: int = 3,
):
    """Create a PySCF Mole object from a SMILES string using RDKit 3D embedding."""
    rdmol = Chem.AddHs(Chem.MolFromSmiles(smiles))

    params = AllChem.ETKDGv3()
    params.randomSeed = int(seed)
    ok = AllChem.EmbedMolecule(rdmol, params)
    if ok != 0:
        raise RuntimeError(f"RDKit embedding failed for SMILES: {smiles}")

    conf = rdmol.GetConformer()
    atom_lines = []
    for atom in rdmol.GetAtoms():
        pos = conf.GetAtomPosition(atom.GetIdx())
        atom_lines.append(f"{atom.GetSymbol()} {pos.x:.8f} {pos.y:.8f} {pos.z:.8f}")

    atom_block = "\n".join(atom_lines)
    return gto.M(atom=atom_block, basis=basis, charge=charge, verbose=verbose)

## Define the species

We model acetic acid deprotonation with water as base:

$$
\mathrm{CH_3COOH + H_2O \rightleftharpoons CH_3COO^- + H_3O^+}.
$$

Run the next cells to build all four species from SMILES.


In [None]:
# Example: acetic acid
acoh = mol_from_smiles("CC(=O)O", charge=0)
print(acoh.atom)

In [None]:
# Provide SMILES strings and total charges for the remaining three species.
# Fill in the placeholders and then run this cell.

aco_minus = mol_from_smiles("...", charge=...)
h2o = mol_from_smiles("...", charge=...)
h3o_plus = mol_from_smiles("...", charge=...)

## Gas-phase reaction thermochemistry

1. Compute RRHO thermochemistry in the gas phase (`use_pcm=False`) for each species.
2. Extract $G^\circ$ from each `thermo_info` dictionary using the key `"G_tot"`.
3. Form $\Delta_r G^\circ$ and convert it to $K_{\mathrm{rxn}}$.
4. Apply the water-concentration correction to estimate $\mathrm{pK_a}$.


In [None]:
T = 298.15  # K
P = 100000.0  # Pa (1 bar)
R = 8.314462618  # J mol^-1 K^-1
hartree_to_jmol = 2625.499748
G_key = "G_tot"
water_conc = 55.5  # mol/L

thermo_acoh_gas = compute_thermo_for_molecule(acoh, T=T, P=P, use_pcm=False)
thermo_aco_gas = compute_thermo_for_molecule(aco_minus, T=T, P=P, use_pcm=False)
thermo_h2o_gas = compute_thermo_for_molecule(h2o, T=T, P=P, use_pcm=False)
thermo_h3o_gas = compute_thermo_for_molecule(h3o_plus, T=T, P=P, use_pcm=False)

G_acoh_gas, _ = thermo_acoh_gas[G_key]
G_aco_gas, _ = thermo_aco_gas[G_key]
G_h2o_gas, _ = thermo_h2o_gas[G_key]
G_h3o_gas, _ = thermo_h3o_gas[G_key]

delta_G_gas = (G_aco_gas + G_h3o_gas) - (G_acoh_gas + G_h2o_gas)
delta_G_gas_Jmol = delta_G_gas * hartree_to_jmol

K_rxn_gas = math.exp(-delta_G_gas_Jmol / (R * T))
pK_rxn_gas = -math.log10(K_rxn_gas)
pKa_gas = pK_rxn_gas - math.log10(water_conc)

print(f"Gas phase: Δ_r G° = {delta_G_gas_Jmol:.2f} J/mol")
print(f"Gas phase: K_rxn°  = {K_rxn_gas:.3e}")
print(f"Gas phase: pK_rxn  = {pK_rxn_gas:.2f}")
print(f"Gas phase: pKa ≈ {pKa_gas:.2f}")

## Including PCM water

Now repeat the calculation with PCM water (`use_pcm=True`) and compute the same quantities.

:::{admonition} Interpretation
PCM is a continuum model and RRHO is an approximate treatment of molecular motion.
For charged species, the absolute values are often less reliable than **differences** between related species.
Use your results primarily for qualitative trends.
:::


In [None]:
thermo_acoh_pcm = compute_thermo_for_molecule(acoh, T=T, P=P, use_pcm=True)
thermo_aco_pcm = compute_thermo_for_molecule(aco_minus, T=T, P=P, use_pcm=True)
thermo_h2o_pcm = compute_thermo_for_molecule(h2o, T=T, P=P, use_pcm=True)
thermo_h3o_pcm = compute_thermo_for_molecule(h3o_plus, T=T, P=P, use_pcm=True)

G_acoh_pcm, _ = thermo_acoh_pcm[G_key]
G_aco_pcm, _ = thermo_aco_pcm[G_key]
G_h2o_pcm, _ = thermo_h2o_pcm[G_key]
G_h3o_pcm, _ = thermo_h3o_pcm[G_key]

delta_G_pcm = (G_aco_pcm + G_h3o_pcm) - (G_acoh_pcm + G_h2o_pcm)
delta_G_pcm_Jmol = delta_G_pcm * hartree_to_jmol

K_rxn_pcm = math.exp(-delta_G_pcm_Jmol / (R * T))
pK_rxn_pcm = -math.log10(K_rxn_pcm)
pKa_pcm = pK_rxn_pcm - math.log10(water_conc)

print(f"PCM water: Δ_r G° = {delta_G_pcm_Jmol:.2f} J/mol")
print(f"PCM water: K_rxn°  = {K_rxn_pcm:.3e}")
print(f"PCM water: pK_rxn  = {pK_rxn_pcm:.2f}")
print(f"PCM water: pKa ≈ {pKa_pcm:.2f}")

## Discussion

In your report, address at least the following points:

- How do the gas-phase and PCM reaction Gibbs free energies compare?
- How do the corresponding equilibrium constants and estimated $\mathrm{pK_a}$ values compare to experimental
  aqueous $\mathrm{pK_a}$ data for acetic acid?
- Which approximations are likely to dominate the error?
  - electronic structure method and basis set,
  - RRHO treatment of vibrations and rotations,
  - continuum solvation model (PCM),
  - neglect of explicit solvent structure and activity effects.
- What would you change if you needed a quantitatively reliable $\mathrm{pK_a}$ prediction?
