# Part 3: Acid–Base Thermochemistry and Solvent Effects

## Introduction

In this part of the lab you will apply the PySCF RRHO workflow from **Part 1**
to an **acid–base equilibrium** and examine how a continuum solvent model
changes the result.

A convenient model system is the deprotonation of acetic acid. Because an
explicit free proton is difficult to treat consistently, we use the reaction
with water as the base:

$$
\mathrm{CH_3COOH + H_2O \rightleftharpoons CH_3COO^- + H_3O^+}.
$$

You will compute RRHO Gibbs free energies for all four species at a fixed
temperature and pressure, form the reaction Gibbs free energy
$\Delta_r G^\circ$, and convert it to an equilibrium constant and a
$\mathrm{pK_a}$-like quantity. You will then repeat the same workflow with **PCM water**
to obtain a solvent-shifted estimate.

This notebook assumes you are comfortable with the structure of the
`thermo_info` dictionary returned by `thermo.thermo(...)` (introduced in Part 1),
including the convention that many entries are stored as `(value, unit)` pairs.


## From $\Delta_r G^\circ$ to $\mathrm{pK_a}$ (standard-state convention)

In this notebook we evaluate the reaction

$$
\mathrm{HA + H_2O \rightleftharpoons A^- + H_3O^+}
$$

and compute its standard reaction Gibbs free energy, $\Delta_r G^\circ$, from RRHO free energies.

Experimental $\mathrm{pK_a}$ values, however, are defined for the dissociation equilibrium in *water as solvent*,
where the activity of liquid water is taken as approximately unity ($a_{\mathrm{H_2O}}\approx 1$):

$$
K_a = \frac{a_{\mathrm{A^-}}\,a_{\mathrm{H_3O^+}}}{a_{\mathrm{HA}}}.
$$

If we instead write an equilibrium constant for the reaction that **includes water as a reactant** under
a 1 mol/L standard state for all aqueous species,

$$
K_{\mathrm{rxn}} = \frac{a_{\mathrm{A^-}}\,a_{\mathrm{H_3O^+}}}{a_{\mathrm{HA}}\,a_{\mathrm{H_2O}}},
$$

then the two are related by

$$
K_a \approx K_{\mathrm{rxn}}\,[\mathrm{H_2O}],
$$

with $[\mathrm{H_2O}]\approx 55.5\ \mathrm{mol\,L^{-1}}$ for liquid water near room temperature.
Equivalently,

$$
\Delta G^\circ_a \approx \Delta_r G^\circ - RT\ln(55.5),
\qquad
\mathrm{pK_a} = \frac{\Delta G^\circ_a}{2.303\,RT}.
$$

In the code below we therefore compute **both** (i) $K_{\mathrm{rxn}}$ directly from $\Delta_r G^\circ$ and
(ii) an estimated $\mathrm{pK_a}$ using the water-concentration correction shown above.

In [1]:
# Core modules
import math

In [2]:
# Import the main PySCF modules used in this workflow.
from pyscf import dft, gto
from pyscf.geomopt.geometric_solver import optimize
from pyscf.hessian import thermo

In [6]:
# RDKit
from rdkit import Chem
from rdkit.Chem import AllChem

In [7]:
def compute_thermo_for_molecule(mol, T=298.15, P=100000.0, use_pcm=False):
    """Run geometry optimization and RRHO thermochemistry for a molecule.

    Parameters
    ----------
    mol : pyscf.gto.Mole
        Molecule object.
    T : float
        Temperature in K.
    P : float
        Pressure in Pa.
    use_pcm : bool
        If True, include PCM water (dielectric constant eps=78.3553).

    Returns
    -------
    thermo_info : dict
        Dictionary of RRHO thermochemistry results. Many entries are stored as
        (value, unit) pairs; see Part 1 for how to inspect and extract values.
    """
    mf = dft.RKS(mol)
    mf.xc = "PBE0-D4"
    if use_pcm:
        mf = mf.PCM()
        mf.with_solvent.eps = 78.3553

    mol_opt = optimize(mf)

    mf_opt = dft.RKS(mol_opt)
    mf_opt.xc = "PBE0-D4"
    if use_pcm:
        mf_opt = mf_opt.PCM()
        mf_opt.with_solvent.eps = 78.3553

    _energy = mf_opt.kernel()

    hess_opt = mf_opt.Hessian().kernel()
    freq_info = thermo.harmonic_analysis(mol_opt, hess_opt)

    thermo_info = thermo.thermo(mf_opt, freq_info["freq_au"], T, P)
    return thermo_info

## Building initial geometries from SMILES with RDKit

In this project, you do **not** need to draw structures in an external builder. Instead, you can
generate reasonable starting geometries directly from **SMILES strings**.

A SMILES string encodes **connectivity** (and, when needed, **formal charges**) using a compact text
representation. For the species in this notebook, the following SMILES are sufficient:

- Acetic acid (HA): `CC(=O)O`
- Acetate (A⁻): `CC(=O)[O-]`
- Water: `O`
- Hydronium: `[OH3+]`

RDKit can (1) build a molecule from SMILES, (2) add explicit hydrogens, (3) generate a 3D conformer,
and (4) perform a quick force-field relaxation to obtain initial Cartesian coordinates.
These coordinates are **only a starting point**; the PySCF geometry optimization will refine them at
the chosen level of theory.

In the next cell, run `smiles_to_xyz(...)` to generate an XYZ block you can paste into a `gto.M(...)`
definition.


In [12]:
def smiles_to_xyz(smiles: str, name: str = "molecule") -> str:
    """Generate an XYZ geometry from a SMILES string using RDKit."""

    mol = Chem.AddHs(Chem.MolFromSmiles(smiles))
    AllChem.EmbedMolecule(mol)

    conf = mol.GetConformer()
    lines = [str(mol.GetNumAtoms()), name]
    for atom in mol.GetAtoms():
        pos = conf.GetAtomPosition(atom.GetIdx())
        lines.append(f"{atom.GetSymbol():2s} {pos.x: .8f} {pos.y: .8f} {pos.z: .8f}")
    return "\n".join(lines)


# Examples (uncomment to use):
# print(smiles_to_xyz("CC(=O)O", name="acetic_acid"))
# print(smiles_to_xyz("CC(=O)[O-]", name="acetate"))
# print(smiles_to_xyz("O", name="water"))
# print(smiles_to_xyz("[OH3+]", name="hydronium"))

## Define Acid, Base, and Conjugate Species

Provide geometries for the four species in the model reaction

$$
\mathrm{CH_3COOH + H_2O \rightleftharpoons CH_3COO^- + H_3O^+}.
$$

Generate starting geometries from SMILES strings using the RDKit helper cell above, then paste the resulting Cartesian coordinates into the `gto.M(...)` blocks. Make sure the **total charge**
is correct.


As in Part 1, use the same basis set and level of theory for all species.

In [None]:
# Acid, base, and conjugate species placeholders.
# Paste Cartesian coordinates into each block.

acoh = gto.M(
    atom="""
    ... CH3COOH coordinates here ...
    """,
    basis="def2-TZVPPD",
    charge=0,
    verbose=3,
)

aco_minus = gto.M(
    atom="""
    ... CH3COO- coordinates here ...
    """,
    basis="def2-TZVPPD",
    charge=-1,
    verbose=3,
)

h2o = gto.M(
    atom="""
    ... H2O coordinates here ...
    """,
    basis="def2-TZVPPD",
    charge=0,
    verbose=3,
)

h3o_plus = gto.M(
    atom="""
    ... H3O+ coordinates here ...
    """,
    basis="def2-TZVPPD",
    charge=+1,
    verbose=3,
)

## Reaction Thermochemistry in the Gas Phase

Consider the model reaction

$$
\mathrm{CH_3COOH(g) + H_2O(g) \rightleftharpoons CH_3COO^-(g) + H_3O^+(g)}.
$$

1. Use `compute_thermo_for_molecule` with `use_pcm=False` to obtain RRHO thermochemistry
   for each species at $T = 298.15\,\mathrm{K}$ and $P = 1\,\mathrm{bar}$.
2. Extract the total Gibbs free energies (key `"G_tot"`) and form the reaction Gibbs free
   energy

   $$
   \Delta_r G^\circ(\mathrm{gas}) =
   G^\circ_{\mathrm{CH_3COO^-}} + G^\circ_{\mathrm{H_3O^+}} -
   \left(G^\circ_{\mathrm{CH_3COOH}} + G^\circ_{\mathrm{H_2O}}\right).
   $$

3. Convert $\Delta_r G^\circ$ to an equilibrium constant using
   $\Delta_r G^\circ = -RT\ln K^\circ$ and report a $\mathrm{pK_a}$-like value via
   $pK = \Delta_r G^\circ/(2.303RT)$.


In [None]:
T = 298.15  # K
P = 100000.0  # Pa (1 bar)
R = 8.314462618  # J mol^-1 K^-1
hartree_to_jmol = 2625.499748

# Total Gibbs free energy is stored under the key "G_tot" as a (value, unit) pair.
G_key = "G_tot"

thermo_acoh_gas = compute_thermo_for_molecule(acoh, T=T, P=P, use_pcm=False)
thermo_aco_gas = compute_thermo_for_molecule(aco_minus, T=T, P=P, use_pcm=False)
thermo_h2o_gas = compute_thermo_for_molecule(h2o, T=T, P=P, use_pcm=False)
thermo_h3o_gas = compute_thermo_for_molecule(h3o_plus, T=T, P=P, use_pcm=False)

G_acoh_gas, _unit = thermo_acoh_gas[G_key]
G_aco_gas, _ = thermo_aco_gas[G_key]
G_h2o_gas, _ = thermo_h2o_gas[G_key]
G_h3o_gas, _ = thermo_h3o_gas[G_key]

delta_G_gas = (G_aco_gas + G_h3o_gas) - (G_acoh_gas + G_h2o_gas)
delta_G_gas_Jmol = delta_G_gas * hartree_to_jmol

K_gas = math.exp(-delta_G_gas_Jmol / (R * T))
pK_rxn = -math.log10(K_gas)

# Standard-state correction: treat water as solvent (a_H2O ≈ 1) rather than a 1 M reactant.
water_conc = 55.5  # mol/L
\mathrm{pK_a} = pK_rxn - math.log10(water_conc)

print(f"Gas-phase Δ_r G° = {delta_G_gas_Jmol:.2f} J/mol")
print(f"Gas-phase K_rxn°  = {K_gas:.3e}")
print(f"Gas-phase pK_rxn  = {pK_rxn:.2f}")
print(f"Gas-phase \\mathrm{\mathrm{pK_a}} ≈ {\mathrm{pK_a}:.2f}")


## Including PCM Water and Estimating an Aqueous $\mathrm{pK_a}$

Now repeat the same workflow with PCM water:

1. Recompute thermochemistry for all four species with `use_pcm=True`.
2. Form the reaction Gibbs free energy $\Delta_r G^\circ(\mathrm{PCM})$ in analogy to the gas-phase case.
3. Convert $\Delta_r G^\circ(\mathrm{PCM})$ to an equilibrium constant and an effective $\mathrm{pK_a}$.

Keep in mind:

- RRHO and PCM are approximate—especially for charged species.
- Solution equilibria are usually reported in a 1 mol/L standard state, whereas here we use an ideal-gas 1 bar standard state.
  A fully consistent treatment would include a standard-state correction; you can comment on this in your discussion.


In [None]:
thermo_acoh_pcm = compute_thermo_for_molecule(acoh, T=T, P=P, use_pcm=True)
thermo_aco_pcm = compute_thermo_for_molecule(aco_minus, T=T, P=P, use_pcm=True)
thermo_h2o_pcm = compute_thermo_for_molecule(h2o, T=T, P=P, use_pcm=True)
thermo_h3o_pcm = compute_thermo_for_molecule(h3o_plus, T=T, P=P, use_pcm=True)

G_acoh_pcm, _unit = thermo_acoh_pcm[G_key]
G_aco_pcm, _ = thermo_aco_pcm[G_key]
G_h2o_pcm, _ = thermo_h2o_pcm[G_key]
G_h3o_pcm, _ = thermo_h3o_pcm[G_key]

delta_G_pcm = (G_aco_pcm + G_h3o_pcm) - (G_acoh_pcm + G_h2o_pcm)
delta_G_pcm_Jmol = delta_G_pcm * hartree_to_jmol

K_pcm = math.exp(-delta_G_pcm_Jmol / (R * T))
pK_rxn = -math.log10(K_pcm)

# Standard-state correction: treat water as solvent (a_H2O ≈ 1) rather than a 1 M reactant.
water_conc = 55.5  # mol/L
\mathrm{pK_a} = pK_rxn - math.log10(water_conc)

print(f"PCM Δ_r G° = {delta_G_pcm_Jmol:.2f} J/mol")
print(f"PCM K_rxn°  = {K_pcm:.3e}")
print(f"PCM pK_rxn  = {pK_rxn:.2f}")
print(f"PCM \\mathrm{\mathrm{pK_a}} ≈ {\mathrm{pK_a}:.2f}")


## Discussion

In your report, address at least the following points:

- How do the gas-phase and PCM reaction Gibbs free energies compare?
- How do the corresponding equilibrium constants and $\mathrm{pK_a}$-like values
  differ from experimental aqueous $\mathrm{pK_a}$ data for acetic acid?
- Which approximations are likely to dominate the error?
  - electronic structure method and basis set,
  - RRHO treatment of vibrations and rotations,
  - continuum solvation model (PCM),
  - neglect of explicit solvent and standard-state corrections.
- Based on your results, what would you change in the model if you needed a
  quantitatively reliable $\mathrm{pK_a}$ prediction?