# Formulating the DFT problem mathematically

This notebook sets some notation that I will use for the remainder of my lecture.

## Preliminaries

- The aim of density-functional theory (DFT) is to find an approximate description of the electronic structure, without solving the many-body Schrödinger equation.
- We consider here the **discretised setting** in which e.g. a plane-wave basis of **size $N_b$** is used.
- We describe the unknowns of our problem in terms of **density matrices** $P$, self-adjoint operators $P$ with $0 \leq P \leq 1$, which can be diagonalised
$$ P = \sum_i^{N_b} f_i |\phi_i\rangle \langle \phi_i|. $$
with **occupation numbers** $0 \leq f_i \leq 1$ and orbitals $\phi_i$.
- As discussed the DFT solution is found by solving a **minimisation problem**
  $$ \min_{P\in\mathcal{P}} \mathcal{E}(P) $$
  where the set of admissable density matrices is
  $$ \mathcal{P} = \left\{P \in \mathbb{C}^{N_b \times N_b}, P^\dagger = P, 0 < P < 1 \right\}. $$
- We work in the *grand-canonical ensemble*: We fix a chemical potential $\mu$ and an inverse temperature $\beta = 1/T$. Note that in this setting the number of electrons is *not fixed*. This is done to simplify the presented mathematical expressions, as e.g. fixing $N$ instead of $\mu$ would introduce extra constraints yielding additional terms in some of the expressions presented here.

- In this ensemble the appropriate energy to minimise is the **free energy** given by
  $$ \mathcal{E}(P) = \mathcal{E}_0(P) - \frac{1}{\beta} \mathrm{Tr}[s(P)] - \mu \mathrm{Tr}(P). $$

- We introduced the fermionic entropy
  $$ s(p) = -[p \log p + (1-p) \log (1-p)] $$
  and a twice continuously differentiable function $\mathcal{E}_0(P)$.
- We take $\mathcal{E}_0$ to be the **total DFT energy** of a semi-local DFT functional, i.e.
  $$ \mathcal{E}_\text{DFT}(P)
     = \mathcal{E}_\text{DFT}\left(\sum_i^{N_b} f_i |\phi_i\rangle \langle \phi_i|\right)
     = \sum_{i=1}^{N_b} f_i \int \psi_i^\ast \left(-\frac12 \Delta\right) \psi_i
    + \int V_\text{ext} \rho + \int \left(v_c \rho\right) \rho + E_\text{xc}(\rho) + E_\text{nuclear}$$
    with
     * the electron **density** $\rho = \sum_i^{N_b} f_i |\psi_i|^2$
       being directly dependent on *all* orbitals.
     * $\sum_i f_i \int \psi_i^\ast \left(-\frac12 \Delta\right) \psi_i$ describing
       the **kinetic** energy of the electrons
     * $\int V_\text{ext} \rho$ is the **external** potential energy,
       i.e. the electron-nuclear interaction.
     * $\int \left(v_c \rho\right) \rho$ is the **Hartree** energy with $(v_c \rho)$ being the classical Coulomb potential assoicated to the charge density $\rho$ in a uniform neutralising background:
     $$ - \Delta (v_c \rho) = 4\pi \rho $$
     * $E_\text{xc}(\rho)$ is the **exchange-correlation**
       (or XC) energy, which describes the electron-electron interaction beyond
       the classical model.
     * $E_\text{nuclear}$: A fixed energy offset due to the repulsion of the nuclei
       from each other.

## First order conditions and self-consistent field equations
- Differentiatating $\mathcal{E}_0$ we get the **Kohn-Sham Hamiltonian**
  $$ H_\text{KS}(P) = \nabla \mathcal{E}_0(P).$$
- Since $\mathcal{E}$ diverges on the boundary of $\mathcal{P}$,
  the closure of $\mathcal{P}$ is compact and $\mathcal{E}$ is
  differentiable, $\mathcal{E}$ has at least one minimiser in $\mathcal{P}$.
- The first-order condition $\nabla \mathcal{E}(P) = 0$
  on the above minisation problem gives
  $$H_\text{KS}(P) - \mu - \frac1{\beta} s'(P) = \nabla\mathcal{E}(P) = 0$$
- We introduce the **Fermi-Dirac map**
  $$f_\text{FD}(\varepsilon) = \frac{1}{1+e^{\beta (\varepsilon-\mu)}}$$
  and note
  $$s'(p) = \log\left(\frac{1-p}{p}\right) \qquad \text{and} \qquad
  s'(f_\text{FD}(\varepsilon)) = \beta(\varepsilon - \mu)$$.
- This allows rewrite the first-order condition
  as the **fixed-point problem**
  $$P = f_\text{FD}(H_\text{KS}(P)),$$
  the **self-consistent field problem**.
  
**Note:** The Fermi-Dirac map follows naturally
from the Fermionic entropy. In practical calculations often other smearing functions are used at the place of $f_\text{FD}$.

## Density mixing

- The formalism presented above (based on the density matrix $P$) is general, e.g. could also be applied to the Hartree-Fock problem or Hybrid-DFT if the functional $\mathcal{E}_0$ is modified appropriately.
- From the point of view of a numerical implementation it has, however, the disadvantage that it requires to treat density matrices, i.e. objects of size $N_b \times N_b$. This is challenging in particular for "large" basis sets (such as plane waves) where $N_b$ is regularly of the order of $10^5$ or $10^6$.
- For semi-local DFT one is able to formulate the problem as a minimisation over only the density $\rho$. This is advantageous since the density is an object of only size $N_b$.


- To see this we collect the linear terms (kinetic energy and external interaction with the nuclei) in the **core Hamiltonian** $H_0$ and the non-linear terms in $g$, the Hartree-exchange-correlation energy. We then write

  $$\mathcal{E}_0(P) = \mathrm{Tr}\left(H_0 P\right) + g(\text{diag} P)$$
  where $\rho = \text{diag} P$ is the density density corresponding to $P$.

- Introducing the **Hartree-exchange-correlation potential**
  $$ V(\rho) = \nabla g(\rho) $$
  we can write the gradient of $\mathcal{E}_0$ (the KS Hamiltonian) in terms of the density only:

  $$ H_\text{KS}(P) = H_0 + \text{diagm}\, V(\rho) $$

  where $\text{diagm}$ constructs the Fock matrix contribution from a given local potential.

- In turn this allows to write the SCF fixed-point problem in terms of the density only:
  $$ \rho = \text{diag} \Big( f_\text{FD}\left[ H_0 + \text{diagm}\, V(\rho) \right] \Big) = D(V(\rho)) = F(\rho)$$
  where we introduced the short-hand
  $$ D(V) = \text{diag} \Big( f_\text{FD}\left[ H_0 + \text{diagm}\, V \right] \Big)$$
  for the potential to density map
  and the map $F$ to denote the SCF update $F(\rho) = D(V(\rho))$.

- Notice, that numerically computing $D(V)$ requires to diagonalise
$$ \left[ H_0 + \text{diagm}\,V  \right] \psi_i = \varepsilon_i \psi_i, $$
  which allows to obtain the new density as
  $$ \rho = \sum_{i=1}^{N_b} f_\text{FD}(\varepsilon_i) |\psi_i|^2. $$
  Since $V$ itself depends on $\rho$, which depends on the eigenfunctions $\psi_i$ one can alternatively view the SCF problem as a **non-linear eigenproblem**.
- The diagonalisation required to compute $D(V)$ is the **expensive step** of an SCF procedure.

- When running a calculation in DFTK (like we did in the [previous notebook](0_first_practical_calculation.ipynb)), DFTK first inspects the molecule to guess an initial density $\rho_0$.
- Then we solve $ \rho = D(V(\rho)) $ by computing
  $D(V(\rho_n))$ for a sequence of iterates $\rho_n$ until input and output
  are close enough, i.e. the residual
  $$ R(\rho_n) = D(V(\rho_n)) - \rho_n$$
  is small. Then DFTK flags convergence.
- We will discuss these algorithms in more details in the next notebook.