# Analysing SCF convergence

In the previous notebook we saw that using the `KerkerMixing` as a preconditioner $P^{-1}$ to the SCF problem
greatly improved the convergence of the SCF for aluminium. In this notebook we will use some numerical tools to understand what is going on.

The standard damped, preconditioned fixed-point iterations are written as
$$ \rho_{n+1} = \rho_{n} + \alpha P^{-1} [D(V(\rho_n)) - \rho_n]. $$


Near the fixed point $\rho_\ast = D(V(\rho_\ast))$ the error $e_n = \rho_n - \rho_\ast$ is small and we can expand to first order:
$$ \begin{align*}
D(V(\rho_\ast + e_n)) &\simeq D[V(\rho_\ast) + V'(e_n)] \\
&\simeq D(V(\rho_\ast)) + D'(V'(e_n)))\\
&= \rho_\ast + D'(V'(e_n)))
\end{align*}$$

The derivatives $D'$ and $V'$ are again important quantities and are given special symbols:
- Hartree-exchange-correlation **kernel** $K_\text{Hxc} = V'$
- Independent-particle **susceptibility** $\chi_0 = D'$

The above expansion allows to relate the **error between SCF iterations** (near the fixed point):
$$ \begin{align*}
e_{n+1} = \rho_{n+1} - \rho_\ast 
&\simeq \rho_{n} - \rho_\ast + \alpha P^{-1} [\rho_\ast + \chi_0 K_\text{Hxc} e_n - \rho_n] \\
&= e_n - \alpha P^{-1} [1 - \chi_0 K_\text{Hxc}] e_n
\end{align*}$$

Introducing the **dielectric matrix** adjoint
$$ \varepsilon^\dagger = [1 - \chi_0 K_\text{Hxc}] $$
leads to the final relationship
$$ e_{n+1} \simeq [1 - \alpha P^{-1} \varepsilon^\dagger] e_n = [1 - \alpha P^{-1} \varepsilon^\dagger]^n e_0$$
with $e_0$ being the initial error.

In other words:
$$\text{SCF converges} \qquad \Leftrightarrow \qquad \text{eigenvalues of $1 - \alpha P^{-1} \varepsilon^\dagger$ are between $-1$ and $1$}$$
This implies that the **convergence** properties of an SCF
are related to $\varepsilon$, the dielectric operator,
which **depends on** the **dielectric properties** of the system under study.

## Making an SCF converge

It turns out that for the largest chunk of cases the eigenvalues of $\varepsilon^\dagger$ are positive. To make the SCF converge one can therefore:
- Choose $\alpha$ small enough. Even for $P = I$ this always works, but convergence can be painfully slow.
- Find a good $P^{-1} \simeq (\varepsilon^\dagger)^{-1}$. Then the eigenvalues of $(P^{-1} \varepsilon^\dagger)$ are close to 1, $\alpha \simeq 1$ is a good choice and the SCF converges in few steps. Hooray!
- The optimal $\alpha$ and the optimal rate of convergence are related to the condition number
  $$ \kappa = \frac{\lambda_\text{max}}{\lambda_\text{min}}$$
  of the dielectric matrix. The smaller the condition number, the better the convergence.

**Note:** If the preconditoner is very bad, the eigenvalues of $(P^{-1} \varepsilon^\dagger)$ might even be worse than $\varepsilon^\dagger$, such that convergence is actually hampered.

We will now investigate the eigenvalues of $(P^{-1} \varepsilon^\dagger)$ for a few examples.

## Aluminium

We start by taking a look at a slightly cruder (thus computationally cheaper) version of our aluminium setup from before: 

In [None]:
using DFTK
using LinearAlgebra

function aluminium_setup(repeat=1; Ecut=7.0, kgrid=[1, 1, 1])
    a = 7.65339
    lattice = diagm(fill(a, 3))
    Al = ElementPsp(:Al, psp=load_psp("hgh/lda/al-q3"))
    atoms = [Al => [[0.0, 0.0, 0.0], [0.0, 0.5, 0.5], [0.5, 0.0, 0.5], [0.5, 0.5, 0.0]]]

    # Make supercell in pymatgen
    mg_struct = pymatgen_structure(lattice, atoms)
    mg_struct.make_supercell([1, 1, repeat])
    lattice = load_lattice(mg_struct)
    atoms = [Al => [s.frac_coords for s in mg_struct.sites]];

    # Construct the model
    model = model_LDA(lattice, atoms, temperature=1e-3, symmetries=false)
    PlaneWaveBasis(model; Ecut, kgrid)
end

We already know that for moderate `repeat`s the convergence without mixing / preconditioner is slow:

In [None]:
scfres = self_consistent_field(aluminium_setup(3); tol=1e-12);

while when using the Kerker preconditioner it is much faster:

In [None]:
scfres = self_consistent_field(aluminium_setup(3); tol=1e-12, mixing=KerkerMixing());

Given an `scfres` one easily constructs functions representing $\varepsilon^\dagger$ and $P^{-1}$ with DFTK:

In [None]:
function construct_Pinv_epsilon(scfres)
    basis = scfres.basis
    
    Pinv_Kerker(δρ) = DFTK.mix_density(KerkerMixing(), basis, δρ)

    function epsilon(δρ)  # Apply ε† = 1 - χ0 Khxc
        δV   = apply_kernel(basis, δρ; ρ=scfres.ρ)
        χ0δV = apply_χ0(scfres.ham, scfres.ψ, scfres.εF, scfres.eigenvalues, δV)
        δρ - χ0δV   
    end    
    
    epsilon, Pinv_Kerker
end

Based on these functions we can find the largest eigenvalue of $\varepsilon^\dagger$ for this aluminium case using `KrylovKit`

In [None]:
using KrylovKit

scfres = self_consistent_field(aluminium_setup(3); tol=1e-12, mixing=KerkerMixing());
epsilon, Pinv_Kerker = construct_Pinv_epsilon(scfres)

λ_large, X_large, info = eigsolve(epsilon, randn(size(scfres.ρ)), 4, :LM;
                                  tol=1e-4, eager=true, verbosity=2)
@assert info.converged ≥ 4
λ_max = maximum(real.(λ_large))

println("Largest eigenvalue: $(λ_max)")

The smallest eigenvalue can also be determined using KrylovKit. Getting this to work reliably is a little more tricky, however. I will only show a simple setup, which has the disadvantage of being pretty slow. 

In [None]:
λ_small, X_small, info = eigsolve(epsilon, randn(size(scfres.ρ)), 2, EigSorter(abs, rev=false);
                                  tol=1e-3, eager=true, verbosity=2)
@assert info.converged ≥ 2
λ_min = minimum(real.(λ_small))

println("Smallest eigenvalue: $(λ_min)")

In [None]:
# If running the above takes too long for you just use this estimate:
# λ_min = 0.952

To summarise our results:

In [None]:
@show λ_min
@show λ_max
cond = λ_max / λ_min
@show cond

The condition number of $\varepsilon^\dagger$ for this system is about $30$.
This does not sound large compared to the condition numbers you might know
from linear systems.

However, this is sufficient to cause a notable slowdown, which would be even more
pronounced if we did not use Anderson, since we also would need to drastically
reduce the damping (try it!).

**Exercise:** Find the largest eigenvalue in case the Kerker preconditioner is used.
*Hint:* You can construct the operator $P^{-1} \varepsilon^\dagger$ by simply chaining the functions (`Pinv_Kerker ∘ epsilon`). Assuming that the smallest eigenvalue is about $0.8$, what is the condition number now?

If you want, repeat the exercise for `repeat = 6`. You can assume the smallest eigenvalue is still about $0.95$ or $0.8$, respectively. How does the condition number change if you double the system size?

Keeping in mind that the condition number is linked to the convergence speed: Which is setup should be employed to keep the number of required SCF iterations independent of system size.

#### Takeaways:
- For metals the conditioning of the dielectric matrix increases steaply with system size.
- The Kerker preconditioner tames this and makes SCFs on large metallic systems feasible by keeping the condition number of order 1.

## Helium chain

To prove the point that a single preconditioner (like `KerkerMixing`) is not good for all systems,
we now consider an (insulating) chain of Helium atoms:

In [None]:
using DFTK
using LinearAlgebra

function helium_setup(repeat=40; Ecut=7.0, kgrid=[1, 1, 1])
    a = 5
    lattice = diagm(fill(a, 3))
    He = ElementPsp(:He, psp=load_psp("hgh/lda/he-q2"))
    atoms = [He => [[0.0, 0.0, 0.0]]]

    # Make supercell in pymatgen
    mg_struct = pymatgen_structure(lattice, atoms)
    mg_struct.make_supercell([1, 1, repeat])
    lattice = load_lattice(mg_struct)
    atoms = [He => [s.frac_coords for s in mg_struct.sites]];

    # Construct the model
    model = model_LDA(lattice, atoms, temperature=1e-3, symmetries=false)
    PlaneWaveBasis(model; Ecut, kgrid)
end

From running the SCFs using `KerkerMixing` seems like a bad idea:

In [None]:
scfres = self_consistent_field(helium_setup(40); tol=1e-12, mixing=KerkerMixing());

In [None]:
scfres = self_consistent_field(helium_setup(40); tol=1e-12);

**Exercise** This can be confirmed by investigating the eigenvalues. Here are some good settings for you to play on this problem. Find the condition numbers with and without `KerkerMixing` and explain the observations.

In [None]:
using KrylovKit

scfres = self_consistent_field(helium_setup(40); tol=1e-12);
epsilon, Pinv_Kerker = construct_Pinv_epsilon(scfres)

operator = epsilon

λ_large, X_large, info = eigsolve(operator, randn(size(scfres.ρ)), 2, :LM;
                                  tol=1e-3, eager=true, verbosity=2)
@assert info.converged ≥ 2
λ_max = maximum(real.(λ_large))
    
λ_small, X_small, info = eigsolve(operator, randn(size(scfres.ρ)), 2, EigSorter(abs, rev=false);
                                  tol=1e-3, eager=true, verbosity=2)
@assert info.converged ≥ 2
λ_min = minimum(real.(λ_small))

println("Smallest eigenvalue: $(λ_min)")
println("Largest eigenvalue:  $(λ_max)")

In [None]:
@show λ_min
@show λ_max
cond = λ_max / λ_min
@show cond

#### Takeaways:
- For insulating systems the best approach is to not use any mixing.
- **The ideal mixing** strongly depeends on the dielectric properties of system which is studied (metal versus insulator versus semiconductor).
- A more detailed discussion as well as some ideas how to deal with inhomogeneous systems (where both metals and insulators coexist) is given in [a recently published paper](https://doi.org/10.1088/1361-648X/abcbdb).