# Analysing SCF convergence

The goal of this exercise is to explain the differing convergence behaviour of SCF algorithms depending on the choice of the preconditioner $P^{-1}$ and the underlying material.

For this we will find the largest and smallest eigenvalue of $(P^{-1} \varepsilon^\dagger)$ and $\varepsilon^\dagger$, where $\varepsilon^\dagger$ is the dielectric operator (see [../2_preconditioning.ipynb](../2_preconditioning.ipynb) ). The ratio of largest to smallest eigenvalue is the condition number
  $$ \kappa = \frac{\lambda_\text{max}}{\lambda_\text{min}},$$
which can be related to the rate of convergence. The smaller the condition number, the faster the convergence.

## (a) Aluminium metal.
We start by taking a look at a slightly cruder (thus computationally cheaper) version of our aluminium setup from above: 

In [None]:
using DFTK
using ASEconvert

function aluminium_setup(repeat=1; Ecut=7.0, kgrid=(1, 1, 1))
    # Use ASE to make an aluminium supercell
    pysystem = ase.build.bulk("Al", cubic=true) * pytuple((repeat, 1, 1))
    
    # Convert to AbstractSystem and attach pseudopotentials:
    aluminium = pyconvert(AbstractSystem, pysystem)
    system = attach_psp(aluminium; Al="hgh/lda/al-q3")

    # Construct an LDA model and discretise
    model = model_LDA(system; temperature=1e-3, symmetries=false)
    PlaneWaveBasis(model; Ecut, kgrid)
end

To construct functions representing $\varepsilon^\dagger$ and the Kerker preconditioner $P^{-1}$.

We already know that for moderate `repeat`s the convergence without mixing / preconditioner is slow:

In [None]:
# Note: DFTK uses the self-adapting LdosMixing() by default, so to truly disable
#       any preconditioning, we need to supply `mixing=SimpleMixing()` explicitly.
scfres = self_consistent_field(aluminium_setup(3); tol=1e-8, mixing=SimpleMixing());

while when using the Kerker preconditioner it is much faster:

In [None]:
scfres = self_consistent_field(aluminium_setup(3); tol=1e-8, mixing=KerkerMixing());

Given an `scfres` one easily constructs functions representing $\varepsilon^\dagger$ and $P^{-1}$ with DFTK:

In [None]:
function construct_Pinv_epsilon(scfres)
    basis = scfres.basis
    
    Pinv_Kerker(δρ) = DFTK.mix_density(KerkerMixing(), basis, δρ)
    
    function epsilon(δρ)  # Apply ε† = 1 - χ0 Khxc
        δV   = apply_kernel(basis, δρ; ρ=scfres.ρ)
        χ0δV = apply_χ0(scfres, δV)
        δρ - χ0δV   
    end    
    
    epsilon, Pinv_Kerker
end

(i) Find the largest eigenvalue of $\varepsilon^\dagger$ for this aluminium case using `KrylovKit`. For this use the following code snippet:

In [None]:
using KrylovKit

scfres = self_consistent_field(aluminium_setup(3); tol=1e-8);
epsilon, Pinv_Kerker = construct_Pinv_epsilon(scfres)

λ_large, X_large, info = eigsolve(epsilon, randn(size(scfres.ρ)), 4, :LM;
                                  tol=1e-4, eager=true, verbosity=2)
@assert info.converged ≥ 4
λ_max = maximum(real.(λ_large))

println("Largest eigenvalue: $(λ_max)")

The smallest eigenvalue can also be determined using KrylovKit. Getting this to work reliably is a little more tricky, however. I will only show a simple setup, which has the disadvantage of being pretty slow:
```julia
λ_small, X_small, info = eigsolve(epsilon, randn(size(scfres.ρ)), 2, EigSorter(abs, rev=false);
                                  tol=1e-3, eager=true, verbosity=2)
@assert info.converged ≥ 2
λ_min = minimum(real.(λ_small))

println("Smallest eigenvalue: $(λ_min)")
```
If this takes too long on your machine, just assume the the smallest eigenvalue is $λ_{min} = 0.952$. What is the condition number in this case?

The condition number of $\varepsilon^\dagger$ for this system should be around $30$.
This does not sound large compared to the condition numbers you might know
from linear systems.

However, this is sufficient to cause a notable slowdown, which would be even more
pronounced if we did not use Anderson, since we also would need to drastically
reduce the damping (try it!).

(ii) Having computed the eigenvalues of the dielectric matrix
we can now also look at the eigenmodes, which are responsible for the bad convergence behaviour.
For example like:

```julia
using Plots
using Statistics

function plot_mode(mode)
    # Average along z axis
    mode_xy = mean(real.(mode), dims=3)[:, :, 1, 1]
    heatmap(mode_xy', c=:RdBu_11, aspect_ratio=1, grid=false,
            legend=false, clim=(-0.006, 0.006))
end

plot_mode(X_large[1])
```

Keeping in mind that the origin of the metallic ill-conditioning is termed "charge sloshing", how can you interpret the obtained eigenmode?

(iii) Find the largest eigenvalue for the SCF of the aluminium supercell (`repeat=3`) in case the Kerker preconditioner is used.  
*Hint:* You can construct the operator $P^{-1} \varepsilon^\dagger$ by simply chaining the functions (`Pinv_Kerker ∘ epsilon`). Assuming that the smallest eigenvalue is about $0.8$, what is the condition number now? 


(iv) If you want, repeat the exercise for `repeat = 6`. You can assume the smallest eigenvalue is still about $0.95$ or $0.8$, respectively. How does the condition number change if you double the system size?

Keeping in mind that the condition number is linked to the convergence speed: Which is setup should be employed to keep the number of required SCF iterations independent of system size.

#### Takeaways:
- For metals the conditioning of the dielectric matrix increases steaply with system size.
- The Kerker preconditioner tames this and makes SCFs on large metallic systems feasible by keeping the condition number of order 1.

## (b) Helium chain (insulator).

To prove the point that a single preconditioner (like `KerkerMixing`) is not good for all systems,
we now consider an (insulating) chain of Helium atoms:

In [None]:
using DFTK
using LinearAlgebra

function helium_setup(repeat=30; Ecut=7.0, kgrid=[1, 1, 1])
    # Directly make Helium supercell
    a = 5.0
    lattice = diagm([repeat * a, a, a])
    He = ElementPsp(:He, psp=load_psp("hgh/lda/he-q2"))
    atoms = fill(He, repeat)
    positions = [(i-1)/repeat * [1.0, 0.0, 0.0] for i in 1:repeat]

    # Construct the model
    model = model_LDA(lattice, atoms, positions; temperature=1e-3, symmetries=false)
    PlaneWaveBasis(model; Ecut, kgrid)
end

From running the SCFs using `KerkerMixing` seems like a bad idea:

In [None]:
scfres = self_consistent_field(helium_setup(30); tol=1e-8, mixing=KerkerMixing());

In [None]:
scfres = self_consistent_field(helium_setup(30); tol=1e-8, mixing=SimpleMixing());

Repeat the analysis from (a) for a Helium chain with `repeat=30`. To find the smallest and largest eigenvalues of $\varepsilon^\dagger$ and $P^{-1} \varepsilon^\dagger$ use

In [None]:
using KrylovKit

scfres = self_consistent_field(helium_setup(30); tol=1e-8);
epsilon, Pinv_Kerker = construct_Pinv_epsilon(scfres)

operator = epsilon

λ_large, X_large, info = eigsolve(operator, randn(size(scfres.ρ)), 2, :LM;
                                  tol=1e-3, eager=true, verbosity=2)
@assert info.converged ≥ 2
λ_max = maximum(real.(λ_large))
    
λ_small, X_small, info = eigsolve(operator, randn(size(scfres.ρ)), 2, EigSorter(abs, rev=false);
                                  tol=1e-3, eager=true, verbosity=2)
@assert info.converged ≥ 2
λ_min = minimum(real.(λ_small))

println("Smallest eigenvalue: $(λ_min)")
println("Largest eigenvalue:  $(λ_max)")
println("Condition number:    $(λ_max / λ_min)")

Then run the two SCFs with and without Kerker preconditioning, that is

In [None]:
scfres = self_consistent_field(helium_setup(30); tol=1e-8, mixing=SimpleMixing());

as well as

In [None]:
scfres = self_consistent_field(helium_setup(30); tol=1e-8, mixing=KerkerMixing());

and explain the observations with respect to convergence, taking your findings on the eigenvalues of $\varepsilon^\dagger$ and $P^{-1} \varepsilon^\dagger$ into account.

#### Takeaways:
- For insulating systems the best approach is to not use any mixing.
- **The ideal mixing** strongly depends on the dielectric properties of system which is studied (metal versus insulator versus semiconductor).

## (c) Investigating other sources of ill-conditioning

Originating from the mathematical expression of the dielectric matrix

$$
\varepsilon^\dagger = 1 - \chi_0 K = \varepsilon^\dagger = 1 - \chi_0 (v_c + K_\text{xc})
$$

we can identify a number of sources of ill-conditioning, going beyond the case of charge sloshing we have been focusing on so far.  Without going into too many details (see [M. Herbst, A. Levitt. *J. Phys.: Condens. Matter* **33** 085503 (2021) DOI: 10.1088/1361-648x/abcbdb](http://dx.doi.org/10.1088/1361-648x/abcbdb) if you are interested), the sources of ill-conditioning in the dielectric operator are:

1. $\varepsilon^\dagger$ has small eigenvalues, i.e. eigenvalues substantially smaller than $1$.   
  It can be shown that $v_c$ is a positive operator with eigenvalues larger than $1$ and that $\chi_0$ is non-positive. Usually the $v_c$ term of the kernel dominates over $K_\text{xc}$, such that in most cases $\varepsilon^\dagger$ has a smallest eigenvalue around $1$. In this type of instability therefore $K_\text{xc}$ dominates over $v_c$, which usually is associated with symmetry breaking, e.g. a paramagnetic system close to a ferromagnetic phase transition.
2. Large modes of $\chi_0$ causing large eigenvalues in $\varepsilon^\dagger$.  
  A typical case are localised $d$- or $f$-states near the Fermi level.
3. Large eigenvalues of $v_c$: This is the familiar case of **charge sloshing** and was discussed in [2_preconditioning.ipynb](/notebooks/2_preconditioning.ipynb).

**Exercise:**
There are thus plenty more situations where investigating the eigenvalues of the dielectric matrix with and without preconditioning is insightful. If you are interested in running any of these analyses, please contact us during the exercises and we will give you further instructions:

1. Near a magnetic phase transition: Forcing iron into a collinear spin solution and finding the spin-breaking mode in the dielectric matrix.
2. Investigating the convergence of an isolated oxygen atom as a proxy to understand the ill-conditioning of localised $p$, $d$ or $f$ states.
3. The effectiveness of the LDOS preconditioning on large mixed systems, investigated using a pseudo-1D model system (mixed sodium-helium-chain).