# DFTK --- the density-functional toolkit

## Landscape of DFT codes

- Plane-wave DFT is a wide-spread method
- [Plenty of software](https://en.wikipedia.org/wiki/Vienna_Ab_initio_Simulation_Package) already exists:
  * [ABINIT](https://www.abinit.org) (GPL, Fortran)
  * [CASTEP](http://www.castep.org) (proprietary, Fortran)
  * [VASP](https://www.vasp.at) (proprietary, Fortran)
  * [GPAW](https://wiki.fysik.dtu.dk/gpaw) (GPL, python, C)
  * [Quantum ESPRESSO](https://www.quantum-espresso.org) (GPL, Fortran, C)
  * [KSSOLV](https://crd-legacy.lbl.gov/~chao/KSSOLV) (academic, Matlab)
- Represents hundreds of man-years of coding work!
- Why bother writing a new one?

## Problem 1: High-throughput screening

- Design of novel materials by screening through large space of systems (10000 or more)
- Of growing interest in drug design, materials science, catalysis, pharmacy, ...
- Aim: Find promising candidates for application of interest


- E.g. photovoltaics: Translates to classifiers such as band gap, excitation energies, photostability, ...
- Calculating classifiers accurately not always easy ...


- Typically


  |                 | Initial phase (> 10k candidates)        | Intermediate phase     | Final phase (10-100 candidates) | 
  |:--------------- |:--------------------------------------- |:---------------------- |:----------------------------- |
  | **Target**      | Distinguish interesting from irrelevant | Narrow down selection  | Simulate experimental results |
  | **Manual intervention** | Impossible                      | Should be avoided      | Manageable                    |
  | **Accuracy** requirement   | Low                          | Tuned from low to higher | High  |
  | **Speed**    requirement   | High                         | Less and less          | Low |
    

**Requirements:**
- Reliable algorithms: Breakdown of e.g. SCFs not acceptable in inital phase
- Tunable workflows: Accuracy needs to be systematically lowerable
- Best case: Black-box workflows. User choses only accuracy, program decides how to get there
- Flexibility: Scriptability, interfacing to data science tools, ...

**State of the art:**
- Sizable number of parameters:
    * $k$-point sampling
    * $E_\text{cut}$
    * Multi-layer convergence tolerance (SCF vs. eigensolver)
    * Basically only double precision supported
- Protocols by experience
    * **Too conservative** $\Rightarrow$ inferiour performance
    * **Too optimistic** $\Rightarrow$ inferiour reliability
- **Two-language-problem**
    * Interfacing with python / bash etc. decided a priori
    
**DFTK:**
- Long-term driving force behind some of our projects (see below)

## Problem 2: Reliable SCF algorithms

- Related to Problem 1
- Convergence of SCF depends on dielectric properties of material
- $\Rightarrow$ Need to do SCF differently for metals, insulators, semiconductors, ...
- Different **mixing** methods (Simple, Kerker, Resta, ...)
- Open problem: What to do in mixed cases (e.g. surfaces, catalysis, ...)

**Requirements:**
- To obtain mathematical understanding need to treat reduced problems:
  * 1D or 2D
  * Small $E_\text{cut}$ and $k$-grids for fast feedback loop
  
  
- To **design an algorithm**:
  * Mix and match ideas
  * Change order of typical steps (e.g. for backtracking, line searches, ...)
  * No need to be fast, needs to be easy to try
  * $\Rightarrow$ High-level code inside key algorithms
  
  
- To **test an algorithm**:
  * Scale implementation to realistic problem sizes
  * Finer grids, performance optimisation
  * First version never works $\Rightarrow$ Should not take weeks to do this!

**DFTK:**
- Dimension-independent code base
- Toy problems *and* production-scale
- One language: Julia (high-level and fast)
- [SCF code](https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/scf/scf_solvers.jl) readable and hackable

## Problem 3: A posteriori error analysis in DFT

- A posteriori error analysis: Standard in aviation, engineering, car manifacture
- State of the art in experimental science
- **Not used** in quantum chemistry
- Instead: Trustworthyness of results determined from experience


- Sources of error in plane-wave DFT:
  * Model error
  * Discretisation error (cutoff and BZ)
  * Algorithm error
  * Arithmetic error
- Related to Problem 1:
  * Knowing the sources and sizes of error allows to balance
- Needs rigourous error estimates!
- Lack of mathematical results for full DFT

**Requirements:**
- Expand mathematical theory starting from accessible reduced problems
- Access to unusual intermediate quantities (e.g. not just value, but also extra derivatives)
- Arithmetic error requires custom floating point types (e.g. Interval arithmetic)

**DFTK:**
- Fully customisable model: No problem to treat toy problems
- Code structure independent of model: Results can be expanded to full DFT later!
- Structure of the code agrees with mathematical structure of DFT
- Support for arbitrary floating-point types


- First results [obtained recently](https://doi.org/10.1039/D0FD00048E)
  on non-SCF models:

<img src="img/si_band_errors.png" width=600 height=600 />

## Problem 4: Limitations in time and money

- Funding agencies and universities are tight on money
- Classes have an even smaller budget
- PhDs last for only 3-ish years, Master even less

**Requirements:**
- Students should focus on learning and doing science
- As little as possible they should need to 
  * Learn obscure input formats
  * Work around interface quirks
  * Learn outdated software dialects
  * Struggle with unmaintainable / undocumented code
  * Debugging other people's code

**State of the art:**
- Not all software open-source
- Many leading packages are quite expensive (10k€ and more)
- Millions of lines of code
- Employ outdated conventions (variable length, formatting, commenting, ...)
- Developers have left (PhD is over) and documentation is scarse

**DFTK:**
- Short and mageable size (5k lines)
- Design goal: Code clear and self-explanatory
- Comments often hint derivation or point to publications
- Code as close to equations as possible
- Unit testing: Ideally *master* never breaks
- Documentation by examples (https://docs.dftk.org)


- Timeframes of recent projects:
  * Error estimates (problem 3): **10 weeks** to publication
  * Recent **master student** prepared his first paper (almost) during the 6 months of his project, having no prior knowledge about DFT
  * **8-week student project** made initial progress at implementing GPU routines inside DFTK, having no prior experience with Julia or DFT

## The Julia programming language

- https://julialang.org
- Started 10 years ago, 1.0 released in August 2018

- *Walk like Python, talks like Lisp, runs like FORTRAN*
- High-level language, but still hackable
- Just-in-time compiled to byte code before *native* execution
- Stronger type system compared to Python
- Key concept from functional languages: **Multiple dispatch**
- Amongst C++ and Fortran one of the 3 languages which have been [scaled to a complete supercomputer](https://juliacomputing.com/blog/2019/04/12/Supercomputing-julia.html)
- Rich ecosystem (Optimisation, PDEs, stochastic processes, GPUs, Machine-Learning, Statistics, Linear Algebra, ...)

- $\Rightarrow$ Greatest advantage is ease of **code reuse**:
  * High-level code, automatically hardware-specific kernel
  * E.g. CPU and GPU code look similar to alike
  * Library code and user code decoupled extremely well
  * Simplifies mixing and matching from the ecosystem as needed
  * Key reason for the rapid development of DFTK
- One-day introductory course: https://michael-herbst.com/learn-julia
- More example code in a second

## Personal takeaway: DFTK worth the rewrite?

### Advantage of rewrite
- DFT is an interdisciplinary challenge:
    * **Mathematicians:** Toy models and unphysical edge cases
    * **Scientist:** Wants to focus on science, not numerics
    * **High-performance specialist:** Exploit what hardware offers
    * **Practicioner:** Reliable, black-box, high-level interface
    
    
- I don't believe this (and issues raised above) could be fully addressed in existing codes.
- $\Rightarrow$ DFTK designed as **platform for** multidisciplinary **collaboration**


- Writing DFTK took less than a year and now we understand every line of code
- **Performance** compatible: Within factor 2 to 3 to established codes
- Code shown to have flexibility beyond existing packages
- Started only in April 2019, but already two publications with DFTK (and ongoing work on at least two more)

### Disadvantages of rewrite
- DFTK has [sizable set of features](https://docs.dftk.org), but nowhere near established codes
- There are faster codes on the market