# Finding convergence parameters in AlGaO$_\mathbf{3}$ DOS calculations

For this example we use a data set of 144 calculations of AlGaO$_3$ that were done with FHI-aims using different numerical settings and approximations. The data was produced for Ref. [1]. The data is hosted at [NOMAD](https://nomad-lab.eu/nomad-lab/) and can be downloaded using `MADAS`.

To do so we first define the query:

In [None]:
AlGaO_query = {
    "datasets.dataset_name:any": [
      "Numerical_Errors_FHI-aims"
    ],
    "results.material.elements:all": [
      "Al",
      "O",
      "Ga"
    ],
    "results.properties.available_properties:all": [
      "dos_electronic"
    ]
  }

We use the NOMAD API in `MADAS`:

In [None]:
from madas.apis.NOMAD_web_API import API

The values are extracted from the NOMAD Archives using the [following functions](https://github.com/kubanmar/madas-examples/blob/master/notebooks/processing_functions.py):

In [None]:
from processing_functions import get_dos_values, get_dos_energies, get_FHIaims_kpoints, get_FHIaims_n_basis_functions

In [None]:
processing = API().processing
processing.pop("archive")
processing["electronic_dos_values"] = get_dos_values
processing["electronic_dos_energies"] = get_dos_energies
processing["kpoints"] = get_FHIaims_kpoints
processing["n_basis_functions"] = get_FHIaims_n_basis_functions

Next we define our database for storing the data:

In [None]:
from madas import MaterialsDatabase

In [None]:
db=MaterialsDatabase(filename="AlGaO_convergence.db", 
                     api=API(processing=processing))

In [None]:
db.fill_database(AlGaO_query)

In case some data could not be retrieved, we can retry them:

In [None]:
materials_retry = db.api.retry()
if len(materials_retry) > 0:
    db.backend.add_many(materials_retry)

Eventually, $144$ entries should be downloaded:

In [None]:
len(db)

We then start generating fingerprints:

In [None]:
from madas.fingerprints import DOSFingerprint

We generate a grid for the DOS fingerprint [2] with a high number of pixels and a large energy range:

In [None]:
grid = DOSFingerprint.get_default_grid().create(n_pix=2048, cutoff=[-8, 12])

We add the fingerprints to the database:

In [None]:
db.add_fingerprint("DOS", fingerprint_kwargs={"grid_id":grid.get_grid_id()})

And compute a similarity matrix from these fingerprints:

In [None]:
simat = db.get_similarity_matrix("DOS", name="DOS")

We sort the entries by mean similarity to the rest of the data set by taking the mean of each row of the similarity matrix.

In [None]:
import numpy as np
# tqdm progress bar for visualization of the progress 
from madas.utils import tqdm

In [None]:
sorted_mids = list(sorted(simat.mids, key = lambda x: np.mean(simat[x])))

In [None]:
nfunc, kpoints = [], []
for mid in tqdm(sorted_mids):
    entry = db[mid]
    nfunc.append(entry.data["n_basis_functions"])
    kpoints.append(np.prod(entry.data["kpoints"]))

In [None]:
import matplotlib.pyplot as plt
from plotting_functions import similarity_kpoint_nfunc_plot
plt.style.use("./settings.mplstyle")

In [None]:
# sort the matrix
simat.get_sub_matrix(sorted_mids, copy=False)

In [None]:
similarity_kpoint_nfunc_plot(simat, kpoints, nfunc, filename=None)

## References

[1] Carbogno, C., Thygesen, K.S., Bieniek, B. et al. Numerical quality control for DFT-based materials databases. npj Comput Mater 8, 69 (2022). https://doi.org/10.1038/s41524-022-00744-4

[2] Kuban, M., Rigamonti, S., Scheidgen, M. et al. Density-of-states similarity descriptor for unsupervised learning from materials data. Sci Data 9, 646 (2022). https://doi.org/10.1038/s41597-022-01754-z