# Tuning fingerprint parameters

Here, we showcase how electronic density-of-states (DOS) fingerprints can be tuned to focus on different energy ranges.

## Data download

In [None]:
from os.path import exists as path_exists

We re-use data from Ref. [1], which is based on the electronic density of states of 2D materials from the [C2DB](https://cmr.fysik.dtu.dk/c2db/c2db.html) [2,3]. We download the data and unpack it:

In [None]:
if not path_exists("C2DB_TDOS.json.zip"):
    !curl -o C2DB_TDOS.json.zip https://raw.githubusercontent.com/kubanmar/dos-fingerprints-data/master/dos_data_and_fingerprints.json.zip

In [None]:
if not path_exists("dos_data_and_fingerprints.json"):
    !unzip C2DB_TDOS.json.zip

## Generation of DOS fingerprints

In [None]:
import json

In [None]:
from madas.fingerprints import DOSFingerprint
from madas.utils import tqdm

In [None]:
# Load dos data
with open("dos_data_and_fingerprints.json", "r") as f_:
    dos_data = json.load(f_)

Following `Grid` parameters from [1]:
 - Δεmin = 0.05 eV
 - Δεmax = 1.05 eV
 - N = Δεmax/Δεmin = 21
 - εref = 0 eV 
 - W = 4 eV 
 - WH = 4 eV
 - Nρ = 512
 - ρmin = NρΔρmin = 0.25
 - ρmax = 2.75
 - NH = ρmax/ρmin = 11
 
 We define the grid for the fingerprint: 

In [None]:
grid_original = DOSFingerprint.get_default_grid().create(delta_e_min=0.05, 
                                                         delta_e_max=1.05, 
                                                         e_ref=0, 
                                                         width=4, 
                                                         n_pix=512, 
                                                         delta_rho_min=0.25, 
                                                         delta_rho_max=2.75, 
                                                         cutoff=[-3,3])

and compute the fingerprints for all the data entries in the data set.

In [None]:
ZERO_fps = []
for key, raw_data in tqdm(dos_data.items()):
    # We generate a new fingerprint using the grid id of our grid
    new_fp = DOSFingerprint(grid_id=grid_original.get_grid_id()).calculate(raw_data["energy"], 
                                                                           raw_data["dos"], 
                                                                           convert_data=None)
    # because we use the DOSFingerprint().calculate() method, the id of the fingerprints is not
    # set automatically. We therefore have to set it:
    new_fp.set_mid(key)
    # and append the new fingerprint to the list
    ZERO_fps.append(new_fp)

Next we create a new grid with focus on the conductions bands, by changing the reference energy to $2$ eV. From this new grid we create the respective fingerprints.

In [None]:
grid_PLU2 = grid_original.copy()

# set reference energy to +2 eV
grid_PLU2.e_ref = 2

# the cutoff is defined w.r.t. the reference energy
# therefore we must adapt the cutoff when changing
# the reference energy
grid_PLU2.cutoff_min = -5
grid_PLU2.cutoff_max = 1

PLU2_fps = []
for key, raw_data in tqdm(dos_data.items()):
    new_fp = DOSFingerprint(grid_id=grid_PLU2.get_grid_id()).calculate(raw_data["energy"], 
                                                                           raw_data["dos"], 
                                                                           convert_data=None)
    new_fp.set_mid(key)
    PLU2_fps.append(new_fp)

We create another set of fingerprints with the focus of the grid in the valence bands, by setting the reference energy of the grid to $-2$ eV.

In [None]:
grid_MIN2 = grid_original.copy()

# set reference energy to -2 eV
grid_MIN2.e_ref = -2
# adapt cutoff
grid_MIN2.cutoff_min = -1
grid_MIN2.cutoff_max = 5

MIN2_fps = []
for key, raw_data in tqdm(dos_data.items()):
    new_fp = DOSFingerprint(grid_id=grid_MIN2.get_grid_id()).calculate(raw_data["energy"], 
                                                                           raw_data["dos"], 
                                                                           convert_data=None)
    new_fp.set_mid(key)
    MIN2_fps.append(new_fp)

## Calculation of similarity matrices

In [None]:
from madas import SimilarityMatrix

In [None]:
%%time
ZERO_simat = SimilarityMatrix().calculate(ZERO_fps)

In [None]:
%%time
PLU2_simat = SimilarityMatrix().calculate(PLU2_fps)

In [None]:
%%time
MIN2_simat = SimilarityMatrix().calculate(MIN2_fps)

## Searching for most similar materials

Having the similarity matrices calculated, it is quick to obtain the most similar materials from them. We choose a reference material:

In [None]:
ref_mid = "ZrTe2-f7ad606317e6"

and, for each similarity matrix, obtain the most similar materials alongside with their similarities to the reference. 

In [None]:
# Most similar materials for fingerprints with focus on the Fermi energy...
ZERO_simat.get_k_most_similar(ref_mid, k=2)

In [None]:
# ... the conduction bands ...
PLU2_simat.get_k_most_similar(ref_mid, k=2)

In [None]:
# ... and the valance bands.
MIN2_simat.get_k_most_similar(ref_mid, k=2)

In [None]:
# create sets containing the ids of the two most similar materials
ZERO_most_similar_mids = list(ZERO_simat.get_k_most_similar(ref_mid, k=2).keys())
PLU2_most_similar_mids = list(PLU2_simat.get_k_most_similar(ref_mid, k=2).keys())
MIN2_most_similar_mids = list(MIN2_simat.get_k_most_similar(ref_mid, k=2).keys())

In [None]:
# create a set that contains all og the former
all_similar_mids = []
all_similar_mids.extend(ZERO_most_similar_mids)
all_similar_mids.extend(PLU2_most_similar_mids)
all_similar_mids.extend(MIN2_most_similar_mids)

In [None]:
# And generate a list that starts with the reference material
# (this is not strictly necessary, but helpful for the figure)
all_mids = [ref_mid] + list(sorted(set(all_similar_mids)))

In [None]:
# We can inspect the contents of our list:
for mid in all_mids:
    print(mid)

## Comparing the most similar materials

We import the analysis tool that can be used to compare two spectra from `MADAS`. This is done by setting the cutoff to a smal value and moving the reference energy of the fingerprint grid accross the whole defined energy region. Thereby, we can analyse in which energy regions the spectra are most similar.

In [None]:
from madas.analysis import StrideSpectrumComparison

In [None]:
# We use the original grid settings.
ssc = StrideSpectrumComparison(grid_id=grid_original.get_grid_id(), show_progress=False)

In [None]:
# we define a function that returns a dos spectrum given an id
def get_dos_value(mid: str):
    return dos_data[mid]["energy"], dos_data[mid]["dos"]

The function to generate this plot can be found [here](https://github.com/kubanmar/madas-examples/blob/master/notebooks/plotting_functions.py):

In [None]:
from plotting_functions import fingerprint_tuning_comparison_plot
import matplotlib.pyplot as plt
plt.style.use("./settings.mplstyle")

In [None]:
fingerprint_tuning_comparison_plot(get_dos_value,
                                   ssc,
                                   all_mids,
                                   ref_mid,
                                   PLU2_most_similar_mids, 
                                   MIN2_most_similar_mids, 
                                   filename=None)

In [None]:
# crystal structures and properties for reference
for mid in all_mids:
    print(f"https://cmrdb.fysik.dtu.dk/c2db/row/{mid}")

## References:

[1] Kuban, M., Rigamonti, S., Scheidgen, M. _et al_. _Density-of-states similarity descriptor for unsupervised learning from materials data_. Sci Data **9**, 646 (2022). https://doi.org/10.1038/s41597-022-01754-z

[2] Haastrup, S. _et al_. _The Computational 2D Materials Database: High-Throughput Modeling and Discovery of Atomically Thin Crystals_ 2D Materials **5**, 042002 (2018) https://doi.org/10.1088/2053-1583/aacfc1

[3] Gjerding, M. N. _et al_. _Recent Progress of the Computational 2D Materials Database (C2DB)_. 2D Materials **8**, 044002 (2021) https://doi.org/10.1088/2053-1583/ac1059
