# Example of similarity score calculation of 2CO on Pt(553) configurations

### Import the necessary functions

In [1]:
import pandas as pd
import time # timing the execution of the similarity calculation
import similarity as sim
print("successfully loaded packages")

successfully loaded packages


### Load the configurations
There are 2 pairs of conifgurations (total of 4 configurations) subjected to the similarity calculation.\
The configurations in conf_pair1 (conf1 and conf2) are highly similar.\
The configurations in conf_pair2 (conf3 and conf4) are highly dis-similar.
conf1             |  conf2
:-------------------------:|:-------------------------:
![](conf_img/conf1.png)  |  ![](conf_img/conf2.png)

conf3             |  conf4
:-------------------------:|:-------------------------:
![](conf_img/conf3.png)  |  ![](conf_img/conf4.png)

### Note about loading the configuration

The configurations of interest can be placed arbitrarily. However, the path to these configurations \
must be accurate. The path can be either the absolute path, or the relative path w.r.t. this notebook. \
In this example, relative path is used for each configuration.

In [2]:
conf1_path = '2COconf1.CONTCAR'
conf2_path = '2COconf2.CONTCAR'
conf3_path = '2COconf3.CONTCAR'
conf4_path = '2COconf4.CONTCAR'

### Making configuration pairs
In this example, two configuration pairs are defined as previously stated. \
However, users are free to define their own configuration pairs, as long as \
each pair only consists two configurations.

If users want to do similarity calculation on all possible pairs among the configurations of interest, \
users should use `itertools.combinations`, and is demonstrated below.

In [3]:
conf_pair1 = [conf1_path,conf2_path]
conf_pair2 = [conf3_path,conf4_path]

### define all possible pair among the configurations of interest
from itertools import combinations # users probably need to install the itertools package.
conf_path_arr = [conf1_path, conf2_path, conf3_path, conf4_path]
all_conf_pairs = list(combinations(conf_path_arr,2))

### Do the similarity calculations
the main function to calculate the similarity score between two configurations of interest is `sim.compare_eigval_diff`. \
The two key inputs are:
* the configuration pair, which is an array/list of the paths of the two configurations.
* `start_atom_ele`, which sets element of the root atom for the Breadth-first search (BFS) \
for constructing the adjacency matrix of each chemical environment graph ($G_{chem-env}$). \
The sequence of atoms outputed from the BFS won't change the results, as we use the eigenvalues \
sorted by the magnitude.

It is highly recommended to use the surface atom element because surface atoms \
are always present in each $G_{chem-env}$. In this case, \
we use `Pt`, which is also the default setting.

In [4]:
start = time.time()
conf_pair1_score = sim.compare_eigval_diff(conf_pair1,start_atom_ele='Pt')
conf_pair2_score = sim.compare_eigval_diff(conf_pair2,start_atom_ele='Pt')
end = time.time()
t_execution = end - start

print(f'conf_pair1 score is {conf_pair1_score}')
print(f'conf_pair2 score is {conf_pair2_score}')
print(f'execution time is {t_execution} s')

for pair in all_conf_pairs:
    score = sim.compare_eigval_diff(pair,start_atom_ele='Pt')
    print(f'the sim score between {pair[0]} and {pair[1]} is {score}')

conf_pair1 score is -0.00010181489051319659
conf_pair2 score is -0.24061769247055054
execution time is 1.3386318683624268 s
the sim score between 2COconf1.CONTCAR and 2COconf2.CONTCAR is -0.00010181489051319659
the sim score between 2COconf1.CONTCAR and 2COconf3.CONTCAR is -0.18733686208724976
the sim score between 2COconf1.CONTCAR and 2COconf4.CONTCAR is -0.06794796139001846
the sim score between 2COconf2.CONTCAR and 2COconf3.CONTCAR is -0.1873398721218109
the sim score between 2COconf2.CONTCAR and 2COconf4.CONTCAR is -0.06794655323028564
the sim score between 2COconf3.CONTCAR and 2COconf4.CONTCAR is -0.24061769247055054
