# Tutorial to the data

This tutorial illustrates the D-measure and SOAP distances and the literature search calculated for the article "Graph Similarity Drives Zeolite Diffusionless Transformations and Intergrowth". This implementation was made by Daniel Schwalbe-Koda. If you use this code, tutorial or data, please cite:

D. Schwalbe-Koda, Z. Jensen, E. Olivetti, and R. Gómez-Bombarelli. "Graph similarity drives zeolite diffusionless transformations and intergrowth." _Nature Materials_ (2019). Link: https://www.nature.com/articles/s41563-019-0486-1.

If you use the literature data, please cite, additionally:

Z. Jensen et al. "A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction". _ACS Central Science_ **5** (5), 892-899 (2019). Link: https://pubs.acs.org/doi/abs/10.1021/acscentsci.9b00193

## Imports

In [1]:
import pickle
import pandas as pd

## Loading the D-measure and the SOAP data

The data are pickle files saved on the `data/` folder.

In [2]:
with open('../data/soap.pkl', 'rb') as f:
    soap = pickle.load(f)

with open('../data/dmeasure.pkl', 'rb') as f:
    dmeasure = pickle.load(f)

with open('../data/dmeasure_matrices.pkl', 'rb') as f:
    d_matrices = pickle.load(f)

`soap` and `dmeasure` are dictionaries containing distances between the pairs. Keys for the dictionary are `Zeo1-Zeo2`. To save space, keys such as `Zeo1-Zeo1` and `Zeo2-Zeo1` are omitted. The dictionary is sorted alphabetically, such that `Zeo1` comes before `Zeo2` in the alphabetical order.

### Example of distances

Intergrowth between FAU and EMT zeolites

In [3]:
print('soap distance: %.4f' % soap['EMT-FAU'])
print('D-measure: %.4f' % dmeasure['EMT-FAU'])

soap distance: 0.0494
D-measure: 0.2786


### Example of transformation matrices

In [4]:
M_A, M_B = d_matrices['EAB-SOD']

When comparing the zeolites EAB and SOD, non-identity adjacency matrices minimize the graph distance between the zeolites. For EAB, we recover the identity matrix:

In [5]:
print(M_A)

[[1 0 0]
 [0 1 0]
 [0 0 1]]


In the case of SOD, we convert its cubic symmetry to an hexagonal one by applying the following transformation matrix:

In [6]:
print(M_B)

[[-1 -1  0]
 [ 0  1 -1]
 [ 1 -1 -1]]


then, we can proceed with the calculation of the graph distance between the frameworks as described in the other tutorial.

## Loading the literature data

The literature data is a pandas DataFrame saved on the `data/literature.csv` file.

In [7]:
literature = pd.read_csv('../data/literature.csv', index_col=0)

In [8]:
literature.sort_values(by=['Type', 'Zeolites', 'doi']).reset_index(drop=True)

Unnamed: 0,doi,Type,Zeolites
0,10.1016/j.cattod.2014.08.018,ador,UTL-OKO
1,10.1016/j.cattod.2015.09.036,ador,UTL-OKO
2,10.1038/nmat3455,ador,UTL-OKO
3,10.1002/anie.201406344,ador,UTL-PCR
4,10.1002/chem.201402887,ador,UTL-PCR
5,10.1016/j.cattod.2015.09.033,ador,UTL-PCR
6,10.1016/j.cattod.2015.09.036,ador,UTL-PCR
7,10.1021/jacs.7b00386,ador,UTL-PCR
8,10.1038/nchem.2761,ador,UTL-PCR
9,10.1016/j.cattod.2015.09.033,ador,UTL-PCS


## Loading the labels

We can load some labels we used in our article for the IZA zeolites. For example, we load the CBUs that each zeolite has, as extracted from the IZA website:

In [9]:
cbu = {}
with open('../data/iza_labels/cbu.csv', 'r') as f:
    for line in f.readlines():
        iza, units = line.strip('\n').split(',')
        
        cbu[iza] = units.split(' ') if units else []

In [10]:
cbu['FER']

['mor', 'fer', 'pcr']