# Generate Molecular Structure Descriptors for a Zeolite

Nanoporous materials such as zeolites have pore dimensions similar to that of individual molecules and are used widely in industry as adsorbents, catalysts and chemical separation membranes. The nanoscale cavities in these materials serve as shape and size selective sites facilitating chemical reactions as well as storage, while the channels serve as molecular sieves that can be used for gas separations replacing energetically less efficient distillation processes. 

This notebook presents an approach outlined in [1] for generating computationally efficient digital representations of the molecular structure of nanoporous materials that are then used to compute a number of geometric and statistical descriptors for pore structures. The described methods are capable of identifying and labeling the transport relevant accessible regions in the porous crystals for any user-defined non-spherical atomic-scale morphology. These descriptors can be used as predictors for transport properties.

The notebooks includes the following steps,

 1. [load a cif file with the Zeolite structure](#Load-Structure-of-Interest)
 1. [generate a voxelized representation of the molecular structure](#Generate-Voxelized-Representation-of-the-Pore-Structure)
 1. [compute conventional pore metrics](#Compute-Conventional-Pore-Metrics---PLD-and-LCD)
 1. [compute transport channels through the pore structure](#Geometric-and-Statistical-analysis-of-diffusion-pathways)

![image of zeolite](./DDR_structure.gif)

The image shows the 277x240x813 voxelized molecular structure of the unit cell of a 3D bulk zeolite (namely DDR) at a grid resolution of 0.1 Å. Red voxels correspond to oxygen and orange voxels correspond to silicon atoms.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import sys
sys.path.append("../../../")

In [3]:
import warnings
warnings.filterwarnings("ignore")
import os
import ase
import time
import glob
import numpy as np
import pandas as pd
import ase.io as aio
import scipy.io as sio
from pathlib import Path
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
from toolz.curried import pipe, curry, compose
from collections import defaultdict, OrderedDict

import pymks.atommks.porosity as pore
from pymks.atommks.helpers import write2vtk, save_file, load_file, generate_tubular_paths
from pymks.atommks.grid_generator import generate_grids

from pymks.atommks.canonical_paths import calc_path_distance, calc_path_distances_matrix, calc_canonical_paths

np.set_printoptions(precision=1)

In [4]:
def get_radius(atom_id, radius_type="vdw"):
    """
    Get the radius of the atom
    
    Args:
      atom_id: element symbol
      radius_type = "vdw" for Van der Waals or "cov" for Covalent
      
    Returns:
      the atomic radius
      
    >>> get_radius('Na')
    2.27
    """
    xl = pd.ExcelFile("Elemental_Radii.xlsx")
    df = xl.parse(sheet_name=0, header = 2, index_col=1)
    
    if radius_type is "cov":
        key = 6
    elif radius_type is "vdw":
        key = 7
    else:
        raise ValueError("radius_type not supported")
    if atom_id in df.index:
        return df.loc[atom_id][key]
    else:
        raise ValueError("Elemental symbol not found")

def get_structure_data(cif_file_path, resize_unit_cell=1):
    """
    Get the ASE atom object (a molecule in many cases) and corresponding
    radii for each atom in the molecule
    
    Args:
      cif_file_path: path to the CIF file
      resize_unit_cell: allows a resize of the atom object
      
    Returns:
      a tuple of the ASE atom object and dictionary of atom radii
    
    >>> get_structure_data('iza_zeolites/DDR.cif')[0].get_cell_lengths_and_angles()
    array([ 27.59,  27.59,  81.5 ,  90.  ,  90.  , 120.  ])
    
    """
    ase_atom = aio.read(cif_file_path).repeat(resize_unit_cell if hasattr(resize_unit_cell, "__len__") else [resize_unit_cell] * 3)
    atom_ids = sorted(np.unique(ase_atom.get_chemical_symbols()))
    return (
        ase_atom,
        {idx:get_radius(idx) for idx in atom_ids}
    )

## Load the Zeolite Structure

In the following steps we use the `get_structure_data` function to load the ASE atom object and the corresponding atomic radii. The `get_radius` function loads the atom radii from the [Cambridge Crystlaographic Structural Database][cam]. This is used to give each atom a spherical volume in the voxelized representation of the structure.

[cam]: https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/Elemental_Radii.xlsx

In [5]:
file_path = "iza_zeolites/MFI.cif"
cif = file_path.split("/")[-1][:-4]
ase_atom, radii = get_structure_data(Path(file_path), [2, 2, 1])

The ASE atom object

In [6]:
ase_atom

Atoms(symbols='O768Si384', pbc=True, cell=[40.18, 39.476, 13.142], spacegroup_kinds=...)

The atomic radii of the atoms types in the structure.

In [7]:
radii

{'O': 1.52, 'Si': 2.1}

Number of atoms in the structure.

In [8]:
len(ase_atom)

1152

## Generate Voxelized Representation of the Pore Structure

The `generate_grids` function generates a voxelized representation of the structure. It returns a dictionary of grids with each grid representing a possible state of the system. Each voxel can only be in one of these states. `n_pixel` represents the number of pixels in a unit length defined used in `ase_atom` (generally Å).

In [9]:
%%time
grid_data = generate_grids(
    ase_atom,
    n_pixel=10,
    atomic_radii=radii,
    extend_boundary_atoms=False,
    use_fft_method=False
)

CPU times: user 2.26 s, sys: 383 ms, total: 2.64 s
Wall time: 2.55 s


CPU times: user 17.2 s, sys: 1.82 s, total: 19 s
Wall time: 5.68 s

The keys represent the possible states of the system. Here we have `pores` for empty voxels, `O` for oxygen and `Si` for Silicon.

In [10]:
grid_data.keys()

dict_keys(['pores', 'n_pixel', 'O', 'Si'])

The size of the grids are 277x240x813 voxels, which is sufficent to capture the 2x2x2 sized representation at the resolution of 0.1 Å (`n_pixel` defines 10 pixels per Å)

In [11]:
grid_data['pores'].shape

(403, 396, 133)

In [12]:
grid_data["distance_grid"] = pore.calc_euclidean_distance(grid_data['pores'], n_pixel=grid_data['n_pixel'])

## Compute Conventional Pore Metrics - PLD, LCD, ASA and AV

Here we compute four global pore metrics, the pore limiting diameter (PLD), the largest cavity diameter (LCD), the accessible surface area (ASA) and the accessible volume (AV). The PLD refers to the maximum size of a molecule that can pass through the structure in a particular direction. The LCD is the radial size of the largest sphere that can fit inside any cavity. The ASA for is the combined internal surface area of all cavities that can be accessed by a foreign molecule. The AV is the combined volume of all cavities that can be accessed by a foreign molecule.

We use the `calc_pore_metrics` function to calculate the pore metrics. Internally, this uses the Euclidean distance from the pore phase to the nearest atom. This calculation is direction dependent as the PLD 
calculation simulates a probe molecule traversing the structure in a particular direction (by default the direction is assumed to be the last axis, i.e. z-direction for 3D structures).

In [13]:
metrics = pore.calc_pore_metrics(grid_data["distance_grid"], n_pixel=grid_data['n_pixel'])

In [14]:
metrics

{'pld': 3.8828125,
 'lcd': 6.462197780609131,
 'asa': 9198.990000000002,
 'av': 4793.882000000001}

The PLD values will be different in the x-direction for example (`axis=0` signifies the x-direction).

In [15]:
# metrics_x = pore.calc_pore_metrics(grid_data['pores'], n_pixel=grid_data['n_pixel'], axis=0)

## Compute unique diffusion pathways through the structure

In [16]:
%%time
dists_dict, canonical_dists_dict = pore.calc_diffusion_paths(grid_data["distance_grid"], 
                                                                    r_probe=0.5, 
                                                                    n_pixel=grid_data["n_pixel"])

171it [00:51,  3.31it/s]

CPU times: user 52.9 s, sys: 25.8 s, total: 1min 18s
Wall time: 1min 32s





In [17]:
%%time
cif = file_path.split("/")[-1][:-4]
print(cif)
save_file(obj=canonical_dists_dict, fname=f"canonical_paths_dict_{cif}.pkl")

MFI
CPU times: user 17.4 ms, sys: 15.5 ms, total: 32.8 ms
Wall time: 126 ms


In [18]:
%%time
key = list(canonical_dists_dict.keys())[-2]
pores = np.zeros(grid_data["pores"].shape)
for path in dists_dict[key]:
    pores[path["indxs"]] = 1
pores = generate_tubular_paths(pores)*10 + (grid_data["distance_grid"] > 1.0)*1
write2vtk(pores, "%s_pores.vtk" % cif)

CPU times: user 11.1 s, sys: 765 ms, total: 11.9 s
Wall time: 12.9 s
