## Prediction of Density of States (DOS) using Partial Radial Distribution Function (PRDF) 

We want to study the accuracy and time performance of the featurizations used in [Schutt et al paper](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.89.205118). Here in part 1, we load some inorganic crystal compounds with maximum 6 atoms per unit cell and compute the features. 

### Importing packages 

In [None]:
import numpy as np
import pandas as pd
import pymatgen as pmg

from pymatgen.core.molecular_orbitals import MolecularOrbitals
from pymatgen import MPRester

from matminer.data_retrieval.retrieve_MP import MPDataRetrieval
from matminer.utils.conversions import dict_to_object, str_to_composition
from matminer.featurizers.composition import AtomicOrbitals
from matminer.featurizers.structure import PartialRadialDistributionFunction 

from matminer.utils.data_files.deml_elementdata import atom_num

### Loading dataset 

In [None]:
mp = MPDataRetrieval(api_key='T6QzrvW8J07u4L2O')

Getting all dataset with less than 6 atoms per unit primitive cell

In [None]:
%%time
data = mp.get_dataframe(criteria={"nsites": {"$lte": 6}},
                        properties=["pretty_formula", "structure"])
print ("Shape of retrieved data: ", data.shape)

In [None]:
data.head(1)

In [None]:
data.reset_index(inplace=True)

Drop duplicate compounds

In [None]:
data = data.drop_duplicates(subset=['pretty_formula'])
print ("Current shape of data: ", data.shape)

Convert structure to pymatgen structure object

In [None]:
data['structure_obj'] = dict_to_object(data['structure'])

Convert formula to pymatgen composition object

In [None]:
data['composition_obj'] = str_to_composition(data['pretty_formula'])

Compute orbitals occupied and set f orbital compounds to NaN.

In [None]:
data['max_atom_num'] = data['composition_obj'].apply(lambda x: max(atom_num[str(i)] for i in x))

In [None]:
def orbital_partition(x):
    if (x <= 20):
        return 'sp'
    elif (x > 20 and x < 70):
        return 'spd'
    else:
        return np.nan
    
data['max_orbital'] = data['max_atom_num'].apply(orbital_partition)

Drop compounds with f orbital

In [None]:
data.dropna(subset=['max_orbital'], inplace=True)

In [None]:
print ("Shape of data: ", data.shape)

Get DOS data of materials using MPRester

In [None]:
%%time
mprester = MPRester(api_key='T6QzrvW8J07u4L2O')
def get_dos(id):
    try:
        return mprester.get_dos_by_material_id(id)
    except:
        return np.nan

data['dos_obj'] = data['material_id'].apply(get_dos)

Drop data without Complete DOS value

In [None]:
data = data.dropna(subset=['dos_obj']).reset_index(drop=True)
print ("Shape of data: ", data.shape)

Compute DOS using matminer DOSFeaturizer

In [None]:
def compute_dos(dos):
    try:
        total_density = sum(dos.densities.values()) #sum over both spins, if present
        min_index = np.argmin(abs(dos.energies - dos.efermi))
        return total_density[min_index] # returns states/eV/_unit_cell_
    except:
        return np.nan

In [None]:
data['dos'] = data['dos_obj'].apply(compute_dos)

`compute_dos` returns DOS in unit $states/eV/unit\_cell$. Here, we divide the DOS by volume of its structure to get $states/eV/A^3$.

In [None]:
data['volume'] = data['structure_obj'].apply(lambda x: x.volume)
data['dos'] = np.true_divide(data['dos'], data['volume'])

### Compute representation 

In [None]:
prdf = PartialRadialDistributionFunction(cutoff=16.0, bin_size=1.0)

In [None]:
prdf.fit(data['structure_obj'].tolist())

In [None]:
%%time
data = prdf.featurize_dataframe(data, col_id='structure_obj', ignore_errors=True)

#### Save featurized data as pickle file

In [None]:
data.to_pickle('./schutt_featurized_data.pkl')