# Featurization of Elastic Tensor data 

We want to study the performance of model proposed in [De Jong's paper](https://www.nature.com/articles/srep34256). Here, features for elastic tensor data is added according to that as shown in the paper. Data is then saved as pickle file.

In [13]:
import numpy as np
import pandas as pd
import pymatgen as pmg

from matminer.datasets.dataframe_loader import load_elastic_tensor
from matminer.utils.conversions import str_to_composition

from matminer.featurizers.composition import ElementProperty, CohesiveEnergy
from matminer.featurizers.structure import SiteStatsFingerprint
from matminer.featurizers.site import CoordinationNumber
from matminer.featurizers.base import MultipleFeaturizer
from pymatgen import MPRester
from pymatgen.analysis.local_env import VoronoiNN

key = 'T6QzrvW8J07u4L2O'

Load data

In [2]:
data = load_elastic_tensor()

Compute composition object from formula provided

In [3]:
data['composition'] = str_to_composition(data['formula'])

Drop unnecessary data

In [4]:
data = data.drop(['formula', 'nsites', 'space_group', 
               'G_Reuss', 'G_Voigt', 'K_Reuss', 'K_Voigt',
               'compliance_tensor', 'elastic_tensor', 'elastic_tensor_original'], 1)

## Compute features 

#### Holder Means of first 8 properties in Table 1 of De Jong et al. paper 
'group_number', 'atomic_mass', 'atomic_radius', 'row_number', 'boiling_temp', 'melting_temp', 'electronegativity', 'atomic_number'
$$\mu_p(x) = [\frac{(\Sigma^n_{i=1}w_ix_i^p)}{(\Sigma^n_{i=1}w_i)}]^\frac{1}{p}$$

In [5]:
ef = ElementProperty(data_source='pymatgen', 
                    features=['group', 'atomic_mass', 'atomic_radius', 'row',
                              'boiling_point', 'melting_point', 'X', 'Z'],  
                    stats=['holder_mean::%d'%d for d in range(-4, 4+1)] + ['geom_std_dev', 'std_dev'])
data = ef.featurize_dataframe(data, col_id='composition')


divide by zero encountered in double_scalars


invalid value encountered in double_scalars


divide by zero encountered in double_scalars


invalid value encountered in double_scalars


divide by zero encountered in double_scalars


invalid value encountered in double_scalars


divide by zero encountered in double_scalars


invalid value encountered in double_scalars

ElementProperty: 100%|██████████| 1181/1181 [00:01<00:00, 1066.73it/s]


#### Cohesive Energy 

In [6]:
%%time
ft = CohesiveEnergy(mapi_key=key)

data = ft.featurize_dataframe(data, col_id='composition', ignore_errors=True)

CohesiveEnergy: 100%|██████████| 1181/1181 [00:49<00:00, 23.68it/s]


CPU times: user 121 ms, sys: 86.4 ms, total: 208 ms
Wall time: 1min 40s


#### Formation energy per atom, Energy above hull, band gap, density 

In [7]:
mpr = MPRester(api_key=key)

In [8]:
%%time
data['formation_energy_per_atom'], data['e_above_hull'], data['band_gap'], data['density'] = np.nan, np.nan, np.nan, np.nan
for idx, n in enumerate(data['material_id']):
    ls = mpr.get_data(n)
    try:
        data['formation_energy_per_atom'][idx] = ls[0]['formation_energy_per_atom']
        data['e_above_hull'][idx] = ls[0]['e_above_hull']
        data['band_gap'][idx] = ls[0]['band_gap']
        data['density'][idx] = ls[0]['density']
    except:
        pass

CPU times: user 21.1 s, sys: 1.37 s, total: 22.5 s
Wall time: 6min 21s


#### log (V) per atom 

In [9]:
def compute_log_volume(x):
    return np.log(x['volume']/x['composition'].num_atoms)

In [10]:
data['log volume per atom'] = data.apply(compute_log_volume, axis=1)

#### Voronoi-based average bond length, bond angles and mean AD and SD of composition features

In [11]:
ft = MultipleFeaturizer([
    SiteStatsFingerprint.from_preset("Composition-dejong2016_AD"), 
    SiteStatsFingerprint.from_preset("Composition-dejong2016_SD"), 
    SiteStatsFingerprint.from_preset("BondLength-dejong2016"), 
    SiteStatsFingerprint.from_preset("BondAngle-dejong2016")
])

data = ft.featurize_dataframe(data, col_id='structure')


divide by zero encountered in log


divide by zero encountered in log


divide by zero encountered in log


divide by zero encountered in log

MultipleFeaturizer: 100%|██████████| 1181/1181 [10:00<00:00,  1.97it/s] 


#### Voronoi based site coordination number

In [14]:
ft = SiteStatsFingerprint(CoordinationNumber(nn=VoronoiNN(weight='area')), 
        stats=['holder_mean::%d' % d for d in range(-4, 4 + 1)]
                        + ['std_dev', 'geom_std_dev'])

data = ft.featurize_dataframe(data, col_id='structure')

SiteStatsFingerprint: 100%|██████████| 1181/1181 [03:47<00:00,  5.19it/s]


In [15]:
print ("FINAL SHAPE OF DATA: ", data.shape)
data.head(1)

FINAL SHAPE OF DATA:  (1181, 195)


Unnamed: 0,material_id,volume,structure,elastic_anisotropy,G_VRH,K_VRH,poisson_ratio,composition,holder_mean::-4 group,holder_mean::-3 group,...,holder_mean::-3 CN_VoronoiNN,holder_mean::-2 CN_VoronoiNN,holder_mean::-1 CN_VoronoiNN,holder_mean::0 CN_VoronoiNN,holder_mean::1 CN_VoronoiNN,holder_mean::2 CN_VoronoiNN,holder_mean::3 CN_VoronoiNN,holder_mean::4 CN_VoronoiNN,std_dev CN_VoronoiNN,geom_std_dev CN_VoronoiNN
0,mp-10003,194.419802,"[[0.94814328 2.07280467 2.5112 ] Nb, [5.273...",0.030688,97.141604,194.268884,0.285701,"(Nb, Co, Si)",5.495497,5.623652,...,12.350149,12.602521,12.857143,13.103707,13.333333,13.540064,13.721244,13.876971,2.357023,1.020127


Save data as pickle file

In [16]:
data.to_pickle('./dejong_featurized_data.pkl')