# Training the refractive index

This notebook goes trough the main functions and objects implemented in this library. Based on a dataset containing ~4,000 entries of type (mp_id, structure, refractive index) taken from the MaterialsProject (MP). The workflow can be devided in two parts. First, the creation of a MODData object which stores the information concerning this particular dataset: the materials, the targets and optimal features. Second, a MODNetModel is trained which can later be used for predicting on unseen data.

In [4]:
import sys
sys.path.append('..')
from modnet.models import MODNetModel
from modnet.preprocessing import MODData

## 1. Loading the dataset

In this example the dataset is a dataframe saved as a pickle. But it can be any format as long as you can retreive the structures and targets (and the mpids optionally for fast featurization).

In [5]:
import pandas as pd
df = pd.read_pickle('data/df_ref_index.pkl')
print('{} datapoints'.format(len(df)))
df.head()

4022 datapoints


Unnamed: 0,structure,ref_index
mp-624234,"[[0.67808954 1.32800354 5.90141888] Te, [1.500...",2.440483
mp-560478,"[[-0.62755181 6.55361247 9.268476 ] Ba, [4....",1.790685
mp-556346,"[[4.43332093 4.12714801 8.8721209 ] Pr, [ 1.40...",2.056131
mp-13676,"[[-0.14481557 3.41229366 4.12618551] O, [3.2...",2.023772
mp-7610,"[[ 0.12549448 3.01287591 -0.20434955] Li, [1....",1.745509


## 2. Creating a MODData instance

### (a) structure, mpid, target creation

In [6]:
md = MODData(df['structure'],df['ref_index'].values,mpids = df.index, names = ['refractive_index'])

### (b) Featurizing the data
The MODData has an integrated database containing the features of many materials from the MP. By enabling fast featurization they are directtly retreived from this database and not computed from the structure.

In [7]:
md.featurize(fast=True,db_file='../modnet/data/feature_database.pkl')

Computing features, this can take time...
Fast featurization on, retrieving from database...
Retrieved features for 4022 out of 4022 materials
Data has successfully been featurized!


In [8]:
md.get_featurized_df().head()

Unnamed: 0,ElementProperty|MagpieData minimum Number,ElementProperty|MagpieData maximum Number,ElementProperty|MagpieData range Number,ElementProperty|MagpieData mean Number,ElementProperty|MagpieData avg_dev Number,ElementProperty|MagpieData mode Number,ElementProperty|MagpieData minimum MendeleevNumber,ElementProperty|MagpieData maximum MendeleevNumber,ElementProperty|MagpieData range MendeleevNumber,ElementProperty|MagpieData mean MendeleevNumber,...,OPSiteFingerprint|std_dev square pyramidal CN_5,OPSiteFingerprint|std_dev trigonal bipyramidal CN_5,OPSiteFingerprint|std_dev q2 CN_11,OPSiteFingerprint|std_dev q4 CN_11,OPSiteFingerprint|std_dev q6 CN_11,OPSiteFingerprint|std_dev L-shaped CN_2,OPSiteFingerprint|std_dev water-like CN_2,OPSiteFingerprint|std_dev bent 120 degrees CN_2,OPSiteFingerprint|std_dev hexagonal pyramidal CN_7,OPSiteFingerprint|std_dev pentagonal bipyramidal CN_7
mp-624234,8.0,82.0,74.0,32.0,30.0,8.0,81.0,90.0,9.0,85.875,...,0.186438,0.175091,0.021637,0.0472,0.072313,0.2280295,0.355493,0.217585,0.134621,0.163703
mp-560478,8.0,56.0,48.0,16.0,10.75,8.0,9.0,87.0,78.0,71.0625,...,0.098554,0.1012,0.029021,0.021497,0.036379,0.064974,0.051046,0.253411,0.061584,0.155998
mp-556346,8.0,59.0,51.0,22.307692,19.810651,8.0,17.0,96.0,79.0,83.692308,...,0.197575,0.19499,0.048936,0.049705,0.071292,0.1099133,0.268237,0.282694,0.12368,0.167256
mp-13676,8.0,81.0,73.0,21.333333,19.888889,8.0,76.0,87.0,11.0,84.5,...,0.032056,0.032056,0.046716,0.024166,0.059264,1.084202e-19,0.024395,0.199876,0.057122,0.193736
mp-7610,3.0,20.0,17.0,9.0,4.0,8.0,1.0,87.0,86.0,54.375,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### (c) Computing the optimal features

This runs the feature selction algorithm. First the multual information is computed, followed by the iterative selction based on relevance and redundancy.

This step takes time, but is normally run only once before being saved.

In [9]:
md.feature_selection(n=1100)

Starting target 1/1: refractive_index ...
Computing mutual information ...
Computing optimal features...
Selected 50/1100 features...
Selected 100/1100 features...
Selected 150/1100 features...
Selected 200/1100 features...
Selected 250/1100 features...
Selected 300/1100 features...
Selected 350/1100 features...
Selected 400/1100 features...
Selected 450/1100 features...
Selected 500/1100 features...
Selected 550/1100 features...
Selected 600/1100 features...
Selected 650/1100 features...
Selected 700/1100 features...
Selected 750/1100 features...
Selected 800/1100 features...
Selected 850/1100 features...
Selected 900/1100 features...
Selected 950/1100 features...
Selected 1000/1100 features...
Selected 1050/1100 features...
Done with target 1/1: refractive_index.
Merging all features...
Done.


In [10]:
md.get_optimal_descriptors()[:10]

['ElementProperty|MagpieData maximum GSbandgap',
 'ElementFraction|Th',
 'CrystalNNFingerprint|std_dev hexagonal bipyramidal CN_8',
 'DensityFeatures|density',
 'ElementProperty|MagpieData avg_dev Number',
 'LocalPropertyDifference|mean local difference in Electronegativity',
 'BondOrientationParameter|mean BOOP Q l=2',
 'ElementProperty|MagpieData range NdValence',
 'DensityFeatures|packing fraction',
 'OPSiteFingerprint|mean sgl_bd CN_1']

### (d) Saving the MODData

In [11]:
md.save('out/md_ref_index')

Data successfully saved!


## 3. MODNet model

### (a) Creating the MODNet

In [16]:
model = MODNetModel([[['refractive_index']]],{'refractive_index':1},n_feat=1000, num_neurons=[[128],[64],[32],[]],loss='mae', act='elu')
model.model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 1000)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               128128    
_________________________________________________________________
dense_5 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_6 (Dense)              (None, 32)                2080      
_________________________________________________________________
refractive_index (Dense)     (None, 1)                 33        
Total params: 138,497
Trainable params: 138,497
Non-trainable params: 0
_________________________________________________________________


### (b) Training the model

In [17]:
#md.shuffle()
model.fit(md,val_fraction=0.1, val_key='refractive_index', lr=0.001, epochs = 200, batch_size = 64, xscale='minmax',yscale=None)

new
(4022, 1000)
(1, 4022)
compile
fit
epoch 0: loss: 0.357, val_loss:0.072 val_refractive_index:0.186
epoch 1: loss: 0.080, val_loss:0.049 val_refractive_index:0.145
epoch 2: loss: 0.055, val_loss:0.042 val_refractive_index:0.139
epoch 3: loss: 0.044, val_loss:0.034 val_refractive_index:0.121
epoch 4: loss: 0.044, val_loss:0.043 val_refractive_index:0.144
epoch 5: loss: 0.035, val_loss:0.045 val_refractive_index:0.144
epoch 6: loss: 0.031, val_loss:0.028 val_refractive_index:0.108
epoch 7: loss: 0.029, val_loss:0.034 val_refractive_index:0.122
epoch 8: loss: 0.030, val_loss:0.029 val_refractive_index:0.116
epoch 9: loss: 0.027, val_loss:0.024 val_refractive_index:0.096
epoch 10: loss: 0.024, val_loss:0.029 val_refractive_index:0.115
epoch 11: loss: 0.023, val_loss:0.027 val_refractive_index:0.104
epoch 12: loss: 0.021, val_loss:0.024 val_refractive_index:0.099
epoch 13: loss: 0.018, val_loss:0.032 val_refractive_index:0.121
epoch 14: loss: 0.022, val_loss:0.022 val_refractive_index:0.

In [18]:
model.fit(md,val_fraction=0.1, val_key='refractive_index', lr=0.0005, epochs = 100, batch_size = 128, xscale='minmax',yscale=None)

new
(4022, 1000)
(1, 4022)
compile
fit
epoch 0: loss: 0.002, val_loss:0.010 val_refractive_index:0.056
epoch 1: loss: 0.001, val_loss:0.010 val_refractive_index:0.055
epoch 2: loss: 0.001, val_loss:0.010 val_refractive_index:0.056
epoch 3: loss: 0.001, val_loss:0.010 val_refractive_index:0.056
epoch 4: loss: 0.001, val_loss:0.010 val_refractive_index:0.054
epoch 5: loss: 0.001, val_loss:0.011 val_refractive_index:0.057
epoch 6: loss: 0.001, val_loss:0.010 val_refractive_index:0.053
epoch 7: loss: 0.001, val_loss:0.010 val_refractive_index:0.055
epoch 8: loss: 0.001, val_loss:0.010 val_refractive_index:0.054
epoch 9: loss: 0.001, val_loss:0.009 val_refractive_index:0.053
epoch 10: loss: 0.001, val_loss:0.010 val_refractive_index:0.053
epoch 11: loss: 0.000, val_loss:0.009 val_refractive_index:0.053
epoch 12: loss: 0.000, val_loss:0.011 val_refractive_index:0.057
epoch 13: loss: 0.001, val_loss:0.010 val_refractive_index:0.056
epoch 14: loss: 0.001, val_loss:0.011 val_refractive_index:0.

## 4. Saving the model

In [19]:
model.save('out/MODNet_refractive_index')

Saved model


## 5. Predicting on unseen data

See "predicting_ref_index" notebook