# MODNet on the experimental, HSE and PBE datasets

In this model, we will see if training first the MODNet model on the PBE, HSE and experimental datasets and then, fine tuning the model on the experimental dataset will improve the results.

In [1]:
from modnet.preprocessing import MODData
from modnet.models import MODNetModel
import numpy as np
import os
import copy
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

In [2]:
from sklearn.model_selection import KFold
from modnet.preprocessing import MODData

def shuffle_MD(data,random_state=10):
    data = copy.deepcopy(data)
    ids = data.df_targets.sample(frac=1,random_state=random_state).index
    data.df_featurized = data.df_featurized.loc[ids]
    data.df_targets = data.df_targets.loc[ids]
    data.df_structure = data.df_structure.loc[ids]
    
    return data

def MDKsplit(data,n_splits=5,random_state=10):
    data = shuffle_MD(data,random_state=random_state)
    ids = np.array(data.structure_ids)
    kf = KFold(n_splits=n_splits,shuffle=True,random_state=random_state)
    folds = []
    for train_idx, val_idx in kf.split(ids):
        data_train = MODData(data.df_structure.iloc[train_idx]['structure'].values,data.df_targets.iloc[train_idx].values,target_names=data.df_targets.columns,structure_ids=ids[train_idx])
        data_train.df_featurized = data.df_featurized.iloc[train_idx]
        #data_train.optimal_features = data.optimal_features
        
        data_val = MODData(data.df_structure.iloc[val_idx]['structure'].values,data.df_targets.iloc[val_idx].values,target_names=data.df_targets.columns,structure_ids=ids[val_idx])
        data_val.df_featurized = data.df_featurized.iloc[val_idx]
        #data_val.optimal_features = data.optimal_features

        folds.append((data_train,data_val))
        
    return folds

def MD_append(md,lmd):
    md = copy.deepcopy(md)
    for m in lmd:
        md.df_structure.append(m.df_structure)
        md.df_targets.append(m.df_targets)
        md.df_featurized.append(m.df_featurized)
    return md

In [3]:
md_exp = MODData.load('exp_gap_all')
md_exp.df_targets.columns = ['gap']
md_pbe = MODData.load('pbe_gap.zip')
md_pbe.df_targets.columns = ['gap']
md_hse = MODData.load('hse_gap.zip')
md_hse.df_targets.columns = ['gap']


If you use the ChemEnv tool for your research, please consider citing the following reference(s) :
David Waroquiers, Xavier Gonze, Gian-Marco Rignanese, Cathrin Welker-Nieuwoudt, Frank Rosowski,
Michael Goebel, Stephan Schenk, Peter Degelmann, Rute Andre, Robert Glaum, and Geoffroy Hautier,
"Statistical analysis of coordination environments in oxides",
Chem. Mater., 2017, 29 (19), pp 8346-8360,
DOI: 10.1021/acs.chemmater.7b02766



INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f631495c610> object, created with modnet version <=0.1.7
INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f6314962970> object, created with modnet version <=0.1.7
INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f628d2fa490> object, created with modnet version 0.1.8~develop


In [10]:
k = 5
random_state = 202010
folds = MDKsplit(md_exp,n_splits=k,random_state=random_state)
maes_ph1 = np.ones(5)
maes_ph2 = np.ones(5)
maes = np.ones(5)
for i,f in enumerate(folds):
    train = f[0]
    test = f[1]
    fpath = 'train_{}_{}'.format(random_state,i+1)
    if os.path.exists(fpath):
        train = MODData.load(fpath)
        train.df_targets.columns=['gap']
    else:
        train.feature_selection(n=-1)
        train.save(fpath)
        
    # assure no overlap
    assert len(set(train.df_targets.index).intersection(set(test.df_targets.index))) == 0
    
    #phase 1
    md = MD_append(train,[md_pbe,md_hse])
    
    model = MODNetModel([[['gap']]],{'gap':1})
    model.fit_preset(md,verbose=0)
    
    pred = model.predict(test)
    true = test.df_targets
    error = pred-true
    error = error.drop(pred.index[((pred['gap']).abs()>20)]) # drop unrealistic values: happens extremely rarely
    mae = np.abs(error.values).mean()
    print('mae_ph1')
    print(mae)
    maes_ph1[i] = mae
    
        # phase2
    rlr = ReduceLROnPlateau(monitor="loss", factor=0.5, patience=20, verbose=1, mode="auto", min_delta=0)
    es = EarlyStopping(monitor="loss", min_delta=0.001, patience=300, verbose=1, mode="auto", baseline=None,restore_best_weights=True)
    model.fit(train,lr=0.01, epochs = 250, batch_size = 32, loss='mae', callbacks=[rlr,es], verbose=0)
    model.fit(train,lr=0.01, epochs = 250, batch_size = 64, loss='mae', callbacks=[rlr,es], verbose=0)
    model.fit(train,lr=0.01, epochs = 250, batch_size = 128, loss='mae', callbacks=[rlr,es], verbose=0)
    model.fit(train,lr=0.01, epochs = 250, batch_size = 256, loss='mae', callbacks=[rlr,es], verbose=0)
    
    pred = model.predict(test)
    true = test.df_targets
    error = pred-true
    error = error.drop(pred.index[((pred['gap']).abs()>20)]) # drop unrealistic values: happens extremely rarely
    mae = np.abs(error.values).mean()
    print('mae')
    print(mae)
    maes[i] = mae 
    

INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded DeBreuck2020Featurizer featurizer.
INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f61eaae0190> object, created with modnet version 0.1.8~develop
INFO:root:Training preset #1/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.359
INFO:root:Training preset #2/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.369
INFO:root:Training preset #3/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 

mae_ph1
0.4132849590166959

Epoch 00036: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00117: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00151: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00215: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00071: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00123: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00183: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00228: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00035: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00075: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00147: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00194: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00246: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00037: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00093: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00139: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00170: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00223: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.
mae
0.4229509172268944


INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f61cc8fb1f0> object, created with modnet version 0.1.8~develop
INFO:root:Training preset #1/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.360
INFO:root:Training preset #2/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.383
INFO:root:Training preset #3/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.359
INFO:root:Training preset #4/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.371
INFO:root:Training preset #5/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.390
INFO:root:Training preset #6/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.346
INFO:root:Training preset #7/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.374
INFO:root:Training preset #8/1

mae_ph1
0.35829314908498844

Epoch 00027: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00119: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00179: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00224: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00051: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00087: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00117: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00162: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00204: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00028: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00057: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00090: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00117: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00143: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00180: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00227: ReduceLROnPlateau reducing learning rate to 7.812499825377017e-05.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00042: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00083: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00118: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00148: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00176: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00228: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.
mae
0.33912509566118193


INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f61ccbdae50> object, created with modnet version 0.1.8~develop
INFO:root:Training preset #1/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.424
INFO:root:Training preset #2/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.414
INFO:root:Training preset #3/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.358
INFO:root:Training preset #4/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.395
INFO:root:Training preset #5/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.398
INFO:root:Training preset #6/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.412
INFO:root:Training preset #7/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.344
INFO:root:Training preset #8/1

mae_ph1
0.4690442903929911

Epoch 00024: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00071: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00113: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00149: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00198: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00023: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00076: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00136: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00243: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00059: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00097: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00126: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00159: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00187: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00246: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00038: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00102: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00132: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00167: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00196: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00245: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.
mae
0.45525874754995055


INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f61f58178b0> object, created with modnet version 0.1.8~develop
INFO:root:Training preset #1/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.395
INFO:root:Training preset #2/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.429
INFO:root:Training preset #3/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.438
INFO:root:Training preset #4/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.385
INFO:root:Training preset #5/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.387
INFO:root:Training preset #6/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.380
INFO:root:Training preset #7/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.377
INFO:root:Training preset #8/1

mae_ph1
0.4107633841241524

Epoch 00021: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00072: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00110: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00227: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00077: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00123: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00153: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00196: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00248: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00025: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00060: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00140: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00189: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00027: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00081: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00118: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00145: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00174: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00209: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00237: ReduceLROnPlateau reducing learning rate to 7.812499825377017e-05.
mae
0.3987802996246108


INFO:root:Loaded <modnet.preprocessing.MODData object at 0x7f61bc46cd00> object, created with modnet version 0.1.8~develop
INFO:root:Training preset #1/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.417
INFO:root:Training preset #2/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.506
INFO:root:Training preset #3/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.396
INFO:root:Training preset #4/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.411
INFO:root:Training preset #5/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.438
INFO:root:Training preset #6/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.459
INFO:root:Training preset #7/16
INFO:root:Compiling model...
INFO:root:Fitting model...
INFO:root:Validation loss: 0.402
INFO:root:Training preset #8/1

mae_ph1
0.3930739010602724

Epoch 00046: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00066: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00086: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00106: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00126: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00146: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00166: ReduceLROnPlateau reducing learning rate to 7.812499825377017e-05.

Epoch 00186: ReduceLROnPlateau reducing learning rate to 3.9062499126885086e-05.

Epoch 00206: ReduceLROnPlateau reducing learning rate to 1.9531249563442543e-05.

Epoch 00226: ReduceLROnPlateau reducing learning rate to 9.765624781721272e-06.

Epoch 00246: ReduceLROnPlateau reducing learning rate to 4.882812390860636e-06.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00056: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00079: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00108: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00134: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00157: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00181: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00231: ReduceLROnPlateau reducing learning rate to 7.812499825377017e-05.


INFO:root:Compiling model...
INFO:root:Fitting model...



Epoch 00069: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00103: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00123: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00144: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00164: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00184: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00204: ReduceLROnPlateau reducing learning rate to 7.812499825377017e-05.

Epoch 00224: ReduceLROnPlateau reducing learning rate to 3.9062499126885086e-05.


INFO:root:Compiling model...



Epoch 00244: ReduceLROnPlateau reducing learning rate to 1.9531249563442543e-05.


INFO:root:Fitting model...



Epoch 00031: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.

Epoch 00069: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.

Epoch 00106: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.

Epoch 00126: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Epoch 00146: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.

Epoch 00166: ReduceLROnPlateau reducing learning rate to 0.00015624999650754035.

Epoch 00186: ReduceLROnPlateau reducing learning rate to 7.812499825377017e-05.

Epoch 00206: ReduceLROnPlateau reducing learning rate to 3.9062499126885086e-05.

Epoch 00226: ReduceLROnPlateau reducing learning rate to 1.9531249563442543e-05.

Epoch 00246: ReduceLROnPlateau reducing learning rate to 9.765624781721272e-06.
mae
1.1986566424816412


In [11]:
maes_ph1.mean()

0.40889193673582014

In [12]:
maes.mean()

0.5629543405088558

#### Conclusion

No improvement.