***
$$\mathbf{\text{Simulation/Experimental Output Processing and ML-MF Correction}}$$<br>
$$\mathbf{\text{Author: Kenneth Kusima}}$$<br>
$\mathbf{\text{Date: 06/01}}$<br>

#### Note sw: Switching -> CO<->O2 ; CO*<->O* To match KMC specnum file


#### It also |uses the MKM input files that match this order *_sw

#### As well as the new test set KMC_NonDynamic_Data_iCovg_iRates_sw

#### Switch the pressures accordingly CO <-> O2

***
$\mathbf{\text{Model being explored: Simple 4-step CO Oxidation}}:$<br>

${\text{Corresponding Micro Kinetic Model}}:$<br>
***
$$\require{mhchem}$$       
---
Overall Reaction: 
$$ CO + \frac{1}{2} O_2 {\stackrel{\tiny{\textrm{Pt/Pd}}}{\rightleftharpoons}} CO_2 $$
---
Note Reations in the Reaction Mechanism may be reversible or irreversible

Reaction 1:&emsp;Adsorption of CO

$$ CO + * \rightleftharpoons CO^{*} $$

Reaction 2:&emsp;Adsorption of $O_2$

$$ O_2 + * \rightleftharpoons {O_2}^{*} $$

Reaction 3:&emsp;Dissociation of ${O_2}^*$ 

$$ {O_2}^* + * \rightleftharpoons 2{O}^* $$

Reaction 4:&emsp;Surface Reaction of $CO$ and $O_2$  

$$ {CO}^{*} + {O}^{*} \rightleftharpoons CO_2 + 2* $$

In [1]:
import glob, os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shutil

***

<a id="top"></a><br>
 # Table of Contents  
1. [Loading in the labeled simulation folders](#1)   
    1. [Checking all simulations were completed as expected](#1.1)
1. [Developing ML Training Dataset for rate correction](#2) 
    1. [Generating Experimental Data Dictionary](#2.1) 
    1. [Generating MF-MKM Data Dictionary](#2.2)     
    1. [Creating Features](#2.3)      
        C1. [Log ratio](#2.3.1)     
        C2. [Percent Difference](#2.3.2)    
        
    1. [Creating Input/Feature Tensor](#2.4)
    1. [Extracting Full X (Feature) and Y(Target) datasets](#2.5)
    1. [Performing Train/Test X and Y Split datasets](#2.6)
1. [Modelling](#3)
1. [Describing Possible Machine Learning Model Algorithms](#4)
1. [Selecting and Training the Model](#5)
1. [Importing External/Experimental Data to be used in the model](#6)
    1. [Generating corresponding MF-MKModel](#6.1) 
    1. [Predicting Machine-Learned Mean-Field Corrections](#6.2)
    1. [ML Correction to MF-MKModel](#6.3)
    1. [Evaluating the ML model prediction](#6.4)
    1. [Plotting results](#6.5)
    
1. [Exploring and Evaluating possible ML options](#7)

***
<a id="1"></a> 
## 1. Loading in the labeled simulation folders
<a href="#top">Back to top</a>

In [2]:
os.getcwd()

'/Users/klkusima/Desktop/RASH_Research/CO_ox/Kinetics/OOP_Kinetics/MiKi/Input_data_KMC_Lateral/KMC_output_varied_coverage'

In [3]:
%%time
Sim_folder_names = []
i = 0
for file in glob.glob("Sim_*"):
    Sim_folder_names.append(file)
    i+=1
print('Number of simulations:',i)
print(Sim_folder_names)
#os.chdir(os.path.normpath(os.getcwd() + os.sep + os.pardir)) #Changes directory back to where this script is

Number of simulations: 55
['Sim_A_0_B_77', 'Sim_A_44_B_55', 'Sim_A_55_B_22', 'Sim_A_22_B_77', 'Sim_A_88_B_11', 'Sim_A_22_B_22', 'Sim_A_33_B_0', 'Sim_A_33_B_55', 'Sim_A_0_B_22', 'Sim_A_22_B_0', 'Sim_A_66_B_11', 'Sim_A_11_B_44', 'Sim_A_11_B_88', 'Sim_A_11_B_11', 'Sim_A_0_B_100', 'Sim_A_77_B_11', 'Sim_A_11_B_33', 'Sim_A_11_B_66', 'Sim_A_66_B_33', 'Sim_A_44_B_0', 'Sim_A_44_B_22', 'Sim_A_55_B_0', 'Sim_A_100_B_0', 'Sim_A_22_B_55', 'Sim_A_0_B_55', 'Sim_A_33_B_22', 'Sim_A_33_B_66', 'Sim_A_0_B_11', 'Sim_A_44_B_33', 'Sim_A_55_B_44', 'Sim_A_66_B_0', 'Sim_A_0_B_0', 'Sim_A_22_B_11', 'Sim_A_55_B_11', 'Sim_A_22_B_44', 'Sim_A_33_B_33', 'Sim_A_0_B_88', 'Sim_A_0_B_44', 'Sim_A_11_B_22', 'Sim_A_66_B_22', 'Sim_A_11_B_77', 'Sim_A_77_B_0', 'Sim_A_88_B_0', 'Sim_A_11_B_0', 'Sim_A_11_B_55', 'Sim_A_77_B_22', 'Sim_A_44_B_44', 'Sim_A_0_B_66', 'Sim_A_33_B_11', 'Sim_A_22_B_66', 'Sim_A_55_B_33', 'Sim_A_22_B_33', 'Sim_A_44_B_11', 'Sim_A_0_B_33', 'Sim_A_33_B_44']
CPU times: user 587 µs, sys: 580 µs, total: 1.17 ms
Wall

In [4]:
set_init_coverages = np.empty([len(Sim_folder_names),4])
#Remember: A='CO*'; B='O*'
#Reading A and B initial coverages from the KMC simulation input coverage files
c = 0 #counter
for s in Sim_folder_names:
    set_coverages = []
    for i in np.arange(len(s)):
        if i<(len(s)-2) and s[i].isdigit() and (s[i+1]).isdigit() and (s[i+2]).isdigit():
            cov_triple = int(s[i:i+3])
            set_coverages.append(cov_triple)
            
        elif i<(len(s)-1) and s[i].isdigit() and (s[i+1]).isdigit()and not((s[i-1]).isdigit()):
            cov_double = int(s[i:i+2])
            set_coverages.append(cov_double)
            
#             print(cov_double)
        elif s[i].isdigit() and not((s[i-1]).isdigit()) and not((s[i-2]).isdigit()):
            cov_single = int(s[i])
            set_coverages.append(cov_single)
                                #B_O*_covg,     A_CO*_covg,     O2*_covg,*_covg
    set_init_coverages[c,:] = [set_coverages[1],set_coverages[0],0,100-sum(set_coverages)]
    c+=1 #counter

In [5]:
os.getcwd()

'/Users/klkusima/Desktop/RASH_Research/CO_ox/Kinetics/OOP_Kinetics/MiKi/Input_data_KMC_Lateral/KMC_output_varied_coverage'

In [6]:
#Checking to see match
## Copying all the other input files into the different simulation folders
# Extracting initial coverages
#Remember: A='CO*'; B='O*'
n_points = 500 #From KMC simulation 
n_gas_species = 3 #From KMC simulation
n_surf_species = 4 #From KMC simulation

Exp_init_coverages = np.empty([len(Sim_folder_names),n_surf_species])
c = 0 #counter
for s in Sim_folder_names:
    os.chdir(s)
    file=open('specnum_output.txt','r').readlines() #Reading in the relevant file
    b=[]
    for i in np.arange(len(file)): 
        b.append(file[i].split())                   #Dividing the rows into columns
    o = pd.DataFrame(data=b)                        #Final output

#     print(o)
    #Extracting Number of Sites from the general_output file:
    inp=open('general_output.txt','r').readlines()
    for i in np.arange(len(inp)): 
        if 'Total number of lattice sites:' in inp[i]:
            val = i  #Line in text file where sentence is present

    sites = int(inp[val][35:])
    
    #Finding number of surface species
    headings = (o.iloc[0,:])
    n_ss = sum('*' in h for h in headings) #Number of surface species
    
    #Finding number of gas species
    n_gs = len(headings)-5-n_ss
    
    #Adding column to calculate number of empty sites
    n_c=(len(o.iloc[0,:])) #number of current columns
    o[n_c]=" "           #Creating new empty column 
    o.iloc[0,n_c]="*"    #Labelling the new empty column 

    st = 0 #Initializing empty site coverage vector


    for i in range(len(o.iloc[1:])):
        if n_ss==0:
            site = sites
        else:
            for j in range(n_ss):
                st = st + float(o.iloc[i+1,5+j]) #Calculating no. of empty sites #Asuming empty sites are first to be reportes (i.e @5)
            site = sites - st
            st = 0
        o.iloc[i+1,n_c] = site
    
    Sspecies = []
    for i in range(n_ss):
        Sspecies.append(5+i) 
    Sspecies.append(len(o.iloc[1,:])-1)#Including empty sites

    #Calculating itme:
    Gtime = o[2][1:].astype(float) 
    #Calculating coverages:
    Scoverages = np.empty([len(o.iloc[:,1])-1,len(Sspecies)])
    for i in range(len(Scoverages[1,:])):
        Scoverages[:,i] = o[Sspecies[i]][1:].astype(float)/sites
        
    exp_init_covg = []
    for i in np.arange(n_surf_species):    #B_O*_covg,     A_CO*_covg,     O2*_covg, *_covg
        exp_init_covg.append(Scoverages[0,i])
        
    Exp_init_coverages[c,:] = exp_init_covg
    
    c+=1
    
    
    os.chdir(os.path.normpath(os.getcwd() + os.sep + os.pardir)) #Changes directory back to where this script is

<a id="1.1"></a> 
## 1.1 Checking all simulations were completed as expected
<a href="#top">Back to top</a>

In [7]:
if np.shape(set_init_coverages) != np.shape(Exp_init_coverages):
    raise Exception('Not all simulations have been completed successfully')
    
for i in np.arange(np.shape(set_init_coverages)[0]):
    for j in np.arange(np.shape(set_init_coverages)[1]):
        norm_val = set_init_coverages[i,j]
        exp_val = round(Exp_init_coverages[i,j])
        if not( norm_val + 1 > exp_val) and not(norm_val - 1 < exp_val): #i.e if not within range
            raise Exception('Initial coverages used in the simulation are not the same as it was set')
            
        if (i==(np.shape(set_init_coverages)[0] - 1) and j==(np.shape(set_init_coverages)[1] - 1)):
            print('SIMULATIONS MATCH AS EXPECTED')

SIMULATIONS MATCH AS EXPECTED


***
<a id="2"></a> 
## 2. Developing ML Training Dataset for rate correction
<a href="#top">Back to top</a>

## Dictionary formats:

***
<a id="2.1"></a> 
## 2.A  Generating Experimental Data Dictionary
<a href="#top">Back to top</a>

In [8]:
#os.chdir(os.path.normpath(os.getcwd() + os.sep + os.pardir)) #Changes directory back to where this script is

In [9]:
%%time
## Copying all the other input files into the different simulation folders
# Extracting initial coverages
#Remember: A='CO*'; B='O*'
n = len(Sim_folder_names)

Covg = np.zeros((n,n_points,n_surf_species)) #CO*, O*, O2*, *
Rates = np.zeros((n,n_points,n_gas_species)) #CO, O2, CO2
KMC_time_Array = np.zeros((n,n_points))
init_coverages = np.empty([n,n_surf_species])
c = 0 #counter for number of simulation (folders)

for s in Sim_folder_names:
    os.chdir(s)
    file=open('specnum_output.txt','r').readlines() #Reading in the relevant file
    b=[]
    for i in np.arange(len(file)): 
        b.append(file[i].split())                   #Dividing the rows into columns
    o = pd.DataFrame(data=b)                        #Final output

#     print(o)
    #Extracting Number of Sites from the general_output file:
    inp=open('general_output.txt','r').readlines()
    for i in np.arange(len(inp)): 
        if 'Total number of lattice sites:' in inp[i]:
            val = i  #Line in text file where sentence is present

    sites = int(inp[val][34:])
    
    #Finding number of surface species
    headings = (o.iloc[0,:])
    n_ss = sum('*' in h for h in headings) #Number of surface species
    
    #Finding number of gas species
    n_gs = len(headings)-5-n_ss
    
    #Adding column to calculate number of empty sites
    n_c=(len(o.iloc[0,:])) #number of current columns
    o[n_c]=" "           #Creating new empty column 
    o.iloc[0,n_c]="*"    #Labelling the new empty column 

    st = 0 #Initializing empty site coverage vector


    for i in range(len(o.iloc[1:])):
        if n_ss==0:
            site = sites
        else:
            for j in range(n_ss):
                st = st + float(o.iloc[i+1,5+j]) #Calculating no. of empty sites #Asuming empty sites are first to be reportes (i.e @5)
            site = sites - st
            st = 0
        o.iloc[i+1,n_c] = site
    
    Sspecies = []
    for i in range(n_ss):
        Sspecies.append(5+i) 
    Sspecies.append(len(o.iloc[1,:])-1)#Including empty sites

    #Calculating itme:
    Gtime = o[2][1:].astype(float) 
    
    #Calculating coverages:
    Scoverages = np.empty([len(o.iloc[:,1])-1,len(Sspecies)])
    for i in range(len(Scoverages[1,:])):
        Scoverages[:,i] = o[Sspecies[i]][1:].astype(float)/sites
        
    Gspecies = []
    for i in range(n_gs):
        Gspecies.append(5+n_ss+i) 
        
    #Extracting the number of gas species molecules:    
    Gnmol = np.empty([len(o.iloc[:,1])-1,len(Gspecies)])
    for i in range(len(Gnmol[1,:])):
        Gnmol[:,i] = o[Gspecies[i]][1:].astype(float)
    
    ### Calculating the instantaneous rates of profuction (i.e grad/sites)
    TOF_GS = np.empty([len(o.iloc[:,1])-1,len(Gspecies)]) #initializing an array of instantaneous TOFs for gaseous species

    for i in np.arange(len(Gspecies)):
        grads = np.gradient(Gnmol[:,i],Gtime,edge_order=2)
        TOF_GS[:,i] = grads/sites
    
    
    #initializing TOF for gas species
    STOF = np.empty([n_points,n_gas_species])
    gs_names = (o.iloc[0,Gspecies].tolist())
    gs_names_colmn = []
    
    for i in np.arange(n_gas_species): #Collecting TOFs
        STOF[:,i] = pd.Series(TOF_GS[:,i])
        
    for i in gs_names: #Collecting gas names
        gs_names_colmn.append('R_'+i)
    
    Rates_p = pd.DataFrame(STOF,
                    columns = gs_names_colmn)

    init_covg = []
    for i in np.arange(n_surf_species):    #B_O*_covg,     A_CO*_covg,     O2*_covg, *_covg
        init_covg.append(Scoverages[0,i])
        
    init_coverages[c,:]= init_covg #Initial coverages
    
    KMC_time_Array[c,:]= Gtime #Time matrix
     
    Covg[c,:,:] = Scoverages #Coverage profile tensor
    
    Rates[c,:,:] = Rates_p
    
    c+=1
    
    os.chdir(os.path.normpath(os.getcwd() + os.sep + os.pardir)) #Changes directory back to where this script is

#https://stackoverflow.com/questions/49881570/python-dictionaries-appending-arrays-to-a-dictionary-for-a-specific-key
data_KMC_dict = {'init_covg': init_coverages, 'sim_time': KMC_time_Array, 'covg_prof': Covg, 'iRates': Rates}

CPU times: user 3.29 s, sys: 16.6 ms, total: 3.31 s
Wall time: 3.32 s


## Starting to Create ML dataset
### x:n_features ; y = n_points ; z = n_simulations

<a id="2.4"></a> 
## 2.D  Creating Input/Feature Tensor
<a href="#top">Back to top</a>

In [10]:
%%time
## Creating Simulation file names input

Sim_names_tens = np.empty((n,n_points,1),dtype=np.dtype('U100'))
for i in np.arange(n):
    for j in np.arange(n_points):
        Sim_names_tens[i,j,:] = Sim_folder_names[i]

## Creating Init coverages tensor input
ini_covg_tens = np.empty((n,n_points,n_surf_species),dtype=float)
for i in np.arange(n):
    for j in np.arange(n_points):
        ini_covg_tens[i,j,:] = data_KMC_dict['init_covg'][i,:]

## Creating time tensor input
sim_time_tens = np.empty((n,n_points,1),dtype=float)
for i in np.arange(n):
    for z in np.arange(1):
        sim_time_tens[i,:,z] = data_KMC_dict['sim_time'][i,:]

CPU times: user 42.2 ms, sys: 4.87 ms, total: 47.1 ms
Wall time: 49.3 ms


## Tenosr_To_Array Function

In [11]:
def Tensor_To_Array(Sim_tens):
    a = Sim_tens
    m,n,r = a.shape
    sim_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1)))
    return sim_arr

## Creating Dataframe

In [12]:
out_df = pd.DataFrame(Tensor_To_Array(Sim_names_tens),columns= ['Sim_ndex','Sim_names'])

In [13]:
#Adding initial coverages
surf_names = (o.iloc[0,Sspecies].tolist())
for i in np.arange(n_surf_species):
    spec = surf_names[i]
    out_df['Init_Covg_'+spec] = pd.DataFrame(Tensor_To_Array(ini_covg_tens))[1+i]

In [14]:
#Adding time
out_df['Time'] = pd.DataFrame(Tensor_To_Array(sim_time_tens))[1]

In [15]:
## Adding KMC results

In [16]:
#Adding coverage profiles of surface species
surf_names = (o.iloc[0,Sspecies].tolist())
for i in np.arange(n_surf_species):
    spec = surf_names[i]
    out_df['KMC_Covg_'+spec] = pd.DataFrame(Tensor_To_Array(data_KMC_dict['covg_prof']))[1+i]

In [17]:
#Adding iRates profiles of gaseous species
gs_names = (o.iloc[0,Gspecies].tolist())
for i in np.arange(n_gas_species):
    spec = gs_names[i]
    out_df['KMC_iRates_'+spec] = pd.DataFrame(Tensor_To_Array(data_KMC_dict['iRates']))[1+i]

***
<a id="2.2"></a> 
## 2.B  Generating MF-MKModel Data Dictionary
<a href="#top">Back to top</a>

In [18]:
%%time
from main import *
MKM = MKModel('Atomic_sw.csv','Stoich_sw.csv','Param_sw.csv')
MKM.set_rxnconditions(Pr=[(1.0e-4*0.1),(1.0e-4*1.0e-5), 0]) #From KMC #Make sure it matches 02 CO CO2
MKM_init_coverages = np.empty([len(Sim_folder_names),n_surf_species])

def MKModelling(*fit_params):
    n_points = 500 #From KMC simulation 
    n_gas_species = 3 #From KMC simulation
    n_surf_species = 4 #From KMC simulation
    MKM_Covg = np.zeros((n,n_points,n_surf_species)) # O*,CO*, O2*, * #Make sure KMC order of species matches MKM inputs
    MKM_Rates = np.zeros((n,n_points,n_gas_species)) #O2, CO, CO2     #Make sure KMC order of species matches MKM inputs
    time_MKM_Array = np.zeros((n,n_points))

    MKM.set_limits_of_integration(Ti=data_KMC_dict['sim_time'][0][0],Tf=data_KMC_dict['sim_time'][-1][-1])
    
    MKM.k = np.array(fit_params)
    
    #Remember: A='CO*'; B='O*'
    #Reading A and B initial coverages from the KMC simulation input coverage file names!
    c = 0 #counter
    for s in Sim_folder_names:
        set_coverages = []
        for i in np.arange(len(s)):
            if i<(len(s)-2) and s[i].isdigit() and (s[i+1]).isdigit() and (s[i+2]).isdigit():
                cov_triple = int(s[i:i+3])
                set_coverages.append(cov_triple)

            elif i<(len(s)-1) and s[i].isdigit() and (s[i+1]).isdigit()and not((s[i-1]).isdigit()):
                cov_double = int(s[i:i+2])
                set_coverages.append(cov_double)

            elif s[i].isdigit() and not((s[i-1]).isdigit()) and not((s[i-2]).isdigit()):
                cov_single = int(s[i])
                set_coverages.append(cov_single)
                                    #B_O*_covg,     A_CO*_covg,     O2*_covg,*_covg  #Note: Special case: Simulation naming switches from KMC and MKM order
        init_covgs = [set_coverages[1]/100,set_coverages[0]/100,0,(100-sum(set_coverages))/100]
        
        MKM.set_initial_coverages(init=init_covgs)
        MKM_init_coverages[c,:] = [float(i) for i in init_covgs]
        
        sola,solta = MKM.solve_coverage(Tf_eval=data_KMC_dict['sim_time'][0],plot=False)
        time_MKM_Array[c,:]= solta #Time matrix
        MKM_Covg[c,:,:] = sola #Coverage profile tensor

        solb,soltb = MKM.solve_rate_production(Tf_eval=data_KMC_dict['sim_time'][0],plot=False)
        MKM_Rates[c,:,:] = solb[:,0:n_gas_species] 

        c+=1 #counter
    return {'init_covg': MKM_init_coverages, 'sim_time': time_MKM_Array, 'covg_prof': MKM_Covg, 'iRates': MKM_Rates}

Mass is conserved. 

CPU times: user 288 ms, sys: 69.3 ms, total: 357 ms
Wall time: 595 ms


In [19]:
%%time
### Fitting rate constant function due to the feature = iRates
def MKM_k_fitting(x,*fit_params,feature = 'iRates'):
    data_MKM_dict  = MKModelling(*fit_params)    
    return np.reshape(data_MKM_dict[feature],data_MKM_dict[feature].size)

CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 4.05 µs


***
<a id="2.3.1"></a> 
## 2.B.1 Optimizing Rate Constant:
<a href="#top">Back to top</a>

In [20]:
# %%time
# x_values = data_KMC_dict['sim_time'] #Normalized Input Time variables (Independent Variable) (eg. KMC Time)
# y_values = np.reshape(data_KMC_dict['iRates'],data_KMC_dict['iRates'].size) #Normalized Input Dependent variable(s) (eg. KMC coverages)

# initial_vals = np.array(MKM.k)

# params, params_covariance = optimize.curve_fit(MKM_k_fitting, x_values, y_values
#                                             ,method = 'trf', bounds=(0,1e10), maxfev=1e3, xtol=1e3, ftol=1e3
#                                             ,p0=initial_vals)

In [21]:
#From the KMC fitting attempt
##Perfect Fits
params = np.array([5.60983514e+07, 6.26873886e-10, 2.09163588e+04, 8.74022929e+09, 1.03443448e+10, 2.75903622e-01, 1.03031697e+08, 5.00000000e+02])

In [None]:
%%time
MKM.ODE_Tolerances(Dplace=50,reltol=1e-5,abstol=1e-8)
data_MKM_dict = MKModelling(*params)

***
<a id="2.3.2"></a> 
## Adding MKM to dataframe
<a href="#top">Back to top</a>

In [None]:
#Adding coverage profiles of surface species
surf_names = (o.iloc[0,Sspecies].tolist())
for i in np.arange(n_surf_species):
    spec = surf_names[i]
    out_df['MKM_Covg_'+spec] = pd.DataFrame(Tensor_To_Array(data_MKM_dict['covg_prof']))[1+i]

In [None]:
#Adding iRates profiles of gaseous species
gs_names = (o.iloc[0,Gspecies].tolist())
for i in np.arange(n_gas_species):
    spec = gs_names[i]
    out_df['MKM_iRates_'+spec] = pd.DataFrame(Tensor_To_Array(data_MKM_dict['iRates']))[1+i]

In [None]:
out_df

In [None]:
# ## RMSE Evaluation of Fit
# from math import sqrt
import numpy as np
# rmse_fit_covg = []
# rmse_fit_iRates = []

from sklearn.metrics import mean_squared_error

rmse_matrix = []
for i in np.arange(len(set(out_df['Sim_ndex']))):
    
    df = out_df.loc[out_df['Sim_ndex'] == str(i)]
    
    #calculating covg  ---------------------------------------------------------------------------------
    df = out_df.loc[out_df['Sim_ndex'] == str(i)] #Extracting dataframe only corresponding to simulation i
    kmc_dat_covg = df[[col for col in df if 'KMC_Covg' in col]] #Extracting KMC comp data 
    mkm_dat_covg = df[[col for col in df if 'MKM_Covg' in col]] #Extracting MKM comp data 
    
    ls = kmc_dat_covg.columns.to_list()
    covg_nm = [string[3:] for string in ls] #surface_species names
     
    rmse_covg = []
    for i in np.arange(len(covg_nm)):
        rmse_covg.append(sqrt(mean_squared_error(kmc_dat_covg['KMC'+covg_nm[i]], mkm_dat_covg['MKM'+covg_nm[i]])))
        
    
    #calculating irates ---------------------------------------------------------------------------------
    
    kmc_dat_irates = df[[col for col in df if 'KMC_iRates' in col]] #Extracting KMC comp data 
    mkm_dat_irates = df[[col for col in df if 'MKM_iRates' in col]] #Extracting MKM comp data 
    
    ls = kmc_dat_irates.columns.to_list()
    irates_nm = [string[3:] for string in ls] #gas_species names
    
    rmse_irates = []
    for i in np.arange(len(irates_nm)):
        rmse_irates.append(sqrt(mean_squared_error(kmc_dat_irates['KMC'+irates_nm[i]], mkm_dat_irates['MKM'+irates_nm[i]])))
        
    rmse_matrix.append(rmse_covg+rmse_irates)
    
    rmse_names = covg_nm+irates_nm
    
#Creating the RMSE Dataframe

RMSE_Dataframe = pd.DataFrame(list(set(out_df['Sim_names'])), columns = ['Sim_names'])

for i in np.arange(len(rmse_names)):
    spec = rmse_names[i]
    RMSE_Dataframe['RMSE'+spec] = pd.DataFrame(rmse_matrix).applymap(lambda x: round(x, 3))[i]

RMSE_Dataframe

# Visual Evaluation of Fitting Results:

In [None]:
%%time
#Part 1: Choose the feature of which you we will compare the k-opt values

# Comp = 'iRates'
Comp = 'Covg'

print('Comparison of KMC vs fitted-k MKM results for' + Comp)

#Part 2: Plot comparison results for fitting analysis
for i in np.arange(len(set(out_df['Sim_ndex']))): #For each simulation:
        #Extracting KMC results: ------------------------------------------------
        df = out_df.loc[out_df['Sim_ndex'] == str(i)] #Extracting dataframe only corresponding to simulation i
        kmc_dat = df[[col for col in df if 'KMC_'+Comp in col]].to_numpy() #Extracting KMC comp data as array
        Time = df['Time'].to_numpy()

        #Plotting KMC result : ------------------------------------------------          
        fig = plt.figure()
        ax = fig.add_subplot(111)
        
        for i in np.arange(len(kmc_dat[0,:])):
            ax.plot(Time, kmc_dat[:,i],'--')
                        
        if Comp =='iRates':
            leg_nd = (o.iloc[0,Gspecies].tolist())
            ax.set_ylim([-0.2,0.2])
        elif Comp == 'Covg':
            leg_nd = (o.iloc[0,Sspecies].tolist())
            
        ax.set_xlabel('Time, t, [s]')
        if Comp =='iRates':
            ax.set_ylabel(r"Rates of Production, $R_i$")
            ax.set_title('Rates of production versus Time_ for Simulation_'+ df['Sim_ndex'].iloc[i] +': _'+df['Sim_names'].iloc[0]+'| A:CO* ; B:O*')
        elif Comp == 'Covg':
            ax.set_ylabel(r"Coverage, $\theta_i, [ML]$")
            ax.set_title('Coverages versus Time_for Simulation_'+ df['Sim_ndex'].iloc[i] +': _'+df['Sim_names'].iloc[0]+'| A:CO* ; B:O*')
        
        
        #Extracting MKM results: ------------------------------------------------
        mkm_dat = df[[col for col in df if 'MKM_'+Comp in col]].to_numpy() #Extracting MKM comp data as array
        Time = df['Time'].to_numpy()
            
            
        #Adding to the plot, MKM result : ------------------------------------------------     
        for i in np.arange(len(mkm_dat[0,:])):
            ax.plot(Time, mkm_dat[:,i])
        
        #Plotting all the legends together
        ax.legend([f"{string}_KMC" for string in leg_nd]+[f"{string}_MKM" for string in leg_nd],fontsize=10, loc='upper right',facecolor='white', edgecolor ='black', framealpha=1)


# Representation in 3D

### Selecting only a few simulations to be represented in 3D

In [None]:
import random
p_test = 0.1 #Percent test data
print('Percent of test data selected:',p_test*100,'%')


max_sim_number = int(len(set(out_df['Sim_ndex']))) #MAx_number of simulations present #Count is starting from 0
n_test_sim = int(p_test*max_sim_number) #Number of simulations being used as test
sim_nums = list(set(out_df['Sim_ndex'])) # List of unique simulation numbers

test_sims = random.sample(sim_nums,n_test_sim) #Random sim_numbers for testing

print('\n','The list of simulations used in the test dataset:\n',test_sims)

# params = np.array([5.60983514e+07, 6.26873886e-10, 2.09163588e+04, 8.74022929e+09, 1.03443448e+10, 2.75903622e-01, 1.03031697e+08, 5.00000000e+02])
New_out_df = out_df.loc[out_df['Sim_ndex'].isin(test_sims)]
New_out_df

In [None]:
%matplotlib notebook
##Constructing a waterfall plot
from matplotlib.collections import PolyCollection
from matplotlib.collections import LineCollection
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from matplotlib import colors as mcolors
import numpy as np
import matplotlib

# Comp = 'iRates'
Comp = 'Covg'

axes=plt.axes(projection="3d")
plt.figure(Comp+'Exp_only')
def colors(arg):
    return mcolors.to_rgba(arg, alpha=6)

x1 = New_out_df.loc[New_out_df['Sim_ndex'] == New_out_df.iloc[0][0]]['Time'].to_numpy()  #Sim_ndex = fist sim on the list and then find array of corresponding time
verts1 = []
verts2= []
Sim_len = len(set(New_out_df['Sim_ndex']))
sims = np.array(list(set(New_out_df['Sim_ndex'])))
z1 = np.arange(Sim_len)
for z in z1:
    df = New_out_df.loc[New_out_df['Sim_ndex'] == str(sims[z])] #Extracting only the dataframe corrsponding to simulation z
    y1 = df[[col for col in df if 'KMC_'+Comp+'_O*' in col]].to_numpy()
    y2 = df[[col for col in df if 'KMC_'+Comp+'_CO*' in col]].to_numpy()
#     y1[0], y1[-1] = 0, 0
    verts1.append(list(zip(x1, y1)))
    verts2.append(list(zip(x1, y2)))
    
facecolors = [matplotlib.cm.jet(x) for x in np.random.rand(Sim_len)]

poly1 = LineCollection(verts1,color = facecolors,linewidths=(1,),zorder=2,linestyle='-')
poly2 = LineCollection(verts2,color = facecolors,linewidths=(1,),zorder=2,linestyle='--')

# Removes shaded region
poly1.set_facecolor(None)
poly2.set_facecolor(None)


poly1.set_alpha(0.6)
poly2.set_alpha(0.6)
axes.add_collection3d(poly1, zs=z1, zdir='y')
axes.add_collection3d(poly2, zs=z1, zdir='y')

axes.set_xlabel('X : Time')
axes.set_xlim3d(0, x1[-1])
axes.set_ylabel('Y : Simulation')
axes.set_ylim3d(0, Sim_len,auto=False)
# axes.yticks(sims.astype(float))
# axes.set_yticks(sims.astype(float))
axes.set_yticklabels(np.array(list(set(New_out_df['Sim_names']))),fontdict={'fontsize': 6,'fontweight': 10,'verticalalignment': 'baseline'})
axes.set_zlabel(Comp)
# axes.set_zlim3d(0, 1)
# axes.set_zlim3d(-0.2, 0.2)
axes.set_title("KMC_Results")

line_1 = Line2D([0], [0], color='black', linewidth=0.6, linestyle='-') 
line_2 = Line2D([0], [0], color='black', linewidth=0.6, linestyle='--')

label_1 = 'KMC_'+'Covg_O*'
label_2 = 'KMC_'+'Covg_CO*'

lines = [line_1,line_2]
labels = [label_1,label_2]
axes.legend(lines, labels, title = "A: CO* | B: O*", loc='best',fontsize=6,title_fontsize=6)


plt.show()

## iRates

In [None]:
%matplotlib notebook
#iRates
##Constructing a waterfall plot
from matplotlib.collections import PolyCollection
from matplotlib.collections import LineCollection
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from matplotlib import colors as mcolors
import numpy as np
import matplotlib

Comp = 'iRates'
g_sp = (o.iloc[0,Gspecies].tolist()) #Gas Species list

for i in np.arange(len(g_sp)):
    plt.figure(Comp+str(i))
    axes=plt.axes(projection="3d")

    def colors(arg):
        return mcolors.to_rgba(arg, alpha=6)

    x1 = New_out_df.loc[New_out_df['Sim_ndex'] == New_out_df.iloc[0][0]]['Time'].to_numpy()  #Sim_ndex = fist sim on the list and then find array of corresponding time
    verts1 = []
    verts2= []
    Sim_len = len(set(New_out_df['Sim_ndex']))
    sims = np.array(list(set(New_out_df['Sim_ndex'])))
    z1 = np.arange(Sim_len)
    for z in z1:
        df = New_out_df.loc[New_out_df['Sim_ndex'] == str(sims[z])] #Extracting only the dataframe corrsponding to simulation z
        KMC_vals = df[[col for col in df if 'KMC_'+Comp+'_'+g_sp[i] in col]].to_numpy()
        MKM_vals = df[[col for col in df if 'MKM_'+Comp+'_'+g_sp[i] in col]].to_numpy()

        if g_sp[i]=='CO':
            y1 = KMC_vals[:,0]
            y2 = MKM_vals[:,0]
        else:
            y1 = KMC_vals
            y2 = MKM_vals

        verts1.append(list(zip(x1, y1)))
        verts2.append(list(zip(x1, y2)))

    facecolors = [matplotlib.cm.jet(x) for x in np.random.rand(Sim_len)]

    poly1 = LineCollection(verts1,color = facecolors,linewidths=(1,),zorder=2,linestyle='-')
    poly2 = LineCollection(verts2,color = facecolors,linewidths=(1,),zorder=2,linestyle='--')

    # Removes shaded region
    poly1.set_facecolor(None)
    poly2.set_facecolor(None)


    poly1.set_alpha(0.6)
    poly2.set_alpha(0.6)
    axes.add_collection3d(poly1, zs=z1, zdir='y')
    axes.add_collection3d(poly2, zs=z1, zdir='y')

    axes.set_xlabel('X : Time')
    axes.set_xlim3d(0, x1[-1])
    axes.set_ylabel('Y : Simulation')
    axes.set_ylim3d(0, Sim_len,auto=False)
    axes.set_yticklabels(np.array(list(set(New_out_df['Sim_names']))),fontdict={'fontsize': 6,'fontweight': 10})
    axes.set_zlabel(Comp)
    axes.set_zlim3d(-0.2, 0.2)
    axes.set_title(Comp+"_Results")

    line_1 = Line2D([0], [0], color='black', linewidth=0.6, linestyle='-') 
    line_2 = Line2D([0], [0], color='black', linewidth=0.6, linestyle='--')

    label_1 = 'KMC_'+Comp+'_'+g_sp[i]
    label_2 = 'MKM_'+Comp+'_'+g_sp[i]

    lines = [line_1,line_2]
    labels = [label_1,label_2]
    axes.legend(lines, labels, title = "A: CO* | B: O*", loc='best',fontsize=6,title_fontsize=6)

    plt.show()


## Coverages

In [None]:
%matplotlib notebook
#Coverages
##Constructing a waterfall plot
from matplotlib.collections import PolyCollection
from matplotlib.collections import LineCollection
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from matplotlib import colors as mcolors
import numpy as np
import matplotlib

Comp = 'Covg'
s_sp = (o.iloc[0,Sspecies].tolist()) #Gas Species list

for i in np.arange(len(g_sp)):
    plt.figure(Comp+str(i))
    axes=plt.axes(projection="3d")

    def colors(arg):
        return mcolors.to_rgba(arg, alpha=6)

    x1 = New_out_df.loc[New_out_df['Sim_ndex'] == New_out_df.iloc[0][0]]['Time'].to_numpy()  #Sim_ndex = fist sim on the list and then find array of corresponding time
    verts1 = []
    verts2= []
    Sim_len = len(set(New_out_df['Sim_ndex']))
    sims = np.array(list(set(New_out_df['Sim_ndex'])))
    z1 = np.arange(Sim_len)
    for z in z1:
        df = New_out_df.loc[New_out_df['Sim_ndex'] == str(sims[z])] #Extracting only the dataframe corrsponding to simulation z
        KMC_vals = df[[col for col in df if 'KMC_'+Comp+'_'+s_sp[i] in col]].to_numpy()
        MKM_vals = df[[col for col in df if 'MKM_'+Comp+'_'+s_sp[i] in col]].to_numpy()

        y1 = KMC_vals
        y2 = MKM_vals

        verts1.append(list(zip(x1, y1)))
        verts2.append(list(zip(x1, y2)))

    facecolors = [matplotlib.cm.jet(x) for x in np.random.rand(Sim_len)]

    poly1 = LineCollection(verts1,color = facecolors,linewidths=(1,),zorder=2,linestyle='-')
    poly2 = LineCollection(verts2,color = facecolors,linewidths=(1,),zorder=2,linestyle='--')

    # Removes shaded region
    poly1.set_facecolor(None)
    poly2.set_facecolor(None)


    poly1.set_alpha(0.6)
    poly2.set_alpha(0.6)
    axes.add_collection3d(poly1, zs=z1, zdir='y')
    axes.add_collection3d(poly2, zs=z1, zdir='y')

    axes.set_xlabel('X : Time')
    axes.set_xlim3d(0, x1[-1])
    axes.set_ylabel('Y : Simulation')
    axes.set_ylim3d(0, Sim_len,auto=False)
    axes.set_yticklabels(np.array(list(set(New_out_df['Sim_names']))),fontdict={'fontsize': 6,'fontweight': 10})
    axes.set_zlabel(Comp)
    axes.set_zlim3d(0, 1)
    axes.set_title(Comp+"_Results")

    line_1 = Line2D([0], [0], color='black', linewidth=0.6, linestyle='-') 
    line_2 = Line2D([0], [0], color='black', linewidth=0.6, linestyle='--')

    label_1 = 'KMC_'+Comp+'_'+s_sp[i]
    label_2 = 'MKM_'+Comp+'_'+s_sp[i]

    lines = [line_1,line_2]
    labels = [label_1,label_2]
    axes.legend(lines, labels, title = "A: CO* | B: O*", loc='best',fontsize=6,title_fontsize=6)

    plt.show()


## Dropping KMC columns to clean up ML dataset

In [None]:
out_df = out_df[out_df.columns.drop(list(out_df.filter(regex='KMC_')))]
out_df

***
<a id="2.3"></a> 
## 2.C  Creating Features:
<a href="#top">Back to top</a>

<a id="2.3.2"></a> 
## 2.C.1  Percent difference
<a href="#top">Back to top</a>

$$ p_{diff} = \frac{|r_{MKM}-r_{KMC}|}{\frac{(r_{MKM}+r_{KMC})}{2}} $$

In [None]:
rx,ry,rz = np.shape(data_KMC_dict['iRates'])
P_diff = np.zeros((rx,ry,rz)) 

In [None]:
import math
MKM_values = data_MKM_dict['iRates']
KMC_values = data_KMC_dict['iRates']

for i in np.arange(rx):
    for j in np.arange(ry):
        for k in np.arange(rz):
            mkmr = KMC_values[i,j,k]
            kmcr = MKM_values[i,j,k]      
            #Preventing nan
            if float(mkmr) == 0:
                mkmr = 1e-20
            if float(kmcr) == 0:
                kmcr = 1e-20
            val = abs(mkmr-kmcr)/((mkmr+kmcr)/2) 
            P_diff[i,j,k] = val
            if math.isinf(val) or math.isnan(val):
                raise Exception('ERROR: inf or nan is present')

<a id="2.3.1"></a> 
## 2.C.2  Log ratio correction factor
<a href="#top">Back to top</a>

$$ corr_{fac} = ln\frac{r_{KMC}}{r_{MKM}}$$

In [None]:
rx,ry,rz = np.shape(data_KMC_dict['iRates'])
Corr_fac = np.zeros((rx,ry,rz)) 

In [None]:
import math
MKM_values = data_MKM_dict['iRates']
KMC_values = data_KMC_dict['iRates']

for i in np.arange(rx):
    for j in np.arange(ry):
        for k in np.arange(rz):
            num = KMC_values[i,j,k]
            den = MKM_values[i,j,k]
            #Preventing log(0)
            if float(num) == 0:
                num = 1e-20
            if float(den) == 0:
                den = 1e-20
                
            frac = num/den
            if float(frac) < 0: #(i.e the rates are either being calculated as consumed versus produced)
                frac = abs(frac)
                
            val = np.log(frac)
            Corr_fac[i,j,k] = val
            if math.isinf(val) or math.isnan(val):
                raise Exception('ERROR: inf or nan is present')

# Adding Descriptors/Correction features

In [None]:
#Percent Diff
gs_names = (o.iloc[0,Gspecies].tolist())
for i in np.arange(n_gas_species):
    spec = gs_names[i]
    out_df['P_diff_'+spec] = pd.DataFrame(Tensor_To_Array(P_diff))[1+i]

#Log Correc
gs_names = (o.iloc[0,Gspecies].tolist())
for i in np.arange(n_gas_species):
    spec = gs_names[i]
    out_df['Corr_fac_'+spec] = pd.DataFrame(Tensor_To_Array(Corr_fac))[1+i]

In [38]:
out_df

Unnamed: 0,Sim_ndex,Sim_names,Init_Covg_O*,Init_Covg_CO*,Init_Covg_O2*,Init_Covg_*,Time,MKM_Covg_O*,MKM_Covg_CO*,MKM_Covg_O2*,MKM_Covg_*,MKM_iRates_O2,MKM_iRates_CO,MKM_iRates_CO2,P_diff_O2,P_diff_CO,P_diff_CO2,Corr_fac_O2,Corr_fac_CO,Corr_fac_CO2
0,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.0,0.770000,0.000000e+00,0.000000e+00,0.230000,-0.048108,-0.012903,0.000000,-2.045625,-0.009120,2.000000,-4.484944,0.009120,40.036521
1,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.1,0.746245,1.851449e-10,1.818909e-11,0.253755,0.105901,-0.014235,0.014235,2.020598,-0.064958,1.306832,-5.274003,0.064981,-1.562474
2,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.2,0.724674,2.068636e-10,1.747295e-11,0.275326,0.095129,-0.015445,0.015445,2.022943,-0.147428,0.922216,-5.166740,0.147696,-0.997435
3,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.3,0.705010,2.278200e-10,1.686238e-11,0.294990,0.085680,-0.016548,0.016548,2.000000,-0.078672,0.369000,-43.594565,0.078712,-0.373275
4,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.4,0.687022,2.480413e-10,1.633778e-11,0.312978,0.077332,-0.017558,0.017558,2.000000,-0.059244,0.237191,-43.492059,-0.059261,-0.238312
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27495,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.5,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650,-0.447839,-0.360601,0.468218,0.455557,0.364586,0.477064
27496,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.6,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650,-0.597142,-0.388898,0.300915,0.615900,0.393914,0.303217
27497,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.7,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650,-0.621995,-0.360601,0.366339,0.643299,0.364586,0.370521
27498,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.8,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650,-0.405412,-0.447840,0.447840,0.411106,0.455558,0.455558


***
<a id="2.5"></a> 
## 2.E  Extracting Full X (Feature) and Y(Target) datasets
<a href="#top">Back to top</a>

In [39]:
All_columns = out_df.columns.to_list()
target_columns = list(filter(lambda x: ('Corr') in x or ('P_diff') in x, All_columns))

input_columns = [colmn for colmn in All_columns if colmn not in target_columns]

In [40]:
X_all = out_df[input_columns] #Ignoring the first two columns(index and simulation name)

In [41]:
Y_all = out_df[['Sim_ndex','Sim_names']+target_columns]

In [42]:
X_all

Unnamed: 0,Sim_ndex,Sim_names,Init_Covg_O*,Init_Covg_CO*,Init_Covg_O2*,Init_Covg_*,Time,MKM_Covg_O*,MKM_Covg_CO*,MKM_Covg_O2*,MKM_Covg_*,MKM_iRates_O2,MKM_iRates_CO,MKM_iRates_CO2
0,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.0,0.770000,0.000000e+00,0.000000e+00,0.230000,-0.048108,-0.012903,0.000000
1,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.1,0.746245,1.851449e-10,1.818909e-11,0.253755,0.105901,-0.014235,0.014235
2,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.2,0.724674,2.068636e-10,1.747295e-11,0.275326,0.095129,-0.015445,0.015445
3,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.3,0.705010,2.278200e-10,1.686238e-11,0.294990,0.085680,-0.016548,0.016548
4,0,Sim_A_0_B_77,0.777724,0.000000,0.0,0.222276,0.4,0.687022,2.480413e-10,1.633778e-11,0.312978,0.077332,-0.017558,0.017558
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27495,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.5,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650
27496,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.6,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650
27497,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.7,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650
27498,54,Sim_A_33_B_44,0.444444,0.333333,0.0,0.222222,49.8,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,0.031650


***
<a id="2.6"></a> 
## 2.F  Performing Train/Test X and Y Split datasets
<a href="#top">Back to top</a>

In [43]:
import random
p_test = 0.1 #Percent test data
print('Percent of test data selected:',p_test*100,'%')


max_sim_number = int(X_all[input_columns[0]].iloc[-1]) #MAx_number of simulations present #Count is starting from 0
n_test_sim = int(p_test*max_sim_number) #Number of simulations being used as test
sim_nums = list(set(X_all['Sim_ndex'])) # List of unique simulation numbers

test_sims = random.sample(sim_nums,n_test_sim) #Random sim_numbers for testing


Percent of test data selected: 10.0 %


In [44]:
X_test = X_all.loc[X_all['Sim_ndex'].isin(test_sims)]
Y_test = Y_all.loc[Y_all['Sim_ndex'].isin(test_sims)]

X_train = X_all[~X_all['Sim_ndex'].isin(test_sims)]
Y_train = Y_all[~Y_all['Sim_ndex'].isin(test_sims)]

In [45]:
#Making sure no null values are present
P_diff = [string for string in All_columns if string.startswith("P_diff_")]#Extracting all different forms of P_diff
Y_all[['Sim_ndex'] +P_diff].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27500 entries, 0 to 27499
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Sim_ndex    27500 non-null  object 
 1   P_diff_O2   27500 non-null  float64
 2   P_diff_CO   27500 non-null  float64
 3   P_diff_CO2  27500 non-null  float64
dtypes: float64(3), object(1)
memory usage: 859.5+ KB


In [46]:
#REMOVING THE SIM_NDEX AND SIM_NAMES COLUMNS\
X_test = X_test.drop(columns=['Sim_ndex','Sim_names'])
Y_test = Y_test.drop(columns=['Sim_ndex','Sim_names'])

X_train = X_train.drop(columns=['Sim_ndex','Sim_names'])
Y_train = Y_train.drop(columns=['Sim_ndex','Sim_names'])

In [47]:
X_train

Unnamed: 0,Init_Covg_O*,Init_Covg_CO*,Init_Covg_O2*,Init_Covg_*,Time,MKM_Covg_O*,MKM_Covg_CO*,MKM_Covg_O2*,MKM_Covg_*,MKM_iRates_O2,MKM_iRates_CO,MKM_iRates_CO2
500,0.555501,0.444444,0.0,0.000054,0.0,0.550000,4.400000e-01,0.000000e+00,0.010000,-0.002092,-0.000561,2.493367e+07
501,0.555501,0.444444,0.0,0.000054,0.1,0.123558,3.861532e-09,1.053168e-11,0.876442,-0.091271,-0.049167,4.915865e-02
502,0.555501,0.444444,0.0,0.000054,0.2,0.136676,3.440478e-09,1.051058e-11,0.863324,-0.088711,-0.048431,4.844858e-02
503,0.555501,0.444444,0.0,0.000054,0.3,0.149359,3.103847e-09,1.049502e-11,0.850641,-0.086194,-0.047720,4.776408e-02
504,0.555501,0.444444,0.0,0.000054,0.4,0.161612,2.822581e-09,1.048461e-11,0.838388,-0.083722,-0.047032,4.699936e-02
...,...,...,...,...,...,...,...,...,...,...,...,...
27495,0.444444,0.333333,0.0,0.222222,49.5,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02
27496,0.444444,0.333333,0.0,0.222222,49.6,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02
27497,0.444444,0.333333,0.0,0.222222,49.7,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02
27498,0.444444,0.333333,0.0,0.222222,49.8,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02


In [48]:
Y_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25000 entries, 500 to 27499
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   P_diff_O2     25000 non-null  float64
 1   P_diff_CO     25000 non-null  float64
 2   P_diff_CO2    25000 non-null  float64
 3   Corr_fac_O2   25000 non-null  float64
 4   Corr_fac_CO   25000 non-null  float64
 5   Corr_fac_CO2  25000 non-null  float64
dtypes: float64(6)
memory usage: 1.3 MB


In [49]:
X_train

Unnamed: 0,Init_Covg_O*,Init_Covg_CO*,Init_Covg_O2*,Init_Covg_*,Time,MKM_Covg_O*,MKM_Covg_CO*,MKM_Covg_O2*,MKM_Covg_*,MKM_iRates_O2,MKM_iRates_CO,MKM_iRates_CO2
500,0.555501,0.444444,0.0,0.000054,0.0,0.550000,4.400000e-01,0.000000e+00,0.010000,-0.002092,-0.000561,2.493367e+07
501,0.555501,0.444444,0.0,0.000054,0.1,0.123558,3.861532e-09,1.053168e-11,0.876442,-0.091271,-0.049167,4.915865e-02
502,0.555501,0.444444,0.0,0.000054,0.2,0.136676,3.440478e-09,1.051058e-11,0.863324,-0.088711,-0.048431,4.844858e-02
503,0.555501,0.444444,0.0,0.000054,0.3,0.149359,3.103847e-09,1.049502e-11,0.850641,-0.086194,-0.047720,4.776408e-02
504,0.555501,0.444444,0.0,0.000054,0.4,0.161612,2.822581e-09,1.048461e-11,0.838388,-0.083722,-0.047032,4.699936e-02
...,...,...,...,...,...,...,...,...,...,...,...,...
27495,0.444444,0.333333,0.0,0.222222,49.5,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02
27496,0.444444,0.333333,0.0,0.222222,49.6,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02
27497,0.444444,0.333333,0.0,0.222222,49.7,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02
27498,0.444444,0.333333,0.0,0.222222,49.8,0.435818,7.048435e-10,1.169093e-11,0.564182,-0.015825,-0.031650,3.164966e-02


In [50]:
Y_train

Unnamed: 0,P_diff_O2,P_diff_CO,P_diff_CO2,Corr_fac_O2,Corr_fac_CO,Corr_fac_CO2
500,-41.402960,2.131444,2.000000,-0.096687,3.447801,-16.677200
501,-1.873367,-1.810774,1.819603,-3.420583,-3.002648,3.052743
502,-1.573799,-0.861889,1.739990,-2.126474,-0.922112,2.666119
503,-1.286354,-0.385967,1.624183,-1.527148,-0.390869,2.266282
504,-1.021129,-0.105528,1.502937,-1.126985,-0.105626,1.952641
...,...,...,...,...,...,...
27495,-0.447839,-0.360601,0.468218,0.455557,0.364586,0.477064
27496,-0.597142,-0.388898,0.300915,0.615900,0.393914,0.303217
27497,-0.621995,-0.360601,0.366339,0.643299,0.364586,0.370521
27498,-0.405412,-0.447840,0.447840,0.411106,0.455558,0.455558


***
<a id="3"></a> 
## 3. MODELLING
<a href="#top">Back to top</a>

***
<a id="4"></a> 
## 4. Describing Possible Machine Learning Model Algorithms
<a href="#top">Back to top</a>

In [51]:
def ML_model(algorithm="XGBoost", X_train = X_train, Y_train = Y_train):
    
    #XGBoost Algorithm
    #https://xgboost.readthedocs.io/en/stable/python/python_api.html
    if algorithm=="XGBoost":  
        import xgboost as xgb

        reg = xgb.XGBRegressor(booster='gbtree',    
                               n_estimators=1500,
                               objective='reg:squarederror',
                               max_depth=20,
                               learning_rate=0.01)
        reg.fit(X_train, Y_train,
                eval_set=[(X_train, Y_train)],
                verbose=False)
    
    #Artificial Neural Network
    #https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
    elif algorithm=="ANN":
        from sklearn.neural_network import MLPRegressor
        No_H_nodes_per_layer = 128
        print('Number of Hidden layer nodes per layer : ',No_H_nodes_per_layer)
        No_H_layers = 4
        print('Number of Hidden layers: ',No_H_layers)

        hidden_layers = No_H_nodes_per_layer*np.ones(No_H_layers) 
        hidden_layer_sizes = tuple(tuple(int(item) for item in hidden_layers))
        reg = MLPRegressor(hidden_layer_sizes=hidden_layer_sizes,
                            activation='relu', 
                            solver='adam')
#                            ,random_state=42, 
#                             max_iter=300)

        reg.fit(X_train, Y_train)
    
    #K-Nearest Neighbor
    #https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
    elif algorithm=="KNN":
        from sklearn.neighbors import KNeighborsRegressor

        reg = KNeighborsRegressor(n_neighbors=50, weights='distance',p=1)
        reg.fit(X_train, Y_train)
    
    #RandomForest 
    #https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor
    elif algorithm=='RandomForest':
        from sklearn.ensemble import RandomForestRegressor
        
        reg = RandomForestRegressor(n_estimators=1000, random_state=42)

        reg.fit(X_train, Y_train)
        
        print('Features:',X_train.columns)
        
        print('\nFeature Importance:\n',reg.feature_importances_) #Shows which features are chosen most when doing splits #gives the most information
        
    elif algorithm=='DecisionTree':
        from sklearn import tree
        reg = tree.DecisionTreeRegressor()#criterion='poisson',max_depth=20,min_samples_leaf=10,min_samples_split=20
        
        reg.fit(X_train, Y_train)
        
        print('Features:',X_train.columns)
        
        print('\nFeature Importance:\n',reg.feature_importances_) #Shows which features are chosen most when doing splits #gives the most information

    return reg

***
<a id="5"></a> 
## 5. Selecting the Training Model
<a href="#top">Back to top</a>

In [52]:
import time
######### OPTIONS: 'XGBoost','ANN','KNN','RandomForest'#########
################################################################
ALGORITHM_NAME = "KNN"
################################################################

start_time = time.time()
reg = ML_model(algorithm = ALGORITHM_NAME)
end_time = time.time()

elapsed_time = end_time - start_time
print("\nElapsed Model Training Time: \n", elapsed_time, "seconds \n", elapsed_time/60, "minutes")


Elapsed Model Training Time: 
 0.11791729927062988 seconds 
 0.0019652883211771647 minutes


***
<a id="6"></a> 
## 6. Importing External/Experimental Data to be used in the model
<a href="#top">Back to top</a>

In [53]:
KMC_Data_EXP = pd.read_csv('KMC_NonDynamic_Data_iCovg_iRates_sw.csv')
KMC_Data_EXP_rates= KMC_Data_EXP.iloc[:,-n_gas_species:] #To be used to later to compare and asses ML prediction results

#Creating repeated init covs matrix
test_data_ini_cov = KMC_Data_EXP.iloc[0,1:5].values
test_data_time_interv = KMC_Data_EXP.iloc[:,0].values
matrix_test_data_ini_cov = np.empty((len(test_data_time_interv),len(test_data_ini_cov)))
for i in np.arange(len(test_data_time_interv)):
    matrix_test_data_ini_cov[i] = test_data_ini_cov


***
<a id="6.1"></a> 
## 6.A Generating corresponding MF-MKModel
<a href="#top">Back to top</a>

In [54]:
%%time
from main import *
MKM_test_inp = MKModel('Atomic_sw.csv','Stoich_sw.csv','Param_sw.csv')
MKM_test_inp.set_rxnconditions(Pr=[(1.0e-4*0.1),(1.0e-4*1.0e-5), 0]) #From KMC #Make sure it matches 02 CO CO2

MKM_test_inp.set_limits_of_integration(Ti=float(KMC_Data_EXP['Time'].head(1)),Tf=float(KMC_Data_EXP['Time'].tail(1)))
MKM_test_inp.ODE_Tolerances(Dplace=50,reltol=1e-5,abstol=1e-8)
MKM_test_inp.k = params #From fitting or external

MKM_Covg_test_inp = np.zeros((len(test_data_time_interv),len(test_data_ini_cov))) #CO*, O*, O2*, *
MKM_Rates_test_inp = np.zeros((len(test_data_time_interv),n_gas_species)) #CO, O2, CO2

MKM_test_inp.set_initial_coverages(init=test_data_ini_cov)

sola,solta = MKM_test_inp.solve_coverage(Tf_eval=test_data_time_interv,plot=False)
MKM_Covg_test_inp = sola #Coverage profile matrix

solb,soltb = MKM_test_inp.solve_rate_production(Tf_eval=test_data_time_interv,plot=False)
MKM_Rates_test_inp = (solb[:,0:n_gas_species])


Mass is conserved. 

CPU times: user 2.24 s, sys: 17.9 ms, total: 2.26 s
Wall time: 2.3 s


In [55]:
Exp_surf_name = KMC_Data_EXP.columns.to_list()[1:n_surf_species+1]
Exp_gas_name = [i[-2:] for i in KMC_Data_EXP.columns.to_list()[n_surf_species+1:]]

In [56]:
Test_input = pd.DataFrame()

#Adding initial coverages
surf_names = Exp_surf_name
for i in np.arange(n_surf_species):
    spec = surf_names[i]
    Test_input['Init_Covg_'+spec] = pd.DataFrame(matrix_test_data_ini_cov)[i]

#Adding Time
Test_input['Time'] = pd.DataFrame(test_data_time_interv)

#Adding coverage profiles of surface species
surf_names = Exp_surf_name
for i in np.arange(n_surf_species):
    spec = surf_names[i]
    Test_input['MKM_Covg_'+spec] = pd.DataFrame(MKM_Covg_test_inp)[i]
    
#Adding iRates profiles of gaseous species
gs_names = (o.iloc[0,Gspecies].tolist())
for i in np.arange(n_gas_species):
    spec = gs_names[i]
    Test_input['MKM_iRates_'+spec] = pd.DataFrame(MKM_Rates_test_inp)[i]

In [57]:
Test_input.head(10)

Unnamed: 0,Init_Covg_O*,Init_Covg_CO*,Init_Covg_O2*,Init_Covg_*,Time,MKM_Covg_O*,MKM_Covg_CO*,MKM_Covg_O2*,MKM_Covg_*,MKM_iRates_O2,MKM_iRates_CO,MKM_iRates_CO2
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,-0.2091635880000000350381798273641951412943046...,-0.0560983514000000020037784839650069314220299...,0.0
1,0.0,0.0,0.0,1.0,0.1,0.016831,3.118391e-08,1.08787e-11,0.983169,-0.1105608879035077669140510764485022376942958...,-0.0551541802364910902060996739044973018268594...,0.05407558585355573543829166283103404566645622...
2,0.0,0.0,0.0,1.0,0.2,0.033194,1.527356e-08,1.080637e-11,0.966806,-0.1077704048766015905026716454692446495029480...,-0.0542362131754244155196599747082489894476218...,0.05223633608044318116236937044050137046724557...
3,0.0,0.0,0.0,1.0,0.3,0.049093,1.044358e-08,1.074246e-11,0.950907,-0.1050036082800393984321297276494346046233218...,-0.0533443397936447275448489136970192398076322...,0.05282457109249135557282883723928534891456365...
4,0.0,0.0,0.0,1.0,0.4,0.064528,8.027593e-09,1.068658e-11,0.935472,-0.1022635143091418929115884023744269111362680...,-0.0524784301386458902001992039629267963325894...,0.05337097658260952365294471633205830585211515...
5,0.0,0.0,0.0,1.0,0.5,0.079504,6.627157e-09,1.063829e-11,0.920496,-0.0995531111637587943702888332081898377070076...,-0.0516383058544899316993550843938612532351268...,0.05428592869743011778416175161510182078927755...
6,0.0,0.0,0.0,1.0,0.6,0.094024,5.273235e-09,1.059722e-11,0.905976,-0.0968751486380757785825954495253431717869156...,-0.0508237761034714637963803270416886798004792...,0.05108405840746021303599988527821551542729139...
7,0.0,0.0,0.0,1.0,0.7,0.108091,4.464733e-09,1.056294e-11,0.891909,-0.0942323164784485213666187647583183796670167...,-0.0500346073955799529323760976772867374374157...,0.04972296942771153166473752094134397339075803...
8,0.0,0.0,0.0,1.0,0.8,0.121712,3.784331e-09,1.053505e-11,0.878288,-0.0916271280798244901862986916056576372792586...,-0.0492705220380946228031640385108055018985665...,0.04745614991549722272434763681303593330085277...
9,0.0,0.0,0.0,1.0,0.9,0.13489,3.277554e-09,1.051316e-11,0.86511,-0.0890620423447665135465714721701577462955773...,-0.0485312348234408841745794332600288763676596...,0.04555132870236358599846937522670486941933631...


***
<a id="6.2"></a> 
## 6.B Predicting Machine-Learned Mean-Field Corrections
<a href="#top">Back to top</a>

In [58]:
Test_input

Unnamed: 0,Init_Covg_O*,Init_Covg_CO*,Init_Covg_O2*,Init_Covg_*,Time,MKM_Covg_O*,MKM_Covg_CO*,MKM_Covg_O2*,MKM_Covg_*,MKM_iRates_O2,MKM_iRates_CO,MKM_iRates_CO2
0,0.0,0.0,0.0,1.0,0.0,0.000000,0.000000e+00,0.000000e+00,1.000000,-0.2091635880000000350381798273641951412943046...,-0.0560983514000000020037784839650069314220299...,0.0
1,0.0,0.0,0.0,1.0,0.1,0.016831,3.118391e-08,1.087870e-11,0.983169,-0.1105608879035077669140510764485022376942958...,-0.0551541802364910902060996739044973018268594...,0.05407558585355573543829166283103404566645622...
2,0.0,0.0,0.0,1.0,0.2,0.033194,1.527356e-08,1.080637e-11,0.966806,-0.1077704048766015905026716454692446495029480...,-0.0542362131754244155196599747082489894476218...,0.05223633608044318116236937044050137046724557...
3,0.0,0.0,0.0,1.0,0.3,0.049093,1.044358e-08,1.074246e-11,0.950907,-0.1050036082800393984321297276494346046233218...,-0.0533443397936447275448489136970192398076322...,0.05282457109249135557282883723928534891456365...
4,0.0,0.0,0.0,1.0,0.4,0.064528,8.027593e-09,1.068658e-11,0.935472,-0.1022635143091418929115884023744269111362680...,-0.0524784301386458902001992039629267963325894...,0.05337097658260952365294471633205830585211515...
...,...,...,...,...,...,...,...,...,...,...,...,...
246,0.0,0.0,0.0,1.0,24.6,0.435817,7.770800e-10,1.169092e-11,0.564183,-0.0158252145983088236484042742647730427285029...,-0.0316497303982379459783917147499768195338097...,0.03489320578564896607742795708873018156737089...
247,0.0,0.0,0.0,1.0,24.7,0.435817,7.793632e-10,1.169092e-11,0.564183,-0.0158251951490796674741534802497150687833049...,-0.0316497267619731873167933558824781907274617...,0.03499573101781112277608443150711536873131990...
248,0.0,0.0,0.0,1.0,24.8,0.435817,7.779264e-10,1.169092e-11,0.564183,-0.0158251768677596946723092751660776142192642...,-0.0316497233378776715176604552705218784504832...,0.03493121831077659888986630676299682818353176...
249,0.0,0.0,0.0,1.0,24.9,0.435817,7.718553e-10,1.169092e-11,0.564183,-0.0158251594340028680939185749238373153938306...,-0.0316497200625545014272015294676000669253734...,0.03465861629447254677049983229153440333902835...


In [59]:
Test_output = reg.predict(Test_input)

Pred_corr = Test_output[:,-len(Exp_gas_name):] #extracting correction factors  #O2 #CO #CO2

<a id="6.3"></a> 
## 6.C ML Correction to MF-MKModel
<a href="#top">Back to top</a>

In [60]:
#Calculating extracted ML predicted rates
ML_Rates_pred = np.zeros((len(test_data_time_interv),len(Exp_gas_name)))  #O2, #CO, CO2
for i in np.arange(np.shape(ML_Rates_pred)[0]):
    for j in np.arange(np.shape(ML_Rates_pred)[1]):
        ML_Rates_pred[i,j] = MKM_Rates_test_inp[i,j]*np.exp(Pred_corr[i,j])

<a id="6.4"></a> 
## 6.D Evaluating the ML model prediction
<a href="#top">Back to top</a>

In [61]:
from math import sqrt
from sklearn.metrics import mean_squared_error
#Calculating the root mean squared of the test set
print('Root Mean Squared Error:\n',sqrt(mean_squared_error(KMC_Data_EXP_rates, ML_Rates_pred)))

Root Mean Squared Error:
 0.01546011067533951


<a id="6.5"></a> 
## 6.E Plotting
<a href="#top">Back to top</a>

In [62]:
%%time
#%matplotlib notebook
plt.figure(figsize = (8, 6))
plt.plot(test_data_time_interv, KMC_Data_EXP_rates.values[:,0],'r*', label='O2_kMC')        
plt.plot(test_data_time_interv, KMC_Data_EXP_rates.values[:,1],'g*', label='CO_kMC') 
plt.plot(test_data_time_interv, KMC_Data_EXP_rates.values[:,2], 'b*', label='CO2_kMC') 

plt.plot(test_data_time_interv, MKM_Rates_test_inp[:,0],'ro', label='O2_MKM')        
plt.plot(test_data_time_interv, MKM_Rates_test_inp[:,1],'go', label='CO_MKM') 
plt.plot(test_data_time_interv, MKM_Rates_test_inp[:,2], 'bo', label='CO2_MKM') 

plt.plot(test_data_time_interv, ML_Rates_pred[:,0],'r-', label='O2_ML')        
plt.plot(test_data_time_interv, ML_Rates_pred[:,1],'g-', label='CO_ML') 
plt.plot(test_data_time_interv, ML_Rates_pred[:,2], 'b-', label='CO2_ML') 

plt.xlabel('Time, s')
plt.ylabel("Rates_production, $r$")
plt.title('ML_rate_correction_Results')
# plt.ylim([-0.2,0.2])
plt.legend(fontsize=5, loc='best')
plt.show()

<IPython.core.display.Javascript object>

CPU times: user 23.5 ms, sys: 4.47 ms, total: 28 ms
Wall time: 25.9 ms


***
<a id="7"></a> 
## 7. Exploring and Evaluating possible ML options
<a href="#top">Back to top</a>
***

In [62]:
%%time
#%matplotlib notebook
import time
from math import sqrt
from sklearn.metrics import mean_squared_error

algs = ['ANN','KNN','RandomForest','DecisionTree']
rmse_vec = []
for v in np.arange(len(algs)):
    
    print("-"*50)
    print('Algorithm:',algs[v],'\n')
    
    tart_time = time.time()  
    reg = ML_model(algorithm=algs[v])
    end_time = time.time()

    elapsed_time = end_time - start_time
    print("\nElapsed Model Training Time: \n", elapsed_time, "seconds \n", elapsed_time/60, "minutes")
    
    Test_output = reg.predict(Test_input)

    Pred_corr = Test_output[:,-3:] #extracting correction factors #CO #O2 #CO2
    
    #Calculating extracted ML predicted rates
    ML_Rates_pred = np.zeros((len(test_data_time_interv),3)) #CO, O2, CO2
    for i in np.arange(np.shape(ML_Rates_pred)[0]):
        for j in np.arange(np.shape(ML_Rates_pred)[1]):
            ML_Rates_pred[i,j] = MKM_Rates_test_inp[i,j]*np.exp(Pred_corr[i,j])
    
    #Calculating the root mean squared of the test set
    rmse = sqrt(mean_squared_error(KMC_Data_EXP_rates, ML_Rates_pred))
    print('\nRoot Mean Squared Error when using',algs[v],':\n',rmse)
    rmse_vec.append(rmse)
    
    plt.figure()
    plt.plot(test_data_time_interv, KMC_Data_EXP_rates.values[:,0],'r*', label='O2_kMC')        
    plt.plot(test_data_time_interv, KMC_Data_EXP_rates.values[:,1],'g*', label='CO_kMC') 
    plt.plot(test_data_time_interv, KMC_Data_EXP_rates.values[:,2], 'b*', label='CO2_kMC') 

    plt.plot(test_data_time_interv, MKM_Rates_test_inp[:,0],'ro', label='O2_MKM')        
    plt.plot(test_data_time_interv, MKM_Rates_test_inp[:,1],'go', label='CO_MKM') 
    plt.plot(test_data_time_interv, MKM_Rates_test_inp[:,2], 'bo', label='CO2_MKM') 

    plt.plot(test_data_time_interv, ML_Rates_pred[:,0],'r-', label='O2_ML')        
    plt.plot(test_data_time_interv, ML_Rates_pred[:,1],'g-', label='CO_ML') 
    plt.plot(test_data_time_interv, ML_Rates_pred[:,2], 'b-', label='CO2_ML') 

    plt.xlabel('Time, s')
    plt.ylabel("Rates_production, $r$")
    plt.title('Algorithm: {}'.format(algs[v]))
    plt.legend(fontsize=5, loc='best')

    plt.show()

--------------------------------------------------
Algorithm: ANN 

Number of Hidden layer nodes per layer :  128
Number of Hidden layers:  4

Elapsed Model Training Time: 
 50.01718878746033 seconds 
 0.8336198131243387 minutes

Root Mean Squared Error when using ANN :
 13.899150601301798


<IPython.core.display.Javascript object>

--------------------------------------------------
Algorithm: KNN 


Elapsed Model Training Time: 
 50.0748028755188 seconds 
 0.8345800479253133 minutes

Root Mean Squared Error when using KNN :
 0.015462016840793243


<IPython.core.display.Javascript object>

--------------------------------------------------
Algorithm: RandomForest 

Features: Index(['Init_Covg_O*', 'Init_Covg_CO*', 'Init_Covg_O2*', 'Init_Covg_*', 'Time',
       'MKM_Covg_O*', 'MKM_Covg_CO*', 'MKM_Covg_O2*', 'MKM_Covg_*',
       'MKM_iRates_O2', 'MKM_iRates_CO', 'MKM_iRates_CO2'],
      dtype='object')

Feature Importance:
 [0.19787579 0.02685934 0.         0.00061753 0.09021048 0.0112738
 0.13878435 0.         0.19513041 0.08675235 0.11340689 0.13908905]

Elapsed Model Training Time: 
 247.19891095161438 seconds 
 4.119981849193573 minutes

Root Mean Squared Error when using RandomForest :
 0.015210472597655676


<IPython.core.display.Javascript object>

--------------------------------------------------
Algorithm: DecisionTree 

Features: Index(['Init_Covg_O*', 'Init_Covg_CO*', 'Init_Covg_O2*', 'Init_Covg_*', 'Time',
       'MKM_Covg_O*', 'MKM_Covg_CO*', 'MKM_Covg_O2*', 'MKM_Covg_*',
       'MKM_iRates_O2', 'MKM_iRates_CO', 'MKM_iRates_CO2'],
      dtype='object')

Feature Importance:
 [7.55010642e-02 2.65658798e-05 0.00000000e+00 7.34263571e-05
 8.30911155e-03 1.77008750e-03 2.89890644e-03 0.00000000e+00
 7.09506358e-02 7.86871924e-01 3.80949460e-03 4.97887837e-02]

Elapsed Model Training Time: 
 247.68900394439697 seconds 
 4.128150065739949 minutes

Root Mean Squared Error when using DecisionTree :
 0.015485941992775618


<IPython.core.display.Javascript object>

CPU times: user 6min 43s, sys: 39.6 s, total: 7min 22s
Wall time: 4min 5s


In [63]:
print("The ML model giving the smallest rmse of", rmse_vec[np.argmin(rmse_vec)],"is : \n",algs[np.argmin(rmse_vec)]) 

The ML model giving the smallest rmse of 0.015210472597655676 is : 
 RandomForest
