Paper Topic: High-throughput screening of bimetallic catalysts enabled by machine learning
Author: Zheng Li, Siwen Wang, Wei Shan Chin, Luke E. Achenie and Hongliang Xin *
Abstract:
    We present a holistic machine-learning framework for rapid screening of bimetallic catalysts with the aid of the descriptor-based kinetic analysis. A catalyst database, which contains the adsorption energies of *CO and *OH on {111}-terminated model alloy surfaces and fingerprint features of active sites from density functional theory calculations with the semi-local generalized gradient approximation (GGA), is established and used in optimizing the structural and weight parameters of artificial neural networks. The fingerprint descriptors, rooted at the d-band chemisorption theory and its recent developments, include the sp-band and d-band characteristics of an adsorption site together with tabulated properties of hostmetal atoms. Using methanol electro-oxidation as the model reaction, the machine-learning model trained with the existing dataset of 1000 idealized alloy surfaces can capture complex, non-linear adsorbate/metal interactions with the RMSE  0.2 eV and shows predictive power in exploring the immense chemical space of bimetallic catalysts. Feature importance analysis sheds light on the underlying factors that govern the adsorbate/metal interactions and provides the physical origin of bimetallics in breaking energy-scaling constraints of *CO and *OH, the two most commonly used reactivity descriptors in heterogeneous catalysis.
    我们提出了一个整体机器学习框架，可在基于描述符的动力学分析的帮助下快速筛选双金属催化剂。催化剂数据库包含 *CO和 *OH上的吸附能，{111}终止模型合金表面和来自密度功能理论计算的活性位点的指纹特征，并建立了半局部广义梯度近似（GGA）（GGA）用于优化人工神经网络的结构和重量参数。植根于D波段化学吸附理论及其最新发展的指纹描述符包括吸附位点的SP波段和D波段特性以及宿主力原子的列表特性。使用甲醇电氧化作为模型反应，使用1000个理想合金表面的现有数据集训练的机器学习模型可以捕获与RMSE 0.2 eV的复杂，非线性吸附/金属相互作用双金属催化剂的巨大化学空间。特征重要性分析阐明了控制吸附物/金属相互作用的基本因素，并在破坏 *CO和 *oh *oh的能量尺度约束时提供了双金属的物理起源，这是两个最常用的反应性描述符在异质催化中。
doi：10.1039/c7TA01812F
Publisher: Journal of Materials Chemistry A

In [None]:
'''
It's a demo about predicting the energy of catalytic carbon dioxide intermediates.

input features:
    1. f: Filling of a d-band
    2. εd: Center of a d-band
    3. Wd: Width of a d-band
    4. γ1: Skewness of a d-band
    5. γ2: Kurtosis of a d-band
    6. W: Work function
    7. r0: Radius of a metal atom
    8. rd: Spatial extent of d-orbitals
    9. IE: Ionization potential
    10. EA: Electron affinity
    11. χ0: Pauling electronegativity
    12. χ: Local Pauling electronegativity
    13. Vd: Adsorbate-metal interatomic d coupling matrix element squared
    
output features:
    1. ΔE*CO: Binding energy of adsorbed CO on a metal surface
'''

Data

In [None]:
import pandas as pd

data = pd.read_excel('/Users/precious/work/catalyst_demo/data/Input_features_and_CO_adsorption_energies_on_bimetallic_surfaces.xlsx')
data.head()

In [None]:
input_features = data.iloc[:, 1:14]
output_features = data.iloc[:, 14]
input_features = input_features.apply(pd.to_numeric)
output_features = output_features.apply(pd.to_numeric)
output_features.head()

Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

linear_regressor = LinearRegression()
linear_regressor.fit(input_features, output_features)
linear_regressor.coef_

In [None]:
#matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

corr = np.corrcoef(output_features, linear_regressor.predict(input_features))[0,1]

plt.scatter(output_features, linear_regressor.predict(input_features))
plt.plot((-2, 0), (-2, 0), color='red')
bbox = dict(boxstyle="round", fc='1',alpha=0.5)
plt.text(-2, -0.03, '$R^2=%.2f$' % (corr**2), size=12, bbox=bbox)
plt.xlabel('true_value',fontsize=13)
plt.ylabel('pred_value',fontsize=13)
plt.title("LinearRegression",fontsize=14)
plt.show()

Polynomial regression

In [None]:
from sklearn.preprocessing import PolynomialFeatures

# We are simply generating the matrix for a quadratic model
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(input_features)
# polynomial regression model
poly_reg_model = LinearRegression()
poly_reg_model.fit(X_poly, output_features)

corr = np.corrcoef(output_features, poly_reg_model.predict(X_poly))[0,1]

plt.scatter(output_features, poly_reg_model.predict(X_poly))
plt.plot((-2, 0), (-2, 0), color='red')
bbox = dict(boxstyle="round", fc='1',alpha=0.5)
plt.text(-2, -0.1, '$R^2=%.2f$' % (corr**2), size=12, bbox=bbox)
plt.xlabel('true_value',fontsize=13)
plt.ylabel('pred_value',fontsize=13)
plt.title("PolynomialRegression",fontsize=14)
plt.show()

In [None]:
'''
It's a demo to predict adsorption energies such as CH3 CH2 CH C and H.
'''

In [None]:
data1 = pd.read_excel("/Users/precious/work/catalyst_demo/data/Input_features_with_CH3_CH2_CH_C_and_H.xlsx", sheet_name="Sheet2")
data1.head()

In [None]:
data2 = pd.read_excel("/Users/precious/work/catalyst_demo/data/Input_features_with_CH3_CH2_CH_C_and_H.xlsx", sheet_name="Sheet1")
data2.head()

In [None]:
import numpy as np
from sklearn.ensemble import RandomForestRegressor
forest_regressor = RandomForestRegressor(
    n_estimators = 300,
    random_state = 0
)
X = data1.iloc[:, 2:]
y = data2.iloc[:, 1:]
X = np.array(X.apply(pd.to_numeric))
y = np.array(y.apply(pd.to_numeric))
forest_regressor.fit(X, y)
pred = forest_regressor.predict(X)

In [None]:
number = 3
true_value = output_features[:,number]
prediction = pred[:,number]

corr = np.corrcoef(true_value, prediction)[0,1]

plt.scatter(true_value,prediction,color='blue')
plt.plot((-10, 0), (-10, 0), color='red')
bbox = dict(boxstyle="round", fc='1',alpha=0.5)
plt.text(-2, -0.03, '$R^2=%.2f$' % (corr**2), size=12, bbox=bbox)
plt.xlabel('true_value',fontsize=13)
plt.ylabel('pred_value',fontsize=13)
plt.title("CH3",fontsize=14)
plt.show()

In [None]:
from sklearn.preprocessing import StandardScaler

input_features = data1.iloc[:, 2:]
output_features = data2.iloc[:, 1:]
input_features = input_features.apply(pd.to_numeric)
output_features = output_features.apply(pd.to_numeric)
input_features = np.array(input_features)
output_features = np.array(output_features)
scaler_input = StandardScaler().fit(input_features)
scaler_output = StandardScaler().fit(output_features)
input_features = scaler_input.transform(input_features)
input_features_test = input_features[31:]
input_features_train = input_features[:31]
output_features = scaler_output.transform(output_features)
output_features_test = output_features[31:]
output_features_train = output_features[:31]

In [None]:
from keras.models import Sequential
from keras.layers import Dense, PReLU
from keras.callbacks import ModelCheckpoint
import os

model = Sequential()
model.add(Dense(128))
model.add(PReLU('ones'))
model.add(Dense(64))
model.add(PReLU('ones'))
model.add(Dense(48))
model.add(PReLU('ones'))
model.add(Dense(32))
model.add(PReLU('ones'))
model.add(Dense(24))
model.add(PReLU('ones'))
model.add(Dense(8))
model.add(PReLU('ones'))
model.add(Dense(5))
model.compile(optimizer='adam', loss='mse')

filepath = '/Users/precious/work/catalyst_demo/model/model1'
if not os.path.exists(filepath):
   os.makedirs(filepath)
checkpoint = ModelCheckpoint(
   filepath=filepath, monitor='val_loss', verbose=1, save_best_only=True)
callback_list = [checkpoint]

model.fit(input_features_train, output_features_train, epochs=50, batch_size=1, validation_split=0.1, callbacks=callback_list)

In [None]:
import  keras.models
model =  keras.models.load_model(filepath)
pred = model.predict(input_features_test)

In [None]:
number = 4
true_value = output_features_test
prediction = pred

true_value = scaler_output.inverse_transform(true_value)
prediction = scaler_output.inverse_transform(prediction)

corr = np.corrcoef(true_value, prediction)[0,1]

plt.scatter(true_value,prediction,color='blue')
plt.plot((-10, -0), (-10, -0), color='red')
bbox = dict(boxstyle="round", fc='1',alpha=0.5)
plt.text(-2, -0.03, '$R^2=%.2f$' % (corr**2), size=12, bbox=bbox)
plt.xlabel('true_value',fontsize=13)
plt.ylabel('pred_value',fontsize=13)
plt.title("CH3",fontsize=14)
plt.show()

In [None]:
'''
    Paper Topic: High-throughput screening of bimetallic catalysts enabled by machine learning
    Author: Zheng Li, Siwen Wang, Wei Shan Chin, Luke E. Achenie and Hongliang Xin *
    Abstract:
        We present a holistic machine-learning framework for rapid screening of bimetallic catalysts with the aid of the descriptor-based kinetic analysis. A catalyst database, which contains the adsorption energies of *CO and *OH on {111}-terminated model alloy surfaces and fingerprint features of active sites from density functional theory calculations with the semi-local generalized gradient approximation (GGA), is established and used in optimizing the structural and weight parameters of artificial neural networks. The fingerprint descriptors, rooted at the d-band chemisorption theory and its recent developments, include the sp-band and d-band characteristics of an adsorption site together with tabulated properties of hostmetal atoms. Using methanol electro-oxidation as the model reaction, the machine-learning model trained with the existing dataset of 1000 idealized alloy surfaces can capture complex, non-linear adsorbate/metal interactions with the RMSE  0.2 eV and shows predictive power in exploring the immense chemical space of bimetallic catalysts. Feature importance analysis sheds light on the underlying factors that govern the adsorbate/metal interactions and provides the physical origin of bimetallics in breaking energy-scaling constraints of *CO and *OH, the two most commonly used reactivity descriptors in heterogeneous catalysis.
        我们提出了一个整体机器学习框架，可在基于描述符的动力学分析的帮助下快速筛选双金属催化剂。催化剂数据库包含 *CO和 *OH上的吸附能，{111}终止模型合金表面和来自密度功能理论计算的活性位点的指纹特征，并建立了半局部广义梯度近似（GGA）（GGA）用于优化人工神经网络的结构和重量参数。植根于D波段化学吸附理论及其最新发展的指纹描述符包括吸附位点的SP波段和D波段特性以及宿主力原子的列表特性。使用甲醇电氧化作为模型反应，使用1000个理想合金表面的现有数据集训练的机器学习模型可以捕获与RMSE 0.2 eV的复杂，非线性吸附/金属相互作用双金属催化剂的巨大化学空间。特征重要性分析阐明了控制吸附物/金属相互作用的基本因素，并在破坏 *CO和 *oh *oh的能量尺度约束时提供了双金属的物理起源，这是两个最常用的反应性描述符在异质催化中。
    doi：10.1039/c7TA01812F
    Publisher: Journal of Materials Chemistry A
'''

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, PReLU, Conv1D, Flatten, Dropout, AvgPool1D, BatchNormalization
from keras.callbacks import ModelCheckpoint
import  keras.models
import os
import matplotlib.pyplot as plt


In [42]:
x_test_1 = pd.read_csv("/Users/precious/work/catalyst_demo/data/c7ta01812f1.csv").drop(['name'], axis=1)

In [43]:
x_test_2 = pd.read_csv("/Users/precious/work/catalyst_demo/data/c7ta01812f2.csv").drop(['name'], axis=1)

In [44]:
train_CO = pd.read_csv("/Users/precious/work/catalyst_demo/data/c7ta01812f3.csv")
x_train = train_CO.drop(['name', 'CO'], axis=1)
x_train.head()

Unnamed: 0,filling,center,sigma_c,skewness,kurtosis,local_Pauling,IonizationPotential,ElectronAffinity,Pauling,WorkFunction,dBandCenter,AtomicRadius,OrbitalRadius
0,0.866845,-1.058236,1.225608,0.37087,5.860201,1.906994,7.64,1.16,1.91,5.77,-1.29,1.38,0.71
1,0.870556,-1.180083,1.210548,0.343019,5.964618,1.992738,7.64,1.16,1.91,5.77,-1.29,1.38,0.71
2,0.862962,-1.19684,1.219834,0.279196,6.02093,2.014206,7.64,1.16,1.91,5.77,-1.29,1.38,0.71
3,0.866355,-1.127358,1.15632,0.28475,6.080364,1.992738,7.64,1.16,1.91,5.77,-1.29,1.38,0.71
4,0.882387,-0.806609,1.012896,0.197427,10.777396,1.915978,7.64,1.16,1.91,5.77,-1.29,1.38,0.71


In [66]:
train_OH = pd.read_csv("/Users/precious/work/catalyst_demo/data/c7ta01812f4.csv")
y_train = pd.concat((train_OH['OH'], train_CO['CO']), axis=1)
y_train.head()

Unnamed: 0,OH,CO
0,0.502531,-1.438149
1,0.148383,-1.666744
2,0.245582,-1.618728
3,0.176632,-1.69597
4,0.032181,-1.856567


In [67]:
def handle_outlier(dataframe:pd.DataFrame):
    data = dataframe.copy()

    for column in data.columns.values.tolist():
            mean = data[column].mean()
            std = data[column].std()
            top_num = mean + 2 * std
            bottom_num = mean - 2 * std
            replace_value1 = data[column][data[column] < top_num].max()
            data.loc[data[column] > top_num, column] = replace_value1
            replace_value2 = data[column][data[column] > bottom_num].min()
            data.loc[data[column] < bottom_num, column] = replace_value2

    return data

In [72]:
def handle_data(data:pd.DataFrame, is_x=True, train_len=700):

    data = handle_outlier(data)

    data = data.apply(pd.to_numeric)
    data = np.array(data)
    if is_x:
        scaler = StandardScaler().fit(data)
        data = scaler.transform(data)
        # data = data.reshape(971, 13, 1)
        train_data = data[:train_len]
        test_data = data[train_len:]

        return train_data, test_data, scaler

    else:
        data_OH = data[:,0]
        data_CO = data[:,1]
        scaler_OH = StandardScaler().fit(data_OH)
        scaler_CO = StandardScaler().fit(data_CO)
        data_OH = scaler_OH.transform(data_OH)
        data_CO = scaler_CO.transform(data_CO)
        train_data_OH = data_OH[:train_len]
        test_data_OH = data_OH[train_len:]
        train_data_CO = data_CO[:train_len]
        test_data_CO = data_CO[train_len:]

        return train_data_OH, test_data_OH, train_data_CO, test_data_CO, scaler_OH, scaler_CO


In [73]:
x_train_train, x_train_test, x_scaler = handle_data(x_train)
y_train_train_OH, y_train_test_OH, _train_train_CO, y_train_test_CO, y_scaler_OH, y_scaler_CO = handle_data(y_train, is_x=False)

ValueError: Expected 2D array, got 1D array instead:
array=[ 5.025310e-01  1.483830e-01  2.455820e-01  1.766320e-01  3.218100e-02
 -6.200000e-04  1.927890e-01  1.277900e-01  5.135830e-01  2.777370e-01
  5.537050e-01  9.373460e-01 -4.147100e-02  6.226520e-01  9.731520e-01
  1.440960e+00  7.782710e-01  4.544700e-01  1.485359e+00  1.593449e+00
  1.624388e+00  1.497362e+00  1.089212e+00  1.312816e+00  1.544618e+00
  1.522363e+00  1.248033e+00  9.195700e-01  9.789790e-01  1.059059e+00
  6.173740e-01  7.074120e-01  9.253760e-01  9.150070e-01  9.166330e-01
  1.496852e+00  1.065669e+00  1.137191e+00  1.005361e+00  1.209156e+00
  6.250980e-01  7.774140e-01  1.055918e+00  9.830810e-01  6.832690e-01
  4.809930e-01  5.004510e-01  3.849840e-01  4.884140e-01  4.470320e-01
  5.249690e-01  4.584150e-01  3.246060e-01  6.561960e-01  5.800150e-01
  6.313400e-01  3.485010e-01  1.467270e-01  5.152040e-01  5.303930e-01
  3.926130e-01  5.801800e-02 -1.866800e-01 -3.382550e-01 -1.915380e-01
 -2.494370e-01  4.093450e-01  3.989620e-01  1.747000e-01 -1.647400e-02
  4.762210e-01  4.507530e-01  3.382760e-01  2.702760e-01 -6.051000e-02
 -6.839400e-01 -6.066880e-01 -2.913610e-01 -1.172520e+00 -4.689690e-01
 -2.787010e-01 -4.121410e-01 -1.046112e+00 -3.501070e-01 -4.297860e-01
 -3.850220e-01 -4.351730e-01  7.631300e-02 -2.302910e-01 -1.800900e-01
 -1.172520e+00 -1.172520e+00 -1.867620e-01 -1.017603e+00 -7.612020e-01
  8.441000e-02  1.699030e-01 -3.675080e-01 -1.172520e+00  2.517840e-01
  2.567970e-01  2.089860e-01 -1.374790e-01 -1.172520e+00 -1.172520e+00
 -1.150719e+00 -1.172520e+00 -1.172520e+00 -1.172520e+00 -1.086108e+00
  6.848230e-01  8.037730e-01  1.116918e+00  1.163600e+00  9.830160e-01
  1.297267e+00  1.406875e+00  1.291393e+00  1.355127e+00  1.096008e+00
  8.135130e-01  5.997020e-01  1.105608e+00  1.313433e+00  1.422489e+00
  7.789100e-01  7.252420e-01  4.184050e-01  6.778920e-01  1.211363e+00
  1.388692e+00  1.188471e+00  1.070128e+00  8.800910e-01  4.490400e-01
  1.123577e+00  1.163932e+00  1.288558e+00  1.559089e+00  1.297735e+00
  9.930910e-01  8.705960e-01  8.817760e-01  9.804960e-01  1.031016e+00
  1.277705e+00  8.906860e-01  7.869140e-01  6.281290e-01  6.896170e-01
  8.801330e-01  1.116027e+00  9.815540e-01  8.629380e-01  1.018117e+00
  1.072025e+00  1.195662e+00  1.480754e+00  1.154787e+00  8.757560e-01
  7.742820e-01  8.400580e-01  9.262350e-01  1.227099e+00  8.117670e-01
  6.933220e-01  6.482400e-01  6.684730e-01  8.003510e-01  9.021490e-01
  1.079975e+00  5.923850e-01  3.592780e-01  3.098690e-01  5.830430e-01
  4.773540e-01  6.016560e-01  6.337650e-01  4.501590e-01  3.064190e-01
  4.803750e-01  3.692560e-01  9.096800e-01  7.696170e-01  5.859950e-01
  7.226590e-01  7.008030e-01  6.818150e-01  6.463280e-01  9.780260e-01
  1.007381e+00  9.165750e-01  6.820640e-01  5.055100e-01  4.990720e-01
  5.197940e-01  8.180690e-01  9.026300e-01  6.651110e-01  5.365030e-01
  3.982110e-01  4.653850e-01  4.703940e-01  4.474940e-01  8.894810e-01
  8.339520e-01  6.474800e-01  1.453544e+00  7.433530e-01  7.077370e-01
  7.564300e-01  7.002860e-01  1.071955e+00  1.073901e+00  1.103488e+00
  9.485420e-01  7.805460e-01  6.462470e-01  6.900390e-01  5.334370e-01
  5.716260e-01  8.563510e-01  4.760180e-01  9.536740e-01  7.661290e-01
  6.650120e-01  5.579610e-01  5.003650e-01  4.279500e-01 -8.817200e-02
 -1.533210e-01 -8.470300e-02  5.739510e-01  4.036680e-01  3.473000e-01
  3.090610e-01  9.025000e-02  2.787100e-01  5.646780e-01 -2.137510e-01
  3.457990e-01  6.720740e-01  3.178440e-01  1.257820e-01 -8.621800e-02
  3.743990e-01  4.805860e-01  3.247190e-01  6.680240e-01  2.758990e-01
  3.049970e-01  2.349150e-01 -4.812000e-02  2.330650e-01  3.718360e-01
  1.599350e-01  1.591950e-01  1.038060e-01 -1.729660e-01 -5.592160e-01
  1.520790e-01  4.105800e-01  8.941000e-02 -1.617900e-02  1.600090e-01
  4.025650e-01  2.173930e-01  9.339800e-02 -9.818030e-01 -1.172520e+00
 -1.172520e+00 -1.172520e+00 -9.221230e-01 -8.010960e-01 -6.503180e-01
 -7.070230e-01 -9.563030e-01 -1.172520e+00 -1.172520e+00 -1.172520e+00
 -1.172520e+00 -1.172520e+00 -1.172520e+00 -5.477320e-01 -5.745300e-01
 -7.905330e-01 -1.042566e+00 -1.172520e+00 -1.172520e+00 -8.180940e-01
 -5.567740e-01 -1.067964e+00 -1.172520e+00  4.619730e-01  4.532180e-01
 -4.572000e-02  6.334810e-01  6.278100e-02  5.182110e-01  5.176560e-01
  4.822860e-01 -3.501690e-01 -2.088150e-01 -8.005200e-02  3.918480e-01
  4.188910e-01  3.683850e-01 -2.194080e-01  3.490710e-01 -3.760000e-02
  3.716870e-01  3.827180e-01  3.421940e-01  1.098278e+00  8.484740e-01
  1.004690e+00  1.058144e+00  1.074999e+00  1.055259e+00  1.064919e+00
  1.064449e+00  9.981570e-01  8.413900e-01  9.504690e-01  9.273900e-01
  9.309450e-01  9.500820e-01  9.929070e-01  1.000763e+00  9.177600e-01
  1.028133e+00  9.636040e-01  9.352860e-01  9.342590e-01  9.189370e-01
  9.371270e-01  9.440690e-01  9.526430e-01  9.065820e-01  1.000713e+00
  1.010892e+00  1.003060e+00  1.086905e+00  1.113990e+00  1.130656e+00
  1.141431e+00  1.108521e+00  9.984790e-01  9.847070e-01  9.705610e-01
  9.915460e-01  1.004139e+00  1.005933e+00  1.025871e+00  9.528160e-01
  9.153500e-01  9.230690e-01  9.918210e-01  9.959610e-01  9.865240e-01
  9.575190e-01  9.874550e-01  9.668250e-01  9.047810e-01  9.473870e-01
  5.773100e-01  6.431930e-01  1.689720e-01  6.976960e-01  6.749810e-01
  6.316820e-01  6.222800e-01  5.756030e-01  6.523710e-01  6.614770e-01
  4.235450e-01  5.908610e-01  5.692190e-01  5.756680e-01  5.478000e-01
  5.106340e-01  5.108940e-01  5.688550e-01  5.491840e-01  5.665780e-01
  5.398710e-01  5.259680e-01  5.064250e-01  5.111730e-01  5.234320e-01
  6.148780e-01  7.147750e-01  6.656470e-01  6.041040e-01  7.253320e-01
  6.562240e-01  4.455390e-01  5.940060e-01  6.408840e-01  6.266630e-01
  6.476280e-01  6.557980e-01  5.684770e-01  5.272910e-01  3.176730e-01
  6.443520e-01  5.991280e-01  6.493010e-01  6.540290e-01  5.607820e-01
  4.909650e-01  5.003770e-01  3.408850e-01 -6.520210e-01 -3.880880e-01
 -1.967850e-01 -1.732080e-01 -4.124250e-01 -1.686800e-01 -2.778870e-01
 -3.703450e-01 -4.344220e-01 -2.598260e-01 -1.244170e-01  6.467200e-02
  1.499100e-02 -4.720600e-02 -5.042300e-02 -2.407430e-01 -1.934730e-01
  8.729400e-02  1.003000e-03 -6.243700e-02  1.496072e+00  1.398238e+00
  1.372716e+00  1.527600e+00  1.484960e+00  1.352419e+00  1.540573e+00
  1.521248e+00  7.755990e-01  8.008050e-01  9.373820e-01  9.196490e-01
  9.363760e-01  8.676320e-01  9.197790e-01  8.851880e-01  8.874210e-01
  5.966090e-01  8.396040e-01 -2.475310e-01  1.078363e+00  9.979820e-01
  9.792040e-01  9.839630e-01  7.782310e-01  5.031240e-01  1.010688e+00
  1.044025e+00  9.516420e-01  1.010949e+00  1.092800e+00  1.060623e+00
  1.074324e+00 -2.804640e-01 -1.399500e-02  5.019650e-01 -9.398410e-01
  5.016510e-01  4.276670e-01 -3.621500e-02 -8.115700e-01  6.296510e-01
  5.918090e-01 -1.718780e-01  4.723200e-02 -3.563270e-01 -1.857690e-01
  3.922900e-02 -6.212890e-01  1.969500e-02  1.737500e-02  4.791500e-01
  4.930830e-01  6.020950e-01  6.058960e-01  2.590240e-01  5.288260e-01
  4.956150e-01  6.038580e-01  5.782800e-01  5.411470e-01  5.410490e-01
  3.746700e-01 -5.746290e-01  7.047860e-01  6.793290e-01  5.314340e-01
  5.232950e-01  2.912250e-01 -1.101629e+00 -3.303240e-01 -3.866100e-01
 -1.264390e-01 -1.172520e+00 -7.576340e-01 -5.076100e-02  3.835440e-01
  9.419390e-01  1.016460e-01  7.097060e-01 -1.172520e+00 -1.337430e-01
 -2.400300e-02  2.200050e-01  5.059410e-01 -1.172520e+00 -6.477560e-01
  1.169243e+00 -9.498140e-01 -4.351720e-01  2.881510e-01  8.278760e-01
 -1.172520e+00 -1.090124e+00 -3.451970e-01 -2.598110e-01  5.488800e-02
  3.276460e-01 -1.172520e+00 -1.142032e+00  1.513030e-01  5.728700e-01
  1.014802e+00 -1.172520e+00 -1.172520e+00 -4.312740e-01  3.445020e-01
  8.130380e-01  1.319589e+00 -7.555670e-01 -6.412700e-01 -8.483200e-02
  3.226020e-01  6.040510e-01  7.612480e-01  1.255333e+00 -1.147577e+00
 -7.002510e-01  4.103410e-01  6.808070e-01  1.163669e+00  1.800879e+00
 -1.172520e+00 -1.172520e+00 -6.822360e-01 -1.150170e-01  4.346390e-01
  1.649654e+00 -1.026286e+00 -2.666680e-01  2.833640e-01  4.736190e-01
  9.630150e-01 -1.172520e+00 -8.049820e-01  2.311830e-01  4.420530e-01
  1.594020e+00 -1.172520e+00 -7.566660e-01 -2.757000e-01  2.534360e-01
  1.490694e+00 -8.945240e-01 -9.423050e-01 -3.820700e-01 -3.851200e-02
  3.375330e-01  9.246980e-01 -1.172520e+00 -9.401870e-01  3.202040e-01
  5.670700e-01 -1.172520e+00 -1.172520e+00 -9.521210e-01 -2.678240e-01
  3.899860e-01  8.974810e-01  1.380123e+00 -8.752700e-01 -3.502060e-01
 -2.796720e-01  6.693840e-01  1.134790e+00 -5.533970e-01  8.371550e-01
  1.203253e+00 -1.172520e+00 -8.043130e-01  6.605400e-01  1.046670e+00
 -8.554820e-01 -7.693410e-01 -2.892040e-01  5.464100e-02  3.803810e-01
  5.684250e-01  9.842720e-01 -9.626000e-01 -5.257800e-01  3.400300e-01
  9.350240e-01 -1.161908e+00 -8.252140e-01 -5.668390e-01  4.130980e-01
  8.117230e-01 -6.796820e-01  1.278369e+00 -7.102030e-01 -3.308050e-01
  6.155080e-01  8.089800e-01  1.123154e+00  1.701944e+00 -9.381030e-01
 -7.053560e-01 -4.571280e-01  9.253500e-02  9.719680e-01  1.616247e+00
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan           nan           nan           nan           nan
           nan].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

In [55]:
model = Sequential()

# model.add(Conv1D(8, 3))
# model.add(Conv1D(16, 3))
# model.add(PReLU('ones'))
# model.add(Conv1D(32, 3))
# model.add(Conv1D(64, 3))
# model.add(PReLU('ones'))
# model.add(AvgPool1D())
# model.add(BatchNormalization())
# model.add(Conv1D(128, 1))
# model.add(Dropout(0.1))
# model.add(Conv1D(256, 1))
# model.add(PReLU('ones'))
# model.add(BatchNormalization())
# model.add(Conv1D(64, 1))
# model.add(Conv1D(32, 1))
# model.add(PReLU('ones'))
# model.add(Flatten())
# model.add(Dense(128))
# model.add(PReLU('ones'))
# model.add(Dense(32))
# model.add(PReLU('ones'))
# model.add(Dense(24))
# model.add(PReLU('ones'))
# model.add(Dense(8))
# model.add(PReLU('ones'))
# model.add(Dense(2))

model.add(Dense(5, activation='sigmoid'))
model.add(Dense(5, activation='sigmoid'))
model.add(Dense(1))

In [56]:
import keras.backend as K
import tensorflow as tf

def root_mean_squared_error(y_true, y_pred):
        return K.sqrt(K.mean(K.square(y_pred - y_true)))

model.compile(optimizer =
              "sgd",
              # 'adam',
              loss=tf.keras.metrics.mean_squared_error,
              metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse')])

In [58]:
filepath = '/Users/precious/work/catalyst_demo/model/model_OH'
if not os.path.exists(filepath):
   os.makedirs(filepath)
checkpoint = ModelCheckpoint(
   filepath=filepath, monitor='rmse', verbose=1, save_best_only=True)
callback_list = [checkpoint]

In [59]:
val_rate = 0.1
batch_rate = 0.001

model.fit(x_train_train, y_train_train_OH, epochs=500,
          # batch_size=int(len(x_train_train)*batch_rate),
          batch_size=10,
          validation_split=val_rate,
          callbacks=callback_list, shuffle=True)

Epoch 1/500


2023-04-10 19:27:22.807494: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 1: rmse improved from inf to 0.99968, saving model to /Users/precious/work/catalyst_demo/model/model_OH


2023-04-10 19:27:23.988187: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


INFO:tensorflow:Assets written to: /Users/precious/work/catalyst_demo/model/model_OH/assets
Epoch 2/500
Epoch 2: rmse improved from 0.99968 to 0.99610, saving model to /Users/precious/work/catalyst_demo/model/model_OH
INFO:tensorflow:Assets written to: /Users/precious/work/catalyst_demo/model/model_OH/assets
Epoch 3/500
Epoch 3: rmse improved from 0.99610 to 0.99077, saving model to /Users/precious/work/catalyst_demo/model/model_OH
INFO:tensorflow:Assets written to: /Users/precious/work/catalyst_demo/model/model_OH/assets
Epoch 4/500
Epoch 4: rmse improved from 0.99077 to 0.98592, saving model to /Users/precious/work/catalyst_demo/model/model_OH
INFO:tensorflow:Assets written to: /Users/precious/work/catalyst_demo/model/model_OH/assets
Epoch 5/500
Epoch 5: rmse improved from 0.98592 to 0.98228, saving model to /Users/precious/work/catalyst_demo/model/model_OH
INFO:tensorflow:Assets written to: /Users/precious/work/catalyst_demo/model/model_OH/assets
Epoch 6/500
Epoch 6: rmse improved f

<keras.callbacks.History at 0x2ee82fdf0>

In [61]:
model =  keras.models.load_model(filepath)
pred = model.predict(x_train_test)



2023-04-10 19:32:51.817296: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


In [63]:
true_value = y_train_test
prediction = pred

true_value = y_scaler.inverse_transform(true_value)
prediction = y_scaler.inverse_transform(prediction)

corr = np.corrcoef(true_value, prediction)[0,1]

plt.scatter(true_value,prediction,color='blue')
plt.plot((-3, 1), (-3, 1), color='red')
bbox = dict(boxstyle="round", fc='1',alpha=0.5)
plt.text(2, 2.5, '$R^2=%.2f$' % (corr**2), size=12, bbox=bbox)
plt.xlabel('true_value',fontsize=13)
plt.ylabel('pred_value',fontsize=13)
plt.title("OH&CO",fontsize=14)
plt.show()

ValueError: Expected 2D array, got 1D array instead:
array=[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.