**Sequential Training of Unconstrained Hybrid Gaussian RAdial Basis Neural Networks (GRAB-NN) Models**

This code demonstrates the sequential training algorithms developed for hybrid GRAB-NN models by exploting the model architecture and using different training / optimization algorithms for solving all submodels, while still solving an outer layer optimization to ensure overall convergence. The hidden layer consists of two different types of nodes -- namely, the ANN nodes with sigmoid activation function and the RBF nodes with Gaussian activation function. The centers and widths of RBF nodes in hidden layer are optimized by IPOPT. The hidden layer weights for the ANN are estimated by typical backpropagation based first-order approaches. The output layer weights for the overall GRAB-NN model are estimated by Orthogonal Least Squares (OLS) algorithm. The sequence of ANN and RBF nodes in the hidden layer is immaterial since the network architecture represents a fully-connected NN model.

The model structure can be optimized simultaneously / sequentially by the MINLP approaches proposed in this work. However, this code simply compares the predictive performances of different combinations of model architectures by comprehensive enumeration for a fixed size / structure of the network, i.e., for a fixed total number of hidden layer nodes in the overall GRAB-NN model.

The model parameters are initialized by cubic spline interpolation during implementation (not included in this code). This code, however, provides a simple demonstration of how the sequential training is performed for the hybrid GRAB-NN models. Furthermore, with same initialization both simultaneous and sequential approaches led to the same solution practically, but the latter was significantly computationally faster than the former.

*Load the training and validation datasets and specify the input and output variables for the RBF models. Note that the user can consider any dynamic dataset for training and validation. The rows signify the time steps for transient data and the columns signify the input and output variables.*

*The nonlinear dynamic continuous stirred tank reactor (CSTR) system is chosen for demonstration of the proposed approach.*

In [1]:
import numpy as np
import scipy as sp
import pyomo.environ as pyo
from pyomo.opt import SolverFactory
from idaes.core.solvers import get_solver
get_solver()
import matplotlib.pyplot as plt
import pandas as pd
from pyDOE import *
import math as mt
import time
import json
import pickle

In [2]:
# Loading the data for model development

data = pd.read_excel("Dynamic CSTR Data.xlsx","Data", header=None).values
data = data[2:1202, 1:]

# For this specific system, the first five columns are the model inputs and the following four columns are the model outputs

input_data = data[:,0:5]
output_data = data[:,5:]

**Defining three individual optimization (sub)problems**

In [None]:
def RBF_hidden(tn,ni,Imat_t,dsr_RBF_t,nh_RBF):

    nh = nh_RBF

    # Setting up the optimization problem

    M = pyo.ConcreteModel()

    M.I1 = pyo.RangeSet(1, ni)
    M.I2 = pyo.RangeSet(1, nh)
    M.I3 = pyo.RangeSet(1, 1)
    M.I4 = pyo.RangeSet(1,tn)
    
    M.x1 = pyo.Var(M.I1,M.I2, bounds = (1e-3,2.5), initialize = 0.5)  # centermat (RBF)
    M.x2 = pyo.Var(M.I3, bounds = (1e-3,4), initialize = 1)           # sigma (RBF)

    M.y1 = pyo.Var(M.I2,M.I4)                     # PhiofD / y_h (RBF)
    
    @M.Expression(M.I2, M.I4)
    def D(M,i,j):
        return (sum((Imat_t[k-1,j-1] - M.x1[k,i])**2 for k in M.I1))**0.5   
    
    def constraint_rule_1(M,i,j):
        return M.y1[i,j] == (1/pyo.sqrt(2*mt.pi*M.x2[1]**2))*pyo.exp(-(M.D[i,j] * M.D[i,j])/(2*M.x2[1]**2)) 
    
    M.constraint_1 = pyo.Constraint(M.I2, M.I4, rule = constraint_rule_1)
           
    def GRBF_optim_det(M):             
        obj_value = sum(sum((dsr_RBF_t[i-1,j-1] - M.y1[i,j]) ** 2 for i in M.I2) for j in M.I4)
        return obj_value
    
    M.obj = pyo.Objective(rule = GRBF_optim_det, sense = pyo.minimize)
    
    solver = pyo.SolverFactory('ipopt')
    solver.options['max_iter'] = 350
    
    results = solver.solve(M, tee = True)
    
    yRBF = np.zeros((nh,tn))
    for (i,j) in M.y1:
        yRBF[i-1,j-1] = pyo.value(M.y1[i,j])
    
    return yRBF

In [None]:
def NN_BP_hidden(tn,ni,Imat_t,dsr_ANN_t,nh_ANN):

    nh = nh_ANN
    wh = np.random.rand(ni,nh)
    max_iter = 10000
    eta = 0.01
    tol = 1e-4

    for iter in range(max_iter):

        yANN = 1/(1+np.exp(-np.dot(wh.T,Imat_t)))
        err = dsr_ANN_t - yANN
        mse = np.mean(err**2)

        if (mse>=tol):
            delta_y = yANN*(np.ones_like(yANN) - yANN)*err
            wh = wh.T + eta*np.dot(delta_y,Imat_t.T)
            wh = wh.T

    return yANN

In [None]:
def RBFANN_output_OLS(y_h,dsr_t):

    wo = np.dot(np.transpose(np.linalg.pinv(y_h)) , dsr_t.T)

    yRBFANN = np.dot(y_h.T,wo)

    return yRBFANN

**Defining the Outer Optimization Problem for Overall Convergence**

In [None]:
def outer_optim_RBFANN(tn,ni,nh_RBF,nh_ANN,Imat_t,dsr_t):

    tol = 1e-2
    flag = 1
    
    nh = nh_RBF + nh_ANN

    error_h = 1e2

    # Initializing intermediate inputs y_h
    w0 = np.random.rand(ni,nh)
    y_h = np.dot(w0.T,Imat_t)

    while(error_h > tol and flag < 3):        
    
        y_RBFANN = RBFANN_output_OLS(y_h,dsr_t)
    
        dsr_RBF_t = y_h[:nh_RBF]
        dsr_ANN_t = y_h[nh_RBF:]
    
        if not (nh_RBF == 0):
            yRBF_h = RBF_hidden(tn,ni,Imat_t,dsr_RBF_t,nh_RBF)

        if not (nh_ANN == 0):
            yANN_h = NN_BP_hidden(tn,ni,Imat_t,dsr_ANN_t,nh_ANN)
    
        if (nh_RBF == 0):
            yRBFANN_h = yANN_h
        elif (nh_ANN == 0):
            yRBFANN_h = yRBF_h
        else:
            yRBFANN_h = np.vstack((yRBF_h,yANN_h))
    
        error_h = np.linalg.norm((y_h - yRBFANN_h)**2)
        
        y_h = yRBFANN_h
        flag += 1
        

    return y_RBFANN 

In [None]:
# Specification of Model Inputs and Target Outputs

data = np.concatenate((input_data, output_data), axis = 1)
ni = input_data.shape[1]
no = output_data.shape[1]
nt = ni+no

tt = data.shape[0]
tn = int(np.floor(1*tt))

# Normalizing the input and output variables

norm_mat = np.zeros((tt,nt))
delta = np.zeros((1,nt))
for i in range(nt):
    delta[:,i] = max(data[:,i]) - min(data[:,i])
    norm_mat[:,i] = (data[:,i] - min(data[:,i]))/delta[0,i]

Imat = norm_mat[:,0:ni].T
dsr = norm_mat[:,ni:ni+no].T

# TRAINING OF RBFNN

tr_steps = np.random.choice(tt, tn, replace=False)
tr_steps = np.sort(tr_steps) 

dsr_t = np.zeros((no,tn))
Imat_t = np.zeros((ni,tn))

for i in range(tn):
    ts = tr_steps[i]
    dsr_t[:,i] = dsr[:,ts]
    Imat_t[:,i] = Imat[:,ts]

In [None]:
# Comprehensive Enumeration of all possible combinations of RBF and ANN nodes in the Hidden Layer

# Specify the size of the hidden layer (i.e., number of nodes)
nh = 15

check = 1e6

k_v = []; mse_v = [];

target_unnorm = np.zeros((tn,no))
for i in range(no):
    target_unnorm[:,i] = np.transpose(dsr_t[i])*delta[0,ni+i] + min(data[:,ni+i])

for i in range(nh+1):
    nh_ANN = i
    nh_RBF = nh - nh_ANN

    if (nh_RBF == 0):
        yANN = outer_optim_RBFANN(tn,ni,nh_RBF,nh_ANN,Imat_t,dsr_t)
        yANN_unnorm = np.zeros((tn,no))
        for i in range(no):
            yANN_unnorm[:,i] = yANN[:,i]*delta[0,ni+i] + min(data[:,ni+i])
        mse = np.mean((np.divide((target_unnorm - yANN_unnorm),target_unnorm))**2)
        k = ni*nh + nh*no
    elif (nh_ANN == 0):
        yRBF = outer_optim_RBFANN(tn,ni,nh_RBF,nh_ANN,Imat_t,dsr_t)
        yRBF1_unnorm = np.zeros((tn,no))
        for i in range(no):
            yRBF1_unnorm[:,i] = yRBF[:,i]*delta[0,ni+i] + min(data[:,ni+i])
        mse = np.mean((np.divide((target_unnorm - yRBF1_unnorm),target_unnorm))**2)
        k = ni*nh + 1 + nh*no
    else:
        yRBFANN = outer_optim_RBFANN(tn,ni,nh_RBF,nh_ANN,Imat_t,dsr_t)
        yRBFANN_unnorm = np.zeros((tn,no))
        for i in range(no):
            yRBFANN_unnorm[:,i] = yRBFANN[:,i]*delta[0,ni+i] + min(data[:,ni+i])
        mse = np.mean((np.divide((target_unnorm - yRBFANN_unnorm),target_unnorm))**2)
        k = ni*nh + 1 + nh*no

    k_v.append(k)
    mse_v.append(mse)

    # Note that in this code, in absence of cubic spline interpolation the monotonicity of MSE values may not be guaranteed. Random initialization may
    # lead to local optima. The number of parameters in all different combinations have been kept the same / similar to ensure fair comparison

    if (mse_v[-1] < check and nh_RBF > 0 and nh_ANN > 0):
        check = mse_v[-1]
        yopt = yRBFANN

In [None]:
# Evaluation of the optimal model architecture, i.e., optimal values of nh_ANN and nh_RBF

nh_ANN_opt = np.argmin(mse_v)
nh_RBF_opt = nh - nh_ANN_opt

print('Optimal GRABNN architecture ==> nh_ANN:', nh_ANN_opt, 'and nh_RBF:', nh_RBF_opt)