In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Extreme Learning Machine 
## Concept 

ELM is an machine learning algorithm developed by Guang-Bin Huang in 2006 that eliminates gradient based learning. Instead it uses Moore-Penrose Generalised Matrix pseudo inverse to calculate the output. Essentially it is a single layer feed forward neural network that initialises random weights and biases between the features the first hidden layer, making the output of the first hidden layer constant. The weights between the hidden layer and the output layer are then calculated using pseudo inverse of the output of the first hidden layer (Beta). This algorithm provides much shorter training time in comparison to generic gradient based learning techniques. 

Mathematically, a matrix B* will be a pseudo inverse of matrix B if it satisfies the following four conditions:
1) BB* B = B
2) B* BB* = B*
3) (BB* )^H = BB*
4) (B* B )^H = BB*
where B^H is the conjugate transpose of matrix B. 

The shortest length least squares solution of the linear system Ax = c will be x = A* c where A* is the pseudo inverse of A. 

The implementation of the ELM algorithm is as follows:
1) Initialise random weights and biases
2) Calculate H where H = activation(X*W + b); In this implementation the sigmoid activation function is used.
3) Derive Beta = H * y where y is the vector of true values and H** is the pseudo inverse of H. 

The pseudo inverse is calculated using the numpy linear algebra pinv command. So the final expression becomes H*Beta = y.

Source: Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme learning machine: Theory and applications, Neurocomputing, Volume 70, Issues 1–3, 2006, Pages 489-501, ISSN 0925-2312.
https://doi.org/10.1016/j.neucom.2005.12.126.
(https://www.sciencedirect.com/science/article/pii/S0925231206000385)

# Sections

The first section defines three functions: The sigmoid activation function since this is a regression problem, The ELM algorithm itself and the preds function that outputs predicted and true values from the test set. 


In [22]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x)) 
def ELM(weights_shape, bias_shape, train_shape, train_X, train_y):
    np.random.seed(42)
    weights = np.random.rand(weights_shape[0], weights_shape[1])
    bias = np.random.rand(bias_shape[0],bias_shape[1])
    
    
    H = sigmoid(np.dot(train_X,weights) + bias)
    H_pinv = np.linalg.pinv(H)
    Beta = np.dot(H_pinv,train_y)
    
    print("Beta shape: ", Beta.shape)
    
    return Beta, weights, bias
def preds(Beta, test_X, test_y, weights, bias):
    H_pred = sigmoid(np.dot(test_X,weights)+bias)
    y_pred = np.dot(H_pred, Beta)
    
    print("predictions: ", y_pred[:8])
    print("true values: ", test_y[:8])

The following section reads the dataset as a Pandas dataframe that has been regularised with the following formula: 
(x - x_mean) / x_range. This allows the NN model that deals with data in a similar range which improves the generalisation power. Upon reading the data, the target values (in this case the total CH4 conversion percentage) are split into a label dataset (y) and the rest is the features dataset (X). They are converted into a numpy as array and split into a training and testing dataset in the ratio 9:1. 

In [36]:
data = pd.read_excel("regularised_data.xlsx")
print(data.head())
y = data["target"]
X = data.drop(columns=["target"])
X = X.values
y = y.values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1)

   Unnamed: 0   Co  Ba_promoter  Ca_promoter  Cu_promoter   Ce  Mn   La  \
0           4  0.0          0.0          0.0            0  0.0   0  0.0   
1           5  0.0          0.0          0.0            0  0.0   0  0.0   
2           6  0.0          0.0          0.0            0  0.0   0  0.0   
3           7  0.0          0.0          0.0            0  0.0   0  0.0   
4           8  0.0          0.0          0.0            0  0.0   0  0.0   

         Ni   Pt  ...  ZSM-5  H-ZSM-5  react Temperature (C)  reac Ar %vol  \
0  0.100000  0.0  ...      0        0               0.636364           0.0   
1  0.100000  0.0  ...      0        0               0.727273           0.0   
2  0.100000  0.0  ...      0        0               0.818182           0.0   
3  0.166667  0.0  ...      0        0               0.454545           0.0   
4  0.166667  0.0  ...      0        0               0.545455           0.0   

   reac N2 %vol  reac He %vol  GHSV (mgmin/mL)  Time on stream (min)  \
0       

It has been reported in the literature that taking the number of nodes equal to the number of data points in the dataset makes it a powerful model with impressive results in some of the most common datasets. Hence I have taken the hidden node number to be 4967, equal to the number of data points in out training set. 

In [30]:
HIDDEN_NODES = 4967
weights_shape = (X_train.shape[1], HIDDEN_NODES)
bias_shape = (1, HIDDEN_NODES)
train_shape = X.shape

In [33]:
B, weights, bias = ELM(weights_shape, bias_shape, train_shape, X_train, y_train)

Beta shape:  (4967,)


With the newly calculated Beta, some predictions are made on dataset and both are printed together to have a fair idea of the model's performance. So far, it isn't that impressive. From the looks of it, it's evident that the model is seriously underfitting the dataset. 

TO DO: Needs more work on the model complexity, further data processing, implementation of performance metric preferably mean squared error and/or mean absolute error.  

In [34]:
preds(B, X_test, y_test, weights, bias)

predictions:  [115.07421875  36.9296875   67.77734375  38.546875    49.55078125
  62.0078125   59.7890625   38.1875    ]
true values:  [63.7    18.9097 89.0187 37.1753 26.0982 56.7    97.7427 43.7551]
