# Extreme Learning Machines from Scratch
### Author: [Glenn Paul Gara](glenngara.github.io)
<!--The notebook is originally uploaded to the [GitHub repository](github.com).-->

<!---
Extreme Learning Machines (ELMs) are single-hidden layer feedforward neural networks (SLFN) that learns very fast compared to gradient-based learning techniques. It’s like a classical one hidden layer neural network without a learning process. This kind of neural network does not perform iterative tuning, making it faster than networks trained using backpropagation method with better generalization performance as claimed by the author.

ELM is based on the Universal Approximation Theorem which states that:

> _"A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function."_

This means that ELM can solve a classification and regression tasks with significant accuracy if it has sufficient hidden neurons and training data to learn for all hidden neurons.

In this tutorial, we will build an Extreme Learning Machine model for `MNIST handwritten digits`.

### Outline

1. Package Imports
2. Dataset Loading and Preprocessing
3. Network Initialization
-->
---

Let's import necessary packages for this tutorial.

In [24]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from scipy.linalg import pinv2
from tqdm.notebook import tqdm

Next, we need to load our datasets to train our ELM and test the model.

In [2]:
train = pd.read_csv('dataset/mnist_train.csv')
test = pd.read_csv('dataset/mnist_test.csv')

We will use a `MinMaxScaler` to normalize our features within the range of (0,1), and a `OneHotEncoder` to transform our targets into one hot encoding.

In [3]:
onehotencoder = OneHotEncoder(categories='auto')
scaler = MinMaxScaler()

X_train = scaler.fit_transform(train.values[:,1:])
y_train = onehotencoder.fit_transform(train.values[:,:1]).toarray()

X_test = scaler.fit_transform(test.values[:,1:])
y_test = onehotencoder.fit_transform(test.values[:,:1]).toarray()

In [4]:
X_test.shape

(10000, 784)

To initialize our network, we need to identify the following:
1. The size of the input layer, which is the number of input features
2. Number of hidden neurons
3. Input to hidden weights
4. Hidden layer activation function

The size of the input layer refers to the number of input features of the dataset.

In [5]:
input_size = X_train.shape[1]

Let's initalize the number of hidden neurons to 1000.

In [6]:
hidden_size = 1000

Next, we need to initialize input weights and biases randomly drawn from a Gaussian distribution.

In [7]:
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

Hidden layer activation function

In [8]:
def relu(x):
    return np.maximum(x, 0, x)

In [22]:
def hidden_nodes(X):
    G = safe_sparse_dot(X, input_weights)
    G = G + biases
    H = relu(G)
    return H

output_weights = np.dot(pinv2(hidden_nodes(X_train)), y_train)

In [19]:
def predict(X):
    out = hidden_nodes(X)
    out = np.dot(out, output_weights)
    return out

In [23]:
prediction = predict(X_test)
correct = 0
total = X_test.shape[0]

for i in range(total):
    predicted = np.argmax(prediction[i])
    actual = np.argmax(y_test[i])
    correct += 1 if predicted == actual else 0
accuracy = correct/total
print('Accuracy for ', hidden_size, ' hidden nodes: ', accuracy)

Accuracy for  1000  hidden nodes:  0.9427
