# ELM class example

** ELM class implements a single hidden layer ELM **

As an example we will use the popular MNIST dataset. 

In [None]:
import os
import tensorflow as tf
import keras
from keras.datasets import mnist;

Using TensorFlow backend.


In [2]:
train, test = mnist.load_data(os.getcwd() + "/elm_tf_test" + "mnist.txt");
x_train, y_train = train
x_test, y_test = test
del train, test
y_train = keras.utils.to_categorical(y_train, num_classes=10)
y_test = keras.utils.to_categorical(y_test, num_classes=10)

# the input has to be flattened in order to be feeded to the network

x_train = x_train.reshape(-1, 28* 28)
x_test = x_test.reshape(-1, 28 * 28)


In [3]:
input_size = 28*28 
output_size = 10 # mnist has 10 output classes 

### Creating ELM classifier

In [4]:
from tfelm.elm import ELM

elm1 = ELM(input_size=input_size, output_size=output_size, l2norm=10e1, name='elm1')

This creates an ELM network with 784 input neurons and 10 output neurons. 
The l2norm is a regularization parameter used in training.
For now the hidden layer size hasn't been specified. 
The hidden layer is added to the network through the add_layer method

In [5]:
elm1.add_layer(n_neurons=1024); 


This adds an hidden layer comprised of 1024 hidden layer neurons. 

By default the activation is set to tf.sigmoid and the initialization of weights and biases of the hidden layer is a modified He initialization: 

The weights are initialized by sampling from a random normal distribution with variance of 2/n_in, where n_in is the size of the previous layer, in this case the input layer. 

Actual network initialization is done via the compile method:


In [6]:
elm1.compile()

elm1 has been initialized


### Training the Network: fit method

To train the network there are two main methods: train and fit. 
Fit is the most  basic and simple method but is suitable only for small datasets. 

It needs in input two numpy arrays for the training instances and labels and an optional batch_size argument. 
Internally a TensorFlow Iterator and a TensorFlow Dataset objects are created from the numpy arrays for the training as this it is the most efficient way to train a model according to TensorLow documentation. 

It should be noted that, unlike conventional Neural Networks, the batch_size doesn't change the outcome of the training but it only affect training time and memory required to train the network. The smaller the less the memory required but the more the training time.


In [7]:
elm1.fit(x_train, y_train, batch_size=500)

Creating Dataset and Iterator from tensors
25/120 ETA:0:00
50/120 ETA:0:00
75/120 ETA:0:00
100/120 ETA:0:00
Training of ELM elm1 ended in 0:0:0.672470
####################################################################################################
Evaluating network performance
Accuracy: 0.9210833


0.92108333

Now that the network has been trained, we can evaluate the performance on the test set via evaluate method. 

In [8]:
elm1.evaluate(x_test, y_test, batch_size=500) # it accepts batch size as also the evaluation is done by batching to allow bigger datasets to be evaluated as we will see.  

Evaluating network performance
Accuracy: 0.9181999


0.9181999

To return a numpy array with actual predictions it exist a prediction method:


In [9]:
pred = elm1.predict(x_test, batch_size=500)

Predicting...
Done


This is pretty much the most basic functionalities offered by the API and are suitable for small/medium datasets as it is required the dataset is fitted into memory as an array. 

## Training and evaluating bigger Datasets which cannot be fitted into memory

We will use the same MNIST dataset for example purpose only. 

Instead of calling the fit method we should call the train method

The train method requires a TensorFlow Iterator object. the TensorFlow Iterator object thus must be created esternally from the dataset. 

There are various ways to create a TF iterator object from a dataset and this strongly depends on what is your input pipeline and in what format your dataset is.

Tutorials, documentation and examples on Dataset and Iterator is available at: https://www.tensorflow.org/

** As an example suppose we have an input pipeline in which we want to do some pre-process and data augmentation on the original MNIST dataset: **

In [10]:
from keras.preprocessing.image import ImageDataGenerator # we will use keras imagedatagen for simplicity

In [11]:
batch_size = 2500
n_epochs = 10 # as the dataset is augmented the training now will be done on more "epochs", the resulting dataset will be 10 times the original.
              # It could be argued that calling these epochs is not strictly correct as each "epoch" is different from the previous: 
              # the dataset is augmented via random trsformations
        
# keras ImageDataGen requires a 4-D tensor in input:
x_train = x_train.reshape(-1, 28, 28, 1)

datagen = ImageDataGenerator(
        
    width_shift_range=0.05,
    height_shift_range=0.05
)

# random height and weight shifting 


datagen.fit(x_train)

In [12]:
batches_per_epochs = len(x_train) // batch_size

def gen():
    n_it = 0
    
    for x, y in datagen.flow(x_train, y_train, batch_size=batch_size):
        x = x.reshape(batch_size, 28 * 28) # the network requires a flatten array in input we flatten here
        if n_it % 100 == 0:
            print("generator iteration: %d" % n_it)
        yield x, y
        n_it += 1
        if n_it >= batches_per_epochs * n_epochs:
            break

data = tf.data.Dataset.from_generator(generator=gen,
                                      output_shapes=((batch_size, 28 * 28,), (batch_size, 10,)),
                                      output_types=(tf.float32, tf.float32))

Here we have defined a python generator from the keras ImageDataGenerator and we have used this generator to create a TensorFlow dataset. This because it isn't possible to create a Dataset directly from the keraas generator

In [13]:
iterator = data.make_one_shot_iterator() # a TF iterator is created from the Dataset

elm1.train(iterator, n_batches=batches_per_epochs*n_epochs)


generator iteration: 0
25/240 ETA:0:47
50/240 ETA:0:39
75/240 ETA:0:34
100/240 ETA:0:29
generator iteration: 100
125/240 ETA:0:23
150/240 ETA:0:18
175/240 ETA:0:13
200/240 ETA:0:08
generator iteration: 200
225/240 ETA:0:03
Training of ELM elm1 ended in 0:0:50.281698
####################################################################################################


The train method has the optinal n_batches argument which serves only the purpose of extimating the ETA. 
Note that the train method does not return the network performance. 
This should be done via evaluate. 

In [14]:
elm1.evaluate(x_test, y_test,  batch_size=1024)

Evaluating network performance
Accuracy: 0.9127730


0.912773

To find the performance on the training set, due to the fact that ELM are not trained with gradient descent as conventional Neural Networks, one should call the evaluate function passing an iterator on the training set. 

Note that in fact, the actual training set is different now when evaluating, due to the random data augmentation. Unfortunately this is the only way to asses training performance in such scenario without loading and saving the augmented dataset or resorting to gradient descent to train the ELM thus giving up fast training.  

As the iterator was made as one shot only it should be re-created: 

In [15]:
iterator = data.make_one_shot_iterator()
elm1.evaluate(tf_iterator = iterator,  batch_size=1024)


Evaluating network performance
generator iteration: 0
generator iteration: 100
generator iteration: 200
Accuracy: 0.8673667


0.8673667

** Instead of creating two times the iterator a better way is to create an initializable iterator in first place, before training :**

In [16]:
iterator = data.make_initializable_iterator()

The iterator should be initialized: 
    
    

In [17]:
with tf.Session() as sess: 
    elm1.sess.run(iterator.initializer)

We have accessed the TF session attribute in ELM object, which has its own session and initialized the iterator inside the ELM session.
Now taht the iterator has been initialized the iterator can be used for training

In [18]:
elm1.train(iterator, n_batches=batches_per_epochs*n_epochs)

generator iteration: 0
25/240 ETA:0:47
50/240 ETA:0:39
75/240 ETA:0:34
100/240 ETA:0:28
generator iteration: 100
125/240 ETA:0:23
150/240 ETA:0:18
175/240 ETA:0:13
200/240 ETA:0:08
generator iteration: 200
225/240 ETA:0:03
Training of ELM elm1 ended in 0:0:49.798366
####################################################################################################


In [19]:
# this re-initialize the iterator before calling the evaluate on the training set

with tf.Session() as sess: 
    elm1.sess.run(iterator.initializer)


elm1.evaluate(tf_iterator = iterator,  batch_size=1024)

Evaluating network performance
generator iteration: 0
generator iteration: 100
generator iteration: 200
Accuracy: 0.8676217


0.86762166

Note how the training accuracy is slightly different due to the random trasformation due to data augmentation. 

This concludes this brief tutorial for ELM class training 

### Custom Activation and Weights and Bias initialization 

In [20]:
def softsign(x):
    y = x / (1+ tf.abs(x))
    return y

# this is a simple function which implements a softsign activation function 

In [21]:
elm1.add_layer(1024, activation=softsign)

This softsign function can be passed to the add_layer method, in the same way any TensorFlow pre-defined tf.nn.relu, tf.nn.elu  function etc can be passed

The add_layer method supports also custom Weights and Bias Initialization

For example if we wish to initialize the ELM with an orthogonal weight matrix and the Bias as a unit norm vector: 

In [22]:
ortho_w = tf.orthogonal_initializer()
uni_b= tf.uniform_unit_scaling_initializer()

init_w = tf.get_variable(name='init_w',shape=[input_size, 1024], initializer=ortho_w)
init_b = tf.get_variable(name='init_b', shape=[1024], initializer=uni_b)

elm1.add_layer(1024, activation= softsign, w_init= init_w, b_init = init_b)

Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.


We have used pre-made TensoFlow initialization functions but note that numpy or every other function can be used.

** The important hing is that to w_init and b_init are passed TensorFlow variables with desired values. **


In [24]:
# with numpy 
import numpy as np 

init_w = tf.Variable(name='init_w', initial_value=np.random.uniform(low=-1, high=1, size=[input_size, 1024]))

elm1.add_layer(1024, activation= softsign, w_init=init_w, b_init = None)


** Note that when using custom initialization both b_init and w_init should be specified, setting b_init to None creates a network without bias **