## Training neural networks with NALU as proposed by Trask et al. in their paper: https://arxiv.org/abs/1808.00508

Following are the main equations that govern the supremacy of NALU.

![](https://i.ibb.co/RhSvJby/NAC-NALU.png)



## Imports

In [18]:
import torch
from torch import nn
from NAC import NAC
from NALU import NALU
from torch.optim import RMSprop, Adam

import pandas as pd
import numpy as np

## Experiments (for Synthetic Arithmetic Tasks (refer Appendix B in the paper))

- Generate sample data
- Enlist the arithmetic operations
- Train shallow networks w.r.t the arithmetic operations
- Evaluate the network implemented with NALU

### Generate data

Generate 2-D and 1-D data-points (1-D for square and square-roots) sampled from a uniform distribution within the range of [*min_val*, *max_val*] for the arithmetic operations. 

In [19]:
def generate_data(min_val, max_val, observations, op):
    data = np.random.uniform(min_val, max_val, size=(observations, 2))
    if op == '+':
        target = data[:, 0] + data[:, 1]
    elif op == '-':
        target = data[:, 0] - data[:, 1]
    elif op == '*':
        target = data[:, 0] * data[:, 1]
    elif op == '/':
        target = data[:, 0] / data[:, 1]
    elif op == '^2':
        data = np.random.uniform(min_val, max_val, size=(observations, 1))
        target = data ** 2
    elif op == 'sqrt':
        data = np.random.uniform(min_val, max_val, size=(observations, 1))
        target = np.sqrt(data)
    
    return data, target

### Enlist the arithmetic operations

In [20]:
ops = ['+', '-', '*', '/', 'sqrt', '^2']

### Utility function for training the network

In [21]:
def train_network(model, X, y, epochs, criterion, optimizer):
    # Create tensor from the numpy arrays
    X = torch.from_numpy(X)
    X = X.float()
    y = torch.from_numpy(y)
    y = y.float().view(-1, 1)
    for epoch in range(1, epochs + 1):
        model.train()
        # Forward Propagation
        y_pred = model(X)
        # Compute and print loss
        loss = criterion(y_pred, y)
        # Output logging
        if epoch%500 == 0:
            print('epoch: ', epoch,' loss: ', loss.item())
        # Zero the gradients
        optimizer.zero_grad()
        # Perform a backward pass (backpropagation)
        loss.backward()
        # Update the parameters
        optimizer.step()
    
    return model

### Helper function for model validation

In [22]:
def validate_model(op, model, validation_data):
    # Unpack the data
    X_valid, y_valid = validation_data
    
    # Create tensors
    X_valid = torch.from_numpy(X_valid)
    X_valid = X_valid.float()
    y_valid = torch.from_numpy(y_valid)
    y_valid = y_valid.float().view(-1, 1)
    
    # Sets the all requires_grad to False: https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615
    with torch.no_grad():
        model.eval()
        preds = (model(X_valid))
        # Convert into numpy arrays and flatten them
        original = y_valid.cpu().numpy().flatten()
        prediction = preds.cpu().numpy().flatten()
        # Determine how close the predicitions are to true values (upto three decimal places)
        accuracy = np.isclose(prediction, original, rtol=1e-3)
        accuracy = accuracy.astype(np.int32).mean()
        # Return the accuracy score
        return accuracy * 100

### Train shallow networks w.r.t the arithmetic operations and evaluation
> Adam fails miserably with default settings :O

In [23]:
test_scores = {}
for op in ops:
    # Define the train/validation/test sets
    X_train, y_train = generate_data(20, 30, 10000, op)
    X_valid, y_valid = generate_data(20, 30, 3000, op)
    X_test, y_test = generate_data(10, 40, 7000, op)
    
    # Define network
    if op == '^2':
        # A slightly deeper network for the exponentiation: https://github.com/Nilabhra/NALU
        model = nn.Sequential(NALU(X_train.shape[1], 2), NALU(2, 1))
        # Define the loss for model
        criterion = torch.nn.MSELoss()
        # Define RMSProp as the optimizer
        optimizer = RMSprop(model.parameters())
        
        # Train the network
        print('------- Training the model for [{}] -------'.format(op))
        trained_model = train_network(model, X_train, y_train, 10000, criterion, optimizer)
        print('------- Training completed! -------')
        # Model validation
        print('------- Validting the model for [{}] -------'.format(op))
        validation_data = (X_valid, y_valid)
        print('Validation Accuracy for op[{}] '.format(op) + str(validate_model(op, trained_model, validation_data)))
        # Model performance on test data
        print('------- Testing the model for [{}] -------'.format(op))
        validation_data = (X_test, y_test)
        print('Test Accuracy for op[{}] '.format(op) + str(validate_model(op, trained_model, validation_data))) 
        test_scores['^2'] = validate_model(op, trained_model, validation_data)
     
    else:
        model = NALU(X_train.shape[1], 1)
        # Define the loss for model
        criterion = torch.nn.MSELoss()
        # Define RMSProp as the optimizer
        optimizer = RMSprop(model.parameters())

        # Train the network
        print('------- Training the model for [{}] -------'.format(op))
        trained_model = train_network(model, X_train, y_train, 10000, criterion, optimizer)
        print('------- Training completed! -------')
        # Model validation
        print('------- Validting the model for [{}] -------'.format(op))
        validation_data = (X_valid, y_valid)
        print('Validation Accuracy for op[{}] '.format(op) + str(validate_model(op, trained_model, validation_data)))
        # Model performance on test data
        print('------- Testing the model for [{}] -------'.format(op))
        validation_data = (X_test, y_test)
        print('Test Accuracy for op[{}] '.format(op) + str(validate_model(op, trained_model, validation_data))) 
        test_scores[op] = (validate_model(op, trained_model, validation_data))

------- Training the model for [+] -------
epoch:  500  loss:  2.474065065383911
epoch:  1000  loss:  0.2002781629562378
epoch:  1500  loss:  0.01778424344956875
epoch:  2000  loss:  0.0016054622828960419
epoch:  2500  loss:  0.00014577888941857964
epoch:  3000  loss:  1.3271685020299628e-05
epoch:  3500  loss:  1.209169568028301e-06
epoch:  4000  loss:  1.1231373520104171e-07
epoch:  4500  loss:  1.1934092825072184e-08
epoch:  5000  loss:  2.0231571973283735e-09
epoch:  5500  loss:  7.221577225102749e-10
epoch:  6000  loss:  4.369590800301637e-10
epoch:  6500  loss:  3.7942810005020533e-10
epoch:  7000  loss:  1.8777208954379176e-10
epoch:  7500  loss:  1.8777208954379176e-10
epoch:  8000  loss:  1.8777208954379176e-10
epoch:  8500  loss:  1.8777208954379176e-10
epoch:  9000  loss:  1.5406258213612745e-10
epoch:  9500  loss:  1.5406258213612745e-10
epoch:  10000  loss:  8.952629609870755e-11
------- Training completed! -------
------- Validting the model for [+] -------
Validation Acc

In [25]:
pd.DataFrame.from_dict(test_scores, orient='index', columns=['Accuracy'])

Unnamed: 0,Accuracy
+,100.0
-,99.842857
*,100.0
/,0.242857
sqrt,0.185714
^2,0.0


Performance seems to be weird. :/

## Acknowledgements

I am grateful to the following repositories from which I took references: 
- [NALU by Nilabhra Roy Chowdhury](https://github.com/Nilabhra/NALU/)
- [NALU by Valeri](https://github.com/vrxacs/NALU)