<a href="https://colab.research.google.com/github/nathanaelsee/diffplasticity-RNN/blob/master/diffplastRNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Extension of Uber's Differentiable Plasticity:
https://github.com/uber-research/differentiable-plasticity/

##Examples/tutorials used:

https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html  
https://jovianlin.io/pytorch-with-gpu-in-google-colab/  

# Check Python version and GPU

In [0]:
import sys
sys.version
!nvidia-smi

# Install PyTorch

In [0]:
!pip3 install http://download.pytorch.org/whl/cu92/torch-0.4.1-cp36-cp36m-linux_x86_64.whl

# Import PyTorch

In [43]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

print("PyTorch version:", torch.__version__)
print("CUDA version:\t", torch.version.cuda)
print("cuDNN version:\t", torch.backends.cudnn.version())

PyTorch version: 0.4.1
CUDA version:	 9.2.148
cuDNN version:	 7104


# Download Dataset

Dataset of names and their lingustic backgrounds taken from https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

In [0]:
!wget https://download.pytorch.org/tutorial/data.zip
!unzip data.zip

In [53]:
from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
import unicodedata
import string

all_letters = string.ascii_letters + " .,;'"

# Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

print('Ślusàrski ->', unicodeToAscii('Ślusàrski'))

# Build the category_lines dictionary, a list of names per language
category_lines = {}
all_categories = []

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for filename in findFiles('data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

print(all_categories)
print(category_lines['Italian'][:5])

Ślusàrski -> Slusarski
['Irish', 'French', 'Arabic', 'Polish', 'German', 'Korean', 'Portuguese', 'Scottish', 'Greek', 'Vietnamese', 'Spanish', 'Czech', 'Russian', 'Dutch', 'Chinese', 'English', 'Japanese', 'Italian']
['Abandonato', 'Abatangelo', 'Abatantuono', 'Abate', 'Abategiovanni']


# Build the model

Credits to Uber: https://github.com/uber-research/differentiable-plasticity/  
Modifications made to original sample code.

In [0]:
class DiffPlastRNN(nn.Module):
    def __init__(self, input_size, output_size):
        
        super(DiffPlastRNN, self).__init__()
        self.input_size  = input_size
        self.output_size = output_size
        
        # Initialize trainable parameters
        self.w     = nn.Parameter(.01 * torch.randn(input_size, output_size), requires_grad=True) # The matrix of fixed (baseline) weights
        self.alpha = nn.Parameter(.01 * torch.randn(input_size, output_size), requires_grad=True) # The matrix of plasticity coefficients
        self.eta   = nn.Parameter(.01 * torch.ones(1),                        requires_grad=True) # The "learning rate" of plasticity

    # Run the network for one timestep
    def forward(self, input, yin, hebb):
        
        # using tanh as non-linearity, as per Uber's implementation
        yout = F.tanh( yin.mm(self.w + torch.mul(self.alpha, hebb)) + input )
        
        # Oja's rule. yin, yout are row vectors (dim (1,N))
        hebb = hebb + self.eta * torch.mul((yin.unsqueeze(1) - torch.mul(hebb , yout.unsqueeze(0))) , yout.unsqueeze(0))
        
        return yout, hebb

    # Return an initialized, all-zero hidden state
    def initialZeroState(self):
        return torch.zeros(1, self.input_size)

    # Return an initialized, all-zero Hebbian trace
    def initialZeroHebb(self):
        return torch.zeros(self.input_size, self.output_size)

In [0]:
net = DiffPlastRNN(n_letters, n_categories)

# Enable GPU

_**Note**: You could enable this line to run the codes on GPU_

In [48]:
use_cuda = True
if use_cuda and torch.cuda.is_available():
    net.cuda()
    print("Using CUDA!")
else:
    net.cpu()
    print("Using CPU!")

Using CUDA!


# Initialize Hyperparameters

In [0]:
# Hyperparameters
n_letters = len(all_letters)
n_categories = len(all_categories)
learning-rate = 1e-6
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

below is sample code from original tutorial, ignore

# Training the FNN Model

This process might take around 3 to 5 minutes depending on your machine. The detailed explanations are listed as comments (#) in the following codes.

In [0]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        labels = Variable(labels)
        
        if use_cuda and torch.cuda.is_available():
            images = images.cuda()
            labels = labels.cuda()
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))


Epoch [1/5], Step [100/600], Loss: 0.3270
Epoch [1/5], Step [200/600], Loss: 0.2756
Epoch [1/5], Step [300/600], Loss: 0.2773
Epoch [1/5], Step [400/600], Loss: 0.1931
Epoch [1/5], Step [500/600], Loss: 0.1273
Epoch [1/5], Step [600/600], Loss: 0.2067
Epoch [2/5], Step [100/600], Loss: 0.1348
Epoch [2/5], Step [200/600], Loss: 0.1073
Epoch [2/5], Step [300/600], Loss: 0.1898
Epoch [2/5], Step [400/600], Loss: 0.1743
Epoch [2/5], Step [500/600], Loss: 0.0714
Epoch [2/5], Step [600/600], Loss: 0.0631
Epoch [3/5], Step [100/600], Loss: 0.0785
Epoch [3/5], Step [200/600], Loss: 0.1645
Epoch [3/5], Step [300/600], Loss: 0.0878
Epoch [3/5], Step [400/600], Loss: 0.1399
Epoch [3/5], Step [500/600], Loss: 0.0438
Epoch [3/5], Step [600/600], Loss: 0.0766
Epoch [4/5], Step [100/600], Loss: 0.1236
Epoch [4/5], Step [200/600], Loss: 0.0239
Epoch [4/5], Step [300/600], Loss: 0.0508
Epoch [4/5], Step [400/600], Loss: 0.1045
Epoch [4/5], Step [500/600], Loss: 0.1045
Epoch [4/5], Step [600/600], Loss:

# Testing the FNN Model

Similar to training the neural network, we also need to load batches of test images and collect the outputs. The differences are that:

1. No loss & weights calculation
2. No wights update
3. Has correct prediction calculation


In [0]:
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    
    if use_cuda and torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
    
    
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)  # Choose the best class from the output: The class with the best score
    total += labels.size(0)                    # Increment the total count
    correct += (predicted == labels).sum()     # Increment the correct count
    
print('Accuracy of the network on the 10K test images: %d %%' % (100 * correct / total))

Accuracy of the network on the 10K test images: 97 %


# Save the trained FNN Model for future use

We save the trained model as a pickle that can be loaded and used later.

In [0]:
torch.save(net.state_dict(), 'fnn_model.pkl')