In the lecture, we talked about RBF networks and LSTM networks. We will use the RBF networks to show how we can implement custom layer in tesorflow

# RBF Networks

Implementing a custom layer in tensorflow is simple. We just have to implement a class with a `build` method that inicializes the parameters according to the size of the input, and a `call` method that implements the computation of the layer itself.

In [None]:
import tensorflow as tf

class RBFLayer(tf.keras.layers.Layer):
    def __init__(self, num_outputs, init=None):
        super(RBFLayer, self).__init__()
        self.num_outputs = num_outputs
        self.init = init
    
    def build(self, input_shape):
        # TODO add initializer to centers
        self.centers = self.add_weight("centers", shape=(self.num_outputs, int(input_shape[-1])))
        self.beta = self.add_weight("beta", shape=(self.num_outputs,))
    
    def compute_output_shape(self, input_shape):
        return input_shape[0], self.num_outputs
    
    def call(self, x):
        C = tf.expand_dims(self.centers, -1)        
        H = tf.transpose(C - tf.transpose(x))
        
        return tf.math.exp(-self.beta * tf.reduce_sum(tf.pow(H,2), axis=1))                               

In [None]:
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
x, y = iris.data, iris.target

# TODO implement KMeans initializer
init = None

model = tf.keras.Sequential([
    RBFLayer(10, init=init),
    tf.keras.layers.Dense(3, activation=tf.nn.softmax)
]
)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])

model.fit(x, y, epochs=1000, verbose=False)
np.mean(np.argmax(model.predict(x), axis=1)==y)

## Exercise

We can see that our implementation does not work very well. In the lecture, we mentioned that the centers of the input neurons are commonly initialized using the $k$-means algoriths. Try to change our implementation so that it uses this method. (Hint: the `add_weights` method has an `initializer` argument.)

# LSTM networks

LSTM networks are used to process texts and time-series data. We will show how they can be used to generate text. We will use Nietzsches texts as a training set

In [None]:
import numpy as np
import random
import sys

'''
    Example script to generate text from Nietzsche's writings.
    At least 20 epochs are required before the generated text
    starts sounding coherent.
    It is recommended to run this script on GPU, as recurrent
    networks are quite computationally intensive.
    If you try this script on new data, make sure your corpus
    has at least ~100k characters. ~1M is better.
'''

path = tf.keras.utils.get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path).read().lower()
print('corpus length:', len(text))

chars = set(text)
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 20
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool_)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool_)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
    
print('Build model...')
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.LSTM(512, return_sequences=False))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(len(chars), activation=tf.nn.softmax))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')


def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    a = a/np.sum(a)
    return np.argmax(np.random.multinomial(1, a, 1))

# train the model, output generated text after each iteration
for iteration in range(1, 60):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(X, y, batch_size=128, epochs=1)

    start_index = random.randint(0, len(text) - maxlen - 1)

    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print()
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for _ in range(400):
            x = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x[0, t, char_indices[char]] = 1.

            preds = model.predict(x, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()


Running this cell on a PC without GPU would take several hours. Therefore I ran it in the Google Colab platform and you can [check the results](https://colab.research.google.com/drive/1B7zys275xmpPqahPwNvuYMPLmgvlV3l5).

## PyTorch version

For the model definition, torch is simpler (no build(), no weird initializer). However, it is more sensitive to shapes - tensorflow automatically matches dimensions in self.centers - tf.transpose(x) or self.beta * tf.reduce_sum(...), while here we have to call unsqueeze() (like np.newaxis). Try to remove .unsqueeze() from forward and look at the errors :-)

The training code is again longer, and more sensitive to types.

In [18]:
import torch
from torch import nn
from torch.utils.data import TensorDataset, DataLoader
from torch import optim
import torch.nn.functional as F


class RBFModule(nn.Module):
    def __init__(self, input_shape, num_outputs, init=None):
        super().__init__()
        self.num_outputs = num_outputs
        self.init = init
        
        self.centers = torch.rand(self.num_outputs, int(input_shape)) if init is None else torch.tensor(init).float()
        self.beta = torch.rand((self.num_outputs,))
        
        self.centers = torch.unsqueeze(self.centers, -1)
    
    def forward(self, x):
        H = torch.permute(self.centers - torch.permute(x, (1, 0)).unsqueeze(0), (2, 1, 0))
        
        return torch.exp(-self.beta * torch.sum(torch.pow(H, 2), axis=1))                     

In [None]:
mod = nn.Sequential(
    RBFModule(4, 10, init=init),
    nn.Linear(10, 3)
)

mod(torch.tensor(x).float()).shape

# create DataLoader for batching
x_tensor, y_tensor = torch.tensor(x).float(), torch.tensor(y).long()
train_data = TensorDataset(x_tensor, y_tensor)
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)

# fit and predict
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(mod.parameters())

# train the network
for epoch in range(1000):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        output = mod(inputs)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()

# predict
preds = []
with torch.no_grad():
    predictions = mod(torch.tensor(x).float())
    predictions = F.softmax(predictions, dim=1).numpy()
    print(np.mean(np.argmax(predictions, axis=1) == y))

## LSTM

Apart from the training code being longer, note how there is no softmax in the last output layer. This is because CrossEntropyLoss computes it *inside*. This is why we have to call softmax manually in the sample function to get actual probabilities and not logits.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import random
import sys
import requests

from tqdm import tqdm

'''
    Example script to generate text from Nietzsche's writings.
    At least 20 epochs are required before the generated text
    starts sounding coherent.
    It is recommended to run this script on GPU, as recurrent
    networks are quite computationally intensive.
    If you try this script on new data, make sure your corpus
    has at least ~100k characters. ~1M is better.
'''

def download_nietzsche_text():
    url = "https://s3.amazonaws.com/text-datasets/nietzsche.txt"
    response = requests.get(url)
    text = response.text.lower()
    return text

text = download_nietzsche_text()
print('corpus length:', len(text))

chars = sorted(set(text))
print('total chars:', len(chars))
char_indices = {char: i for i, char in enumerate(chars)}
indices_char = {i: char for i, char in enumerate(chars)}

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 20
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.float32)
y = np.zeros((len(sentences), len(chars)), dtype=np.float32)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

class CharRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(CharRNN, self).__init__()
        self.hidden_size = hidden_size
        self.lstm1 = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.dropout = nn.Dropout(0.2)
        self.lstm2 = nn.LSTM(hidden_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.lstm1(x)
        out = self.dropout(out)
        out, _ = self.lstm2(out)
        out = self.dropout(out)
        out = self.fc(out[:, -1, :])
        return out

device = 'cuda'

input_size = len(chars)
hidden_size = 512
output_size = len(chars)
model = CharRNN(input_size, hidden_size, output_size).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=0.001)

def sample(logits, temperature=1.0):
    # helper function to sample an index from a probability array
    logits = logits / temperature
    prob = torch.softmax(logits, dim=-1)
    return torch.multinomial(prob, 1).item()

# Training parameters
batch_size = 128
num_epochs = 60

# train the model, output generated text after each iteration
for epoch in range(num_epochs):
    print()
    print('-' * 50)
    print('Epoch', epoch+1)

    # Shuffle data
    indices = np.arange(len(X))
    np.random.shuffle(indices)
    X = X[indices]
    y = y[indices]

    pbar = tqdm(range(0, len(X), batch_size))
    for i in pbar:
        optimizer.zero_grad()
        batch_X = torch.tensor(X[i:i+batch_size], dtype=torch.float32)
        batch_y = torch.tensor(np.argmax(y[i:i+batch_size], axis=1), dtype=torch.long)

        output = model(batch_X.to(device))
        loss = criterion(output, batch_y.to(device))
        loss.backward()
        optimizer.step()
        
        pbar.set_postfix({'loss': loss.item()})

    start_index = random.randint(0, len(text) - maxlen - 1)

    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print()
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for _ in range(400):
            x = torch.zeros(1, maxlen, len(chars), dtype=torch.float32)
            for t, char in enumerate(sentence):
                x[0, t, char_indices[char]] = 1.

            with torch.no_grad():
                logits = model(x.to(device))
            next_index = sample(logits.detach().cpu()[0], diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
